30 September 2009
Tags: jlangdetect nlp
This is a small update which includes the following features :
Ability to detect the language of a document using a subset of the languages used for training
Logs now managed by log4j
The ability to use a subset of the languages used for training is important if you know that a document must be written in french or english, for example, but the detector has been trained for more languages. Using a subset will ensure that the detector returns one of those languages.
JLangDetect is licensed under Apache 2.0.
Binary : jlangdetect-0.2.jar
Europarl pre-compiled corpus: ngrams-europarl.zip
Project is hosted on Google code.