TextCat Settings

OK languages - ok_languages
This option is used to specify which languages are considered okay for incoming mail. SpamAssassin will try to detect the language used in the message body text. Note that the language cannot always be recognized with sufficient confidence. In that case, no action is taken. The rule UNWANTED_LANGUAGE_BODY is triggered if none of the languages detected are in the "ok" list. Note that this is the only effect of the "ok" list. It does not act as a whitelist against any other form of spam scanning. In your configuration, you must use the two or three letter language specifier in lowercase, not the English name for the language. You may also specify all if a desired language is not listed, or if you want to allow any language.
Default: all

Inactive languages - inactive_languages
This option is used to specify which languages will not be considered when trying to guess the language. For performance reasons, supported languages that have fewer than about 5 million speakers are disabled by default.
Default: bs cy eo et eu fy ga gd is la lt lv rm sa sco sl yi

Textcat max languages - textcat_max_languages
The maximum number of languages any one message can simultaneously match before its classification is considered unknown.
Default: 3

Textcat optimal ngrams - textcat_optimal_ngrams
If the number of ngrams is lower than this number then they will be removed. This can be used to speed up the program for longer inputs. For shorter inputs, this should be set to 0.
Default: 0

Textcat max ngrams - textcat_max_ngrams
The maximum number of ngrams that should be compared with each of the languages models (note that each of those models is used completely).
Default: 400

Textcat acceptable score - textcat_acceptable_score
Include any language that scores at least textcat_acceptable_score in the returned list of languages.
Default: 1.02

Related Pages