Basics Settings Antispam Learning Settings

Learning Settings

Use learner - use_learner
Whether to use any machine-learning classifiers with SpamAssassin, such as the default "BAYES_*" rules. Setting this to 0 will disable use of any and all human-trained classifiers.
Default: 1

Use bayes - use_bayes
Whether to use the naive-Bayesian-style classifier built into SpamAssassin. This is a master on/off switch for all Bayes-related operations.
Default: 1

Use bayes rules - use_bayes_rules
Whether to use rules using the naive-Bayesian-style classifier built into SpamAssassin. This allows you to disable the rules while leaving auto and manual learning enabled.
Default: 1

Bayes auto learn - bayes_auto_learn
Whether SpamAssassin should automatically feed high-scoring mails (or low-scoring mails, for non-spam) into its learning systems. The only learning system supported currently is a naive-Bayesian-style classifier.
Default: 1

Bayes auto learn on error - bayes_auto_learn_on_error
With bayes_auto_learn_on_error off, autolearning will be performed even if bayes classifier already agrees with the new classification (i.e. yielded BAYES_00 for what we are now trying to teach it as ham, or yielded BAYES_99 for spam). This is a traditional setting, the default was chosen to retain backwards compatibility. With bayes_auto_learn_on_error turned on, autolearning will be performed only when a bayes classifier had a different opinion from what the autolearner is now trying to teach it (i.e. it made an error in judgement). This strategy may or may not produce better future classifications, but usually works very well, while also preventing unnecessary overlearning and slows down database growth.
Default: 1

Bayes auto learn threshold nonspam - bayes_auto_learn_threshold_nonspam
The score threshold below which a mail has to score, to be fed into SpamAssassins learning systems automatically as a non-spam message.
Default: 0.1

Bayes auto learn threshold spam - bayes_auto_learn_threshold_spam
The score threshold above which a mail has to score, to be fed into SpamAssassins learning systems automatically as a spam message. Note: SpamAssassin requires at least 3 points from the header, and 3 points from the body to auto-learn as spam. Therefore, the minimum working value for this option is 6.
Default: 12

Bayes min spam num - bayes_min_spam_num
To be accurate, the Bayes system does not activate until a certain number of spam have been learned.
Default: 200

Bayes min ham num - bayes_min_ham_num
To be accurate, the Bayes system does not activate until a certain number of ham have been learned.
Default: 200

Bayes learn during report - bayes_learn_during_report
The Bayes system will, by default, learn any reported messages (spamassassin -r) as spam. If you do not want this to happen, set this option to 0.
Default: 1

Bayes use hapaxes - bayes_use_hapaxes
Should the Bayesian classifier use hapaxes (words/tokens that occur only once) when classifying? This produces significantly better hit-rates.
Default: 1

Bayes auto expire - bayes_auto_expire
If enabled, the Bayes system will try to automatically expire old tokens from the database. Auto-expiry occurs when the number of tokens in the database surpasses the bayes_expiry_max_db_size value. If a bayes datastore backend does not implement individual key/value expirations, the setting is silently ignored.
Default: 1

Bayes expiry max db size - bayes_expiry_max_db_size
What should be the maximum size of the Bayes tokens database? When expiry occurs, the Bayes system will keep either 75% of the maximum value, or 100,000 tokens, whichever has a larger value. 150,000 tokens is roughly equivalent to a 8Mb database file.
Default: 150000

Storage Settings

For high volume servers we recommend switching from the default MySQL storage backend to the Redis backend for bayes data. This allows spam training and bayes lookups to run in memory which makes it extremely fast.

Bayes store module - bayes_store_module
The storage backend to use for bayes data. The default is backend is MySQL. Using Redis will give the highest performance.
Default: Mail::SpamAssassin::BayesStore::MySQL

Bayes SQL DSN - bayes_sql_dsn
Config parameters affecting a connection to a MySQL or Redis server. RHEL/Centos/CloudLinux 8 uses DBI:MariaDB while all others use DBI:mysql
Default: DBI:mysql:database=danami_warden;host=localhost
Default RHEL/Centos/CloudLinux 8: DBI:MariaDB:database=danami_warden;host=localhost

Bayes token TTL - bayes_token_ttl
Controls token expiry (ttl value in SECONDS, sent as-is to Redis). The default value is 21 days. Expiry is done internally in Redis using *_ttl settings.
Default: 21d

Bayes seen TTL - bayes_seen_ttl
Controls "seen" expiry (ttl value in SECONDS, sent as-is to Redis). The default value is 8 days. Expiry is done internally in Redis using *_ttl settings.
Default: 8d

Installing Redis

It is up to you to secure your Redis installation. You should make sure that the Redis port 6379 is not exposed to the Internet. We recommend binding Redis to the IPv4 loopback interface 127.0.0.1 and setting a password for it.

RHEL/Centos/Cloudlinux/AlmaLinux

yum install redis
systemctl enable redis --now

Debian/Ubuntu

apt-get install redis-server
systemctl enable redis-server --now

Bayes Storage Connectors

Bayes Storage connection information is stored in the /etc/mail/spamassassin/local.cf file. If the connection entries are missing in your config re-saving the page at Warden -> Settings -> Learning Settings will add them again.

Example MySQL Connector:

bayes_store_module Mail::SpamAssassin::BayesStore::MySQL
bayes_sql_dsn DBI:mysql:database=danami_warden;host=localhost
bayes_sql_username danami_warden
bayes_sql_password XXXXXXXXX
bayes_sql_override_username amavis

Example Redis Connector:

The server with port number is required where the password and database options are optional. The database number referes to the redis database to use 0-15 with 0 being the default.

bayes_store_module Mail::SpamAssassin::BayesStore::Redis
bayes_sql_dsn server=127.0.0.1:6379;password=foo;database=0
bayes_token_ttl 120d
bayes_seen_ttl 8d

Learning Ignore Settings

Bayes ignore from - bayes_ignore_from
Bayesian classification and autolearning will not be performed on mail from the listed addresses. Program sa-learn will also ignore the listed addresses if it is invoked using the --use-ignores option. One or more addresses can be listed, see whitelist_from. Spam messages from certain senders may contain many words that frequently occur in ham. For example, one might read messages from a preferred bookstore but also get unwanted spam messages from other bookstores. If the unwanted messages are learned as spam then any messages discussing books, including the preferred bookstore and antiquarian messages would be in danger of being marked as spam. The addresses of the annoying bookstores would be listed. (Assuming they were halfway legitimate and did not send you mail through myriad affiliates.) Those who have pieces of spam in legitimate messages or otherwise receive ham messages containing potentially spammy words might fear that some spam messages might be in danger of being marked as ham. The addresses of the spam mailing lists, correspondents, etc. would be listed.
Default: empty

Bayes ignore to - bayes_ignore_to
Bayesian classification and autolearning will not be performed on mail to the listed addresses. See bayes_ignore_from for details.
Default empty

Bayes ignore headers - bayes_ignore_header
If you receive mail filtered by upstream mail systems, like a spam-filtering ISP or mailing list, and that service adds new headers (as most of them do), these headers may provide inappropriate cues to the Bayesian classifier, allowing it to take a "short cut". To avoid this, list the headers using this setting.
Default: empty