The TxRep (Reputation) plugin is designed as an improved replacement of the AWL (Auto-Whitelist) plugin. It adjusts the final message spam score by looking up and taking in consideration the reputation of the sender.
The TxRep plugin keeps track of the average SpamAssassin score for senders. Senders are tracked using multiple identificators, or their combinations: the From: email address, the originating IP and/or an originating block of IPs, sender's domain name, the DKIM signature, and the HELO name. TxRep then uses the average score to reduce the variability in scoring from message to message, and modifies the final score by pushing the result towards the historical average. This improves the accuracy of filtering for most email.
In comparison with the original AWL plugin, several conceptual changes were implemented in TxRep:
Scoring - at AWL, although it tracks the number of messages received from each respective sender, when calculating the corrective score at a new message, it does not take it in count in any way. So for example a sender who previously sent a single ham message with the score of -5, and then sends a second one with the score of +10, AWL will issue a corrective score bringing the score towards the -5. With the default auto_whitelist_factor of 0.5, the resulting score would be only 2.5. And it would be exactly the same even if the sender previously sent 1,000 messages with the average of -5. TxRep tries to take the maximal advantage of the collected data, and adjusts the final score not only with the mean reputation score stored in the database, but also respecting the number of messages already seen from the sender. You can see the exact formula in the section txrep_factor.
Learning - AWL ignores any spam/ham learning. In fact it acts against it, which often leads to a frustrating situation, where a user repeatedly tags all messages of a given sender as spam (resp. ham), but at any new message from the sender, AWL will adjust the score of the message back to the historical average which does not include the learned scores. This is now changed at TxRep, and every spam/ham learning will be recorded in the reputation database, and hence taken in consideration at future email from the respective sender. See the section LEARNING SPAM / HAM for more details.
Auto-Learning - in certain situations SpamAssassin may declare a message an obvious spam resp. ham, and launch the auto-learning process, so that the message can be re-evaluated. AWL, by design, did not perform any auto-learning adjustments. This plugin will readjust the stored reputation by the value defined by txrep_learn_penalty resp. txrep_learn_bonus. Auto-learning score thresholds may be tuned, or the auto-learning completely disabled, through the setting txrep_autolearn.
Relearning - messages that were wrongly learned or auto-learned, can be relearned. Old reputations are removed from the database, and new ones added instead of them. The relearning works better when message tracking is enabled through the txrep_track_messages option. Without it, the relearned score is simply added to the reputation, without removing the old ones.
Aging - with AWL, any historical record of given sender has the same weight. It means that changes in senders behavior, or modified SA rules may take long time, or be virtually negated by the AWL normalization, especially at senders with high count of past messages, and low recent frequency. It also turns to be particularly counterproductive when the administrator detects new patterns in certain messages, and applies new rules to better tag such messages as spam or ham. AWL will practically eliminate the effect of the new rules, by adjusting the score back towards the (wrong) historical average. Only setting the auto_whitelist_factor lower would help, but in the same time it would also reduce the overall impact of AWL, and put doubts on its purpose. TxRep, besides the txrep_factor (replacement of the auto_whitelist_factor), introduces also the txrep_dilution_factor to help coping with this issue by progressively reducing the impact of past records. More details can be found in the description of the factor below.
Blacklisting and Whitelisting - when a whitelisting or blacklisting was requested through SpamAssassin's API, AWL adjusts the historical total score of the plain email address without IP (and deleted records bound to an IP), but since during the reception new records with IP will be added, the blacklisted entry would cease acting during scanning. TxRep always uses the record of th plain email address without IP together with the one bound to an IP address, DKIM signature, or SPF pass (unless the weight factor for the EMAIL reputation is set to zero). AWL uses the score of 100 (resp. -100) for the blacklisting (resp. whitelisting) purposes. TxRep increases the value proportionally to the weight factor of the EMAIL reputation. It is explained in details in the section BLACKLISTING / WHITELISTING. TxRep can blacklist or whitelist also IP addresses, domain names, and dotless HELO names.
Sender Identification - AWL identifies a sender on the basis of the email address used, and the originating IP address (better told its part defined by the mask setting). The main purpose of this measure is to avoid assigning false good scores to spammers who spoof known email addresses. The disadvantage appears at senders who send from frequently changing locations or even when connecting through dynamical IP addresses that are not within the block defined by the mask setting. Their score is difficult or sometimes impossible to track. Another disadvantage is, for example, at a spammer persistently sending spam from the same IP address, just under different email addresses. AWL will not find his previous scores, unless he reuses the same email address again. TxRep uses several identificators, and creates separate database entries for each of them. It tracks not only the email/IP address combination like AWL, but also the standalone email address (regardless of the originating IP), the standalone IP (regardless of email address used), the domain name of the email address, the DKIM signature, and the HELO name of the connecting PC. The influence of each individual identificator may be tuned up with the help of weight factors described in the section REPUTATION WEIGHTS.
Message Tracking - TxRep (optionally) keeps track of already scanned and/or learned message ID's. This is useful for avoiding to strengthen the reputation score by simply rescanning or relearning the same message multiple times. In the same time it also allows the proper relearning of once wrongly learned messages, or relearning them after the learn penalty or bonus were changed. See the option txrep_track_messages.
User and Global Storages - usually it is recommended to use the per-user setup of SpamAssassin, because each user may have quite different requirements, and may receive quite different sort of email. Especially when using the Bayesian and AWL plugins, the efficiency is much better when SpamAssassin is learned spam and ham separately for each user. However, the disadvantage is that senders and emails already learned many times by different users, will need to be relearned without any recognized history, anytime they arrive to another user. TxRep uses the advantages of both systems. It can use dual storages: the global common storage, where all email processed by SpamAssassin is recorded, and a local storage separate for each user, with reputation data from his email only. See more details at the setting txrep_user2global_ratio.
Outbound Whitelisting - when a local user sends messages to an email address, we assume that he needs to see the eventual answer too, hence the recipient's address should be whitelisted.
When SpamAssassin is used for scanning outgoing email too, when local users use the SMTP server where SA is installed, for sending email, and when internal networks are defined, TxREP will improve the
reputation of all 'To:' and 'CC' addresses from messages originating in the internal networks. Details can be found at the setting txrep_welcomelist_out
.
Both plugins (AWL and TxREP) cannot coexist. It is necessary to disable the AWL to allow TxRep running. TxRep reuses the database handling of the original AWL module, and some its parameters bound to the database handler modules. By default, TxRep creates its own database, but the original auto-whitelist can be reused as a starting point. The AWL database can be renamed to the name defined in TxRep settings, and TxRep will start using it. The original auto-whitelist database has to be backed up, to allow switching back to the original state.
_TXREP_XXX_Y_ TXREP modifier
_TXREP_XXX_Y_MEAN_ Mean score on which TXREP modification is based
_TXREP_XXX_Y_COUNT_ Number of messages on which TXREP modification is based
_TXREP_XXX_Y_PRESCORE_ Score before TXREP
_TXREP_XXX_Y_UNKNOW_ New sender (not found in the TXREP list)