Spam filtering

From MuonPi-Wiki
Jump to navigation Jump to search

Our website uses postfix as it's MTA (mail transport agent), which also has the ability to forward incoming mails to so called MILTERS ("mail filters"). One of those milters configured on our server is the spam filter Rspamd.

The WebUI of the service is located at rspamd.muonpi.org. The rspamd service itself is running on localhost:11332 with the UI running on localhost:11334.

Terminology

Spam is unwanted mail, while Ham is referring to false positives.

From Wikipedia on Spam (food):

Spam has affected popular culture, including a Monty Python skit, which repeated the name many times, leading to its name being borrowed to describe unsolicited electronic messages, especially email.

Setup

Setup and configuration was done according to this guide. Configuration for postfix is rather simple, just add inet:localhost:11332 to the smtpd_milters field:

smtpd_milters = unix:/opendkim/opendkim.sock,inet:localhost:11332

Training the filter

Rspamd has the ability to learn to better filter spam. The training data is stored using a redis database.

Training can be done by two different methods: Via the WebUIs Scan/Learn Section of via the command line interface rspamc.

The WebUI

After logging into the WebUI you can see the status of the filter. On the right hand side, a pie chart shows how many mails have been processed and the portions of rejected or annotated mails. The Table named Bayesian statistics shows how many mails have been declared as 'Spam' or as 'Ham'.

Training is done in the Scan/Learn tab of the WebUI. Paste the raw message source into the text field and click Scan message. Below you will see the result of the scan. The 'action' indicates what Rspamd would do if it where to receive that mail. The symbols listed give the merits on which a message was evaluated. They can either have a positive or negative value, indicating if their presence indicates if the given message is spam or ham.

You can tell rspamd to learn from this message by choosing Upload Ham or Upload Spam.

The CLI

Similar to the WebUI, you can check if a given mail is spam or not by calling rspamc suspicious.eml. After analysis, the found symbols and the collective score is shown. To train the filter on this suspicious mail call rspamc learn_spam suspicious.eml and rspamc learn_ham suspicious.eml respectively.

Dovecot CLI: 'doveadm'

Since most of the incomming mail is recieved by OSTicket via support@muonpi.org which does not provide the raw message source in the tickets, doveadm is used to get the message source.

NOTE: This can only be done with superuser permissions and gives full read/write access to all users mails. So be careful!

Use doveadm search [-u <user>|-A] [-S <socket_path>] <search query> to search for mails. See wiki.dovecot.org for command reference and this page for search_query reference.

This one-liner will search and save mails from user <user> which were sent/recieved on the date <YYYY-MM-DD>:
doveadm search -u <user> ON <YYYY-MM-DD> | while read guid uid; do doveadm fetch -u <user> text mailbox-guid $guid uid $uid > $uid.eml; done