SpamAssassin, MIMEDefang, and Procmail: Best Trio of 2017

Image by:

Internet Archive. Modified by Opensource.com. CC BY-SA 4.0

In 2015 and 2016, I awarded "Best Couple" to two open source commands or program types that, combined, make my world a better place. This year, the "Best Couple" prize has turned into the "Best Trio," because resolving the problem I set out to fix—effective server-side email sorting—took three pieces of software working together. Here's how I got everything to work using SpamAssassin, MIMEDefang, and Procmail, three common and freely available open source software packages.

The problem

To make managing my email easier, I like to sort incoming messages into a few folders (in addition to the inbox). Spam is always filed into the spam folder, and I look at it every couple of days in case something I want was marked as spam. I also sort email from a couple of other sources into specific folders. Everything else is filed into the inbox by default.

A quick word about terminology to begin: Sorting is the process of classifying email and storing it in an appropriate folder. Filters like SpamAssassin classify the email. MIMEDefang uses that classification to mark a message as spam by adding a text string to the subject line. That classification allows other software to file the email into the designated folders. I had been using those two applications, and I needed software to do this last bit—the one that does the filing.

I have several email filters set up in Thunderbird, the best client I have found for my personal needs. Both my wife and I use email filters on our computers. When we travel or use our handheld devices, those filters don't always work because Thunderbird—or any other email client with filters—must be running on my computer at home in order to perform the filtering tasks. I can set up filters on my laptop to sort email when I'm traveling, but that means I have to maintain multiple sets of filters.

There was also a technical problem I wanted to fix. Client-side email filtering relies on scanning messages after they are deposited in the inbox. For some unknown reason, sometimes the client does not delete (expunge) the moved messages from the inbox. This may be an issue with Thunderbird (or it may be a problem with my configuration of Thunderbird). I have worked on this problem for years with no success, even through multiple complete re-installations of Fedora and Thunderbird.

Additionally, spam is a major problem for me. I have my own email server, and I use several email addresses. I have had some of those email accounts for a couple decades, and they have become major spam magnets. In fact, I regularly get between 1,200 and 1,500 spam emails each day—my record is just over 2,500 spam emails in a single day—and the numbers keep increasing.

To solve my problems, I needed a method for filing emails (i.e., sorting them into appropriate folders) that was server-based rather than client-based. This would solve several issues: I wouldn't need to leave an email client running on my home workstation just to perform filtering. I wouldn't have to delete or expunge messages—especially spam—from our inboxes. And I wouldn't need to configure filters in multiple locations—I would need them in only one location, the server.

My email server

I chose Sendmail as my email server in about 1997, when I switched from OS/2 to Red Hat Linux 5, as I'd already been using it for several years at work. It's been my mail transfer agent (MTA) ever since, for both business and personal use. (I don't know why Wikipedia refers to MTA as a "message" transfer agent, when all my other references say "mail" transfer agent. The Talk tab of the Wikipedia page has a bit of discussion about this, which generated even more confusion for me.)

I've been using SpamAssassin and MIMEDefang together to score and mark incoming emails as spam, placing a known string in the subject, ###SPAM###, so that I can identify and sort junk email both as a human and with software. I use UW IMAP for client access to emails, but that is not a factor in server-side filtering and sorting.

Yes, I use a lot of old-school software for the server side of email, but it is well known, it works well, and I understand how to make it do the things I need it to do.

Project requirements

I believe having a well-defined set of requirements is imperative before starting a project. Based on my description of the problem, I created five simple requirements for this project:

Sort incoming spam emails into the spam folder on the server side using the identifying text that is already being added to the subject line.
Sort other incoming emails into designated folders.
Circumvent problems with moved messages not being deleted or expunged from the inbox.
Keep the existing SpamAssassin and MIMEDefang software.
Make sure any new software is easy to install and configure.

This set of objectives meant that I would need a sorting program that integrates well with the parts I already have.

Procmail

After extensive research, I settled on the venerable Procmail. I know—more old stuff—and pretty much unsupported these days, too. But it does what I need it to do and is known to work well with the software I am already using. It is stable and has no known serious bugs. It can be configured for use at the system level as well as at the individual user level.

Red Hat and Red Hat-based distributions, such as CentOS and Fedora, use Procmail as the default mail delivery agent (MDA) for SendMail, so it does not even need to be installed; it is already there. My server runs CentOS, so using Procmail is a real no-brainer.

In addition to delivering email, Procmail can be used to filter and sort it. Procmail rules (known as recipes) can be used to identify spam and delete or sort it into a designated mail folder. Other recipes can identify and sort other mail as well. Procmail can be used for many other things besides sorting email into designated folders, such as automated forwarding, duplication, and much more. Those other tasks are beyond the scope of this article, but understanding sorting should give you a better understanding of how to accomplish those other tasks.

How it works

There are so many ways of using SpamAssassin, MIMEDefang, and Procmail together for anti-spam solutions, so I won't go deeply into how to configure them. Instead, I will focus on how I integrated these three packages to implement my own solution.

Incoming email processing begins with SendMail. I added this line to my sendmail.mc configuration file:

INPUT_MAIL_FILTER(`mimedefang', `S=unix:/var/spool/MIMEDefang/mimedefang.sock, T=S:5m;R:5m')dnl

This line calls MIMEDefang as part of email processing. Be sure to run the make command after making any configuration changes to SendMail, then restart SendMail. (For more information, see Chapter 8 of SpamAssassin: A Practical Guide to Integration and Configuration.)

SpamAssassin can run as standalone software in some applications; however, in this environment, it is not run as a daemon, it is called by MIMEDefang, and each email is first processed by SpamAssassin to generate a spam score for it.

SpamAssassin provides a default set of rules, but you can modify the scores for existing rules, add your own rules, and create whitelists and blacklists by modifying the /etc/mail/spamassassin/local.cf file. This file can grow quite large; mine is just over 70KB and still growing.

SpamAssassin uses the set of default and custom rules and scores to generate a total score for each email. MIMEDefang uses SpamAssassin as a subroutine and receives the spam score as a return code.

MIMEDefang is programmed in Perl, so it is easy to hack. I have hacked the last major portion of the code in /etc/mail/mimedefang-filter to provide a filtering breakdown with a little more granularity than the default. Here's how this section of the code looks on my installation (I have made significant changes to this portion of the code, so yours probably will not look much like this):

#####################################################################
# Determine how to handle the email based on its spam score and #
# add an appropriate X-Spam-Status header and alter the subject. #
#####################################################################
# Set required_hits in sa-mimedefang.cf to get value for $req #
#####################################################################
if ($hits >= $req) {
action_add_header("X-Spam-Status", "Spam, score=$hits required=$req tests=$names");
action_change_header("Subject", "####SPAM#### ($hits) $Subject");
action_add_part($entity, "text/plain", "-suggest", "$report\n", "SpamAssassinReport.txt", "inline");
# action_discard();
} elsif ($hits >= 8) {
action_add_header("X-Spam-Status", "Probably, score=$hits required=$req tests=$names");
action_change_header("Subject", "####Probably SPAM#### ($hits) $Subject");
action_add_part($entity, "text/plain", "-suggest", "$report\n", "SpamAssassinReport.txt", "inline");
} elsif ($hits >= 5) {
action_add_header("X-Spam-Status", "Possibly, score=$hits required=$req tests=$names");
action_change_header("Subject", "####Possibly SPAM#### ($hits) $Subject");
action_add_part($entity, "text/plain", "-suggest", "$report\n", "SpamAssassinReport.txt", "inline");
} elsif ($hits >= 0.00) {
action_add_header("X-Spam-Status", "Probably not, score=$hits required=$req tests=$names");
# action_add_part($entity, "text/plain", "-suggest", "$report\n", "SpamAssassinReport.txt", "inline");
} else {
# If score (hits) is less than or equal to 0
action_add_header("X-Spam-Status", "No, score=$hits required=$req tests=$names");
# action_add_part($entity, "text/plain", "-suggest", "$report\n", "SpamAssassinReport.txt", "inline");
}

Here's the line in that code that changes the subject line of the email:

action_change_header("Subject", "####SPAM#### ($hits) $Subject");

Actually it calls another Perl subroutine to change the subject line using the string I want to add as an argument, but the effect is the same. The subject line now contains the string ####SPAM#### and the spam score (i.e., the variable $hits). Having this known string in the subject line makes further filtering easy.

The modified email is returned to SendMail for further processing, and SendMail calls Procmail to act as the MDA.

Procmail uses global and user-level configuration files, but the global /etc/procmailrc file and individual user ~/.procmailrc files must be created. The structure of the files is the same, but the global file operates on all incoming email, while local files can be configured for each individual user. Since I don't use a global file, all the sorting is done on the user level. My .procmailrc file is simple:

# .procmailrc file for david@both.org
# Rules are run sequentially - first match wins

PATH=/usr/sbin:/usr/bin
MAILDIR=$HOME/mail #location of your mailboxes
DEFAULT=/var/spool/mail/david

# Send Spam to the spam mailbox
# This is my new style SPAM subject
:0
* ^Subject:.*####SPAM####
$HOME/spam

# Political stuff goes here. Must be using my political email address
:0
* ^To:.*political
$HOME/Political

# SysAdmin stuff goes here. Usually system log messages
:0
* ^Subject:.*(Logwatch|rkhunter|Anacron|Cron|Fail2Ban)
$HOME/AdminStuff

# drops messages into the default box
:0
* .*

Note that the .procmailrc file must be located in my email account's home directory on the email server, not in the home directory on my workstation. Because most email accounts are not login accounts, they use the nologin program as the default shell, so an admin must create and maintain these files. The other option is to change to a login shell, such as Bash, and set passwords so that knowledgeable users can log in to their email accounts on the server and maintain their .procmailrc files.

Each Procmail recipe starts with :0 (yes, that is a zero) on the first line and contains a total of three lines. The second line starts with * and contains a conditional statement consisting of a regular expression (regex) that Procmail compares to each line in the incoming email. If there is a match, Procmail sorts the email into the folder specified by the third line. The ^ symbol denotes the beginning of the line when making the comparison.

The first recipe in my .procmailrc file sorts the spam identified in the subject line by MIMEDefang into my spam folder. The second recipe sorts political email (identified by a special email address I use for my volunteer work for various political organizations) into its own folder. The third recipe sorts the huge amount of system emails I receive from the many computers I deal with into a mailbox for my system administrator duties. This setup makes those emails very easy to find.

Note the use of parentheses to enclose a list of strings to match. Each string is separated by a vertical bar, aka the pipe ( | ), which is used as a logical "or." So the conditional line

* ^Subject:.*(Logwatch|rkhunter|Anacron|Cron|Fail2Ban)

reads, "if the Subject line contains Logwatch or rkhunter or ... or Fail2Ban." Since Procmail ignores case, there is no need to create recipes that look for various combinations of upper and lower case.

The last recipe drops all email that does not match another recipe into the default folder, usually the inbox.

Having the .procmailrc file in my home directory does not cause Procmail to filter my mail. I have to add one more file, the following ~/.forward file, which tells Procmail to filter all of my incoming email:

# .forward file
# process all incoming mail through procmail - see .procmailrc for 
# the filter rules.
|/usr/bin/procmail

It is not necessary to restart either SendMail or MIMEDefang when creating or modifying the Procmail configuration files.

For more detail about the configuration of Procmail and creation of recipes, see the SpamAssassin book and the Procmail information in the RHEL Deployment Guide.

A few additional notes

Note that MIMEDefang must be started first, before SendMail, so it can create the socket where SendMail sends emails for processing. I have a short script (automate everything!) I use to stop and restart SendMail and MIMEDefang in the correct order so that new or modified rules in the local.cf file take effect.

I already have a large body of rules and score modifiers in my SpamAssassin local.cf file so, although I could have used Procmail by itself for spam filtering and sorting, it would have taken a lot of work to convert all of those rules. I also think SpamAssassin does a better job of scoring because it does not rely on a single rule to match, but rather the aggregate score from all the rules, as well as scores from Bayesian filtering.

Procmail works very well when matches can be made very explicit with known strings, such as the ones I have configured MIMEDefang to place in the subject line. I think Procmail works better as a final sorting stage in the spam-filtering process than as a complete solution by itself. That said, I know that many admins have made complete spam filtering solutions using nothing more than Procmail.

Now that I have server-side filtering in place, I am somewhat less limited in my choice of email clients, because I no longer need a client that performs filtering and sorting. Nor do I need to leave an email client running all the time to perform that filtering and sorting.

Reports of Procmail's demise are greatly exaggerated

In my research for this article, I found a number of Google results (dating from 2001 to 2013) that declared Procmail to be dead. Evidence includes broken web pages, missing source code, and a sentence on Wikipedia that declares Procmail to be dead and links to more recent replacements. However, all Red Hat, Fedora, and CentOS distributions install Procmail as the MDA for SendMail. The Red Hat, Fedora, and CentOS repositories all have the source RPMs for Procmail, and the source code is also on GitHub.

Considering Red Hat's continued use of Procmail, I have no problem using this mature software that does its job silently and without fanfare.