Let’s do Postfix slowly and properly – Part 9: Using Rspamd as a spam milter

Continuing on from building up Dovecot to more competently move mail about, we head back into the Postfix series to finally address the issue of spam.

There is a reason that this part of the series didn’t happen sooner: Spam is tricky, spam fighting software setup is possibly even more so. The tools I have been using so far don’t come with inbuilt spam fighting capabilities, so something has to be bolted onto them, and that something is mostly not particularly elegant.

My first port of call was SpamAssassin because that’s what’s mostly talked about. However, I ended up discarding it in favour of Rspamd. I will disscuss SpamAssassin mostly to talk through how it works with Postfix and why that left me cold. I have not evaluated how well it does its job, only how easy it was to wrap my head around and set up.

SpamAssassin

What is SpamAssassin? 2-3 things as far as I can tell. It’s a service/daemon, spamd, capable of analysing the contents of email and assign a score based on how spammy the email is. It’s a client, spamc, that can talk to the daemon – usually on behalf of an MTA. And it’s some scripts that download shared rules of thumb about spammers and spam to help spamd make more educated guesses. This much was reasonably clear.

What was unclear even after reading a number of tutorials, however, was how it would actually work with Postfix. Sure, I could find copy-paste configs but no real explanation. A lot of trial and error followed and somehow the penny finally dropped. I would need to twist the smtpd daemon to hand over mail to a local socket. That local socket is not really a socket service, however, but a link to a client binary. That client (spamc) then feeds the mail to the SpamAssassin/spamd service whether on the local machine or somwhere else. Once spamd has processed the mail it then needs to feed the mail back into a later stage of Postfix’s mail processing (so that it doesn’t get caught up in an infinte loop of spam checking). I can see how it works and it probably does but elegant it is not. It’s also not particularly container friendly because in order to feed back the mail, spamd needs access to postfix binaries. Which it will not automatically have because it’s obviously in a different container.

Basically, it just felt like a series of hacks. So I was happy to discover that milters, or mail filters, were a thing. Unlike the setup described above (and proscribed in most SpamAssassin tutorials), milters can be set in Postfix’s configuration file, main.cf, and Postfix knows what they are. Rather than hacking at master.cf processes to twist the smtp daemon, I could have Postfix actually understand what this new setup was.

The problem with this is that SpamAssassin does not ‘speak’ the milter protocol so in order to have it work with Postfix this way, I would need another process altogether, spamass-milter, as a go-between. Configure spamass-milter as a milter in Postfix, configure SpamAssassin as the spam processing service in spamass-milter. Tie it all up with unix sockets. These are all separate and independent processes, mind you.

Not only does this feel heavy in terms of setup – so many processes, so much configuration – it also is not particularly container friendly, as I could not verify that I could use ports rather than sockets. Luckily, I stumbled on an alternative spam processor with an inbuilt milter opening, rspamd.

Rspamd

Rspamd is a newer and seemingly more actively developed project than SpamAssassin. This also helped. What did not help was that, despite it being more suited for hooking up to Postfix, it was not particularly friendly to a basic setup.

Rspamd consists of a lot modules that each try their own approach to determine if an email is spam or ham. Most of them are preselected. That sounds good in theory – the more angles, the better results – but what it means is that an inexperienced user is thrown in at the very deep end: Tens of modules all shouting at the same time about configuration issues, results, scores, etc.

There is no way that I can get to grips with what they all do. So all I am going to focus on here is the very most basic stuff:

Installing
Setting up a database which is a requirement for a lot of basic functionality
Getting logging under control so I can at least see what it’s shouting at me
Understanding the confusing configuration hierarchy – or at least being able to wrangle results from it.
Connecting it to Postfix
Disabling the advanced or hard-to-setup modules

This will leave me with what is essentially a black box but hopefully one that works and doesn’t break too much. It bugs me immensely that i won’t know that’s going on in there but simply, life is too short and spam too boring a subject.

Installation

Normally, I wouldn’t go into detail about installing packages but I found out the hard way that it’s a bad idea installing Rspamd from the official repos of Debian/Ubuntu. They are far behind the official releases and one important development in recent releases (2.x) seems to be an increasing commitment to using a Redis database over other kinds of databases, like Sqlite. Now, I think sqlite is perfectly adequate for my needs but older versions of rspamd, like those even in Ubuntu 20.04, still require a Redis database. They just haven’t started using them for everything. Basically, picking an older release means having to deal with both sqlite issues AND Redis issues. With newer releases, like 2.5, I only have to worry about the latter as Redis seems to be the default database for all configuration, including all the modules I saw.

It boils down to this: When installing on a recent (2020) distro, I am definitely better off following the documentation’s suggestion of adding the official Rspamd repos before installing.

2024 note

When using a recent release, like Ubuntu 24.04 Noble Numbat, this can be disregarded. Just install rspamd from the repos.

Redis

I haven’t investigated Redis much but I did sit through this interactive introduction that actually did a swell job of making Redis seem like a jolly good idea. Rather than arcane SQL commands and syntax, Redis interaction felt more like Python, including in it’s scripting language like ability to modify values.

For configuration I disabled protected-mode in order to accept connections from outside the container. Other than that the default options seemed to work well. Once protected mode is off, Redis listens for local and non-local traffic on port 6379. Obviously, if Redis running on the same machine as Rspamd, protected-mode should probably stay on.

Other containerization issues:

The working directory (the dir setting) is where Redis stores files. So that should obviously go onto a volume when run in containers.
rspamd complains mightily if it cannoty find Redis at startup. I don’t think it fails if it has to wait but the output is much nicer if docker compose is told that rspamd depends_on the redis service.

Logging

As mentioned, rspamd produces a lot of output and I need access to it in order to debug. By default logging goes to a log file. Because of containerization, I prefer it going to stdout so I can inspect it with docker logs. I do this by creating a configuration file called logging.inc containing the following:

type = "console";
level = "notice";

logging.inc goes into the local.d folder under the main rspamd configuration folder, most likely /etc/rspamd/.

This brings me to the subject of how to configure Rspamd, which in itself is so daunting that it nearly put me off using the software.

The configuration system

Rspamd has a main configuration file rspamd.conf that sources other config files that again sources other files, and so on. The sourced files can be divided into two sets: One set that has to do with specific processes and one set that is not tied to a specific process. Logging is an example of the former. The files in this first set seem to all have the .inc suffix, whereas the latter files all end in .conf. In rspamd.conf the processes are detailed in sections – there are a number of worker processes, the logging process etc. All other options are sourced via the “commons” section. I am going to leave those alone for now and focus on settings related to the worker processes.

So far, so good. How do I change a setting? Good question. Here is the rspamd.conf section for the ‘normal’ process. I don’t need this process because I run a tiny operation. So my job here is to disable it. Let’s look at the section:

worker "normal" {
    bind_socket = "localhost:11333";
    .include "$CONFDIR/worker-normal.inc"
    .include(try=true; priority=1,duplicate=merge) "$LOCAL_CONFDIR/local.d/worker-normal.inc"
    .include(try=true; priority=10) "$LOCAL_CONFDIR/override.d/worker-normal.inc"
}

The normal worker has a default setting to bind to port 11333 on the local host. All other options are gained from the package configuration file worker-normal.inc.

First and foremost: It should not be necessary to edit any of the files in the rspamd package, including the configuration files. That’s what the sourcing is for. So if I want to disable worker-normal, I need to look to worker-normal.inc file in either the local.d or the override.d directories. Neither file exists as yet; all files in local.d and override.d are completely the user’s own. The obvious advantage is that I can upgrade rspamd all I want without it touching my configuration. The obvious downside is ridiculous complexity.

So which should I pick: local.d or override.d? The official documentation is not very clear (I don’t think the writer is a native English speaker) but I think I can boil it down to this. A section in an override.d file will remove settings from the default configuration if those settings are not explicitly included in my override.d file. The same config file, placed in local.d instead, will add new settings and overwrite default settings but not unset any settings that I don’t mention.

As an example, if the default worker-normal.inc has this section:

a_section {
  a_setting = 123
}

and I write a worker-normal.inc file that looks like this

a_section {
  b_setting = 456
}

… putting it in local.d would meld the two section into one containing both a_setting and b_setting settings. If I put it into override.d instead, the result would be only one setting, the one for b_setting. Had I included a setting for a_setting in my file (say, 124), it would have overwritten the 123 value regardless of which directory I had placed it in.

Now, in order to disable worker-normal, all I need is the following in a worker-normal.inc:

enabled = false;

Because all other settings are irrelevant once I have disabled it, I don’t think it matters much which directory I pick. Unless I have good cause, though, I generally stick to local.d.

Workers

As mentioned, I don’t need worker-normal because the so-called worker-proxy will do. In a big setup, worker-proxy is just a frontend that talks to the milter and leaves the hard processing work to worker-normals in the background. In my setup proxy worker does everything itself.

There is a third worker type set up by default which is worker-controller that has to do with statistics and the web UI. Basically, I try to pair Rspamd down as much as possible to keep my sanity so I also disable worker-controller in the same way as I did with worker-normal above. I haven’t seen any evidence of problems as a result of this. Obviously without a web ui and statistics it can be hard knowing what’s going on without parsing logs but for now I will do without.

That leaves worker-proxy. Here’s my worker-proxy.inc that I have copied verbatim from another how-to:

bind_socket = "0.0.0.0:11332";
milter = yes;
timeout = 120s;
upstream "local" {
  default = yes;
  self_scan = yes;
}
count = 4; # Spawn more processes in self-scan mode
max_retries = 5; # How many times master is queried in case of failure
discard_on_reject = false; # Discard message instead of rejection
quarantine_on_reject = false; # Tell MTA to quarantine rejected messages
spam_header = "X-Spam"; # Use the specific spam header
reject_message = "Spam message rejected"; # Use custom rejection message

I bind it to all network interface, rather than localhost, obviously because it’s containerized. milter is turned on so it can talk to Postfix. “self_scan” is the important bit that tells one of the 4 (see the count setting) worker-proxy process not to hand over mail to worker-normal but handle scanning itself.

Because it replicates every single setting from the default worker-proxy.inc file it is again not important if the file goes in local.d or override.d.

Tying it together

So now I would like to start seeing some results. A few bits are still missing. I need to tell Rspamd how to find my Redis database and I need to tell Postfix how to find Rspamd.

Headers

In order to debug or just get a glimpse of what is going on, it can be helpful to get Rspamd to write some of the results of the processing into email headers. In daily use this may expose some of the system but at least for a start, it’s quite helpful and educational.

use = ["authentication-results"];
extended_spam_headers = true;

The authentication results contain an SPF check much like the one I already did in the last post. Presumably, the email has already passed that check by now, so I should probably disable it here. The interesting stuff happens in the extended spam headers where Rspamd tells what individual modules think of the email. An example:

X-Spamd-Result: default: False [-3.00 / 15.00];
	 RCVD_VIA_SMTP_AUTH(0.00)[];
	 ARC_NA(0.00)[];
	 SUBJECT_ENDS_QUESTION(1.00)[];
	 TO_MATCH_ENVRCPT_ALL(0.00)[];
	 R_SPF_ALLOW(0.00)[+ip4:198.54.127.32/27];
	 MIME_GOOD(0.00)[multipart/alternative,text/plain];
	 TO_DN_NONE(0.00)[];
	 REPLY(-4.00)[];
	 RCPT_COUNT_ONE(0.00)[1];
	 RCVD_COUNT_THREE(0.00)[3];
	 PREVIOUSLY_DELIVERED(0.00)[mail@brokkr.net];
	 FROM_NO_DN(0.00)[];
	 DMARC_NA(0.00)[madsmi.de];
	 FROM_EQ_ENVFROM(0.00)[];
	 R_DKIM_NA(0.00)[];
	 MIME_TRACE(0.00)[0:+,1:+,2:~];
	 ASN(0.00)[asn:22612, ipnet:198.54.127.0/24, country:US];
	 RCVD_TLS_LAST(0.00)[];
	 MID_RHS_MATCH_FROM(0.00)[]

Since this email comes from a legitimate source, it’s not surprising that it’s not marked as spam. The fact that it’s part of an ongoing conversation clearly helps – “REPLY(-4.0)” – but it was interesting to see that apparently using a subject line with a question mark in it (literally: “Does this work?”) is a suspect trait.

Rspamd has a list of headers that can be added to the email. The milter-headers.conf file can be added to either local.d or override.d as the module sources both and has no default settings at all.

Skipping blacklists

The most demanding checks – and probably the best – that Rspamd can do require access to RBLs or realtime blacklists. This approach is used by two modules, rbl and surbl. For various reasons using the blacklists require you to set up a local DNS server to deal with the great amount of traffic that using an RBL apparently generates. I’m no stranger to adding extra services to my network but there is a limit and I don’t think the extra few hits that I would get from RBLs are worth it. Frankly with the records checks that I implemented in the last post, Rspamd is having very little material to work with already.

If left to their defaults, rbl and surbl will be used but wholly ineffective and complaining loudly of their lack of access to DNS and lists in the logs. There are apparently proper ways to turn them off but I found, that with Docker, the most efficient way is simply to remove their conf files from the modules.d directory. That way they don’t get caught up in the “load all modules in the modules.d directory” wildcard command. From the Dockerfile:

ARG MDIR="/etc/rspamd/modules.d/"
ARG SDIR="/etc/rspamd/scores.d/"
RUN rm "${MDIR}/rbl.conf" "${MDIR}/surbl.conf"
COPY rbl_group.conf surbl_group.conf "$SDIR"

the *_group.conf files are just empty files that overwrite the default definition files that assign scores to various RBL related symbols. If left in, Rspamd will complain that scores are assigned to symbols that have no use or definition in the setup. Whenever I tried to use override.d files to adjust settings, I always either changed too little (i.e. nothing) or too much (like resetting all group definitions). This way is crude but effective.

Postfix

I add the following to Postfix’s main.cf:

milter_protocol = 6
milter_mail_macros = i {mail_addr} {client_addr} {client_name} {auth_authen}
milter_default_action = accept
smtpd_milters = inet:container_ip:11332

I believe that macros here, really mean variables – it’s information about the email, not instructions, I’m passing along – but I haven’t delved much into it. Postfix operates with different milters for different stages of the SMTP conversation. “milter_mail” means that the information is fed to the milter at the “MAIL FROM” stage. In other words, Postfix does not yet have the contents of the email, so rspamd has to work with what Postfix has got at this stage, as those variables indicate. Milter protocol version 6 is the most recent and default for Postfix at time of writing, but I guess there is not harm in spelling it out.

The default action – to accept, i.e. not reject the connection – only applies in the case that the milter application is malfunctioning or unresponsive. Finally, smtpd_milters is a list, that at current only has one element in it, the rspamd milter. Which can be reached on port 11332 (by default) on either localhost, another machine on the network or a container ip as is the case there.

Results

I cheated a bit above and previewed some scan results before I had detailed setup of Postfix. Now that I have that it’s time to see what Rspamd makes of actual, real-life spam:

X-Spamd-Bar: +++++++++++++
X-Spam-Level: *************
X-Rspamd-Server: fe592012c780
Authentication-Results: brokkr.localdomain;
	dkim=none;
	dmarc=fail reason="No valid SPF, No valid DKIM" header.from=126.com (policy=none);
	spf=softfail (brokkr.localdomain: 49.68.145.31 is neither permitted nor denied by domain of xpwijl@nflpa.com) smtp.mailfrom=xpwijl@nflpa.com
X-Rspamd-Queue-Id: 87D741E29A4
X-Spamd-Result: default: False [13.80 / 15.00];
	 HAS_REPLYTO(0.00)[dengdao2762077750@126.com];
	 RDNS_NONE(1.00)[];
	 SUBJ_EXCESS_BASE64(1.50)[];
	 FREEMAIL_FROM(0.00)[126.com];
	 TO_DN_NONE(0.00)[];
	 MIME_BASE64_TEXT_BOGUS(1.00)[];
	 R_SPF_SOFTFAIL(0.00)[~all];
	 MIME_BASE64_TEXT(0.10)[];
	 FORGED_SENDER(0.30)[dengdao2762077750@126.com,xpwijl@nflpa.com];
	 RCVD_NO_TLS_LAST(0.10)[];
	 R_DKIM_NA(0.00)[];
	 MIME_TRACE(0.00)[0:~];
	 ASN(0.00)[asn:4134, ipnet:49.64.0.0/11, country:CN];
	 FROM_NEQ_ENVFROM(0.00)[dengdao2762077750@126.com,xpwijl@nflpa.com];
	 FAKE_REPLY(1.00)[];
	 ARC_NA(0.00)[];
	 REPLYTO_EQ_FROM(0.00)[];
	 FROM_HAS_DN(0.00)[];
	 TO_MATCH_ENVRCPT_ALL(0.00)[];
	 FREEMAIL_REPLYTO(0.00)[126.com];
	 RCPT_COUNT_ONE(0.00)[1];
	 MISSING_MID(2.50)[];
	 VIOLATED_DIRECT_SPF(3.50)[];
	 MIME_HTML_ONLY(0.20)[];
	 RCVD_COUNT_TWO(0.00)[2];
	 HFILTER_HOSTNAME_UNKNOWN(2.50)[];
	 GREYLIST(0.00)[pass,body];
	 DMARC_POLICY_SOFTFAIL(0.10)[126.com : No valid SPF, No valid DKIM,none]
X-Spam: Yes

Mmm mmm mmmhh now that’s good spam! Note that I had to relax a lot of Postfix restrictions in order to let this one through but it was definitely worth it. The score of 13.8 can be arrived at by simply totalling all the scores in paranthesis.

Obviously I could assign scores to some of the symbol that currently don’t come with any penalties to drive it up further. I will not go into detail about the various scores because there simply is not sufficient high quality documentation to make sure that I know what I’m talking about.

One might question the value of a spam check that skips RBL lookups but I think that this shows that Rspamd does a pretty good job even without external references. Even if it does not tip the scales sufficiently for Postfix to reject the connection there is certainly enough here to use in an email filter.

Final notes

Setting up Rspamd has probably been the most time-consuming single-purpose task in this entire series. It’s complex, it’s not particularly noob-friendly, it’s documentation is very complete but technical and knotty, and it’s configuration, while powerful, is a nightmare at the beginning. And unlike Dovecot/Postfix I have this bad feeling that things are going to break at some point, simply because I don’t really know what’s going on inside the black box.

Compared to doing SPF and PTR properly, the payoff per spent hour is miniscule and understanding those two help you with your outbound as well as your inbound mail issues. Basically, if you’ve found this post by internet searches or skipped the post on PTR/SPF, do yourself a favour and focus on that because it’s vastly more helpful.

Rspamd is currently set up to recommend various actions to Postfix, such as rejection or greylisting using the default cutoff points on Rspamd’s spamminess scale. I will leave it as an exercise to the reader to use those suggestions in combination with Sieve, that we set up in the last post, to actually filter and move spam away from the inbox.

And now…

In parts 6 and 7 of the series we applied basic access controls to Postfix before letting the internet access our MTA. We required authentication as a user in order to get authorized to relay email and we made sure that that authentication happened on an encrypted channel. It is however time to see if there is anything else that needs doing to make sure that Postfix isn’t put to nefarious use.

Let’s do Postfix slowly and properly – Part 10: Restricting access

SPAM logo (transformed by brokkr.net), Public Domain

1 Comment

To me, the REAL power of RSpamD appears to come into play with certain additional modules, all integrated in the whole:
1) It has Bayesian filtering (like Popfile) at the server level
2) It supports DKIM and DMARC signing, checking and reporting
3) It supports ARC signing and verification
That last one is incredibly rare, and of growing importance.

ARC is the relatively new protocol that empowers email auto-forwarding without the forwarder being considered a spammer. Consider the two cases:
* I want to forward various emails to my phone
* I want to run an email list
In both cases, I am not actually the source of those messages. It can be VERY hard to stop what the destination considers spam, at my end… and I’d much prefer the destination knows I am simply passing on certain emails. ARC is intended to solve that.