wiki: add inbound spam filtering guide (spamass-milter + SpamAssassin Bayes)

New 02-selfhosting/services article: the full Postfix/Dovecot inbound spam stack
on Fedora — spamass-milter tag-only wiring (the -r footgun), socket permissions
(sa-milt group + UMask), site-wide Bayes DB, Sieve-to-Junk, and sa-learn training
(folders, spam/ham balance, manual-not-cron). From the majormail setup.

Also extends selinux-dovecot-vmail-context with a Permissive-mode variant + a
postfix_cleanup->mysqld_etc companion-denial note. SUMMARY.md nav updated.
This commit is contained in:
Marcus Summers 2026-06-04 16:16:29 -04:00
parent e6a249403c
commit 110a6d49e5
3 changed files with 195 additions and 0 deletions

View file

@ -0,0 +1,171 @@
---
title: "Inbound Spam Filtering: spamass-milter + SpamAssassin Bayes on Postfix/Dovecot (Fedora)"
domain: selfhosting
category: services
tags: [postfix, dovecot, spamassassin, spamass-milter, bayes, spam, sieve, fedora, email, selinux]
status: published
created: 2026-06-04
updated: 2026-06-04
---
# Inbound Spam Filtering: spamass-milter + SpamAssassin Bayes on Postfix/Dovecot
How to add inbound spam scanning to a Postfix/Dovecot virtual-mailbox server on Fedora: SpamAssassin scans every inbound message via `spamass-milter`, spam is **tagged (never rejected)**, Dovecot's Sieve files it into the user's `Junk` folder, and a **site-wide Bayes database** — shared between the scan path and manual `sa-learn` training — learns from your real mail.
This is a "tag and quarantine" design (not "reject at SMTP"), which is the safe default: a misfire lands a message in Junk for review rather than bouncing legitimate mail.
## Architecture
```
inbound SMTP (25) ─► Postfix smtpd
│ smtpd_milters:
│ 1. OpenDKIM (verify/sign)
│ 2. spamass-milter ─► spamc ─► spamd (SpamAssassin)
│ adds X-Spam-Flag / X-Spam-Status headers
Dovecot LMTP delivery ─► global Sieve
if X-Spam-Flag: YES ─► fileinto "Junk"
else ─► INBOX
Bayes DB /var/lib/spamassassin/bayes/ (site-wide, shared)
├─ spamd auto-learns at scan time
└─ sa-learn manual/scripted training from Maildir folders
```
## 1. Install
```bash
sudo dnf install spamassassin spamass-milter
sudo systemctl enable --now spamassassin # spamd
```
On Fedora the `spamass-milter` unit runs as the unprivileged **`sa-milt`** user and creates its socket at `/run/spamass-milter/spamass-milter.sock`. Remember that user — the Bayes DB ownership and the socket permissions both hinge on it.
## 2. Configure spamass-milter — tag-only
Edit `/etc/sysconfig/spamass-milter`:
```sh
EXTRA_FLAGS="-a -r 999999"
```
> [!warning] The `-r` flag is a footgun
> `-r nn` rejects mail scoring ≥ `nn` at SMTP time. **Omitting `-r` does NOT mean "never reject"** — this build still rejects flagged spam at a low default threshold (a GTUBE test will get `550 Blocked by SpamAssassin`). To get pure tag-only behaviour, set the threshold absurdly high (`-r 999999`) so nothing ever reaches it. Do **not** use `-r -1` — that means "reject anything tagged as spam."
- `-a` — skip messages on **authenticated** connections, so your own outbound/submission mail isn't scanned or tagged.
## 3. Socket permissions (so Postfix can connect)
The socket is created `0770 sa-milt:sa-milt` only if you widen the unit's umask; by default it's `0755` and Postfix (running as `postfix`) can't write to it. Two steps:
```bash
# 1. Let the socket be group-accessible
sudo install -d /etc/systemd/system/spamass-milter.service.d
printf '[Service]\nUMask=0007\n' | sudo tee /etc/systemd/system/spamass-milter.service.d/socket-perms.conf
# 2. Put postfix in the sa-milt group, then RESTART postfix (group is read at start)
sudo usermod -aG sa-milt postfix
sudo systemctl daemon-reload
sudo systemctl enable --now spamass-milter
```
Verify: `sudo -u postfix test -w /run/spamass-milter/spamass-milter.sock && echo OK`.
## 4. Wire into Postfix
Append the milter **alongside** OpenDKIM — don't replace it. Inbound (`smtpd`) gets both; local-injected mail (`non_smtpd`) stays DKIM-only.
```bash
postconf -e 'smtpd_milters = local:/run/opendkim/opendkim.sock unix:/run/spamass-milter/spamass-milter.sock'
postconf -e 'milter_default_action = accept' # if SA is down, accept the mail — never defer/bounce
sudo systemctl restart postfix # restart (not reload) to pick up the new group
```
`milter_default_action = accept` is important: if the milter ever hiccups, mail still flows.
## 5. Site-wide Bayes DB
Put the Bayes DB in one fixed location so the scan path and your training script share it. In `/etc/mail/spamassassin/local.cf`:
```
use_bayes 1
bayes_auto_learn 1
bayes_path /var/lib/spamassassin/bayes/bayes
bayes_file_mode 0660
```
Create the directory owned by the **scanning user** (`sa-milt`), under `/var/lib/spamassassin` so it inherits the correct SELinux type (`spamd_var_lib_t`):
```bash
sudo install -d -m 2770 -o sa-milt -g sa-milt /var/lib/spamassassin/bayes
sudo restorecon -Rv /var/lib/spamassassin/bayes
sudo systemctl restart spamassassin
```
The `2770` setgid + `bayes_file_mode 0660` means whether the DB is written by `spamd` (as `sa-milt`) or by `sa-learn` (as `root`, from a training script), all parties can read and write it.
## 6. File spam into Junk (Dovecot Sieve)
A global Sieve before-script files anything SpamAssassin flagged. `/etc/dovecot/sieve/global/spam-to-junk.sieve`:
```sieve
require ["fileinto", "mailbox"];
if anyof (header :contains "X-Spam-Flag" "YES", header :contains "X-Spam-Status" "Yes") {
fileinto :create "Junk";
stop;
}
```
Register it as a global script in `dovecot.conf` (e.g. `sieve_before = /etc/dovecot/sieve/global/spam-to-junk.sieve`) and restart Dovecot.
## 7. Training the Bayes filter
SpamAssassin's Bayes only starts scoring once it has learned **≥ 200 spam AND ≥ 200 ham** (`bayes_min_spam_num` / `bayes_min_ham_num`). Train from your Maildir folders with `sa-learn`. **Run it as `root`** — root can read every user's Maildir *and* write the Bayes DB.
```bash
# Spam — your Junk folder(s) and any dedicated spam mailbox
sa-learn --spam /var/vmail/example.com/user/.Junk/{cur,new}
# Ham — Sent + Inbox (known-good)
sa-learn --ham /var/vmail/example.com/user/{cur,new}
sa-learn --ham /var/vmail/example.com/user/.Sent/{cur,new}
sa-learn --sync
sa-learn --dump magic | grep -E 'nspam|nham'
```
`bayes_path` is read from `local.cf`, so no `--dbpath` is needed.
> [!tip] Keep spam and ham roughly balanced
> Bayes accuracy drops when one corpus dwarfs the other (aim for within ~3:1). Don't dump a 90,000-message archive of ham against a few hundred spam — it biases everything toward "ham" and spam slips through. Use Sent + recent Inbox for ham, not your entire archive.
> [!warning] Train manually, not from cron — unless your folders are always clean
> `sa-learn` learns whatever is *in* the folder. If a spam slips into the Inbox, or you haven't yet rescued a false-positive out of Junk, an unattended cron run will mislearn it. Prefer a manual script you run **after** triaging Junk/Inbox. (`sa-learn` is idempotent and re-classifies on re-run, so a mistake is fixable: move the message to the right folder and run again.)
## 8. Test
Send a [GTUBE](https://spamassassin.apache.org/gtube/) probe through port 25 (unauthenticated) and a normal message:
```bash
# from a host that can reach :25 — GTUBE scores ~1000
printf 'Subject: gtube\n\nXJS*C4JDBQADN1.NSBN3*2IDNEN*GTUBE-STANDARD-ANTI-UBE-TEST-EMAIL*C.34X\n' \
| sendmail -f test@example.org user@example.com
```
Confirm in `/var/log/maillog` that `spamd` scanned it (`result: Y …`), the message was **delivered** (no `milter-reject`), it landed in `.Junk`, and the stored message has `X-Spam-Flag: YES`.
## Gotchas recap
| Symptom | Cause | Fix |
|---|---|---|
| Spam gets `550 Blocked by SpamAssassin` (you wanted Junk) | spamass-milter rejects at a default threshold | `-r 999999` for tag-only |
| Postfix can't reach the milter socket | socket `0755`, postfix not in `sa-milt` group | `UMask=0007` drop-in + `usermod -aG sa-milt postfix` + restart postfix |
| `sa-learn` trains but `spamd` doesn't use it | per-user vs site Bayes mismatch | set `bayes_path` in `local.cf` (site-wide) |
| Bayes never scores (`BAYES_*` absent) | below the 200/200 learn floor | train more, keep spam/ham balanced |
| Your own outbound mail gets tagged | scanning authenticated mail | `-a` flag |
| AVC denials on the Bayes DB (SELinux) | DB outside `/var/lib/spamassassin` | keep it under that path (`spamd_var_lib_t`) + `restorecon` |
## See also
- [[selinux-dovecot-vmail-context|SELinux: Fixing Dovecot Mail Spool Context (/var/vmail)]]
- [[linux-server-hardening-checklist|Linux Server Hardening Checklist]] (basic `sa-learn` section)

View file

@ -98,6 +98,29 @@ ausearch -m avc -ts recent | grep dovecot
No output = no new denials.
## Variant: a Freshly-Rebuilt Box Left in Permissive Mode
If a server was rebuilt or migrated and came up **Permissive** (check `getenforce`), the symptom flips: mail works fine, but `/var/log/audit/audit.log` quietly fills with thousands of `dovecot_t → var_t` denials that *would* break IMAP/LMTP the instant you switch to Enforcing. The mailstore was created or `rsync`'d onto `/var/vmail` with no fcontext rule, so it defaulted to `var_t`.
Apply the relabel above first, then flip to Enforcing **only after** verifying zero new denials:
```bash
MARK=$(date +%H:%M:%S)
# ...deliver a test message + do an IMAP login...
ausearch -m avc -ts "$MARK" | grep -c denied # expect 0
setenforce 1
sed -i 's/^SELINUX=permissive/SELINUX=enforcing/' /etc/selinux/config
```
**Companion denial:** a Postfix virtual-mailbox server that looks up recipients in MySQL also trips `postfix_cleanup_t` reading `/etc/my.cnf*` (`mysqld_etc_t`). Allow it with a small local module:
```bash
ausearch -m avc -c cleanup | audit2allow -M local_postfix_mysql
semodule -i local_postfix_mysql.pp
```
See also [[postfix-spamassassin-bayes-spam-filtering|Inbound Spam Filtering]] — the SpamAssassin Bayes DB belongs under `/var/lib/spamassassin` (`spamd_var_lib_t`) for the same labeling reason.
## Key Notes
- **One rule is enough**`"/var/vmail(/.*)?"` with `mail_spool_t` covers every file and directory under `/var/vmail`, including all `tmp/` subdirectories.

View file

@ -42,6 +42,7 @@ updated: 2026-05-15T09:00
* [Mastodon — The `--prune-profiles` Trap and How to Recover](02-selfhosting/services/mastodon-prune-profiles-trap.md)
* [Mastodon on S3 — Silent Upload Failures (BucketOwnerEnforced/ACLs)](02-selfhosting/services/mastodon-s3-acl-upload-failures.md)
* [Ghost Email Configuration with Mailgun](02-selfhosting/services/ghost-smtp-mailgun-setup.md)
* [Inbound Spam Filtering: spamass-milter + SpamAssassin Bayes](02-selfhosting/services/postfix-spamassassin-bayes-spam-filtering.md)
* [Claude Code Remote Control — Mobile Access to a Persistent Host Session](02-selfhosting/services/claude-code-remote-control.md)
* [Linux Server Hardening Checklist](02-selfhosting/security/linux-server-hardening-checklist.md)
* [Standardizing unattended-upgrades with Ansible](02-selfhosting/security/ansible-unattended-upgrades-fleet.md)