From 110a6d49e5ffceb8e0d9c085a4a0405a278cdbcf Mon Sep 17 00:00:00 2001
From: Marcus Summers <marcus@majorshouse.com>
Date: Thu, 4 Jun 2026 16:16:29 -0400
Subject: [PATCH] wiki: add inbound spam filtering guide (spamass-milter +
 SpamAssassin Bayes)
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

New 02-selfhosting/services article: the full Postfix/Dovecot inbound spam stack
on Fedora — spamass-milter tag-only wiring (the -r footgun), socket permissions
(sa-milt group + UMask), site-wide Bayes DB, Sieve-to-Junk, and sa-learn training
(folders, spam/ham balance, manual-not-cron). From the majormail setup.

Also extends selinux-dovecot-vmail-context with a Permissive-mode variant + a
postfix_cleanup->mysqld_etc companion-denial note. SUMMARY.md nav updated.
---
 ...stfix-spamassassin-bayes-spam-filtering.md | 171 ++++++++++++++++++
 .../selinux-dovecot-vmail-context.md          |  23 +++
 SUMMARY.md                                    |   1 +
 3 files changed, 195 insertions(+)
 create mode 100644 02-selfhosting/services/postfix-spamassassin-bayes-spam-filtering.md

diff --git a/02-selfhosting/services/postfix-spamassassin-bayes-spam-filtering.md b/02-selfhosting/services/postfix-spamassassin-bayes-spam-filtering.md
new file mode 100644
index 0000000..81ecb79
--- /dev/null
+++ b/02-selfhosting/services/postfix-spamassassin-bayes-spam-filtering.md
@@ -0,0 +1,171 @@
+---
+title: "Inbound Spam Filtering: spamass-milter + SpamAssassin Bayes on Postfix/Dovecot (Fedora)"
+domain: selfhosting
+category: services
+tags: [postfix, dovecot, spamassassin, spamass-milter, bayes, spam, sieve, fedora, email, selinux]
+status: published
+created: 2026-06-04
+updated: 2026-06-04
+---
+# Inbound Spam Filtering: spamass-milter + SpamAssassin Bayes on Postfix/Dovecot
+
+How to add inbound spam scanning to a Postfix/Dovecot virtual-mailbox server on Fedora: SpamAssassin scans every inbound message via `spamass-milter`, spam is **tagged (never rejected)**, Dovecot's Sieve files it into the user's `Junk` folder, and a **site-wide Bayes database** — shared between the scan path and manual `sa-learn` training — learns from your real mail.
+
+This is a "tag and quarantine" design (not "reject at SMTP"), which is the safe default: a misfire lands a message in Junk for review rather than bouncing legitimate mail.
+
+## Architecture
+
+```
+inbound SMTP (25) ─► Postfix smtpd
+                       │  smtpd_milters:
+                       │    1. OpenDKIM  (verify/sign)
+                       │    2. spamass-milter ─► spamc ─► spamd (SpamAssassin)
+                       │         adds X-Spam-Flag / X-Spam-Status headers
+                       ▼
+                     Dovecot LMTP delivery ─► global Sieve
+                       if X-Spam-Flag: YES ─► fileinto "Junk"
+                       else                ─► INBOX
+
+Bayes DB  /var/lib/spamassassin/bayes/   (site-wide, shared)
+   ├─ spamd      auto-learns at scan time
+   └─ sa-learn   manual/scripted training from Maildir folders
+```
+
+## 1. Install
+
+```bash
+sudo dnf install spamassassin spamass-milter
+sudo systemctl enable --now spamassassin   # spamd
+```
+
+On Fedora the `spamass-milter` unit runs as the unprivileged **`sa-milt`** user and creates its socket at `/run/spamass-milter/spamass-milter.sock`. Remember that user — the Bayes DB ownership and the socket permissions both hinge on it.
+
+## 2. Configure spamass-milter — tag-only
+
+Edit `/etc/sysconfig/spamass-milter`:
+
+```sh
+EXTRA_FLAGS="-a -r 999999"
+```
+
+> [!warning] The `-r` flag is a footgun
+> `-r nn` rejects mail scoring ≥ `nn` at SMTP time. **Omitting `-r` does NOT mean "never reject"** — this build still rejects flagged spam at a low default threshold (a GTUBE test will get `550 Blocked by SpamAssassin`). To get pure tag-only behaviour, set the threshold absurdly high (`-r 999999`) so nothing ever reaches it. Do **not** use `-r -1` — that means "reject anything tagged as spam."
+
+- `-a` — skip messages on **authenticated** connections, so your own outbound/submission mail isn't scanned or tagged.
+
+## 3. Socket permissions (so Postfix can connect)
+
+The socket is created `0770 sa-milt:sa-milt` only if you widen the unit's umask; by default it's `0755` and Postfix (running as `postfix`) can't write to it. Two steps:
+
+```bash
+# 1. Let the socket be group-accessible
+sudo install -d /etc/systemd/system/spamass-milter.service.d
+printf '[Service]\nUMask=0007\n' | sudo tee /etc/systemd/system/spamass-milter.service.d/socket-perms.conf
+
+# 2. Put postfix in the sa-milt group, then RESTART postfix (group is read at start)
+sudo usermod -aG sa-milt postfix
+
+sudo systemctl daemon-reload
+sudo systemctl enable --now spamass-milter
+```
+
+Verify: `sudo -u postfix test -w /run/spamass-milter/spamass-milter.sock && echo OK`.
+
+## 4. Wire into Postfix
+
+Append the milter **alongside** OpenDKIM — don't replace it. Inbound (`smtpd`) gets both; local-injected mail (`non_smtpd`) stays DKIM-only.
+
+```bash
+postconf -e 'smtpd_milters = local:/run/opendkim/opendkim.sock unix:/run/spamass-milter/spamass-milter.sock'
+postconf -e 'milter_default_action = accept'   # if SA is down, accept the mail — never defer/bounce
+sudo systemctl restart postfix                 # restart (not reload) to pick up the new group
+```
+
+`milter_default_action = accept` is important: if the milter ever hiccups, mail still flows.
+
+## 5. Site-wide Bayes DB
+
+Put the Bayes DB in one fixed location so the scan path and your training script share it. In `/etc/mail/spamassassin/local.cf`:
+
+```
+use_bayes 1
+bayes_auto_learn 1
+bayes_path /var/lib/spamassassin/bayes/bayes
+bayes_file_mode 0660
+```
+
+Create the directory owned by the **scanning user** (`sa-milt`), under `/var/lib/spamassassin` so it inherits the correct SELinux type (`spamd_var_lib_t`):
+
+```bash
+sudo install -d -m 2770 -o sa-milt -g sa-milt /var/lib/spamassassin/bayes
+sudo restorecon -Rv /var/lib/spamassassin/bayes
+sudo systemctl restart spamassassin
+```
+
+The `2770` setgid + `bayes_file_mode 0660` means whether the DB is written by `spamd` (as `sa-milt`) or by `sa-learn` (as `root`, from a training script), all parties can read and write it.
+
+## 6. File spam into Junk (Dovecot Sieve)
+
+A global Sieve before-script files anything SpamAssassin flagged. `/etc/dovecot/sieve/global/spam-to-junk.sieve`:
+
+```sieve
+require ["fileinto", "mailbox"];
+if anyof (header :contains "X-Spam-Flag" "YES", header :contains "X-Spam-Status" "Yes") {
+    fileinto :create "Junk";
+    stop;
+}
+```
+
+Register it as a global script in `dovecot.conf` (e.g. `sieve_before = /etc/dovecot/sieve/global/spam-to-junk.sieve`) and restart Dovecot.
+
+## 7. Training the Bayes filter
+
+SpamAssassin's Bayes only starts scoring once it has learned **≥ 200 spam AND ≥ 200 ham** (`bayes_min_spam_num` / `bayes_min_ham_num`). Train from your Maildir folders with `sa-learn`. **Run it as `root`** — root can read every user's Maildir *and* write the Bayes DB.
+
+```bash
+# Spam — your Junk folder(s) and any dedicated spam mailbox
+sa-learn --spam /var/vmail/example.com/user/.Junk/{cur,new}
+
+# Ham — Sent + Inbox (known-good)
+sa-learn --ham  /var/vmail/example.com/user/{cur,new}
+sa-learn --ham  /var/vmail/example.com/user/.Sent/{cur,new}
+
+sa-learn --sync
+sa-learn --dump magic | grep -E 'nspam|nham'
+```
+
+`bayes_path` is read from `local.cf`, so no `--dbpath` is needed.
+
+> [!tip] Keep spam and ham roughly balanced
+> Bayes accuracy drops when one corpus dwarfs the other (aim for within ~3:1). Don't dump a 90,000-message archive of ham against a few hundred spam — it biases everything toward "ham" and spam slips through. Use Sent + recent Inbox for ham, not your entire archive.
+
+> [!warning] Train manually, not from cron — unless your folders are always clean
+> `sa-learn` learns whatever is *in* the folder. If a spam slips into the Inbox, or you haven't yet rescued a false-positive out of Junk, an unattended cron run will mislearn it. Prefer a manual script you run **after** triaging Junk/Inbox. (`sa-learn` is idempotent and re-classifies on re-run, so a mistake is fixable: move the message to the right folder and run again.)
+
+## 8. Test
+
+Send a [GTUBE](https://spamassassin.apache.org/gtube/) probe through port 25 (unauthenticated) and a normal message:
+
+```bash
+# from a host that can reach :25 — GTUBE scores ~1000
+printf 'Subject: gtube\n\nXJS*C4JDBQADN1.NSBN3*2IDNEN*GTUBE-STANDARD-ANTI-UBE-TEST-EMAIL*C.34X\n' \
+  | sendmail -f test@example.org user@example.com
+```
+
+Confirm in `/var/log/maillog` that `spamd` scanned it (`result: Y …`), the message was **delivered** (no `milter-reject`), it landed in `.Junk`, and the stored message has `X-Spam-Flag: YES`.
+
+## Gotchas recap
+
+| Symptom | Cause | Fix |
+|---|---|---|
+| Spam gets `550 Blocked by SpamAssassin` (you wanted Junk) | spamass-milter rejects at a default threshold | `-r 999999` for tag-only |
+| Postfix can't reach the milter socket | socket `0755`, postfix not in `sa-milt` group | `UMask=0007` drop-in + `usermod -aG sa-milt postfix` + restart postfix |
+| `sa-learn` trains but `spamd` doesn't use it | per-user vs site Bayes mismatch | set `bayes_path` in `local.cf` (site-wide) |
+| Bayes never scores (`BAYES_*` absent) | below the 200/200 learn floor | train more, keep spam/ham balanced |
+| Your own outbound mail gets tagged | scanning authenticated mail | `-a` flag |
+| AVC denials on the Bayes DB (SELinux) | DB outside `/var/lib/spamassassin` | keep it under that path (`spamd_var_lib_t`) + `restorecon` |
+
+## See also
+
+- [[selinux-dovecot-vmail-context|SELinux: Fixing Dovecot Mail Spool Context (/var/vmail)]]
+- [[linux-server-hardening-checklist|Linux Server Hardening Checklist]] (basic `sa-learn` section)
diff --git a/05-troubleshooting/selinux-dovecot-vmail-context.md b/05-troubleshooting/selinux-dovecot-vmail-context.md
index fdfe379..86389c9 100644
--- a/05-troubleshooting/selinux-dovecot-vmail-context.md
+++ b/05-troubleshooting/selinux-dovecot-vmail-context.md
@@ -98,6 +98,29 @@ ausearch -m avc -ts recent | grep dovecot
 
 No output = no new denials.
 
+## Variant: a Freshly-Rebuilt Box Left in Permissive Mode
+
+If a server was rebuilt or migrated and came up **Permissive** (check `getenforce`), the symptom flips: mail works fine, but `/var/log/audit/audit.log` quietly fills with thousands of `dovecot_t → var_t` denials that *would* break IMAP/LMTP the instant you switch to Enforcing. The mailstore was created or `rsync`'d onto `/var/vmail` with no fcontext rule, so it defaulted to `var_t`.
+
+Apply the relabel above first, then flip to Enforcing **only after** verifying zero new denials:
+
+```bash
+MARK=$(date +%H:%M:%S)
+# ...deliver a test message + do an IMAP login...
+ausearch -m avc -ts "$MARK" | grep -c denied   # expect 0
+setenforce 1
+sed -i 's/^SELINUX=permissive/SELINUX=enforcing/' /etc/selinux/config
+```
+
+**Companion denial:** a Postfix virtual-mailbox server that looks up recipients in MySQL also trips `postfix_cleanup_t` reading `/etc/my.cnf*` (`mysqld_etc_t`). Allow it with a small local module:
+
+```bash
+ausearch -m avc -c cleanup | audit2allow -M local_postfix_mysql
+semodule -i local_postfix_mysql.pp
+```
+
+See also [[postfix-spamassassin-bayes-spam-filtering|Inbound Spam Filtering]] — the SpamAssassin Bayes DB belongs under `/var/lib/spamassassin` (`spamd_var_lib_t`) for the same labeling reason.
+
 ## Key Notes
 
 - **One rule is enough** — `"/var/vmail(/.*)?"` with `mail_spool_t` covers every file and directory under `/var/vmail`, including all `tmp/` subdirectories.
diff --git a/SUMMARY.md b/SUMMARY.md
index a762b89..8730d92 100644
--- a/SUMMARY.md
+++ b/SUMMARY.md
@@ -42,6 +42,7 @@ updated: 2026-05-15T09:00
     * [Mastodon — The `--prune-profiles` Trap and How to Recover](02-selfhosting/services/mastodon-prune-profiles-trap.md)
     * [Mastodon on S3 — Silent Upload Failures (BucketOwnerEnforced/ACLs)](02-selfhosting/services/mastodon-s3-acl-upload-failures.md)
     * [Ghost Email Configuration with Mailgun](02-selfhosting/services/ghost-smtp-mailgun-setup.md)
+    * [Inbound Spam Filtering: spamass-milter + SpamAssassin Bayes](02-selfhosting/services/postfix-spamassassin-bayes-spam-filtering.md)
     * [Claude Code Remote Control — Mobile Access to a Persistent Host Session](02-selfhosting/services/claude-code-remote-control.md)
     * [Linux Server Hardening Checklist](02-selfhosting/security/linux-server-hardening-checklist.md)
     * [Standardizing unattended-upgrades with Ansible](02-selfhosting/security/ansible-unattended-upgrades-fleet.md)