From 5260548caa60ae535b58bed1aa213702539330fb Mon Sep 17 00:00:00 2001 From: MajorLinux Date: Thu, 4 Jun 2026 20:48:01 -0400 Subject: [PATCH] =?UTF-8?q?wiki:=20spam=20filtering=20=E2=80=94=20add=20Pi?= =?UTF-8?q?geonhole=202.4=20syntax,=20REDIRECT-to-junk=20pattern,=20weekly?= =?UTF-8?q?=20timer?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Three updates to the inbound spam filtering guide, all driven by the 2026-06-04 majormail-hetzner Phase 6 cutover and follow-up tuning: 1. Section 6 (Dovecot Sieve): warn explicitly that `plugin/sieve_before` was dropped in Pigeonhole 2.4 and silently does nothing — no startup warning, spam just keeps landing in INBOX. The 2.4 replacement is a top-level `sieve_script { type = before; path = …; }` block. Also note the Fedora-flat-dovecot.conf pitfall (some packagings ship dovecot.conf without `!include conf.d/*.conf`, so the block has to live in the main file directly). Added a `sievec` compile step. 2. New §6b: route spam to a separate `junk@` mailbox via Postfix cleanup `header_checks` REDIRECT. This makes spam invisible to the user's mailbox entirely — Spark/IDLE-based clients don't push-notify because the message never reaches the subscribed mailbox at all. Includes the `regexp:` vs `pcre:` map-type tip (use regexp on stock Fedora to avoid the postfix-pcre package dependency). 3. New §7a: weekly systemd timer for sa-learn. The §7 warning about "don't run sa-learn from cron unless folders are clean" is correct as the safe default — but when you adopt the §6b REDIRECT-to-junk@ pattern, the junk@ mailbox is pure spam by design and a weekly `--spam`/`--ham`/`--sync`/`--force-expire` chain becomes safe and useful. Full unit templates included. Gotchas table gains four entries: - Pigeonhole 2.4 silent breakage of plugin/sieve_before - postfix-pcre vs regexp map type confusion - Why sieve fileinto Junk still pushes a Spark notification - Why local `sendmail` injection doesn't trigger the REDIRECT (smtpd milters skip sendmail-injected mail, so X-Spam-Flag isn't added) All changes match what's now codified in the `majormail` Ansible role (commit 7a8b9eb in MajorAnsible). --- ...stfix-spamassassin-bayes-spam-filtering.md | 109 +++++++++++++++++- 1 file changed, 107 insertions(+), 2 deletions(-) diff --git a/02-selfhosting/services/postfix-spamassassin-bayes-spam-filtering.md b/02-selfhosting/services/postfix-spamassassin-bayes-spam-filtering.md index 81ecb79..bd72d8e 100644 --- a/02-selfhosting/services/postfix-spamassassin-bayes-spam-filtering.md +++ b/02-selfhosting/services/postfix-spamassassin-bayes-spam-filtering.md @@ -5,7 +5,7 @@ category: services tags: [postfix, dovecot, spamassassin, spamass-milter, bayes, spam, sieve, fedora, email, selinux] status: published created: 2026-06-04 -updated: 2026-06-04 +updated: 2026-06-05 --- # Inbound Spam Filtering: spamass-milter + SpamAssassin Bayes on Postfix/Dovecot @@ -116,7 +116,59 @@ if anyof (header :contains "X-Spam-Flag" "YES", header :contains "X-Spam-Status" } ``` -Register it as a global script in `dovecot.conf` (e.g. `sieve_before = /etc/dovecot/sieve/global/spam-to-junk.sieve`) and restart Dovecot. +Register it as a global before-script in `dovecot.conf` (NOT under `plugin {}` on Pigeonhole 2.4+ — see warning below), then compile and restart Dovecot: + +```bash +sievec /etc/dovecot/sieve/global/spam-to-junk.sieve # produces .svbin +systemctl restart dovecot +``` + +> [!warning] Pigeonhole 2.4 dropped `plugin/sieve_before` — it silently does nothing +> Before Dovecot/Pigeonhole 2.4, the canonical way to register a global before-script was: +> +> ``` +> plugin { +> sieve_before = /etc/dovecot/sieve/global/spam-to-junk.sieve +> } +> ``` +> +> On **Dovecot 2.4+**, that setting is gone and **silently ignored** — no warning at start-up, the script never runs, and your X-Spam-Flag mail just lands in INBOX wondering why nothing files it. The 2.4 replacement is a top-level `sieve_script` block (not inside `plugin {}`): +> +> ``` +> sieve_script spam_before { +> type = before +> path = /etc/dovecot/sieve/global/spam-to-junk.sieve +> } +> ``` +> +> Verify with `doveconf -n | grep -A2 spam_before`. If it doesn't appear, dovecot.conf isn't reading your file — check that `!include conf.d/*.conf` exists in dovecot.conf (some Fedora rebuilds ship a flat dovecot.conf without it; the block has to live in dovecot.conf directly). + +## 6b. (Optional) Route spam to a separate mailbox — silence iOS push notifications + +`fileinto :create "Junk"` moves spam to the user's `.Junk` folder, but the user's IMAP session still sees a new-message event in INBOX (briefly, before sieve moves it) or in Junk (depending on client subscriptions). For clients with IMAP IDLE + push, that's a notification you don't want — e.g. Spark on iPhone/iPad fires APNS on any new message touching a subscribed folder. + +To make spam **invisible to the user's mailbox entirely**, REDIRECT the envelope at Postfix `cleanup` (after the milter adds `X-Spam-Flag`, before LMTP delivery) so spam lands in a separate `junk@` mailbox the user doesn't subscribe to: + +```bash +# /etc/postfix/cleanup_header_checks +/^X-Spam-Flag:[[:space:]]+YES/ REDIRECT junk@example.com +``` + +```bash +postconf -e 'header_checks = regexp:/etc/postfix/cleanup_header_checks' +systemctl reload postfix +``` + +> [!tip] Use `regexp:`, not `pcre:`, on stock Fedora +> `pcre:` requires the `postfix-pcre` package. `regexp:` is built into postfix and supports POSIX extended regex — use `[[:space:]]+` for whitespace and `\\\\` for backslash. The patterns in cleanup_header_checks are simple enough that regexp is plenty. + +The Sieve from §6 still runs as a safety net for any tagged message that escapes the cleanup REDIRECT (e.g. a message addressed to the junk@ mailbox itself, or aliases not covered by the REDIRECT rule). Defense in depth. + +Train Bayes from the `junk@` Maildir instead of (or in addition to) per-user Junk folders: + +```bash +sa-learn --spam /var/vmail/example.com/junk/{cur,new} +``` ## 7. Training the Bayes filter @@ -142,6 +194,55 @@ sa-learn --dump magic | grep -E 'nspam|nham' > [!warning] Train manually, not from cron — unless your folders are always clean > `sa-learn` learns whatever is *in* the folder. If a spam slips into the Inbox, or you haven't yet rescued a false-positive out of Junk, an unattended cron run will mislearn it. Prefer a manual script you run **after** triaging Junk/Inbox. (`sa-learn` is idempotent and re-classifies on re-run, so a mistake is fixable: move the message to the right folder and run again.) +### 7a. Weekly systemd timer (safe when junk@ is dedicated and INBOX is curated) + +The warning above is the safe default. If you use the §6b REDIRECT-to-junk@ pattern, **the junk mailbox is pure spam by design** (only `X-Spam-Flag:YES` envelopes reach it), and your INBOX is curated by hand — the misclassification risk drops to near zero, and a weekly timer becomes both safe and useful. Add `--force-expire` to age out stale tokens so the Bayes corpus doesn't drift. + +```ini +# /etc/systemd/system/sa-learn-majormail.service +[Unit] +Description=SpamAssassin Bayes training from majorshouse.com Maildir +After=spamassassin.service +Wants=spamassassin.service + +[Service] +Type=oneshot +Nice=10 +IOSchedulingClass=idle +ExecStart=/usr/bin/sa-learn --spam --no-sync \ + /var/vmail/example.com/junk/cur \ + /var/vmail/example.com/junk/new +ExecStart=/usr/bin/sa-learn --ham --no-sync \ + /var/vmail/example.com/user/cur \ + /var/vmail/example.com/user/new \ + /var/vmail/example.com/user/.Sent/cur \ + /var/vmail/example.com/user/.Sent/new +ExecStart=/usr/bin/sa-learn --sync +ExecStart=/usr/bin/sa-learn --force-expire +``` + +```ini +# /etc/systemd/system/sa-learn-majormail.timer +[Unit] +Description=Weekly SpamAssassin Bayes training + expiry + +[Timer] +OnCalendar=Sun 04:15 +Persistent=true +RandomizedDelaySec=20min + +[Install] +WantedBy=timers.target +``` + +```bash +systemctl daemon-reload +systemctl enable --now sa-learn-majormail.timer +systemctl list-timers sa-learn-majormail.timer +``` + +`Persistent=true` runs the missed job on next boot if the host was off at 04:15. `--force-expire` is a no-op until SA's expiry heuristic decides tokens are due (typically every few weeks for the default `bayes_expiry_max_db_size`). + ## 8. Test Send a [GTUBE](https://spamassassin.apache.org/gtube/) probe through port 25 (unauthenticated) and a normal message: @@ -164,6 +265,10 @@ Confirm in `/var/log/maillog` that `spamd` scanned it (`result: Y …`), the mes | Bayes never scores (`BAYES_*` absent) | below the 200/200 learn floor | train more, keep spam/ham balanced | | Your own outbound mail gets tagged | scanning authenticated mail | `-a` flag | | AVC denials on the Bayes DB (SELinux) | DB outside `/var/lib/spamassassin` | keep it under that path (`spamd_var_lib_t`) + `restorecon` | +| `plugin/sieve_before` does nothing — spam keeps reaching INBOX | Pigeonhole 2.4 silently dropped that setting | use the top-level `sieve_script { type = before; path = ...; }` block instead | +| `postfix reload` fails: `unsupported dictionary type: pcre` | `pcre:` map requires `postfix-pcre` package | install it, OR use `regexp:` (built-in POSIX) | +| Sieve `fileinto Junk` still notifies Spark/iOS | client subscribes to Junk; LMTP delivery briefly hits INBOX | REDIRECT envelope at Postfix cleanup (§6b) so the message never reaches the user's mailbox at all | +| Local `sendmail` test doesn't trigger REDIRECT | `sendmail` bypasses smtpd milters → no `X-Spam-Flag` added | inject through SMTP :25 (e.g. swaks) OR pre-set the header in the test message | ## See also