From 5260548caa60ae535b58bed1aa213702539330fb Mon Sep 17 00:00:00 2001
From: MajorLinux <marcus@majorshouse.com>
Date: Thu, 4 Jun 2026 20:48:01 -0400
Subject: [PATCH] =?UTF-8?q?wiki:=20spam=20filtering=20=E2=80=94=20add=20Pi?=
 =?UTF-8?q?geonhole=202.4=20syntax,=20REDIRECT-to-junk=20pattern,=20weekly?=
 =?UTF-8?q?=20timer?=
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Three updates to the inbound spam filtering guide, all driven by the 2026-06-04
majormail-hetzner Phase 6 cutover and follow-up tuning:

1. Section 6 (Dovecot Sieve): warn explicitly that `plugin/sieve_before` was
   dropped in Pigeonhole 2.4 and silently does nothing — no startup warning,
   spam just keeps landing in INBOX. The 2.4 replacement is a top-level
   `sieve_script <name> { type = before; path = …; }` block. Also note the
   Fedora-flat-dovecot.conf pitfall (some packagings ship dovecot.conf
   without `!include conf.d/*.conf`, so the block has to live in the main
   file directly). Added a `sievec` compile step.

2. New §6b: route spam to a separate `junk@` mailbox via Postfix cleanup
   `header_checks` REDIRECT. This makes spam invisible to the user's
   mailbox entirely — Spark/IDLE-based clients don't push-notify because
   the message never reaches the subscribed mailbox at all. Includes the
   `regexp:` vs `pcre:` map-type tip (use regexp on stock Fedora to avoid
   the postfix-pcre package dependency).

3. New §7a: weekly systemd timer for sa-learn. The §7 warning about
   "don't run sa-learn from cron unless folders are clean" is correct as
   the safe default — but when you adopt the §6b REDIRECT-to-junk@
   pattern, the junk@ mailbox is pure spam by design and a weekly
   `--spam`/`--ham`/`--sync`/`--force-expire` chain becomes safe and
   useful. Full unit templates included.

Gotchas table gains four entries:
- Pigeonhole 2.4 silent breakage of plugin/sieve_before
- postfix-pcre vs regexp map type confusion
- Why sieve fileinto Junk still pushes a Spark notification
- Why local `sendmail` injection doesn't trigger the REDIRECT (smtpd
  milters skip sendmail-injected mail, so X-Spam-Flag isn't added)

All changes match what's now codified in the `majormail` Ansible role
(commit 7a8b9eb in MajorAnsible).
---
 ...stfix-spamassassin-bayes-spam-filtering.md | 109 +++++++++++++++++-
 1 file changed, 107 insertions(+), 2 deletions(-)
diff --git a/02-selfhosting/services/postfix-spamassassin-bayes-spam-filtering.md b/02-selfhosting/services/postfix-spamassassin-bayes-spam-filtering.md
index 81ecb79..bd72d8e 100644
--- a/02-selfhosting/services/postfix-spamassassin-bayes-spam-filtering.md
+++ b/02-selfhosting/services/postfix-spamassassin-bayes-spam-filtering.md
@@ -5,7 +5,7 @@ category: services
 tags: [postfix, dovecot, spamassassin, spamass-milter, bayes, spam, sieve, fedora, email, selinux]
 status: published
 created: 2026-06-04
-updated: 2026-06-04
+updated: 2026-06-05
 ---
 # Inbound Spam Filtering: spamass-milter + SpamAssassin Bayes on Postfix/Dovecot
 
@@ -116,7 +116,59 @@ if anyof (header :contains "X-Spam-Flag" "YES", header :contains "X-Spam-Status"
 }
 ```
 
-Register it as a global script in `dovecot.conf` (e.g. `sieve_before = /etc/dovecot/sieve/global/spam-to-junk.sieve`) and restart Dovecot.
+Register it as a global before-script in `dovecot.conf` (NOT under `plugin {}` on Pigeonhole 2.4+ — see warning below), then compile and restart Dovecot:
+
+```bash
+sievec /etc/dovecot/sieve/global/spam-to-junk.sieve   # produces .svbin
+systemctl restart dovecot
+```
+
+> [!warning] Pigeonhole 2.4 dropped `plugin/sieve_before` — it silently does nothing
+> Before Dovecot/Pigeonhole 2.4, the canonical way to register a global before-script was:
+>
+> ```
+> plugin {
+>   sieve_before = /etc/dovecot/sieve/global/spam-to-junk.sieve
+> }
+> ```
+>
+> On **Dovecot 2.4+**, that setting is gone and **silently ignored** — no warning at start-up, the script never runs, and your X-Spam-Flag mail just lands in INBOX wondering why nothing files it. The 2.4 replacement is a top-level `sieve_script` block (not inside `plugin {}`):
+>
+> ```
+> sieve_script spam_before {
+>   type = before
+>   path = /etc/dovecot/sieve/global/spam-to-junk.sieve
+> }
+> ```
+>
+> Verify with `doveconf -n | grep -A2 spam_before`. If it doesn't appear, dovecot.conf isn't reading your file — check that `!include conf.d/*.conf` exists in dovecot.conf (some Fedora rebuilds ship a flat dovecot.conf without it; the block has to live in dovecot.conf directly).
+
+## 6b. (Optional) Route spam to a separate mailbox — silence iOS push notifications
+
+`fileinto :create "Junk"` moves spam to the user's `.Junk` folder, but the user's IMAP session still sees a new-message event in INBOX (briefly, before sieve moves it) or in Junk (depending on client subscriptions). For clients with IMAP IDLE + push, that's a notification you don't want — e.g. Spark on iPhone/iPad fires APNS on any new message touching a subscribed folder.
+
+To make spam **invisible to the user's mailbox entirely**, REDIRECT the envelope at Postfix `cleanup` (after the milter adds `X-Spam-Flag`, before LMTP delivery) so spam lands in a separate `junk@` mailbox the user doesn't subscribe to:
+
+```bash
+# /etc/postfix/cleanup_header_checks
+/^X-Spam-Flag:[[:space:]]+YES/  REDIRECT junk@example.com
+```
+
+```bash
+postconf -e 'header_checks = regexp:/etc/postfix/cleanup_header_checks'
+systemctl reload postfix
+```
+
+> [!tip] Use `regexp:`, not `pcre:`, on stock Fedora
+> `pcre:` requires the `postfix-pcre` package. `regexp:` is built into postfix and supports POSIX extended regex — use `[[:space:]]+` for whitespace and `\\\\` for backslash. The patterns in cleanup_header_checks are simple enough that regexp is plenty.
+
+The Sieve from §6 still runs as a safety net for any tagged message that escapes the cleanup REDIRECT (e.g. a message addressed to the junk@ mailbox itself, or aliases not covered by the REDIRECT rule). Defense in depth.
+
+Train Bayes from the `junk@` Maildir instead of (or in addition to) per-user Junk folders:
+
+```bash
+sa-learn --spam /var/vmail/example.com/junk/{cur,new}
+```
 
 ## 7. Training the Bayes filter
 
@@ -142,6 +194,55 @@ sa-learn --dump magic | grep -E 'nspam|nham'
 > [!warning] Train manually, not from cron — unless your folders are always clean
 > `sa-learn` learns whatever is *in* the folder. If a spam slips into the Inbox, or you haven't yet rescued a false-positive out of Junk, an unattended cron run will mislearn it. Prefer a manual script you run **after** triaging Junk/Inbox. (`sa-learn` is idempotent and re-classifies on re-run, so a mistake is fixable: move the message to the right folder and run again.)
 
+### 7a. Weekly systemd timer (safe when junk@ is dedicated and INBOX is curated)
+
+The warning above is the safe default. If you use the §6b REDIRECT-to-junk@ pattern, **the junk mailbox is pure spam by design** (only `X-Spam-Flag:YES` envelopes reach it), and your INBOX is curated by hand — the misclassification risk drops to near zero, and a weekly timer becomes both safe and useful. Add `--force-expire` to age out stale tokens so the Bayes corpus doesn't drift.
+
+```ini
+# /etc/systemd/system/sa-learn-majormail.service
+[Unit]
+Description=SpamAssassin Bayes training from majorshouse.com Maildir
+After=spamassassin.service
+Wants=spamassassin.service
+
+[Service]
+Type=oneshot
+Nice=10
+IOSchedulingClass=idle
+ExecStart=/usr/bin/sa-learn --spam --no-sync \
+    /var/vmail/example.com/junk/cur \
+    /var/vmail/example.com/junk/new
+ExecStart=/usr/bin/sa-learn --ham --no-sync \
+    /var/vmail/example.com/user/cur \
+    /var/vmail/example.com/user/new \
+    /var/vmail/example.com/user/.Sent/cur \
+    /var/vmail/example.com/user/.Sent/new
+ExecStart=/usr/bin/sa-learn --sync
+ExecStart=/usr/bin/sa-learn --force-expire
+```
+
+```ini
+# /etc/systemd/system/sa-learn-majormail.timer
+[Unit]
+Description=Weekly SpamAssassin Bayes training + expiry
+
+[Timer]
+OnCalendar=Sun 04:15
+Persistent=true
+RandomizedDelaySec=20min
+
+[Install]
+WantedBy=timers.target
+```
+
+```bash
+systemctl daemon-reload
+systemctl enable --now sa-learn-majormail.timer
+systemctl list-timers sa-learn-majormail.timer
+```
+
+`Persistent=true` runs the missed job on next boot if the host was off at 04:15. `--force-expire` is a no-op until SA's expiry heuristic decides tokens are due (typically every few weeks for the default `bayes_expiry_max_db_size`).
+
 ## 8. Test
 
 Send a [GTUBE](https://spamassassin.apache.org/gtube/) probe through port 25 (unauthenticated) and a normal message:
@@ -164,6 +265,10 @@ Confirm in `/var/log/maillog` that `spamd` scanned it (`result: Y …`), the mes
 | Bayes never scores (`BAYES_*` absent) | below the 200/200 learn floor | train more, keep spam/ham balanced |
 | Your own outbound mail gets tagged | scanning authenticated mail | `-a` flag |
 | AVC denials on the Bayes DB (SELinux) | DB outside `/var/lib/spamassassin` | keep it under that path (`spamd_var_lib_t`) + `restorecon` |
+| `plugin/sieve_before` does nothing — spam keeps reaching INBOX | Pigeonhole 2.4 silently dropped that setting | use the top-level `sieve_script <name> { type = before; path = ...; }` block instead |
+| `postfix reload` fails: `unsupported dictionary type: pcre` | `pcre:` map requires `postfix-pcre` package | install it, OR use `regexp:` (built-in POSIX) |
+| Sieve `fileinto Junk` still notifies Spark/iOS | client subscribes to Junk; LMTP delivery briefly hits INBOX | REDIRECT envelope at Postfix cleanup (§6b) so the message never reaches the user's mailbox at all |
+| Local `sendmail` test doesn't trigger REDIRECT | `sendmail` bypasses smtpd milters → no `X-Spam-Flag` added | inject through SMTP :25 (e.g. swaks) OR pre-set the header in the test message |
 
 ## See also