wiki: spam filtering — add Pigeonhole 2.4 syntax, REDIRECT-to-junk pattern, weekly timer

Three updates to the inbound spam filtering guide, all driven by the 2026-06-04
majormail-hetzner Phase 6 cutover and follow-up tuning:

1. Section 6 (Dovecot Sieve): warn explicitly that `plugin/sieve_before` was
   dropped in Pigeonhole 2.4 and silently does nothing — no startup warning,
   spam just keeps landing in INBOX. The 2.4 replacement is a top-level
   `sieve_script <name> { type = before; path = …; }` block. Also note the
   Fedora-flat-dovecot.conf pitfall (some packagings ship dovecot.conf
   without `!include conf.d/*.conf`, so the block has to live in the main
   file directly). Added a `sievec` compile step.

2. New §6b: route spam to a separate `junk@` mailbox via Postfix cleanup
   `header_checks` REDIRECT. This makes spam invisible to the user's
   mailbox entirely — Spark/IDLE-based clients don't push-notify because
   the message never reaches the subscribed mailbox at all. Includes the
   `regexp:` vs `pcre:` map-type tip (use regexp on stock Fedora to avoid
   the postfix-pcre package dependency).

3. New §7a: weekly systemd timer for sa-learn. The §7 warning about
   "don't run sa-learn from cron unless folders are clean" is correct as
   the safe default — but when you adopt the §6b REDIRECT-to-junk@
   pattern, the junk@ mailbox is pure spam by design and a weekly
   `--spam`/`--ham`/`--sync`/`--force-expire` chain becomes safe and
   useful. Full unit templates included.

Gotchas table gains four entries:
- Pigeonhole 2.4 silent breakage of plugin/sieve_before
- postfix-pcre vs regexp map type confusion
- Why sieve fileinto Junk still pushes a Spark notification
- Why local `sendmail` injection doesn't trigger the REDIRECT (smtpd
  milters skip sendmail-injected mail, so X-Spam-Flag isn't added)

All changes match what's now codified in the `majormail` Ansible role
(commit 7a8b9eb in MajorAnsible).
This commit is contained in:
Marcus Summers 2026-06-04 20:48:01 -04:00
parent 2e58c4625c
commit 5260548caa

View file

@ -5,7 +5,7 @@ category: services
tags: [postfix, dovecot, spamassassin, spamass-milter, bayes, spam, sieve, fedora, email, selinux]
status: published
created: 2026-06-04
updated: 2026-06-04
updated: 2026-06-05
---
# Inbound Spam Filtering: spamass-milter + SpamAssassin Bayes on Postfix/Dovecot
@ -116,7 +116,59 @@ if anyof (header :contains "X-Spam-Flag" "YES", header :contains "X-Spam-Status"
}
```
Register it as a global script in `dovecot.conf` (e.g. `sieve_before = /etc/dovecot/sieve/global/spam-to-junk.sieve`) and restart Dovecot.
Register it as a global before-script in `dovecot.conf` (NOT under `plugin {}` on Pigeonhole 2.4+ — see warning below), then compile and restart Dovecot:
```bash
sievec /etc/dovecot/sieve/global/spam-to-junk.sieve # produces .svbin
systemctl restart dovecot
```
> [!warning] Pigeonhole 2.4 dropped `plugin/sieve_before` — it silently does nothing
> Before Dovecot/Pigeonhole 2.4, the canonical way to register a global before-script was:
>
> ```
> plugin {
> sieve_before = /etc/dovecot/sieve/global/spam-to-junk.sieve
> }
> ```
>
> On **Dovecot 2.4+**, that setting is gone and **silently ignored** — no warning at start-up, the script never runs, and your X-Spam-Flag mail just lands in INBOX wondering why nothing files it. The 2.4 replacement is a top-level `sieve_script` block (not inside `plugin {}`):
>
> ```
> sieve_script spam_before {
> type = before
> path = /etc/dovecot/sieve/global/spam-to-junk.sieve
> }
> ```
>
> Verify with `doveconf -n | grep -A2 spam_before`. If it doesn't appear, dovecot.conf isn't reading your file — check that `!include conf.d/*.conf` exists in dovecot.conf (some Fedora rebuilds ship a flat dovecot.conf without it; the block has to live in dovecot.conf directly).
## 6b. (Optional) Route spam to a separate mailbox — silence iOS push notifications
`fileinto :create "Junk"` moves spam to the user's `.Junk` folder, but the user's IMAP session still sees a new-message event in INBOX (briefly, before sieve moves it) or in Junk (depending on client subscriptions). For clients with IMAP IDLE + push, that's a notification you don't want — e.g. Spark on iPhone/iPad fires APNS on any new message touching a subscribed folder.
To make spam **invisible to the user's mailbox entirely**, REDIRECT the envelope at Postfix `cleanup` (after the milter adds `X-Spam-Flag`, before LMTP delivery) so spam lands in a separate `junk@` mailbox the user doesn't subscribe to:
```bash
# /etc/postfix/cleanup_header_checks
/^X-Spam-Flag:[[:space:]]+YES/ REDIRECT junk@example.com
```
```bash
postconf -e 'header_checks = regexp:/etc/postfix/cleanup_header_checks'
systemctl reload postfix
```
> [!tip] Use `regexp:`, not `pcre:`, on stock Fedora
> `pcre:` requires the `postfix-pcre` package. `regexp:` is built into postfix and supports POSIX extended regex — use `[[:space:]]+` for whitespace and `\\\\` for backslash. The patterns in cleanup_header_checks are simple enough that regexp is plenty.
The Sieve from §6 still runs as a safety net for any tagged message that escapes the cleanup REDIRECT (e.g. a message addressed to the junk@ mailbox itself, or aliases not covered by the REDIRECT rule). Defense in depth.
Train Bayes from the `junk@` Maildir instead of (or in addition to) per-user Junk folders:
```bash
sa-learn --spam /var/vmail/example.com/junk/{cur,new}
```
## 7. Training the Bayes filter
@ -142,6 +194,55 @@ sa-learn --dump magic | grep -E 'nspam|nham'
> [!warning] Train manually, not from cron — unless your folders are always clean
> `sa-learn` learns whatever is *in* the folder. If a spam slips into the Inbox, or you haven't yet rescued a false-positive out of Junk, an unattended cron run will mislearn it. Prefer a manual script you run **after** triaging Junk/Inbox. (`sa-learn` is idempotent and re-classifies on re-run, so a mistake is fixable: move the message to the right folder and run again.)
### 7a. Weekly systemd timer (safe when junk@ is dedicated and INBOX is curated)
The warning above is the safe default. If you use the §6b REDIRECT-to-junk@ pattern, **the junk mailbox is pure spam by design** (only `X-Spam-Flag:YES` envelopes reach it), and your INBOX is curated by hand — the misclassification risk drops to near zero, and a weekly timer becomes both safe and useful. Add `--force-expire` to age out stale tokens so the Bayes corpus doesn't drift.
```ini
# /etc/systemd/system/sa-learn-majormail.service
[Unit]
Description=SpamAssassin Bayes training from majorshouse.com Maildir
After=spamassassin.service
Wants=spamassassin.service
[Service]
Type=oneshot
Nice=10
IOSchedulingClass=idle
ExecStart=/usr/bin/sa-learn --spam --no-sync \
/var/vmail/example.com/junk/cur \
/var/vmail/example.com/junk/new
ExecStart=/usr/bin/sa-learn --ham --no-sync \
/var/vmail/example.com/user/cur \
/var/vmail/example.com/user/new \
/var/vmail/example.com/user/.Sent/cur \
/var/vmail/example.com/user/.Sent/new
ExecStart=/usr/bin/sa-learn --sync
ExecStart=/usr/bin/sa-learn --force-expire
```
```ini
# /etc/systemd/system/sa-learn-majormail.timer
[Unit]
Description=Weekly SpamAssassin Bayes training + expiry
[Timer]
OnCalendar=Sun 04:15
Persistent=true
RandomizedDelaySec=20min
[Install]
WantedBy=timers.target
```
```bash
systemctl daemon-reload
systemctl enable --now sa-learn-majormail.timer
systemctl list-timers sa-learn-majormail.timer
```
`Persistent=true` runs the missed job on next boot if the host was off at 04:15. `--force-expire` is a no-op until SA's expiry heuristic decides tokens are due (typically every few weeks for the default `bayes_expiry_max_db_size`).
## 8. Test
Send a [GTUBE](https://spamassassin.apache.org/gtube/) probe through port 25 (unauthenticated) and a normal message:
@ -164,6 +265,10 @@ Confirm in `/var/log/maillog` that `spamd` scanned it (`result: Y …`), the mes
| Bayes never scores (`BAYES_*` absent) | below the 200/200 learn floor | train more, keep spam/ham balanced |
| Your own outbound mail gets tagged | scanning authenticated mail | `-a` flag |
| AVC denials on the Bayes DB (SELinux) | DB outside `/var/lib/spamassassin` | keep it under that path (`spamd_var_lib_t`) + `restorecon` |
| `plugin/sieve_before` does nothing — spam keeps reaching INBOX | Pigeonhole 2.4 silently dropped that setting | use the top-level `sieve_script <name> { type = before; path = ...; }` block instead |
| `postfix reload` fails: `unsupported dictionary type: pcre` | `pcre:` map requires `postfix-pcre` package | install it, OR use `regexp:` (built-in POSIX) |
| Sieve `fileinto Junk` still notifies Spark/iOS | client subscribes to Junk; LMTP delivery briefly hits INBOX | REDIRECT envelope at Postfix cleanup (§6b) so the message never reaches the user's mailbox at all |
| Local `sendmail` test doesn't trigger REDIRECT | `sendmail` bypasses smtpd milters → no `X-Spam-Flag` added | inject through SMTP :25 (e.g. swaks) OR pre-set the header in the test message |
## See also