diff --git a/02-selfhosting/monitoring/logwatch-fleet-setup.md b/02-selfhosting/monitoring/logwatch-fleet-setup.md index 3e7659d..d563d3b 100644 --- a/02-selfhosting/monitoring/logwatch-fleet-setup.md +++ b/02-selfhosting/monitoring/logwatch-fleet-setup.md @@ -9,7 +9,7 @@ tags: - ubuntu status: published created: 2026-05-09 -updated: 2026-05-10 +updated: 2026-05-10T13:00 --- # Logwatch Fleet Setup — Surviving Package Upgrades @@ -173,6 +173,81 @@ A subtle related class of bug: services like Watchtower, fail2ban, cron, and Net Fix it once at the source: set `WATCHTOWER_NOTIFICATION_EMAIL_FROM`, fail2ban's `sender =`, and similar to a **real mailbox** on your mail server (e.g., `marcus@majorshouse.com`). Bounces then land somewhere a human can read them, and the noise disappears. +## Per-host config drift on cloud-image-derived servers + +When fleet hosts are spun up from images (DigitalOcean droplet snapshots, Packer artifacts, cloud-init templates), three specific config drift patterns silently break notification mail. Each one looks fine in isolation; the combination produces "mail leaves the host with `250 OK queued` and disappears." + +### 1. Packer/snapshot-leftover `myhostname` in postfix + +A host built from a Packer-baked image often has `postfix myhostname = packer-` baked into `main.cf` from the build process. The system hostname might have been correctly set by terraform/cloud-init at first boot, but postfix's `myhostname` was hardcoded during image build and was never overridden. Result: every outbound message-id and EHLO carries the Packer artifact name (e.g., `<20260509120011.7EB6ABD83C@packer-641079bc-bc17-b5e1-1425-be745d012d0b>`), no SPF/DKIM matches that name, and remote spam filters score it as suspicious. + +**Detect:** + +```bash +postconf myhostname | grep -E 'packer-|builder-|' +``` + +**Fix:** + +```bash +hostnamectl set-hostname +postconf -e 'myhostname = ' +sed -i '/^127\.0\.1\.1/d' /etc/hosts && \ + echo "127.0.1.1 " >> /etc/hosts +systemctl reload postfix +``` + +### 2. Empty `relayhost` quietly forces public-MX delivery + +If `postconf relayhost` returns an empty value, postfix doesn't fail — it just does an MX lookup for the destination domain and tries to deliver directly. For mail to your own mail server, that means going via the **public MX** (the domain's external MX record, e.g., `mail.majorshouse.com → 165.227.187.191:25`) instead of the **internal/Tailscale relay path** the rest of the fleet uses. + +The public-MX path is subject to whatever spam filtering, content checks, and trust rules the receiving MX has configured for external traffic. Internal Tailscale-IP traffic typically gets a faster trust shortcut (e.g., bypass spamchk pipe). So this single configuration drift causes one host's mail to land in a different code path than its siblings — and then silently get filtered. + +**Detect:** look for fleet hosts where `postconf relayhost` returns blank and compare to known-good siblings. + +**Fix:** set `relayhost = []:587` (or whatever port your fleet convention uses). + +### 3. Stale SASL passwd map referencing a missing file + +Postfix configurations migrated from a previous setup often retain `smtp_sasl_auth_enable = yes` and `smtp_sasl_password_maps = hash:/etc/postfix/sasl_passwd` even when no SASL is needed for the current relay path. If the actual `sasl_passwd` file isn't there (because the migration didn't carry it, or the new relay doesn't require auth), every send attempt produces: + +``` +error: open database /etc/postfix/sasl_passwd.db: No such file or directory +warning: smtp_sasl_password_maps lookup error +status=deferred (local data error while talking to ) +``` + +Especially common after migrating from external SMTP (SendGrid, Mailgun, etc., which use SASL) to an internal Tailscale relay (which doesn't). + +**Detect:** + +```bash +postconf -n | grep -E 'smtp_sasl_(auth_enable|password_maps)' +[ -f /etc/postfix/sasl_passwd ] || echo "sasl_passwd file missing" +``` + +**Fix — disable SASL if the new relay doesn't need it:** + +```bash +postconf -e 'smtp_sasl_auth_enable = no' +postconf -e 'smtp_tls_wrappermode = no' # if switching from port 465 to 587 +postconf -X 'smtp_sasl_password_maps' +systemctl reload postfix +``` + +### Audit shortcut + +For a quick per-host comparison across the fleet: + +```bash +for host in your fleet hosts; do + echo "=== $host ===" + ssh "$host" 'postconf myhostname relayhost smtp_sasl_auth_enable 2>&1' | head -3 +done +``` + +Anomalies (Packer hostnames, blank relayhost, SASL enabled where siblings have it disabled) jump out immediately. + ## Lesson Learned Never customize `/usr/share/logwatch/default.conf/logwatch.conf`. Always use `/etc/logwatch/conf/logwatch.conf`. This applies to any software that has a "defaults" file and an "override" file — the override survives upgrades, the defaults file does not. diff --git a/index.md b/index.md index 52ecba4..d3306ed 100644 --- a/index.md +++ b/index.md @@ -217,6 +217,7 @@ updated: 2026-05-10T01:30 | Date | Article | Domain | |---|---|---| +| 2026-05-10 | [Logwatch Fleet Setup — Surviving Package Upgrades](02-selfhosting/monitoring/logwatch-fleet-setup.md) — added "Per-host config drift on cloud-image-derived servers" section: Packer-leftover myhostname, empty relayhost forcing public-MX path, stale SASL passwd maps from prior relays | Self-Hosting | | 2026-05-10 | [Patching PHP 8.4 Implicit-Nullable Deprecations in Vendor Packages](05-troubleshooting/php-84-vendor-implicit-nullable-patch.md) — generalized from a Castopod/UuidModel incident; covers the substring-match gotcha that turns a 30-second fix into a 30-minute one | Troubleshooting | | 2026-05-10 | [Logwatch Fleet Setup — Surviving Package Upgrades](02-selfhosting/monitoring/logwatch-fleet-setup.md) — added Fedora CA bundle missing diagnosis, journald-vs-mail.log methodology note, and bounce-source-must-be-real-mailbox section | Self-Hosting | | 2026-05-10 | [ClamAV Fleet Deployment with Ansible](02-selfhosting/security/clamav-fleet-deployment.md) — added DigitalOcean monitoring caveat for 1vCPU droplets (with follow-up note: per-droplet relaxed alert can still trip; accept-the-page decision) | Self-Hosting |