Logwatch fleet article: add cloud-image config-drift section

Documents three more patterns surfaced in the 2026-05-10 fleet-mail
investigation, all hitting hosts derived from cloud images or
cross-provider migrations:

- Packer/snapshot-leftover myhostname (postfix EHLO + message-id
  identifies the build artifact, not the production hostname; remote
  spam scorers hate it)
- Empty relayhost silently routes mail via the public MX instead of
  the Tailscale-internal path, exposing it to spamchk that internal
  traffic bypasses
- Stale SASL passwd map referencing a missing file from a previous
  external-SMTP relay setup, deferring every send with "local data
  error"

Each looks benign in isolation. Together they made dcaprod's Logwatch
disappear into spamchk for weeks while showing 250 OK on the source.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
Marcus Summers 2026-05-10 12:58:00 -04:00
parent 724ae2a5e3
commit 9c62e7f804
2 changed files with 77 additions and 1 deletions

View file

@ -9,7 +9,7 @@ tags:
- ubuntu - ubuntu
status: published status: published
created: 2026-05-09 created: 2026-05-09
updated: 2026-05-10 updated: 2026-05-10T13:00
--- ---
# Logwatch Fleet Setup — Surviving Package Upgrades # Logwatch Fleet Setup — Surviving Package Upgrades
@ -173,6 +173,81 @@ A subtle related class of bug: services like Watchtower, fail2ban, cron, and Net
Fix it once at the source: set `WATCHTOWER_NOTIFICATION_EMAIL_FROM`, fail2ban's `sender =`, and similar to a **real mailbox** on your mail server (e.g., `marcus@majorshouse.com`). Bounces then land somewhere a human can read them, and the noise disappears. Fix it once at the source: set `WATCHTOWER_NOTIFICATION_EMAIL_FROM`, fail2ban's `sender =`, and similar to a **real mailbox** on your mail server (e.g., `marcus@majorshouse.com`). Bounces then land somewhere a human can read them, and the noise disappears.
## Per-host config drift on cloud-image-derived servers
When fleet hosts are spun up from images (DigitalOcean droplet snapshots, Packer artifacts, cloud-init templates), three specific config drift patterns silently break notification mail. Each one looks fine in isolation; the combination produces "mail leaves the host with `250 OK queued` and disappears."
### 1. Packer/snapshot-leftover `myhostname` in postfix
A host built from a Packer-baked image often has `postfix myhostname = packer-<uuid>` baked into `main.cf` from the build process. The system hostname might have been correctly set by terraform/cloud-init at first boot, but postfix's `myhostname` was hardcoded during image build and was never overridden. Result: every outbound message-id and EHLO carries the Packer artifact name (e.g., `<20260509120011.7EB6ABD83C@packer-641079bc-bc17-b5e1-1425-be745d012d0b>`), no SPF/DKIM matches that name, and remote spam filters score it as suspicious.
**Detect:**
```bash
postconf myhostname | grep -E 'packer-|builder-|<image-build-prefix>'
```
**Fix:**
```bash
hostnamectl set-hostname <real-fqdn>
postconf -e 'myhostname = <real-fqdn>'
sed -i '/^127\.0\.1\.1/d' /etc/hosts && \
echo "127.0.1.1 <real-fqdn> <short-name>" >> /etc/hosts
systemctl reload postfix
```
### 2. Empty `relayhost` quietly forces public-MX delivery
If `postconf relayhost` returns an empty value, postfix doesn't fail — it just does an MX lookup for the destination domain and tries to deliver directly. For mail to your own mail server, that means going via the **public MX** (the domain's external MX record, e.g., `mail.majorshouse.com → 165.227.187.191:25`) instead of the **internal/Tailscale relay path** the rest of the fleet uses.
The public-MX path is subject to whatever spam filtering, content checks, and trust rules the receiving MX has configured for external traffic. Internal Tailscale-IP traffic typically gets a faster trust shortcut (e.g., bypass spamchk pipe). So this single configuration drift causes one host's mail to land in a different code path than its siblings — and then silently get filtered.
**Detect:** look for fleet hosts where `postconf relayhost` returns blank and compare to known-good siblings.
**Fix:** set `relayhost = [<mailserver-tailscale-ip>]:587` (or whatever port your fleet convention uses).
### 3. Stale SASL passwd map referencing a missing file
Postfix configurations migrated from a previous setup often retain `smtp_sasl_auth_enable = yes` and `smtp_sasl_password_maps = hash:/etc/postfix/sasl_passwd` even when no SASL is needed for the current relay path. If the actual `sasl_passwd` file isn't there (because the migration didn't carry it, or the new relay doesn't require auth), every send attempt produces:
```
error: open database /etc/postfix/sasl_passwd.db: No such file or directory
warning: smtp_sasl_password_maps lookup error
status=deferred (local data error while talking to <relay>)
```
Especially common after migrating from external SMTP (SendGrid, Mailgun, etc., which use SASL) to an internal Tailscale relay (which doesn't).
**Detect:**
```bash
postconf -n | grep -E 'smtp_sasl_(auth_enable|password_maps)'
[ -f /etc/postfix/sasl_passwd ] || echo "sasl_passwd file missing"
```
**Fix — disable SASL if the new relay doesn't need it:**
```bash
postconf -e 'smtp_sasl_auth_enable = no'
postconf -e 'smtp_tls_wrappermode = no' # if switching from port 465 to 587
postconf -X 'smtp_sasl_password_maps'
systemctl reload postfix
```
### Audit shortcut
For a quick per-host comparison across the fleet:
```bash
for host in your fleet hosts; do
echo "=== $host ==="
ssh "$host" 'postconf myhostname relayhost smtp_sasl_auth_enable 2>&1' | head -3
done
```
Anomalies (Packer hostnames, blank relayhost, SASL enabled where siblings have it disabled) jump out immediately.
## Lesson Learned ## Lesson Learned
Never customize `/usr/share/logwatch/default.conf/logwatch.conf`. Always use `/etc/logwatch/conf/logwatch.conf`. This applies to any software that has a "defaults" file and an "override" file — the override survives upgrades, the defaults file does not. Never customize `/usr/share/logwatch/default.conf/logwatch.conf`. Always use `/etc/logwatch/conf/logwatch.conf`. This applies to any software that has a "defaults" file and an "override" file — the override survives upgrades, the defaults file does not.

View file

@ -217,6 +217,7 @@ updated: 2026-05-10T01:30
| Date | Article | Domain | | Date | Article | Domain |
|---|---|---| |---|---|---|
| 2026-05-10 | [Logwatch Fleet Setup — Surviving Package Upgrades](02-selfhosting/monitoring/logwatch-fleet-setup.md) — added "Per-host config drift on cloud-image-derived servers" section: Packer-leftover myhostname, empty relayhost forcing public-MX path, stale SASL passwd maps from prior relays | Self-Hosting |
| 2026-05-10 | [Patching PHP 8.4 Implicit-Nullable Deprecations in Vendor Packages](05-troubleshooting/php-84-vendor-implicit-nullable-patch.md) — generalized from a Castopod/UuidModel incident; covers the substring-match gotcha that turns a 30-second fix into a 30-minute one | Troubleshooting | | 2026-05-10 | [Patching PHP 8.4 Implicit-Nullable Deprecations in Vendor Packages](05-troubleshooting/php-84-vendor-implicit-nullable-patch.md) — generalized from a Castopod/UuidModel incident; covers the substring-match gotcha that turns a 30-second fix into a 30-minute one | Troubleshooting |
| 2026-05-10 | [Logwatch Fleet Setup — Surviving Package Upgrades](02-selfhosting/monitoring/logwatch-fleet-setup.md) — added Fedora CA bundle missing diagnosis, journald-vs-mail.log methodology note, and bounce-source-must-be-real-mailbox section | Self-Hosting | | 2026-05-10 | [Logwatch Fleet Setup — Surviving Package Upgrades](02-selfhosting/monitoring/logwatch-fleet-setup.md) — added Fedora CA bundle missing diagnosis, journald-vs-mail.log methodology note, and bounce-source-must-be-real-mailbox section | Self-Hosting |
| 2026-05-10 | [ClamAV Fleet Deployment with Ansible](02-selfhosting/security/clamav-fleet-deployment.md) — added DigitalOcean monitoring caveat for 1vCPU droplets (with follow-up note: per-droplet relaxed alert can still trip; accept-the-page decision) | Self-Hosting | | 2026-05-10 | [ClamAV Fleet Deployment with Ansible](02-selfhosting/security/clamav-fleet-deployment.md) — added DigitalOcean monitoring caveat for 1vCPU droplets (with follow-up note: per-droplet relaxed alert can still trip; accept-the-page decision) | Self-Hosting |