Logwatch fleet article: add cloud-image config-drift section
Documents three more patterns surfaced in the 2026-05-10 fleet-mail investigation, all hitting hosts derived from cloud images or cross-provider migrations: - Packer/snapshot-leftover myhostname (postfix EHLO + message-id identifies the build artifact, not the production hostname; remote spam scorers hate it) - Empty relayhost silently routes mail via the public MX instead of the Tailscale-internal path, exposing it to spamchk that internal traffic bypasses - Stale SASL passwd map referencing a missing file from a previous external-SMTP relay setup, deferring every send with "local data error" Each looks benign in isolation. Together they made dcaprod's Logwatch disappear into spamchk for weeks while showing 250 OK on the source. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
parent
724ae2a5e3
commit
9c62e7f804
2 changed files with 77 additions and 1 deletions
|
|
@ -9,7 +9,7 @@ tags:
|
|||
- ubuntu
|
||||
status: published
|
||||
created: 2026-05-09
|
||||
updated: 2026-05-10
|
||||
updated: 2026-05-10T13:00
|
||||
---
|
||||
|
||||
# Logwatch Fleet Setup — Surviving Package Upgrades
|
||||
|
|
@ -173,6 +173,81 @@ A subtle related class of bug: services like Watchtower, fail2ban, cron, and Net
|
|||
|
||||
Fix it once at the source: set `WATCHTOWER_NOTIFICATION_EMAIL_FROM`, fail2ban's `sender =`, and similar to a **real mailbox** on your mail server (e.g., `marcus@majorshouse.com`). Bounces then land somewhere a human can read them, and the noise disappears.
|
||||
|
||||
## Per-host config drift on cloud-image-derived servers
|
||||
|
||||
When fleet hosts are spun up from images (DigitalOcean droplet snapshots, Packer artifacts, cloud-init templates), three specific config drift patterns silently break notification mail. Each one looks fine in isolation; the combination produces "mail leaves the host with `250 OK queued` and disappears."
|
||||
|
||||
### 1. Packer/snapshot-leftover `myhostname` in postfix
|
||||
|
||||
A host built from a Packer-baked image often has `postfix myhostname = packer-<uuid>` baked into `main.cf` from the build process. The system hostname might have been correctly set by terraform/cloud-init at first boot, but postfix's `myhostname` was hardcoded during image build and was never overridden. Result: every outbound message-id and EHLO carries the Packer artifact name (e.g., `<20260509120011.7EB6ABD83C@packer-641079bc-bc17-b5e1-1425-be745d012d0b>`), no SPF/DKIM matches that name, and remote spam filters score it as suspicious.
|
||||
|
||||
**Detect:**
|
||||
|
||||
```bash
|
||||
postconf myhostname | grep -E 'packer-|builder-|<image-build-prefix>'
|
||||
```
|
||||
|
||||
**Fix:**
|
||||
|
||||
```bash
|
||||
hostnamectl set-hostname <real-fqdn>
|
||||
postconf -e 'myhostname = <real-fqdn>'
|
||||
sed -i '/^127\.0\.1\.1/d' /etc/hosts && \
|
||||
echo "127.0.1.1 <real-fqdn> <short-name>" >> /etc/hosts
|
||||
systemctl reload postfix
|
||||
```
|
||||
|
||||
### 2. Empty `relayhost` quietly forces public-MX delivery
|
||||
|
||||
If `postconf relayhost` returns an empty value, postfix doesn't fail — it just does an MX lookup for the destination domain and tries to deliver directly. For mail to your own mail server, that means going via the **public MX** (the domain's external MX record, e.g., `mail.majorshouse.com → 165.227.187.191:25`) instead of the **internal/Tailscale relay path** the rest of the fleet uses.
|
||||
|
||||
The public-MX path is subject to whatever spam filtering, content checks, and trust rules the receiving MX has configured for external traffic. Internal Tailscale-IP traffic typically gets a faster trust shortcut (e.g., bypass spamchk pipe). So this single configuration drift causes one host's mail to land in a different code path than its siblings — and then silently get filtered.
|
||||
|
||||
**Detect:** look for fleet hosts where `postconf relayhost` returns blank and compare to known-good siblings.
|
||||
|
||||
**Fix:** set `relayhost = [<mailserver-tailscale-ip>]:587` (or whatever port your fleet convention uses).
|
||||
|
||||
### 3. Stale SASL passwd map referencing a missing file
|
||||
|
||||
Postfix configurations migrated from a previous setup often retain `smtp_sasl_auth_enable = yes` and `smtp_sasl_password_maps = hash:/etc/postfix/sasl_passwd` even when no SASL is needed for the current relay path. If the actual `sasl_passwd` file isn't there (because the migration didn't carry it, or the new relay doesn't require auth), every send attempt produces:
|
||||
|
||||
```
|
||||
error: open database /etc/postfix/sasl_passwd.db: No such file or directory
|
||||
warning: smtp_sasl_password_maps lookup error
|
||||
status=deferred (local data error while talking to <relay>)
|
||||
```
|
||||
|
||||
Especially common after migrating from external SMTP (SendGrid, Mailgun, etc., which use SASL) to an internal Tailscale relay (which doesn't).
|
||||
|
||||
**Detect:**
|
||||
|
||||
```bash
|
||||
postconf -n | grep -E 'smtp_sasl_(auth_enable|password_maps)'
|
||||
[ -f /etc/postfix/sasl_passwd ] || echo "sasl_passwd file missing"
|
||||
```
|
||||
|
||||
**Fix — disable SASL if the new relay doesn't need it:**
|
||||
|
||||
```bash
|
||||
postconf -e 'smtp_sasl_auth_enable = no'
|
||||
postconf -e 'smtp_tls_wrappermode = no' # if switching from port 465 to 587
|
||||
postconf -X 'smtp_sasl_password_maps'
|
||||
systemctl reload postfix
|
||||
```
|
||||
|
||||
### Audit shortcut
|
||||
|
||||
For a quick per-host comparison across the fleet:
|
||||
|
||||
```bash
|
||||
for host in your fleet hosts; do
|
||||
echo "=== $host ==="
|
||||
ssh "$host" 'postconf myhostname relayhost smtp_sasl_auth_enable 2>&1' | head -3
|
||||
done
|
||||
```
|
||||
|
||||
Anomalies (Packer hostnames, blank relayhost, SASL enabled where siblings have it disabled) jump out immediately.
|
||||
|
||||
## Lesson Learned
|
||||
|
||||
Never customize `/usr/share/logwatch/default.conf/logwatch.conf`. Always use `/etc/logwatch/conf/logwatch.conf`. This applies to any software that has a "defaults" file and an "override" file — the override survives upgrades, the defaults file does not.
|
||||
|
|
|
|||
1
index.md
1
index.md
|
|
@ -217,6 +217,7 @@ updated: 2026-05-10T01:30
|
|||
|
||||
| Date | Article | Domain |
|
||||
|---|---|---|
|
||||
| 2026-05-10 | [Logwatch Fleet Setup — Surviving Package Upgrades](02-selfhosting/monitoring/logwatch-fleet-setup.md) — added "Per-host config drift on cloud-image-derived servers" section: Packer-leftover myhostname, empty relayhost forcing public-MX path, stale SASL passwd maps from prior relays | Self-Hosting |
|
||||
| 2026-05-10 | [Patching PHP 8.4 Implicit-Nullable Deprecations in Vendor Packages](05-troubleshooting/php-84-vendor-implicit-nullable-patch.md) — generalized from a Castopod/UuidModel incident; covers the substring-match gotcha that turns a 30-second fix into a 30-minute one | Troubleshooting |
|
||||
| 2026-05-10 | [Logwatch Fleet Setup — Surviving Package Upgrades](02-selfhosting/monitoring/logwatch-fleet-setup.md) — added Fedora CA bundle missing diagnosis, journald-vs-mail.log methodology note, and bounce-source-must-be-real-mailbox section | Self-Hosting |
|
||||
| 2026-05-10 | [ClamAV Fleet Deployment with Ansible](02-selfhosting/security/clamav-fleet-deployment.md) — added DigitalOcean monitoring caveat for 1vCPU droplets (with follow-up note: per-droplet relaxed alert can still trip; accept-the-page decision) | Self-Hosting |
|
||||
|
|
|
|||
Loading…
Add table
Reference in a new issue