Marcus Summers 0d1697c0d6 wiki: Logwatch wrong hostname (<host>-hetzner) after migration

New troubleshooting runbook for Logwatch reports titled with the Hetzner
provisioning label instead of the real hostname; cross-linked from the
logwatch fleet-setup and VPS migration baseline articles, plus a new
'set system hostname' step in the post-migration checklist.

2026-06-12 10:58:17 -04:00

14 KiB

Raw Blame History

title

description

Logwatch Fleet Setup — Surviving Package Upgrades

Logwatch ships with a defaults file at /usr/share/logwatch/default.conf/logwatch.conf. On Fedora, package upgrades silently reset this file — wiping any customizations. The fix is to put all settings in the local override file at /etc/logwatch/conf/logwatch.conf, which is never touched by package managers.

The Problem

Fedora 44's logwatch 7.14-1 upgrade (April 2026) reset Output from mail back to stdout in the defaults file. Servers that had been emailing daily reports for months went silent with zero errors. rpm -V logwatch shows the defaults file was modified (S.5....T.), but there's no warning during upgrade.

Ubuntu is less affected because its /etc/cron.daily/00logwatch script passes --output mail explicitly, overriding the config. Fedora's cron script does not.

The Fix

Write all settings to the override file (/etc/logwatch/conf/logwatch.conf):

# Managed by Ansible — do not edit manually.
# Local overrides — survives package upgrades.
Output = mail
MailTo = marcus@majorshouse.com
MailFrom = Logwatch@hostname.majorshouse.com
Detail = Low

Key settings:

Setting	Value	Why
`Output`	`mail`	Must be `mail`, not `stdout`. Fedora's cron script doesn't pass `--output mail` like Ubuntu's does.
`MailTo`	recipient address	Where reports go.
`MailFrom`	per-host sender	Makes it easy to identify which server sent the report.
`Detail`	`Low`	Keeps emails scannable. Raise to `Med` or `High` for debugging.

Ansible Playbook

The logwatch.yml playbook handles both OS families:

- name: Install and configure logwatch
  hosts: all
  become: true
  gather_facts: true
  tasks:
    - name: Install logwatch (Debian/Ubuntu)
      ansible.builtin.apt:
        name: logwatch
        state: present
      when: ansible_facts['os_family'] == "Debian"

    - name: Install logwatch (Fedora)
      ansible.builtin.dnf:
        name: logwatch
        state: present
      when: ansible_facts['os_family'] == "RedHat"

    - name: Ensure logwatch override directory exists
      ansible.builtin.file:
        path: /etc/logwatch/conf
        state: directory
        mode: '0755'

    - name: Configure logwatch override (survives package upgrades)
      ansible.builtin.copy:
        dest: /etc/logwatch/conf/logwatch.conf
        mode: '0644'
        content: |
          # Managed by Ansible — do not edit manually.
          Output = mail
          MailTo = {{ logwatch_email }}
          MailFrom = Logwatch@{{ inventory_hostname }}.majorshouse.com
          Detail = Low

Include it in harden.yml so every new server gets logwatch as part of the baseline.

Verifying

After deploying, test immediately:

# Verify crond is actually running — cronie can be "enabled" but not "active"
systemctl is-active crond   # Fedora
systemctl is-active cron    # Ubuntu

# If inactive, start it
sudo systemctl start crond

# Then test logwatch manually
sudo logwatch --output mail --range today

Check that the email arrives. If it doesn't, verify:

crond is running — if inactive, cron.daily never fires and logwatch never runs. No errors anywhere.
Postfix is installed and relaying — logwatch depends on a working local MTA.
CA bundle exists (Fedora) — missing /etc/pki/tls/certs/ca-bundle.crt breaks Postfix TLS relay. See Fedora CA bundle fix.

Diagnosing Silent Failures

# Check if the defaults file was modified by a package upgrade
rpm -V logwatch  # Fedora
dpkg -V logwatch  # Debian

# Look for S.5....T. on the defaults file — means it was replaced
# S = size, 5 = md5, T = timestamp changed

# Check if logwatch produces any output at all
logwatch --output stdout --range yesterday | wc -l
# If 0 lines — logwatch has no log data to report (see rsyslog section below)

Fedora: rsyslog Missing — Logwatch Produces Zero Output

Fedora 44 cloud images (Hetzner, possibly others) ship with journald only — no rsyslog. This means /var/log/messages, /var/log/secure, and /var/log/cron do not exist. Logwatch scans those files, finds nothing, produces empty output, and sends no email. Exit code is still 0 — no error anywhere.

This is particularly insidious because everything else can be correct (crond running, postfix relaying, logwatch config pointing to the right recipient) and you'll still get silence.

# Diagnose
rpm -q rsyslog          # "package rsyslog is not installed"
ls /var/log/messages    # "No such file or directory"

# Fix
dnf install -y rsyslog
systemctl enable --now rsyslog

# Verify log files appear
ls /var/log/messages /var/log/secure /var/log/cron

# Test logwatch
logwatch --output stdout --range today | wc -l   # should be >0

Fedora CA Bundle Missing — Postfix TLS Engine Unavailable

If the Fedora half of your fleet is silent but the Debian/Ubuntu half is fine, and your relayhost requires TLS, suspect a missing CA bundle. Symptom on the sending host:

postfix/error: status=deferred (delivery temporarily suspended:
TLS is required, but our TLS engine is unavailable)

The tell that this is the CA bundle and not a postfix-internal problem: dnf and curl are also broken on the box. Run any sudo dnf list / sudo curl https://... and look for:

Curl error (77): Problem with the SSL CA cert (path? access rights?)
[error adding trust anchors from file: /etc/pki/tls/certs/ca-bundle.crt]

That's the same path postfix's smtp_tls_CAfile defaults to. Every TLS client on the box is failing because a single symlink is missing.

Diagnosis

# Is the consumer-path symlink there?
ls -la /etc/pki/tls/certs/ca-bundle.crt
# Expected: lrwxrwxrwx ... -> /etc/pki/ca-trust/extracted/pem/tls-ca-bundle.pem

# Is the extracted bundle itself intact?
ls -la /etc/pki/ca-trust/extracted/pem/tls-ca-bundle.pem
sudo grep -c 'BEGIN CERTIFICATE' /etc/pki/ca-trust/extracted/pem/tls-ca-bundle.pem
# Expected: ~140-150 certs, ~220 KB

If the extracted bundle exists but the consumer-path symlink is gone, you've found it. update-ca-trust extract regenerates the extracted/ paths but does not recreate the upstream-style symlink at /etc/pki/tls/certs/ca-bundle.crt — that symlink is shipped by the ca-certificates package and can be lost during a partial upgrade or a stray rm.

Fix

sudo ln -sfn /etc/pki/ca-trust/extracted/pem/tls-ca-bundle.pem \
            /etc/pki/tls/certs/ca-bundle.crt
sudo systemctl reload postfix
sudo postqueue -f       # drain deferred mail

Verify with sudo grep -c 'BEGIN CERTIFICATE' /etc/pki/tls/certs/ca-bundle.crt (should match the extracted bundle's count) and sudo dnf list --installed postfix (should no longer show the curl error).

Audit the rest of the Fedora fleet

Once you find one host with this issue, check the others — package events that broke one box may have broken its siblings:

for host in $(your fleet | grep fedora); do
  echo "$host: $(ssh $host 'ls /etc/pki/tls/certs/ca-bundle.crt 2>&1' | tail -1)"
done

Hosts returning "No such file or directory" are silently broken. They won't fail loudly until something asks them to do TLS — which on a small homelab might be never until logwatch tries to mail you weeks later.

Methodology note: postfix logs differ between distros

Don't trust a single log source when surveying a mixed fleet. Fedora and majormail log postfix to journald (journalctl -u postfix); Debian/Ubuntu log to /var/log/mail.log (and rotated mail.log.1 / mail.log.*.gz). Querying journalctl on Ubuntu returns "no entries" even when mail is flowing — easy way to declare a working host broken. Always run tail /var/log/mail.log on Debian-family hosts and journalctl -u postfix on Fedora-family hosts.

Bounce-source addresses must be real mailboxes

A subtle related class of bug: services like Watchtower, fail2ban, cron, and Netdata default to sending notifications from an identity that doesn't exist as a recipient — watchtower@majorshouse.com, fail2ban@<host>.majorshouse.com, root@<host>.localdomain. While the relayhost is healthy, nobody notices. The moment any delivery fails (network blip, recipient typo, queue overflow, the CA bundle bug above), the local MTA tries to bounce the original message back to that sender — finds no mailbox — and the bounce itself bounces. You get MAILER-DAEMON queue churn and 5.7.1 Relay access denied rejections in your mail server logs.

Fix it once at the source: set WATCHTOWER_NOTIFICATION_EMAIL_FROM, fail2ban's sender =, and similar to a real mailbox on your mail server (e.g., marcus@majorshouse.com). Bounces then land somewhere a human can read them, and the noise disappears.

Per-host config drift on cloud-image-derived servers

When fleet hosts are spun up from images (DigitalOcean droplet snapshots, Packer artifacts, cloud-init templates), three specific config drift patterns silently break notification mail. Each one looks fine in isolation; the combination produces "mail leaves the host with 250 OK queued and disappears."

1. Packer/snapshot-leftover `myhostname` in postfix

A host built from a Packer-baked image often has postfix myhostname = packer-<uuid> baked into main.cf from the build process. The system hostname might have been correctly set by terraform/cloud-init at first boot, but postfix's myhostname was hardcoded during image build and was never overridden. Result: every outbound message-id and EHLO carries the Packer artifact name (e.g., <20260509120011.7EB6ABD83C@packer-641079bc-bc17-b5e1-1425-be745d012d0b>), no SPF/DKIM matches that name, and remote spam filters score it as suspicious.

Detect:

postconf myhostname | grep -E 'packer-|builder-|<image-build-prefix>'

Fix:

hostnamectl set-hostname <real-fqdn>
postconf -e 'myhostname = <real-fqdn>'
sed -i '/^127\.0\.1\.1/d' /etc/hosts && \
  echo "127.0.1.1 <real-fqdn> <short-name>" >> /etc/hosts
systemctl reload postfix

[!tip] Same drift, different symptom: the Logwatch title Hetzner provisions boxes with <host>-hetzner as the system hostname. When that's never corrected, Logwatch (which reads the live hostname at runtime) mails reports titled Logwatch for <host>-hetzner — no postfix involvement needed. Same hostnamectl set-hostname + /etc/hosts fix as above. See Logwatch wrong hostname after migration.

2. Empty `relayhost` quietly forces public-MX delivery

If postconf relayhost returns an empty value, postfix doesn't fail — it just does an MX lookup for the destination domain and tries to deliver directly. For mail to your own mail server, that means going via the public MX (the domain's external MX record, e.g., mail.majorshouse.com → 165.227.187.191:25) instead of the internal/Tailscale relay path the rest of the fleet uses.

The public-MX path is subject to whatever spam filtering, content checks, and trust rules the receiving MX has configured for external traffic. Internal Tailscale-IP traffic typically gets a faster trust shortcut (e.g., bypass spamchk pipe). So this single configuration drift causes one host's mail to land in a different code path than its siblings — and then silently get filtered.

Detect: look for fleet hosts where postconf relayhost returns blank and compare to known-good siblings.

Fix: set relayhost = [<mailserver-tailscale-ip>]:587 (or whatever port your fleet convention uses).

3. Stale SASL passwd map referencing a missing file

Postfix configurations migrated from a previous setup often retain smtp_sasl_auth_enable = yes and smtp_sasl_password_maps = hash:/etc/postfix/sasl_passwd even when no SASL is needed for the current relay path. If the actual sasl_passwd file isn't there (because the migration didn't carry it, or the new relay doesn't require auth), every send attempt produces:

error: open database /etc/postfix/sasl_passwd.db: No such file or directory
warning: smtp_sasl_password_maps lookup error
status=deferred (local data error while talking to <relay>)

Especially common after migrating from external SMTP (SendGrid, Mailgun, etc., which use SASL) to an internal Tailscale relay (which doesn't).

Detect:

postconf -n | grep -E 'smtp_sasl_(auth_enable|password_maps)'
[ -f /etc/postfix/sasl_passwd ] || echo "sasl_passwd file missing"

Fix — disable SASL if the new relay doesn't need it:

postconf -e 'smtp_sasl_auth_enable = no'
postconf -e 'smtp_tls_wrappermode = no'   # if switching from port 465 to 587
postconf -X 'smtp_sasl_password_maps'
systemctl reload postfix

Audit shortcut

For a quick per-host comparison across the fleet:

for host in your fleet hosts; do
  echo "=== $host ==="
  ssh "$host" 'postconf myhostname relayhost smtp_sasl_auth_enable 2>&1' | head -3
done

Anomalies (Packer hostnames, blank relayhost, SASL enabled where siblings have it disabled) jump out immediately.

Lesson Learned

Never customize /usr/share/logwatch/default.conf/logwatch.conf. Always use /etc/logwatch/conf/logwatch.conf. This applies to any software that has a "defaults" file and an "override" file — the override survives upgrades, the defaults file does not.

A second, broader lesson from the 2026-05-10 fleet outage: silent fleet-wide email gaps are usually a stack of unrelated failures, not one cause. That morning's investigation surfaced a missing CA bundle on two Fedora hosts, a postfix relayhost using a name that postfix's resolver couldn't handle, two services with non-mailbox sender addresses generating bounce churn, and a corrupt syslog-vs-journald assumption that hid working hosts. Each was minor in isolation. Together they made all seven hosts look broken when in fact only two were. Triage by ground-truth (what arrived in the destination mailbox) before assuming what's broken at the source.

14 KiB Raw Blame History