Covers enabling the [mailer] for password recovery (relay via a tailnet mail
server, no-auth/mynetworks, FORCE_TRUST_SERVER_CERT for IP targets), CLI password
reset + the must-change-password=true gotcha, adding an SSH key via the basic-auth
API when locked out, and ruling out a server-side cause for a 'changing' password.
New troubleshooting runbook for Logwatch reports titled with the Hetzner
provisioning label instead of the real hostname; cross-linked from the
logwatch fleet-setup and VPS migration baseline articles, plus a new
'set system hostname' step in the post-migration checklist.
Documents the Ansible-by-IP known_hosts gap: interactive ssh works (key
stored under hostname) but Ansible connects by inventory IP and fails with
UNREACHABLE/Host key verification failed. Includes tailnet-safe ssh-keyscan
fix and prevention notes. Surfaced by the Hetzner migration IP churn.
Documents why WSL2 hosts fail an Ansible reboot play at privilege
escalation (Timeout waiting for privilege escalation prompt) — WSL2 has
no real reboot semantics + become stalls over the Windows OpenSSH->WSL2
bridge — and the fix: scope reboot.yml to hosts: all:!wsl. Registered
in SUMMARY.md and 05-troubleshooting/index.md.
New troubleshooting/networking article covering the three SSH failure modes
after a fleet migration (stale hardcoded IP, Tailscale 1.98.x cold-path
teardown, rebuilt-box host-key mismatch) and the durable fix (MagicDNS names +
known_hosts purge + ConnectTimeout), with the WSL2 no-resolver caveat.
Cross-links the existing host-key article (adds a 'when pinning the IP is
wrong' callout) and adds the SUMMARY nav entry.
Document that VAAPI HEVC on Polaris can't beat already-efficient H.264 (YouTube/
Twitch/stream archives), so output comes out larger and lands in hevc_failed.txt.
Add already_failed() guard so the batch skips known-bad files on queue rebuilds
instead of re-attempting them. Also: MIN_FREE_GB note (start-only check) and a
source-bitrate triage snippet for picking real encode candidates.
New 05-troubleshooting/networking article covering the case where ssh <alias>
fails host-key verification because no Host block exists and the alias resolves
via Tailscale MagicDNS to a name with no known_hosts entry (key stored under the
IP). Registered in SUMMARY.md and the troubleshooting index.
Backing up two unpublished draft articles that existed only in a working-tree
stash. Drafts — NOT in SUMMARY.md nav and NOT merged to main, so not published
to notes.majorshouse.com. Pre-commit nav check bypassed intentionally (--no-verify).
- 05-troubleshooting/claude-code-warp-login-corrupt-keychain-credential.md
- 05-troubleshooting/iphone-mirroring-connecting-hang-awdl-stall-beta.md
Operational/how-to references updated to the role entry playbooks after the
ADR-0001 migration. Historical incident narrative (dated callouts, commit
refs) preserved.
- clamav-fleet-deployment: override + re-run -> clamav.yml; role note
- ssh-hardening-ansible-fleet: note this is now the ssh_hardening role
- vps-migration-baseline-checklist: table -> clamav.yml / ssh_hardening.yml
- ssh-socket-tailscale-race-condition: Affected Hosts + Prevention + References
-> tailscale role tasks (network_wait/ssh_only_ubuntu/ssh_only_fedora)
- freshclam-logwatch-false-no-updates: codify refs -> clamav role
dcaprod-hetzner + tttpod-hetzner were missing tailscale-wait-ready.service
(inert ssh.service gate -> latent bind race); corrected playbook applied to
both. teelia uses Tailscale SSH (no sshd, immune). All Ubuntu hosts now on
the dependency-free-socket + ssh.service-gate pattern.
- Fedora hosts are NOT automatically immune: a leftover manual
`ListenAddress <tailscale-ip>` drop-in reintroduces the sshd boot bind-race
even under firewalld (hit on majordiscord 2026-06-07; fix = remove it).
- The Ubuntu playbook kept shipping the cycle-causing [Unit] gate on
ssh.socket despite the 2026-06-04 resolution; re-running it re-armed the
ordering cycle (clobbered majorlinux; majortoot-hetzner found armed).
Corrected in MajorAnsible e0d35aa. Fleet ssh-lockdown state is inconsistent
(dcaprod/tttpod lack wait-ready; teelia no override) — needs a per-host audit.
Document the majormail 2026-06-07 incident: when userdb home == maildir
root, the LDA/Sieve duplicate database (.dovecot.lda-dupes + .locks) lands
inside the mail store and the maildir lister exposes it as phantom
mailboxes ("dovecot.lda-dupes"), logging stat(.../tmp) "Not a directory".
Fix: point home at a non-dotted subdir. Wired into the troubleshooting
index and SUMMARY.
Document the majormail spam-routing failure (2026-06-06): a cleanup
header_checks REDIRECT keyed on the milter-added X-Spam-Flag never fired for
real inbound mail (only locally-injected), so spam kept reaching the inbox.
Fix is to route in Sieve at delivery (after the milter), with a redirect +
loop guard. Includes the 'local-injection tests lie' warning.
Document the daily /etc/cron.daily/clamav-freshness watchdog as the real
detector for stale signatures, and the key gotcha that 'mail' is absent on
most fleet hosts so alert scripts must use /usr/sbin/sendmail -t.
logwatch's clam-update counts only 'process started' lines (emitted only at
daemon restart), so daemon-mode freshclam false-alarms on quiet days despite
signatures updating. Fix: $ignore_no_updates=1 drop-in. Includes the
real-vs-false check (a daemonless box with freshclam disabled is a TRUE alert).
New page documenting the majormail (2026-06-05) issue: /etc/localtime
shipped labeled etc_t instead of locale_t on the Hetzner image, so SELinux
denied systemd-timedated and timedatectl/community.general.timezone reported
success while the symlink stayed at UTC. Fix: restorecon before setting TZ.
Indexed in index.md (SELinux) + SUMMARY.md.
- New page: Dovecot IMAP vsz_limit OOM from a bloated/corrupt index.log
(152M index on an empty folder killed IMAP children with error 83).
- fail2ban IMAP self-ban: add permanent ignoreip-whitelist fix + dynamic-IP caveat.
- firewalld mail ports: add 'submission/587 never added' variant + correct
Fedora service name; note Ansible now manages the full mail-service set.
- Index + SUMMARY updated with the new page.
Three updates to the inbound spam filtering guide, all driven by the 2026-06-04
majormail-hetzner Phase 6 cutover and follow-up tuning:
1. Section 6 (Dovecot Sieve): warn explicitly that `plugin/sieve_before` was
dropped in Pigeonhole 2.4 and silently does nothing — no startup warning,
spam just keeps landing in INBOX. The 2.4 replacement is a top-level
`sieve_script <name> { type = before; path = …; }` block. Also note the
Fedora-flat-dovecot.conf pitfall (some packagings ship dovecot.conf
without `!include conf.d/*.conf`, so the block has to live in the main
file directly). Added a `sievec` compile step.
2. New §6b: route spam to a separate `junk@` mailbox via Postfix cleanup
`header_checks` REDIRECT. This makes spam invisible to the user's
mailbox entirely — Spark/IDLE-based clients don't push-notify because
the message never reaches the subscribed mailbox at all. Includes the
`regexp:` vs `pcre:` map-type tip (use regexp on stock Fedora to avoid
the postfix-pcre package dependency).
3. New §7a: weekly systemd timer for sa-learn. The §7 warning about
"don't run sa-learn from cron unless folders are clean" is correct as
the safe default — but when you adopt the §6b REDIRECT-to-junk@
pattern, the junk@ mailbox is pure spam by design and a weekly
`--spam`/`--ham`/`--sync`/`--force-expire` chain becomes safe and
useful. Full unit templates included.
Gotchas table gains four entries:
- Pigeonhole 2.4 silent breakage of plugin/sieve_before
- postfix-pcre vs regexp map type confusion
- Why sieve fileinto Junk still pushes a Spark notification
- Why local `sendmail` injection doesn't trigger the REDIRECT (smtpd
milters skip sendmail-injected mail, so X-Spam-Flag isn't added)
All changes match what's now codified in the `majormail` Ansible role
(commit 7a8b9eb in MajorAnsible).
New 02-selfhosting/services article: the full Postfix/Dovecot inbound spam stack
on Fedora — spamass-milter tag-only wiring (the -r footgun), socket permissions
(sa-milt group + UMask), site-wide Bayes DB, Sieve-to-Junk, and sa-learn training
(folders, spam/ham balance, manual-not-cron). From the majormail setup.
Also extends selinux-dovecot-vmail-context with a Permissive-mode variant + a
postfix_cleanup->mysqld_etc companion-denial note. SUMMARY.md nav updated.
The weekly media-prune cron (and monthly accounts refresh --all) were
removed 2026-06-01 after repeatedly breaking avatars. Update the
majortoot sections: the 648->7GB shrink was a one-time safe attachment
cleanup; automation is now disabled; prune attachments manually if ever
needed, never profiles. Cross-link the two new troubleshooting articles.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
New article mastodon-s3-acl-upload-failures.md: a BucketOwnerEnforced S3
bucket plus a stale S3_PERMISSION/S3_ACL in .env.production makes every
Mastodon upload fail with AccessControlListNotSupported, silently. Covers
symptoms (incl. why a missing object returns 403 not 404), diagnosis,
the fix (S3_PERMISSION= empty, public read via bucket policy), recovery,
a synthetic-write health check, and Ansible enforcement.
Extend mastodon-prune-profiles-trap.md: add a "Bulk restore at scale"
procedure (list existing keys, null missing DB refs, enqueue
RedownloadAvatar/HeaderWorker), a "storage-level deletion without DB
de-ref" section, and a stronger recommendation to disable automated
profile pruning (and scheduled accounts refresh --all) entirely.
Link both from SUMMARY.md and the selfhosting index.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Two related additions covering the 2026-05-31 cutover-night incidents on
majorlinux and majortoot-hetzner.
ssh-socket-tailscale-race-condition.md (update Race 1 fix):
- After=tailscaled.service Requires=tailscaled.service orders against the
service becoming active, not against tailscale0 having an IPv4 — hosts
kept losing SSH intermittently after reboots (incident: majorlinux +
majortoot-hetzner 2026-05-31, during cutover-night Ansible reboot).
- Canonical fix: a oneshot tailscale-wait-ready.service that polls
`ip -4 -o addr show tailscale0` until an address is present, with
ssh.socket After=/Requires= that service. Document the full evolution
(2026-05-19 BindsTo → 2026-05-23 Requires → 2026-05-31 wait-ready) so
future readers don't try the half-fixes thinking they're sufficient.
- Add majortoot-hetzner to affected hosts.
mastodon-post-install-hardening.md (new):
Four upstream-install gaps that bit during the majortoot-hetzner cutover:
1. /home/mastodon at 0750 (useradd default) → nginx www-data can't
traverse → every static asset 403s → unstyled "purple screen" in the
browser while API/HTML still work through the puma proxy.
2. .env.production at 0644 (mastodon-setup default) → DB_PASS,
SECRET_KEY_BASE, OTP_SECRET world-readable once gap (1) is fixed.
3. mastodon user shell at /usr/sbin/nologin → `su - mastodon` blocked.
4. rbenv init in .bashrc only → login shells don't source .bashrc; even
when chained, Ubuntu's .bashrc returns early for non-interactive
shells. Fix: .bash_profile sets up rbenv BEFORE sourcing .profile +
.bashrc, so it works for both interactive and non-interactive logins.
All four codified in MajorAnsible configure_mastodon_permissions.yml
with self-asserting verification steps.
02-selfhosting/index.md + SUMMARY.md:
Add a "Services" section to the selfhosting index linking the
mastodon-post-install-hardening article (and the other orphaned
services/ entries while there). SUMMARY.md gains one new entry.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
BindsTo=tailscaled.service causes a systemd ordering cycle that
prevents ssh.socket from starting on reboot. Updated the recommended
fix to use Requires= and added a warning admonition explaining why
BindsTo must not be used. Added tttpod-hetzner to affected hosts
list and linked the 2026-05-23 dcaprod incident.
Added Race 2: tailscaled starts before network-online.target, causing
Tailscale to get stuck with SetNetworkUp(false). Covers both Ubuntu
ssh.socket and cross-platform tailscaled ordering issues. Updated
references to include majordiscord incident and new Ansible playbook.
Documents the systemd socket activation race where ssh.socket binds
to the Tailscale IP before tailscaled is ready, causing SSH to become
unreachable after a Tailscale reconnect. Includes diagnosis steps and
the After=/BindsTo= fix.
- netdata-apps-fds-group-false-positive: the apps_group_file_descriptors_utilization
false 100% on forking/root app groups (tailscaled on MajorToot 2026-05-15),
the not-a-privilege gotcha, fleet-wide silence fix in MajorAnsible.
- obs-stale-script-paths: pending from prior session (not on remote).
- SUMMARY.md: link both (re-applied onto upstream after concurrent rebase).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Fedora 44 Hetzner images ship without rsyslog — logwatch produces
zero output because /var/log/messages doesn't exist. Added rsyslog
to baseline table and new diagnostic section to logwatch article.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>