Client-side fix for OG Steam Deck (RTL8822CE/rtw88) flapping ~once a minute on
SteamOS: disable IWD periodic scan + disable Wi-Fi power save via NM dispatcher.
Cross-linked with the 160MHz airtime article; registered in SUMMARY.md nav.
New 05-troubleshooting/networking article covering the per-host nature of
authorized_keys: rotating a workstation SSH key requires backfilling the new
pubkey to every host, or hosts holding only the old key reject it with
Permission denied (publickey). Includes fleet-sweep diagnosis, idempotent
backed-up backfill via a still-trusted transit user, and prevention. Wired
into SUMMARY.md nav.
Documents the Ansible-by-IP known_hosts gap: interactive ssh works (key
stored under hostname) but Ansible connects by inventory IP and fails with
UNREACHABLE/Host key verification failed. Includes tailnet-safe ssh-keyscan
fix and prevention notes. Surfaced by the Hetzner migration IP churn.
New troubleshooting/networking article covering the three SSH failure modes
after a fleet migration (stale hardcoded IP, Tailscale 1.98.x cold-path
teardown, rebuilt-box host-key mismatch) and the durable fix (MagicDNS names +
known_hosts purge + ConnectTimeout), with the WSL2 no-resolver caveat.
Cross-links the existing host-key article (adds a 'when pinning the IP is
wrong' callout) and adds the SUMMARY nav entry.
New 05-troubleshooting/networking article covering the case where ssh <alias>
fails host-key verification because no Host block exists and the alias resolves
via Tailscale MagicDNS to a name with no known_hosts entry (key stored under the
IP). Registered in SUMMARY.md and the troubleshooting index.
Operational/how-to references updated to the role entry playbooks after the
ADR-0001 migration. Historical incident narrative (dated callouts, commit
refs) preserved.
- clamav-fleet-deployment: override + re-run -> clamav.yml; role note
- ssh-hardening-ansible-fleet: note this is now the ssh_hardening role
- vps-migration-baseline-checklist: table -> clamav.yml / ssh_hardening.yml
- ssh-socket-tailscale-race-condition: Affected Hosts + Prevention + References
-> tailscale role tasks (network_wait/ssh_only_ubuntu/ssh_only_fedora)
- freshclam-logwatch-false-no-updates: codify refs -> clamav role
dcaprod-hetzner + tttpod-hetzner were missing tailscale-wait-ready.service
(inert ssh.service gate -> latent bind race); corrected playbook applied to
both. teelia uses Tailscale SSH (no sshd, immune). All Ubuntu hosts now on
the dependency-free-socket + ssh.service-gate pattern.
- Fedora hosts are NOT automatically immune: a leftover manual
`ListenAddress <tailscale-ip>` drop-in reintroduces the sshd boot bind-race
even under firewalld (hit on majordiscord 2026-06-07; fix = remove it).
- The Ubuntu playbook kept shipping the cycle-causing [Unit] gate on
ssh.socket despite the 2026-06-04 resolution; re-running it re-armed the
ordering cycle (clobbered majorlinux; majortoot-hetzner found armed).
Corrected in MajorAnsible e0d35aa. Fleet ssh-lockdown state is inconsistent
(dcaprod/tttpod lack wait-ready; teelia no override) — needs a per-host audit.
Document the majormail 2026-06-07 incident: when userdb home == maildir
root, the LDA/Sieve duplicate database (.dovecot.lda-dupes + .locks) lands
inside the mail store and the maildir lister exposes it as phantom
mailboxes ("dovecot.lda-dupes"), logging stat(.../tmp) "Not a directory".
Fix: point home at a non-dotted subdir. Wired into the troubleshooting
index and SUMMARY.
Document the majormail spam-routing failure (2026-06-06): a cleanup
header_checks REDIRECT keyed on the milter-added X-Spam-Flag never fired for
real inbound mail (only locally-injected), so spam kept reaching the inbox.
Fix is to route in Sieve at delivery (after the milter), with a redirect +
loop guard. Includes the 'local-injection tests lie' warning.
- New page: Dovecot IMAP vsz_limit OOM from a bloated/corrupt index.log
(152M index on an empty folder killed IMAP children with error 83).
- fail2ban IMAP self-ban: add permanent ignoreip-whitelist fix + dynamic-IP caveat.
- firewalld mail ports: add 'submission/587 never added' variant + correct
Fedora service name; note Ansible now manages the full mail-service set.
- Index + SUMMARY updated with the new page.
Two related additions covering the 2026-05-31 cutover-night incidents on
majorlinux and majortoot-hetzner.
ssh-socket-tailscale-race-condition.md (update Race 1 fix):
- After=tailscaled.service Requires=tailscaled.service orders against the
service becoming active, not against tailscale0 having an IPv4 — hosts
kept losing SSH intermittently after reboots (incident: majorlinux +
majortoot-hetzner 2026-05-31, during cutover-night Ansible reboot).
- Canonical fix: a oneshot tailscale-wait-ready.service that polls
`ip -4 -o addr show tailscale0` until an address is present, with
ssh.socket After=/Requires= that service. Document the full evolution
(2026-05-19 BindsTo → 2026-05-23 Requires → 2026-05-31 wait-ready) so
future readers don't try the half-fixes thinking they're sufficient.
- Add majortoot-hetzner to affected hosts.
mastodon-post-install-hardening.md (new):
Four upstream-install gaps that bit during the majortoot-hetzner cutover:
1. /home/mastodon at 0750 (useradd default) → nginx www-data can't
traverse → every static asset 403s → unstyled "purple screen" in the
browser while API/HTML still work through the puma proxy.
2. .env.production at 0644 (mastodon-setup default) → DB_PASS,
SECRET_KEY_BASE, OTP_SECRET world-readable once gap (1) is fixed.
3. mastodon user shell at /usr/sbin/nologin → `su - mastodon` blocked.
4. rbenv init in .bashrc only → login shells don't source .bashrc; even
when chained, Ubuntu's .bashrc returns early for non-interactive
shells. Fix: .bash_profile sets up rbenv BEFORE sourcing .profile +
.bashrc, so it works for both interactive and non-interactive logins.
All four codified in MajorAnsible configure_mastodon_permissions.yml
with self-asserting verification steps.
02-selfhosting/index.md + SUMMARY.md:
Add a "Services" section to the selfhosting index linking the
mastodon-post-install-hardening article (and the other orphaned
services/ entries while there). SUMMARY.md gains one new entry.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
BindsTo=tailscaled.service causes a systemd ordering cycle that
prevents ssh.socket from starting on reboot. Updated the recommended
fix to use Requires= and added a warning admonition explaining why
BindsTo must not be used. Added tttpod-hetzner to affected hosts
list and linked the 2026-05-23 dcaprod incident.
Added Race 2: tailscaled starts before network-online.target, causing
Tailscale to get stuck with SetNetworkUp(false). Covers both Ubuntu
ssh.socket and cross-platform tailscaled ordering issues. Updated
references to include majordiscord incident and new Ansible playbook.
Documents the systemd socket activation race where ssh.socket binds
to the Tailscale IP before tailscaled is ready, causing SSH to become
unreachable after a Tailscale reconnect. Includes diagnosis steps and
the After=/BindsTo= fix.
- fail2ban-digest-mode-fleet: recidive-only email model, sshd now silent,
defaults-debian.conf gotcha added
- netdata-docker-health-alarm-tuning: 30m/10m config, tuning history table
- New: wp-fail2ban-logpath-debian-ubuntu, lora-adapter-gguf-conversion-fails,
tailscale-status-json-hostname-localhost-ios
- Various article updates and nav index refreshes
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Two articles surfaced during the v8 deploy + eval on 2026-04-25:
- Ollama: `ollama run` with piped stdin bypasses the chat template and
SYSTEM prompt — output looks like raw base-model completion. Caught
during initial v8 smoke test. Fix: use /api/chat HTTP endpoint.
- rsync over Tailscale can hang in TCP teardown after the data has
fully transferred. Verify with md5sum, then kill the hung pipeline.
Includes a watcher-threshold gotcha (set below true file size, not
above) and prevention tips.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
5 files had stale unmerged index entries from a prior aborted merge —
working tree was already clean (no conflict markers), differences were
purely `updated:` field timestamps that drifted across machines via
Obsidian Sync. Working-tree timestamps (most recent) are kept as the
resolution.
Sixth file (mastodon-instance-tuning.md) preserves staged S3 cache
management content that a working-tree revert would have lost.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
- Fixed 4 broken markdown links (bad relative paths in See Also sections)
- Corrected n8n port binding to 127.0.0.1:5678 (matches actual deployment)
- Updated SnapRAID article with actual majorhome paths (/majorRAID, disk1-3)
- Converted 67 Obsidian wikilinks to relative markdown links or plain text
- Added YAML frontmatter to 35 articles missing it entirely
- Completed frontmatter on 8 articles with missing fields
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Documents the 2026-03-14 incident where MajorAir's public IP was banned
by the postfix-sasl jail after repeated SASL auth failures, silently
blocking all IMAP connections from Spark Desktop.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>