Commit graph

37 commits

Author SHA1 Message Date
cb90bb69a2 wiki: add Steam Deck Wi-Fi flapping runbook (IWD periodic scan + rtw88 power save)
Client-side fix for OG Steam Deck (RTL8822CE/rtw88) flapping ~once a minute on
SteamOS: disable IWD periodic scan + disable Wi-Fi power save via NM dispatcher.
Cross-linked with the 160MHz airtime article; registered in SUMMARY.md nav.
2026-06-19 11:36:06 -04:00
e1767bc19e Add troubleshooting article: Permission denied (publickey) after key rotation
New 05-troubleshooting/networking article covering the per-host nature of
authorized_keys: rotating a workstation SSH key requires backfilling the new
pubkey to every host, or hosts holding only the old key reject it with
Permission denied (publickey). Includes fleet-sweep diagnosis, idempotent
backed-up backfill via a still-trusted transit user, and prevention. Wired
into SUMMARY.md nav.
2026-06-17 13:14:41 -04:00
MajorLinux
27ea2dc62b Add troubleshooting article: Wi-Fi 160 MHz airtime saturation breaking game streaming 2026-06-13 09:48:43 -04:00
11b455a0e2 Add runbook: Ansible host-key verification failed after host rebuild/migration
Documents the Ansible-by-IP known_hosts gap: interactive ssh works (key
stored under hostname) but Ansible connects by inventory IP and fails with
UNREACHABLE/Host key verification failed. Includes tailnet-safe ssh-keyscan
fix and prevention notes. Surfaced by the Hetzner migration IP churn.
2026-06-12 09:30:09 -04:00
950759da52 wiki: add MagicDNS-names-vs-pinned-IPs Tailscale SSH article
New troubleshooting/networking article covering the three SSH failure modes
after a fleet migration (stale hardcoded IP, Tailscale 1.98.x cold-path
teardown, rebuilt-box host-key mismatch) and the durable fix (MagicDNS names +
known_hosts purge + ConnectTimeout), with the WSL2 no-resolver caveat.
Cross-links the existing host-key article (adds a 'when pinning the IP is
wrong' callout) and adds the SUMMARY nav entry.
2026-06-12 01:33:31 -04:00
513d94aa84 Merge branch 'code/majorrig/wiki-ssh-magicdns-article' 2026-06-11 20:12:34 -04:00
9b066d0e54 Add troubleshooting article: SSH alias MagicDNS fall-through host-key failure
New 05-troubleshooting/networking article covering the case where ssh <alias>
fails host-key verification because no Host block exists and the alias resolves
via Tailscale MagicDNS to a name with no known_hosts entry (key stored under the
IP). Registered in SUMMARY.md and the troubleshooting index.
2026-06-11 20:12:22 -04:00
06a794316b docs: point Ansible references at the new roles (clamav/ssh_hardening/tailscale)
Operational/how-to references updated to the role entry playbooks after the
ADR-0001 migration. Historical incident narrative (dated callouts, commit
refs) preserved.

- clamav-fleet-deployment: override + re-run -> clamav.yml; role note
- ssh-hardening-ansible-fleet: note this is now the ssh_hardening role
- vps-migration-baseline-checklist: table -> clamav.yml / ssh_hardening.yml
- ssh-socket-tailscale-race-condition: Affected Hosts + Prevention + References
  -> tailscale role tasks (network_wait/ssh_only_ubuntu/ssh_only_fedora)
- freshclam-logwatch-false-no-updates: codify refs -> clamav role
2026-06-11 11:33:42 -04:00
c3045e33dd troubleshooting: ssh-race article — fleet audited & reconciled 2026-06-07
dcaprod-hetzner + tttpod-hetzner were missing tailscale-wait-ready.service
(inert ssh.service gate -> latent bind race); corrected playbook applied to
both. teelia uses Tailscale SSH (no sshd, immune). All Ubuntu hosts now on
the dependency-free-socket + ssh.service-gate pattern.
2026-06-07 06:22:35 -04:00
8d4dee5da3 troubleshooting: correct ssh tailscale-race article (Fedora ListenAddress variant + playbook cycle landmine)
- Fedora hosts are NOT automatically immune: a leftover manual
  `ListenAddress <tailscale-ip>` drop-in reintroduces the sshd boot bind-race
  even under firewalld (hit on majordiscord 2026-06-07; fix = remove it).
- The Ubuntu playbook kept shipping the cycle-causing [Unit] gate on
  ssh.socket despite the 2026-06-04 resolution; re-running it re-armed the
  ordering cycle (clobbered majorlinux; majortoot-hetzner found armed).
  Corrected in MajorAnsible e0d35aa. Fleet ssh-lockdown state is inconsistent
  (dcaprod/tttpod lack wait-ready; teelia no override) — needs a per-host audit.
2026-06-07 05:56:25 -04:00
01ae62e621 troubleshooting: Dovecot phantom mailboxes from .dovecot.lda-dupes (mail_home overlapping maildir root)
Document the majormail 2026-06-07 incident: when userdb home == maildir
root, the LDA/Sieve duplicate database (.dovecot.lda-dupes + .locks) lands
inside the mail store and the maildir lister exposes it as phantom
mailboxes ("dovecot.lda-dupes"), logging stat(.../tmp) "Not a directory".
Fix: point home at a non-dotted subdir. Wired into the troubleshooting
index and SUMMARY.
2026-06-07 05:06:43 -04:00
662741e7ad troubleshooting: Postfix header_checks can't act on milter-added headers
Document the majormail spam-routing failure (2026-06-06): a cleanup
header_checks REDIRECT keyed on the milter-added X-Spam-Flag never fired for
real inbound mail (only locally-injected), so spam kept reaching the inbox.
Fix is to route in Sieve at delivery (after the milter), with a redirect +
loop guard. Includes the 'local-injection tests lie' warning.
2026-06-06 10:38:04 -04:00
26eb13ab2f troubleshooting: document majormail client-connectivity incident (2026-06-05)
- New page: Dovecot IMAP vsz_limit OOM from a bloated/corrupt index.log
  (152M index on an empty folder killed IMAP children with error 83).
- fail2ban IMAP self-ban: add permanent ignoreip-whitelist fix + dynamic-IP caveat.
- firewalld mail ports: add 'submission/587 never added' variant + correct
  Fedora service name; note Ansible now manages the full mail-service set.
- Index + SUMMARY updated with the new page.
2026-06-05 14:04:22 -04:00
155651c373 wiki: ssh.socket wait-ready gate + mastodon post-install hardening
Two related additions covering the 2026-05-31 cutover-night incidents on
majorlinux and majortoot-hetzner.

ssh-socket-tailscale-race-condition.md (update Race 1 fix):
- After=tailscaled.service Requires=tailscaled.service orders against the
  service becoming active, not against tailscale0 having an IPv4 — hosts
  kept losing SSH intermittently after reboots (incident: majorlinux +
  majortoot-hetzner 2026-05-31, during cutover-night Ansible reboot).
- Canonical fix: a oneshot tailscale-wait-ready.service that polls
  `ip -4 -o addr show tailscale0` until an address is present, with
  ssh.socket After=/Requires= that service. Document the full evolution
  (2026-05-19 BindsTo → 2026-05-23 Requires → 2026-05-31 wait-ready) so
  future readers don't try the half-fixes thinking they're sufficient.
- Add majortoot-hetzner to affected hosts.

mastodon-post-install-hardening.md (new):
Four upstream-install gaps that bit during the majortoot-hetzner cutover:
1. /home/mastodon at 0750 (useradd default) → nginx www-data can't
   traverse → every static asset 403s → unstyled "purple screen" in the
   browser while API/HTML still work through the puma proxy.
2. .env.production at 0644 (mastodon-setup default) → DB_PASS,
   SECRET_KEY_BASE, OTP_SECRET world-readable once gap (1) is fixed.
3. mastodon user shell at /usr/sbin/nologin → `su - mastodon` blocked.
4. rbenv init in .bashrc only → login shells don't source .bashrc; even
   when chained, Ubuntu's .bashrc returns early for non-interactive
   shells. Fix: .bash_profile sets up rbenv BEFORE sourcing .profile +
   .bashrc, so it works for both interactive and non-interactive logins.

All four codified in MajorAnsible configure_mastodon_permissions.yml
with self-asserting verification steps.

02-selfhosting/index.md + SUMMARY.md:
Add a "Services" section to the selfhosting index linking the
mastodon-post-install-hardening article (and the other orphaned
services/ entries while there). SUMMARY.md gains one new entry.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-31 11:08:24 -04:00
52ca8a0413 wiki: batch update — 4 new articles + 4 updates
New articles:
- Postfix SendGrid TLS handshake failure (port 465 vs 587)
- Plex transcoding troubleshooting
- Ansible Ubuntu reboot detection kernel mismatch
- WSL2 PyTorch checkpoint Windows filesystem deadlock

Updated:
- AWS S3 cost management (expanded)
- Network overview (IP updates)
- HEVC VAAPI batch encode (progress + fixes)
- SUMMARY.md (new entries)
2026-05-25 13:55:10 -04:00
3b8c8b0597 ssh.socket wiki: correct BindsTo→Requires, add warning
BindsTo=tailscaled.service causes a systemd ordering cycle that
prevents ssh.socket from starting on reboot. Updated the recommended
fix to use Requires= and added a warning admonition explaining why
BindsTo must not be used. Added tttpod-hetzner to affected hosts
list and linked the 2026-05-23 dcaprod incident.
2026-05-23 02:40:04 -04:00
65b0aa4567 wiki: expand Tailscale race condition article with network-online race
Added Race 2: tailscaled starts before network-online.target, causing
Tailscale to get stuck with SetNetworkUp(false). Covers both Ubuntu
ssh.socket and cross-platform tailscaled ordering issues. Updated
references to include majordiscord incident and new Ansible playbook.
2026-05-19 20:39:18 -04:00
7dc591d257 wiki: add ssh.socket Tailscale race condition troubleshooting article
Documents the systemd socket activation race where ssh.socket binds
to the Tailscale IP before tailscaled is ready, causing SSH to become
unreachable after a Tailscale reconnect. Includes diagnosis steps and
the After=/BindsTo= fix.
2026-05-19 19:35:16 -04:00
4126656c05 wiki: update fail2ban digest + netdata docker health + 3 new articles
- fail2ban-digest-mode-fleet: recidive-only email model, sshd now silent,
  defaults-debian.conf gotcha added
- netdata-docker-health-alarm-tuning: 30m/10m config, tuning history table
- New: wp-fail2ban-logpath-debian-ubuntu, lora-adapter-gguf-conversion-fails,
  tailscale-status-json-hostname-localhost-ios
- Various article updates and nav index refreshes

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-05-02 14:58:07 -04:00
0996861512 wiki: add troubleshooting articles from MajorTwin v8 cycle
Two articles surfaced during the v8 deploy + eval on 2026-04-25:

- Ollama: `ollama run` with piped stdin bypasses the chat template and
  SYSTEM prompt — output looks like raw base-model completion. Caught
  during initial v8 smoke test. Fix: use /api/chat HTTP endpoint.

- rsync over Tailscale can hang in TCP teardown after the data has
  fully transferred. Verify with md5sum, then kill the hung pipeline.
  Includes a watcher-threshold gotcha (set below true file size, not
  above) and prevention tips.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-25 12:57:39 -04:00
ef91865f88 wiki: resolve Obsidian Sync timestamp conflicts
5 files had stale unmerged index entries from a prior aborted merge —
working tree was already clean (no conflict markers), differences were
purely `updated:` field timestamps that drifted across machines via
Obsidian Sync. Working-tree timestamps (most recent) are kept as the
resolution.

Sixth file (mastodon-instance-tuning.md) preserves staged S3 cache
management content that a working-tree revert would have lost.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-25 12:57:39 -04:00
ae563efc9e docs: add Pi-hole AI blocklist / claude.ai ERR_CONNECTION_REFUSED article
- New: 05-troubleshooting/networking/pihole-blocks-claude-desktop.md
  Covers diagnosis via FTL SQLite query log, gravity DB adlist lookup,
  fix via type-0 domainlist whitelist entry + pihole reloaddns, and
  why NULL blocking mode produces TCP refused instead of NXDOMAIN.
- Updated SUMMARY.md and 05-troubleshooting/index.md with new entry
2026-04-22 18:12:08 -04:00
668891082e wiki: updates to existing articles
- 01-linux/networking/ssh-config-key-management.md
- 02-selfhosting/services/mastodon-instance-tuning.md (expanded)
- 05-troubleshooting/ansible-check-mode-false-positives.md — adds "Related
  pattern: command/shell skipped in check mode" section with the ebtables
  usrmerge diagnosis as worked example
- 05-troubleshooting/networking/windows-openssh-wsl-default-shell-breaks-remote-commands.md
- 05-troubleshooting/networking/windows-sshd-stops-after-reboot.md
- 05-troubleshooting/yt-dlp-fedora-js-challenge.md
2026-04-21 09:15:31 -04:00
b40e484aae Add 5 wiki articles from 2026-04-17/18 work
- ghost-smtp-mailgun-setup: two-system email config (newsletter API + transactional SMTP)
- firewalld-fleet-hardening: Fedora fleet firewall audit-and-harden pattern with Ansible
- clamav-fleet-deployment: fleet deployment with nice/ionice throttling + quarantine
- ansible-check-mode-false-positives: when: not ansible_check_mode guard for verify/assert tasks
- ghost-emailanalytics-lag-warning: submitted status, lag counter, fetchMissing skip explained
2026-04-18 11:13:39 -04:00
5af934a6c6 wiki: update SSH docs with bash.exe default shell fix and Windows admin key auth
- ssh-config-key-management: add Windows OpenSSH admin user key auth section
  (administrators_authorized_keys, BOM-free writing, ACL requirements)
- windows-openssh-wsl-default-shell: add bash.exe as recommended fix (Option 1),
  demote PowerShell to Option 2, add shell-not-found diagnostic tip
- windows-sshd-stops-after-reboot: fix stale wsl.exe reference to bash.exe
- index/README: update Recently Updated table and article descriptions

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-07 22:01:36 -04:00
daa771760b wiki: add WSL OpenSSH default shell + Ansible world-writable mount articles
Two new troubleshooting articles from today's MajorRig/MajorMac Ansible setup:
- Windows OpenSSH WSL default shell breaks remote SSH commands
- Ansible silently ignores ansible.cfg on WSL2 world-writable mounts

Article count: 76

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-03 10:23:02 -04:00
6592eb4fea wiki: audit fixes — broken links, wikilinks, frontmatter, stale content (66 files)
- Fixed 4 broken markdown links (bad relative paths in See Also sections)
- Corrected n8n port binding to 127.0.0.1:5678 (matches actual deployment)
- Updated SnapRAID article with actual majorhome paths (/majorRAID, disk1-3)
- Converted 67 Obsidian wikilinks to relative markdown links or plain text
- Added YAML frontmatter to 35 articles missing it entirely
- Completed frontmatter on 8 articles with missing fields

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-02 11:16:29 -04:00
9acd083577 wiki: add fail2ban UFW rule bloat and Apache dirscan jail articles (56 articles)
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-28 00:54:06 -04:00
8c22ee708d merge: resolve conflicts, add SELinux AVC chart article; update indexes to 53
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-27 03:36:49 -04:00
fb2e3f6168 wiki: add SELinux AVC chart, enriched alerts, new server setup, and pending articles; update indexes
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-27 03:34:33 -04:00
c4d3f8e974 wiki: add Tailscale SSH reauth article; update Netdata Docker alarm tuning (50 articles)
- New: Tailscale SSH unexpected re-authentication prompt — diagnosis and fix
- Updated: netdata-docker-health-alarm-tuning — add delay: up 3m to suppress
  Nextcloud AIO PHP-FPM ~90s startup false alerts; update settings table and notes
- Updated: 05-troubleshooting/index.md and SUMMARY.md

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-21 00:12:52 -04:00
59a5cc530e wiki: add Windows sshd and Ollama/Tailscale sleep articles; update indexes to 47
- 05-troubleshooting/networking/windows-sshd-stops-after-reboot.md
- 05-troubleshooting/ollama-macos-sleep-tailscale-disconnect.md
- SUMMARY.md, index.md, README.md: count 45 → 47, add 5 missing articles (3 from 2026-03-16 + 2 today)
- MajorWiki-Deploy-Status.md: session update 2026-03-17

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-17 21:20:15 -04:00
279c094afc wiki: add firewalld mail ports reset article + session updates
- New article: firewalld mail ports wiped after reload (IMAP + webmail outage)
- New article: Plex 4K codec compatibility (Apple TV)
- New article: mdadm RAID recovery after USB hub disconnect
- Updated yt-dlp article
- Updated all index files: SUMMARY.md, index.md, README.md, category indexes
- Article count: 41 → 42

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-15 16:15:02 -04:00
3159bbfb48 merge: resolve conflicts, keep new IMAP self-ban article
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-14 22:03:16 -04:00
1d8be8669e troubleshooting: add Fail2ban IMAP self-ban article
Documents the 2026-03-14 incident where MajorAir's public IP was banned
by the postfix-sasl jail after repeated SASL auth failures, silently
blocking all IMAP connections from Spark Desktop.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-14 21:57:01 -04:00
639b23f861 vault backup: 2026-03-13 01:31:25 2026-03-13 01:31:25 -04:00
9c22a661ea chore: link vault wiki to Gitea 2026-03-11 11:20:12 -04:00