Commit graph

95 commits

Author SHA1 Message Date
623f04720c Add macOS guide: auditing & cleaning Background App Activity (sfltool dumpbtm) 2026-06-21 13:00:35 -04:00
e767ebffcb Add runbook: WordPress 6.7 _load_textdomain_just_in_time notice
Covers the WP 6.7 doing_it_wrong notice fired when a theme/plugin
translates before init (e.g. nav-menu labels on after_setup_theme).
Documents source fix (defer to init) and the update-safe mu-plugin
suppression via doing_it_wrong_trigger_error, plus the renamed-theme
domain gotcha. Derived from the majorlinux.com kappa/marstheme triage.
2026-06-21 11:44:48 -04:00
cb90bb69a2 wiki: add Steam Deck Wi-Fi flapping runbook (IWD periodic scan + rtw88 power save)
Client-side fix for OG Steam Deck (RTL8822CE/rtw88) flapping ~once a minute on
SteamOS: disable IWD periodic scan + disable Wi-Fi power save via NM dispatcher.
Cross-linked with the 160MHz airtime article; registered in SUMMARY.md nav.
2026-06-19 11:36:06 -04:00
cfff75af1c Add troubleshooting article: Time Machine orphaned APFS .previous blocks backups 2026-06-18 10:09:45 -04:00
e1767bc19e Add troubleshooting article: Permission denied (publickey) after key rotation
New 05-troubleshooting/networking article covering the per-host nature of
authorized_keys: rotating a workstation SSH key requires backfilling the new
pubkey to every host, or hosts holding only the old key reject it with
Permission denied (publickey). Includes fleet-sweep diagnosis, idempotent
backed-up backfill via a still-trusted transit user, and prevention. Wired
into SUMMARY.md nav.
2026-06-17 13:14:41 -04:00
2121d3ff1b yt-dlp: document -U trap and avoid duplicate pip installs
Add a Maintenance subsection covering why 'yt-dlp -U' fails on PyPI
builds and how to update via pip, plus how to detect/remove a duplicate
user+system install (the issue hit on majorhome 2026-06-16).
2026-06-16 19:12:06 -04:00
34d9ee42b1 Add wiki: Claude Code keychain prompt keeps reappearing on macOS
New troubleshooting article for the recurring 'security wants to access
Claude Code-credentials' prompt that persists even after Always Allow
(ACL invalidation on binary-signature change / token refresh / post-boot
churn). Covers triage, the reset-and-relogin fix, and the file-based
credentials workaround with its plaintext tradeoff. Registered in
SUMMARY + troubleshooting index; cross-linked with the corrupt-credential
login-failure article (distinct symptom).
2026-06-15 20:12:11 -04:00
a5df9e4873 Correct iPhone Mirroring article: regressed on 27.0 beta, not a Tailscale fix
2026-06-15: mirroring is reproducibly stuck on Connecting again with
Tailscale accept-routes still off, so the 06-14 it-works conclusion was
wrong. _asquic endpoint resolves but the QUIC/AWDL datapath never
completes; awdl0 bounce, full reboot, and phone radio cycle all failed.
Reframed as an intermittent macOS 27.0 beta AWDL bug; QuickTime USB
remains the workaround.
2026-06-15 19:58:20 -04:00
75154ff80c iPhone Mirroring: correct transport finding (video on llw0 not awdl0), it works on ch44, what-changed + MajorMac open test (2026-06-14 evening) 2026-06-14 19:10:06 -04:00
805c0f0a8f iPhone Mirroring AWDL article: refined root cause, Tailscale/congestion ruled out, ch36+ch44 both fail, QuickTime USB workaround, revisit checklist (2026-06-14) 2026-06-14 04:30:22 -04:00
852375ddf0 logwatch-hostname wiki: add hostname-correct-but-config-baked variant
majormail (2026-06-14) had the correct system hostname but still mailed
from majormail-hetzner — the old provisioning label was hardcoded in
logwatch.conf MailFrom and fail2ban jail.local sender. Add a variant
section covering the config grep sweep and the templated-vs-static
Ansible regression caveat.
2026-06-14 04:00:18 -04:00
9dd730fc29 Add nav entries for Warp keychain login + iPhone Mirroring AWDL articles 2026-06-13 09:58:26 -04:00
e0595c04fd Publish drafts: Warp keychain login + iPhone Mirroring AWDL stall 2026-06-13 09:57:37 -04:00
MajorLinux
27ea2dc62b Add troubleshooting article: Wi-Fi 160 MHz airtime saturation breaking game streaming 2026-06-13 09:48:43 -04:00
14cc1ba4b8 wiki: Forgejo account recovery & CLI admin when locked out of the GUI
Covers enabling the [mailer] for password recovery (relay via a tailnet mail
server, no-auth/mynetworks, FORCE_TRUST_SERVER_CERT for IP targets), CLI password
reset + the must-change-password=true gotcha, adding an SSH key via the basic-auth
API when locked out, and ruling out a server-side cause for a 'changing' password.
2026-06-12 17:36:54 -04:00
0d1697c0d6 wiki: Logwatch wrong hostname (<host>-hetzner) after migration
New troubleshooting runbook for Logwatch reports titled with the Hetzner
provisioning label instead of the real hostname; cross-linked from the
logwatch fleet-setup and VPS migration baseline articles, plus a new
'set system hostname' step in the post-migration checklist.
2026-06-12 10:58:17 -04:00
11b455a0e2 Add runbook: Ansible host-key verification failed after host rebuild/migration
Documents the Ansible-by-IP known_hosts gap: interactive ssh works (key
stored under hostname) but Ansible connects by inventory IP and fails with
UNREACHABLE/Host key verification failed. Includes tailnet-safe ssh-keyscan
fix and prevention notes. Surfaced by the Hetzner migration IP churn.
2026-06-12 09:30:09 -04:00
bc4ff144df wiki: add Ansible reboot.yml become-timeout-on-WSL2 troubleshooting article
Documents why WSL2 hosts fail an Ansible reboot play at privilege
escalation (Timeout waiting for privilege escalation prompt) — WSL2 has
no real reboot semantics + become stalls over the Windows OpenSSH->WSL2
bridge — and the fix: scope reboot.yml to hosts: all:!wsl. Registered
in SUMMARY.md and 05-troubleshooting/index.md.
2026-06-12 03:57:17 -04:00
950759da52 wiki: add MagicDNS-names-vs-pinned-IPs Tailscale SSH article
New troubleshooting/networking article covering the three SSH failure modes
after a fleet migration (stale hardcoded IP, Tailscale 1.98.x cold-path
teardown, rebuilt-box host-key mismatch) and the durable fix (MagicDNS names +
known_hosts purge + ConnectTimeout), with the WSL2 no-resolver caveat.
Cross-links the existing host-key article (adds a 'when pinning the IP is
wrong' callout) and adds the SUMMARY nav entry.
2026-06-12 01:33:31 -04:00
513d94aa84 Merge branch 'code/majorrig/wiki-ssh-magicdns-article' 2026-06-11 20:12:34 -04:00
9b066d0e54 Add troubleshooting article: SSH alias MagicDNS fall-through host-key failure
New 05-troubleshooting/networking article covering the case where ssh <alias>
fails host-key verification because no Host block exists and the alias resolves
via Tailscale MagicDNS to a name with no known_hosts entry (key stored under the
IP). Registered in SUMMARY.md and the troubleshooting index.
2026-06-11 20:12:22 -04:00
5ef0fdfad4 draft: WIP wiki articles (warp keychain credential, iPhone Mirroring AWDL stall)
Backing up two unpublished draft articles that existed only in a working-tree
stash. Drafts — NOT in SUMMARY.md nav and NOT merged to main, so not published
to notes.majorshouse.com. Pre-commit nav check bypassed intentionally (--no-verify).

- 05-troubleshooting/claude-code-warp-login-corrupt-keychain-credential.md
- 05-troubleshooting/iphone-mirroring-connecting-hang-awdl-stall-beta.md
2026-06-11 15:41:28 -04:00
06a794316b docs: point Ansible references at the new roles (clamav/ssh_hardening/tailscale)
Operational/how-to references updated to the role entry playbooks after the
ADR-0001 migration. Historical incident narrative (dated callouts, commit
refs) preserved.

- clamav-fleet-deployment: override + re-run -> clamav.yml; role note
- ssh-hardening-ansible-fleet: note this is now the ssh_hardening role
- vps-migration-baseline-checklist: table -> clamav.yml / ssh_hardening.yml
- ssh-socket-tailscale-race-condition: Affected Hosts + Prevention + References
  -> tailscale role tasks (network_wait/ssh_only_ubuntu/ssh_only_fedora)
- freshclam-logwatch-false-no-updates: codify refs -> clamav role
2026-06-11 11:33:42 -04:00
c3045e33dd troubleshooting: ssh-race article — fleet audited & reconciled 2026-06-07
dcaprod-hetzner + tttpod-hetzner were missing tailscale-wait-ready.service
(inert ssh.service gate -> latent bind race); corrected playbook applied to
both. teelia uses Tailscale SSH (no sshd, immune). All Ubuntu hosts now on
the dependency-free-socket + ssh.service-gate pattern.
2026-06-07 06:22:35 -04:00
8d4dee5da3 troubleshooting: correct ssh tailscale-race article (Fedora ListenAddress variant + playbook cycle landmine)
- Fedora hosts are NOT automatically immune: a leftover manual
  `ListenAddress <tailscale-ip>` drop-in reintroduces the sshd boot bind-race
  even under firewalld (hit on majordiscord 2026-06-07; fix = remove it).
- The Ubuntu playbook kept shipping the cycle-causing [Unit] gate on
  ssh.socket despite the 2026-06-04 resolution; re-running it re-armed the
  ordering cycle (clobbered majorlinux; majortoot-hetzner found armed).
  Corrected in MajorAnsible e0d35aa. Fleet ssh-lockdown state is inconsistent
  (dcaprod/tttpod lack wait-ready; teelia no override) — needs a per-host audit.
2026-06-07 05:56:25 -04:00
01ae62e621 troubleshooting: Dovecot phantom mailboxes from .dovecot.lda-dupes (mail_home overlapping maildir root)
Document the majormail 2026-06-07 incident: when userdb home == maildir
root, the LDA/Sieve duplicate database (.dovecot.lda-dupes + .locks) lands
inside the mail store and the maildir lister exposes it as phantom
mailboxes ("dovecot.lda-dupes"), logging stat(.../tmp) "Not a directory".
Fix: point home at a non-dotted subdir. Wired into the troubleshooting
index and SUMMARY.
2026-06-07 05:06:43 -04:00
662741e7ad troubleshooting: Postfix header_checks can't act on milter-added headers
Document the majormail spam-routing failure (2026-06-06): a cleanup
header_checks REDIRECT keyed on the milter-added X-Spam-Flag never fired for
real inbound mail (only locally-injected), so spam kept reaching the inbox.
Fix is to route in Sieve at delivery (after the milter), with a redirect +
loop guard. Includes the 'local-injection tests lie' warning.
2026-06-06 10:38:04 -04:00
d8f07e8e2e wiki: add ClamAV freshness watchdog + sendmail (not mail) alert guidance
Document the daily /etc/cron.daily/clamav-freshness watchdog as the real
detector for stale signatures, and the key gotcha that 'mail' is absent on
most fleet hosts so alert scripts must use /usr/sbin/sendmail -t.
2026-06-06 07:17:56 -04:00
5d7354e856 troubleshooting: freshclam daemon-mode logwatch false 'no updates' alert
logwatch's clam-update counts only 'process started' lines (emitted only at
daemon restart), so daemon-mode freshclam false-alarms on quiet days despite
signatures updating. Fix: $ignore_no_updates=1 drop-in. Includes the
real-vs-false check (a daemonless box with freshclam disabled is a TRUE alert).
2026-06-06 07:06:29 -04:00
d755b77126 troubleshooting: SELinux /etc/localtime mislabel silently breaks timezone
New page documenting the majormail (2026-06-05) issue: /etc/localtime
shipped labeled etc_t instead of locale_t on the Hetzner image, so SELinux
denied systemd-timedated and timedatectl/community.general.timezone reported
success while the symlink stayed at UTC. Fix: restorecon before setting TZ.
Indexed in index.md (SELinux) + SUMMARY.md.
2026-06-05 14:22:00 -04:00
26eb13ab2f troubleshooting: document majormail client-connectivity incident (2026-06-05)
- New page: Dovecot IMAP vsz_limit OOM from a bloated/corrupt index.log
  (152M index on an empty folder killed IMAP children with error 83).
- fail2ban IMAP self-ban: add permanent ignoreip-whitelist fix + dynamic-IP caveat.
- firewalld mail ports: add 'submission/587 never added' variant + correct
  Fedora service name; note Ansible now manages the full mail-service set.
- Index + SUMMARY updated with the new page.
2026-06-05 14:04:22 -04:00
110a6d49e5 wiki: add inbound spam filtering guide (spamass-milter + SpamAssassin Bayes)
New 02-selfhosting/services article: the full Postfix/Dovecot inbound spam stack
on Fedora — spamass-milter tag-only wiring (the -r footgun), socket permissions
(sa-milt group + UMask), site-wide Bayes DB, Sieve-to-Junk, and sa-learn training
(folders, spam/ham balance, manual-not-cron). From the majormail setup.

Also extends selinux-dovecot-vmail-context with a Permissive-mode variant + a
postfix_cleanup->mysqld_etc companion-denial note. SUMMARY.md nav updated.
2026-06-04 16:31:14 -04:00
155651c373 wiki: ssh.socket wait-ready gate + mastodon post-install hardening
Two related additions covering the 2026-05-31 cutover-night incidents on
majorlinux and majortoot-hetzner.

ssh-socket-tailscale-race-condition.md (update Race 1 fix):
- After=tailscaled.service Requires=tailscaled.service orders against the
  service becoming active, not against tailscale0 having an IPv4 — hosts
  kept losing SSH intermittently after reboots (incident: majorlinux +
  majortoot-hetzner 2026-05-31, during cutover-night Ansible reboot).
- Canonical fix: a oneshot tailscale-wait-ready.service that polls
  `ip -4 -o addr show tailscale0` until an address is present, with
  ssh.socket After=/Requires= that service. Document the full evolution
  (2026-05-19 BindsTo → 2026-05-23 Requires → 2026-05-31 wait-ready) so
  future readers don't try the half-fixes thinking they're sufficient.
- Add majortoot-hetzner to affected hosts.

mastodon-post-install-hardening.md (new):
Four upstream-install gaps that bit during the majortoot-hetzner cutover:
1. /home/mastodon at 0750 (useradd default) → nginx www-data can't
   traverse → every static asset 403s → unstyled "purple screen" in the
   browser while API/HTML still work through the puma proxy.
2. .env.production at 0644 (mastodon-setup default) → DB_PASS,
   SECRET_KEY_BASE, OTP_SECRET world-readable once gap (1) is fixed.
3. mastodon user shell at /usr/sbin/nologin → `su - mastodon` blocked.
4. rbenv init in .bashrc only → login shells don't source .bashrc; even
   when chained, Ubuntu's .bashrc returns early for non-interactive
   shells. Fix: .bash_profile sets up rbenv BEFORE sourcing .profile +
   .bashrc, so it works for both interactive and non-interactive logins.

All four codified in MajorAnsible configure_mastodon_permissions.yml
with self-asserting verification steps.

02-selfhosting/index.md + SUMMARY.md:
Add a "Services" section to the selfhosting index linking the
mastodon-post-install-hardening article (and the other orphaned
services/ entries while there). SUMMARY.md gains one new entry.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-31 11:08:24 -04:00
52ca8a0413 wiki: batch update — 4 new articles + 4 updates
New articles:
- Postfix SendGrid TLS handshake failure (port 465 vs 587)
- Plex transcoding troubleshooting
- Ansible Ubuntu reboot detection kernel mismatch
- WSL2 PyTorch checkpoint Windows filesystem deadlock

Updated:
- AWS S3 cost management (expanded)
- Network overview (IP updates)
- HEVC VAAPI batch encode (progress + fixes)
- SUMMARY.md (new entries)
2026-05-25 13:55:10 -04:00
3b8c8b0597 ssh.socket wiki: correct BindsTo→Requires, add warning
BindsTo=tailscaled.service causes a systemd ordering cycle that
prevents ssh.socket from starting on reboot. Updated the recommended
fix to use Requires= and added a warning admonition explaining why
BindsTo must not be used. Added tttpod-hetzner to affected hosts
list and linked the 2026-05-23 dcaprod incident.
2026-05-23 02:40:04 -04:00
65b0aa4567 wiki: expand Tailscale race condition article with network-online race
Added Race 2: tailscaled starts before network-online.target, causing
Tailscale to get stuck with SetNetworkUp(false). Covers both Ubuntu
ssh.socket and cross-platform tailscaled ordering issues. Updated
references to include majordiscord incident and new Ansible playbook.
2026-05-19 20:39:18 -04:00
eb39da9a26 Merge cowork/majorair/ssh-socket-wiki: ssh.socket Tailscale race condition article 2026-05-19 19:36:19 -04:00
7dc591d257 wiki: add ssh.socket Tailscale race condition troubleshooting article
Documents the systemd socket activation race where ssh.socket binds
to the Tailscale IP before tailscaled is ready, causing SSH to become
unreachable after a Tailscale reconnect. Includes diagnosis steps and
the After=/BindsTo= fix.
2026-05-19 19:35:16 -04:00
Marcus (via Claude Code)
28518e403e Add troubleshooting articles: Netdata apps-group FD false-positive + OBS stale script paths
- netdata-apps-fds-group-false-positive: the apps_group_file_descriptors_utilization
  false 100% on forking/root app groups (tailscaled on MajorToot 2026-05-15),
  the not-a-privilege gotcha, fleet-wide silence fix in MajorAnsible.
- obs-stale-script-paths: pending from prior session (not on remote).
- SUMMARY.md: link both (re-applied onto upstream after concurrent rebase).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-15 03:22:12 -04:00
ac84610380 wiki: add 1 vCPU nice/ionice limitation note to ClamAV article
nice -n 19 only yields when other processes compete; on single-core
VPS boxes the scan still saturates CPU. Document the expectation.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-05-11 18:32:01 -04:00
3df0979786 Merge branch 'code/majorair/logwatch-ca-bundle-docs'
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-05-11 07:37:48 -04:00
de9b661b9d wiki: add Fedora CA bundle article, update migration checklist and logwatch docs
New article documenting missing /etc/pki/tls/certs/ca-bundle.crt symlink
on Hetzner Fedora images breaking Postfix TLS, curl, and dnf. Updated
VPS migration baseline checklist with timezone, CA bundle, and crond
verification steps. Updated logwatch fleet setup with crond check.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-05-11 07:35:42 -04:00
724ae2a5e3 Add troubleshooting article: PHP 8.4 implicit-nullable vendor patch
Generalizes the Castopod/UuidModel incident from 2026-05-10. PHP 8.4
deprecated implicit-nullable parameters (`function f(int $x = null)`).
Old vendor libraries spam E_DEPRECATED warnings; CodeIgniter wraps each
in a 23-frame stack trace; per-minute spark cron amplifies into 53-80
MB/day log bleed and 22% sustained CPU floor on small VPS boxes.

Documents the four-line sed fix AND the substring-match gotcha that
extended the fix from 30 seconds to 30 minutes — bare `int \$limit = null`
patterns substring-match `?int \$limit = null` elsewhere in the file
and produce illegal `??type` syntax. Covers anchored sed patterns,
reference-parameter handling (&\$db), the lint-after-every-edit rule,
and a bonus section on hunting stray developer debug prints
(`log_message('critical', 'ITS HEEEEEEEEEEEERE')`).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-10 12:52:25 -04:00
545df9f5c6 Add troubleshooting article: Claude Desktop MCP mass-disconnect from blocking SSH reboot
Documents the failure mode where issuing a synchronous `ssh host reboot`
through Claude Desktop's shell MCP poisons the local MCP transport when
the target severs its session before responding cleanly — eventually
force-disconnecting every MCP at once. Covers diagnostic chain, recovery,
fire-and-forget reboot patterns, and worked example from the 2026-05-10
majorhome AMD-card reboot.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-10 01:28:11 -04:00
7c566cda50 Add: diagnosing Castopod posts that don't appear on Mastodon
Walks the four-step diagnostic chain (post created → activity delivered →
follower exists → notification semantics) for the common confusion where
a Castopod admin's auto-broadcast "doesn't show up" on a Mastodon account
they expected. Most cases are not federation bugs but the difference
between favouriting/boosting (no follow required) and following + the
fact that Mastodon notifications fire only for mentions/follows/favs/
boosts/etc., not for new posts from people you follow. Documents the bell
icon and `@`-mention escape hatches.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-10 00:05:18 -04:00
1c17bdb60a Add: Castopod federation — stale cached avatar URL fix
When a remote actor updates their avatar, Mastodon (Paperclip) deletes the
old S3 object and stores only the new filename. Castopod 2.0.0 caches the
URL of every federated actor in cp_fediverse_actors and never refetches,
so its admin templates emit a dead link forever (the resulting S3 403 is
anti-enumeration, hiding what is really a 404). Article documents the
diagnosis pattern and three fixes (manual UPDATE, DELETE-and-refetch,
bulk audit), plus the Mastodon-side query for sourcing the correct URL.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-08 01:51:18 -04:00
393df3cc45 Add: tuning Netdata web_log_1m_successful for redirect-heavy WordPress
The stock alarm definition counts only 1xx/2xx/304/401/429 as successful,
which causes false CRITICALs on WP sites where 301 canonicalization is
normal traffic (legacy /?p=NNNN, slug edits, host/TLS upgrades, etc.).
Article documents the root cause, verification steps via the access log,
and an in-place threshold retune that keeps the alarm useful as an
"obvious meltdown" floor while delegating real outage detection to the
5xx and 4xx alarms.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-08 01:12:21 -04:00
3c4cc74aef Merge: wiki — Ansible regex_search set_fact gotcha 2026-05-06 08:28:22 -04:00
ca123b0312 wiki: add troubleshooting article — Ansible regex_search capture group fails in set_fact
Documents the gotcha hit during the 2026-05-06 update.yml refactor:
the second-positional-argument back-reference form of regex_search
('\1') doesn't reliably select capture groups when used inside
set_fact. The fix is to match the broader substring and use
.split()[0] (or [-1], etc.) to peel off the value, with a default()
bridge for the no-match case.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-06 08:28:21 -04:00
213a84ed79 wiki: add .keep files for 04-streaming and 05-troubleshooting subdirs 2026-05-02 17:50:22 -04:00