Commit graph

40 commits

Author SHA1 Message Date
631d7e8bc5 Logwatch fleet article: add Fedora CA bundle diagnosis + bounce-source guidance
Documents three lessons from the 2026-05-10 fleet outage where the
Fedora half (majorhome, majorlab) had been silently failing to send
notification mail for days:

- Missing /etc/pki/tls/certs/ca-bundle.crt symlink (extracted bundle
  exists at /etc/pki/ca-trust/extracted/pem/tls-ca-bundle.pem but the
  consumer-path symlink was lost during a ca-certificates package
  event). Diagnosis includes the cross-tool tell — dnf and curl break
  with the same path. Fix is a single ln -sfn.
- Methodology: Fedora and majormail log postfix to journald; Debian and
  Ubuntu log to /var/log/mail.log. Querying the wrong source returns
  false negatives for healthy hosts.
- Bounce-source addresses (Watchtower NOTIFICATION_EMAIL_FROM,
  fail2ban sender, root@<host>.localdomain) must resolve to real
  mailboxes — otherwise the first failed delivery generates
  bounce-of-bounce churn.

Also promoting the article from untracked to committed; it had been
authored on 2026-05-09 and not yet added to the repo.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-10 12:08:15 -04:00
a852f7b7bd ClamAV fleet caveat: add follow-up on the polite-CPU-on-1vCPU edge case
Same-day correction. The proposed per-droplet relaxed alert (>95%/30m)
turned out to also trip on a 1 vCPU box during low-traffic weekly scans,
because there's literally no real load for nice 19 to yield to —
clamscan opportunistically fills the vCPU and DO sees 100% utilization
regardless of `%nice` vs `%user` split. Documents the three realistic
options (accept page / switch to clamdscan / disable alert) and the
underlying limit (no DO threshold can distinguish polite from impolite
CPU when the box is fully utilized).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-10 02:32:35 -04:00
af14e36caf ClamAV fleet article: add DigitalOcean monitoring caveat for 1vCPU droplets
DO's hypervisor-level CPU metric doesn't know about nice/ionice — a
"polite" weekly clamscan on a 1 vCPU droplet still reads 100% utilization
and trips a default >85%/5m alert. Adds a new section explaining the
trade-off and providing the DO API recipe (PUT existing alert with
explicit entities, POST a new relaxed alert scoped to the small
droplet) plus when not to bother (2+ vCPU boxes won't trip).

Triggered by the 2026-05-10 teelia incident where the weekly cron fired
the fleet-wide CPU alert despite the cron script already wrapping
clamscan in nice 19 + ionice idle + cgroup memory limits.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-10 02:24:17 -04:00
545df9f5c6 Add troubleshooting article: Claude Desktop MCP mass-disconnect from blocking SSH reboot
Documents the failure mode where issuing a synchronous `ssh host reboot`
through Claude Desktop's shell MCP poisons the local MCP transport when
the target severs its session before responding cleanly — eventually
force-disconnecting every MCP at once. Covers diagnostic chain, recovery,
fire-and-forget reboot patterns, and worked example from the 2026-05-10
majorhome AMD-card reboot.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-10 01:28:11 -04:00
7c566cda50 Add: diagnosing Castopod posts that don't appear on Mastodon
Walks the four-step diagnostic chain (post created → activity delivered →
follower exists → notification semantics) for the common confusion where
a Castopod admin's auto-broadcast "doesn't show up" on a Mastodon account
they expected. Most cases are not federation bugs but the difference
between favouriting/boosting (no follow required) and following + the
fact that Mastodon notifications fire only for mentions/follows/favs/
boosts/etc., not for new posts from people you follow. Documents the bell
icon and `@`-mention escape hatches.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-10 00:05:18 -04:00
1c17bdb60a Add: Castopod federation — stale cached avatar URL fix
When a remote actor updates their avatar, Mastodon (Paperclip) deletes the
old S3 object and stores only the new filename. Castopod 2.0.0 caches the
URL of every federated actor in cp_fediverse_actors and never refetches,
so its admin templates emit a dead link forever (the resulting S3 403 is
anti-enumeration, hiding what is really a 404). Article documents the
diagnosis pattern and three fixes (manual UPDATE, DELETE-and-refetch,
bulk audit), plus the Mastodon-side query for sourcing the correct URL.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-08 01:51:18 -04:00
393df3cc45 Add: tuning Netdata web_log_1m_successful for redirect-heavy WordPress
The stock alarm definition counts only 1xx/2xx/304/401/429 as successful,
which causes false CRITICALs on WP sites where 301 canonicalization is
normal traffic (legacy /?p=NNNN, slug edits, host/TLS upgrades, etc.).
Article documents the root cause, verification steps via the access log,
and an in-place threshold retune that keeps the alarm useful as an
"obvious meltdown" floor while delegating real outage detection to the
5xx and 4xx alarms.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-08 01:12:21 -04:00
3bcc58a805 services: add Mastodon --prune-profiles trap and recovery article
Documents the long-standing UX regression caused by
`tootctl media remove --prune-profiles` (and `--remove-headers`)
running on a schedule: cached remote avatars are deleted, but
Mastodon does not auto-refetch on profile view, so quiet remote
accounts stay broken indefinitely.

Article covers:
- The mutually-exclusive flag bug (silent skip if combined)
- Mastodon's actual avatar-refresh trigger model (Update activities,
  not profile views)
- A `refresh-my-follows.sh` pattern with a defensible WHERE clause
  (avatar NULL AND avatar_remote_url present) to avoid infinite
  retry on accounts whose origin has no avatar
- Why header_file_name IS NULL is a bad signal (~20% of users
  legitimately have no custom header)
- The cron decision: most admins should drop --prune-profiles
2026-05-07 12:01:47 -04:00
c5b4de4184 wiki: full index refresh — 106 articles, 17 new since Apr 18
Updated article count (89 → 106), domain counts, per-section
listings, and Recently Updated table. Added all articles published
since 2026-04-18 including Pi-hole, Mastodon, fail2ban digest,
LoRA GGUF, Tailscale iOS, and more.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-05-02 16:45:30 -04:00
4126656c05 wiki: update fail2ban digest + netdata docker health + 3 new articles
- fail2ban-digest-mode-fleet: recidive-only email model, sshd now silent,
  defaults-debian.conf gotcha added
- netdata-docker-health-alarm-tuning: 30m/10m config, tuning history table
- New: wp-fail2ban-logpath-debian-ubuntu, lora-adapter-gguf-conversion-fails,
  tailscale-status-json-hostname-localhost-ios
- Various article updates and nav index refreshes

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-05-02 14:58:07 -04:00
063bfa53d7 Resolve Obsidian Sync conflicts on main 2026-04-29 22:47:46 -04:00
f9c61fbac3 wiki: publish 3 unpushed articles and catch nav up
Articles from prior sessions that were written locally but never shipped:
- 02-selfhosting/cloud/aws-s3-cost-management.md — lifecycle rules, storage class selection, bucket inventory, unexpected-growth investigation
- 02-selfhosting/dns-networking/wake-on-lan-router-ssh.md — WOL magic packets via Asus router SSH + ether-wake, Ansible vault integration
- 02-selfhosting/services/claude-code-remote-control.md — mobile access to a persistent host Claude Code session

Nav updated (index.md + SUMMARY.md):
- Added Cloud subsection under Self-Hosting for aws-s3
- Added wake-on-lan and aws-s3 entries to SUMMARY
- Added claude-code-remote-control to index's Services section
- Added ansible-ssh-host-alias-bypass nav entry (article shipped in 2dbeb22)
- Article count 87 → 89, self-hosting 30 → 32, troubleshooting 33 → 34
2026-04-21 09:17:31 -04:00
76f29a46e5 SUMMARY + index: add 10 missing articles to nav, update counts to 86 2026-04-18 18:48:31 -04:00
b40e484aae Add 5 wiki articles from 2026-04-17/18 work
- ghost-smtp-mailgun-setup: two-system email config (newsletter API + transactional SMTP)
- firewalld-fleet-hardening: Fedora fleet firewall audit-and-harden pattern with Ansible
- clamav-fleet-deployment: fleet deployment with nice/ionice throttling + quarantine
- ansible-check-mode-false-positives: when: not ansible_check_mode guard for verify/assert tasks
- ghost-emailanalytics-lag-warning: submitted status, lag counter, fetchMissing skip explained
2026-04-18 11:13:39 -04:00
4f66955d33 wiki: correct article counts in index and README
Local and remote both have 76 articles on disk, but the counter
and per-domain table were stale (74 total / self-hosting 21 /
troubleshooting 29). Trued up to 76 / 22 / 30.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-14 14:17:03 -04:00
c0837b7e89 wiki: add fail2ban jail for Apache PHP webshell probes
Documents the 2026-04-09 scanner incident where 301-redirected PHP probes
bypassed the existing apache-404scan jail, leaving the scanner unbanned
and firing Netdata web_log_1m_redirects alerts. New jail catches 301/302/
403/404 PHP responses while excluding legitimate WordPress endpoints.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-13 10:17:24 -04:00
326c87421f wiki: add troubleshooting article on /var/run heartbeat reboot false alarm
Captures the majorlab incident where the backup watchdog emailed a missing
heartbeat after a kernel-update reboot wiped /var/run, even though the
backup had actually completed cleanly. Documents the tmpfs root cause and
the fix of storing heartbeats under /var/lib instead.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-13 10:11:24 -04:00
56f1014f73 Add troubleshooting article: wget/curl URLs with special characters
Covers shell quoting for URLs containing &, ?, #, and other characters
that Bash interprets as operators. Common gotcha when downloading from
CDNs with token-based URLs.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-08 10:18:34 -04:00
5af934a6c6 wiki: update SSH docs with bash.exe default shell fix and Windows admin key auth
- ssh-config-key-management: add Windows OpenSSH admin user key auth section
  (administrators_authorized_keys, BOM-free writing, ACL requirements)
- windows-openssh-wsl-default-shell: add bash.exe as recommended fix (Option 1),
  demote PowerShell to Option 2, add shell-not-found diagnostic tip
- windows-sshd-stops-after-reboot: fix stale wsl.exe reference to bash.exe
- index/README: update Recently Updated table and article descriptions

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-07 22:01:36 -04:00
84a1893e80 wiki: fix article count to 73, update frontmatter timestamps
Corrected inflated article count (was 76, actual is 73).
Updated domain breakdown and frontmatter timestamps from Obsidian.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-07 10:51:23 -04:00
daa771760b wiki: add WSL OpenSSH default shell + Ansible world-writable mount articles
Two new troubleshooting articles from today's MajorRig/MajorMac Ansible setup:
- Windows OpenSSH WSL default shell breaks remote SSH commands
- Ansible silently ignores ansible.cfg on WSL2 world-writable mounts

Article count: 76

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-03 10:23:02 -04:00
1a00fef199 Update wiki indexes for WordPress login jail article
Article count 73 → 74. Added to SUMMARY.md, index.md, README.md,
and 02-selfhosting/index.md (which was also missing 5 other security
articles from prior sessions).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-02 16:07:08 -04:00
6f53b7c6db wiki: fix broken wikilinks in index and README related sections
Removed Obsidian [[wikilinks]] pointing to vault-only docs (01-Phases, majorlab)
that don't resolve on the MkDocs site. Kept deploy status as a proper markdown link.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-02 10:59:17 -04:00
6d81e7f020 wiki: add 4 new articles from archive, merge 8 archive notes into existing articles (73 articles)
New: mdadm RAID rebuild, Mastodon instance tuning, Ventoy, Fedora networking/kernel recovery.
Merged: Glacier Deep Archive into rsync, SpamAssassin into hardening checklist,
OBS captions/VLC capture into OBS setup, yt-dlp subtitles/temp fix into yt-dlp.
Updated index.md, README.md, SUMMARY.md with 21 previously missing articles.
Fixed merge conflict in index.md Recently Updated table.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-02 10:55:53 -04:00
8c22ee708d merge: resolve conflicts, add SELinux AVC chart article; update indexes to 53
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-27 03:36:49 -04:00
fb2e3f6168 wiki: add SELinux AVC chart, enriched alerts, new server setup, and pending articles; update indexes
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-27 03:34:33 -04:00
4d59856c1e wiki: add Netdata new server deployment guide (49 articles) 2026-03-18 11:00:41 -04:00
38fe720e63 wiki: add Netdata Docker health alarm tuning article; update indexes to 48
- 02-selfhosting/monitoring/netdata-docker-health-alarm-tuning.md — new
- lookup extended to 5m average, delay: down 5m to prevent Nextcloud AIO update flapping
- SUMMARY.md, index.md, README.md, deploy status updated

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-18 00:10:36 -04:00
59a5cc530e wiki: add Windows sshd and Ollama/Tailscale sleep articles; update indexes to 47
- 05-troubleshooting/networking/windows-sshd-stops-after-reboot.md
- 05-troubleshooting/ollama-macos-sleep-tailscale-disconnect.md
- SUMMARY.md, index.md, README.md: count 45 → 47, add 5 missing articles (3 from 2026-03-16 + 2 today)
- MajorWiki-Deploy-Status.md: session update 2026-03-17

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-17 21:20:15 -04:00
279c094afc wiki: add firewalld mail ports reset article + session updates
- New article: firewalld mail ports wiped after reload (IMAP + webmail outage)
- New article: Plex 4K codec compatibility (Apple TV)
- New article: mdadm RAID recovery after USB hub disconnect
- Updated yt-dlp article
- Updated all index files: SUMMARY.md, index.md, README.md, category indexes
- Article count: 41 → 42

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-15 16:15:02 -04:00
0bcc2c822a wiki: add SELinux vmail and gitea-runner articles; update indexes
- New: SELinux Fixing Dovecot Mail Spool Context (/var/vmail)
  Corrected fix — mail_spool_t only, no dovecot_tmp_t on tmp/ dirs.
  Includes warning and recovery steps for the Postfix delivery outage.
- New: Gitea Actions Runner Boot Race Condition Fix
  network-online.target dependency, RestartSec=10, /etc/hosts workaround.
- Updated SUMMARY.md, index.md, README.md, 05-troubleshooting/index.md
- Article count: 37 → 39; MajorWiki-Deploy-Status updated

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-14 22:49:01 -04:00
1d8be8669e troubleshooting: add Fail2ban IMAP self-ban article
Documents the 2026-03-14 incident where MajorAir's public IP was banned
by the postfix-sasl jail after repeated SASL auth failures, silently
blocking all IMAP connections from Spark Desktop.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-14 21:57:01 -04:00
66fcca8028 docs: update index and README — 22 articles, 2026-03-12, new runbook 2026-03-12 18:10:15 -04:00
f35d1abdc6 docs: sync local vault content to remote
Update index pages, troubleshooting articles, README, and deploy status
to match current local vault state.
2026-03-12 18:03:52 -04:00
999e1107f0 vault backup: 2026-03-11 22:00:24 2026-03-11 22:00:24 -04:00
cc7044e9c7 vault backup: 2026-03-11 21:51:13 2026-03-11 21:51:13 -04:00
5513b28efb vault backup: 2026-03-11 21:44:06 2026-03-11 21:44:06 -04:00
59d25e589b vault backup: 2026-03-11 21:19:44 2026-03-11 21:19:44 -04:00
fbb58fdf3e vault backup: 2026-03-11 11:52:33 2026-03-11 11:52:33 -04:00
9c22a661ea chore: link vault wiki to Gitea 2026-03-11 11:20:12 -04:00