Commit graph

42 commits

Author SHA1 Message Date
9c62e7f804 Logwatch fleet article: add cloud-image config-drift section
Documents three more patterns surfaced in the 2026-05-10 fleet-mail
investigation, all hitting hosts derived from cloud images or
cross-provider migrations:

- Packer/snapshot-leftover myhostname (postfix EHLO + message-id
  identifies the build artifact, not the production hostname; remote
  spam scorers hate it)
- Empty relayhost silently routes mail via the public MX instead of
  the Tailscale-internal path, exposing it to spamchk that internal
  traffic bypasses
- Stale SASL passwd map referencing a missing file from a previous
  external-SMTP relay setup, deferring every send with "local data
  error"

Each looks benign in isolation. Together they made dcaprod's Logwatch
disappear into spamchk for weeks while showing 250 OK on the source.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-10 12:58:00 -04:00
724ae2a5e3 Add troubleshooting article: PHP 8.4 implicit-nullable vendor patch
Generalizes the Castopod/UuidModel incident from 2026-05-10. PHP 8.4
deprecated implicit-nullable parameters (`function f(int $x = null)`).
Old vendor libraries spam E_DEPRECATED warnings; CodeIgniter wraps each
in a 23-frame stack trace; per-minute spark cron amplifies into 53-80
MB/day log bleed and 22% sustained CPU floor on small VPS boxes.

Documents the four-line sed fix AND the substring-match gotcha that
extended the fix from 30 seconds to 30 minutes — bare `int \$limit = null`
patterns substring-match `?int \$limit = null` elsewhere in the file
and produce illegal `??type` syntax. Covers anchored sed patterns,
reference-parameter handling (&\$db), the lint-after-every-edit rule,
and a bonus section on hunting stray developer debug prints
(`log_message('critical', 'ITS HEEEEEEEEEEEERE')`).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-10 12:52:25 -04:00
631d7e8bc5 Logwatch fleet article: add Fedora CA bundle diagnosis + bounce-source guidance
Documents three lessons from the 2026-05-10 fleet outage where the
Fedora half (majorhome, majorlab) had been silently failing to send
notification mail for days:

- Missing /etc/pki/tls/certs/ca-bundle.crt symlink (extracted bundle
  exists at /etc/pki/ca-trust/extracted/pem/tls-ca-bundle.pem but the
  consumer-path symlink was lost during a ca-certificates package
  event). Diagnosis includes the cross-tool tell — dnf and curl break
  with the same path. Fix is a single ln -sfn.
- Methodology: Fedora and majormail log postfix to journald; Debian and
  Ubuntu log to /var/log/mail.log. Querying the wrong source returns
  false negatives for healthy hosts.
- Bounce-source addresses (Watchtower NOTIFICATION_EMAIL_FROM,
  fail2ban sender, root@<host>.localdomain) must resolve to real
  mailboxes — otherwise the first failed delivery generates
  bounce-of-bounce churn.

Also promoting the article from untracked to committed; it had been
authored on 2026-05-09 and not yet added to the repo.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-10 12:08:15 -04:00
a852f7b7bd ClamAV fleet caveat: add follow-up on the polite-CPU-on-1vCPU edge case
Same-day correction. The proposed per-droplet relaxed alert (>95%/30m)
turned out to also trip on a 1 vCPU box during low-traffic weekly scans,
because there's literally no real load for nice 19 to yield to —
clamscan opportunistically fills the vCPU and DO sees 100% utilization
regardless of `%nice` vs `%user` split. Documents the three realistic
options (accept page / switch to clamdscan / disable alert) and the
underlying limit (no DO threshold can distinguish polite from impolite
CPU when the box is fully utilized).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-10 02:32:35 -04:00
af14e36caf ClamAV fleet article: add DigitalOcean monitoring caveat for 1vCPU droplets
DO's hypervisor-level CPU metric doesn't know about nice/ionice — a
"polite" weekly clamscan on a 1 vCPU droplet still reads 100% utilization
and trips a default >85%/5m alert. Adds a new section explaining the
trade-off and providing the DO API recipe (PUT existing alert with
explicit entities, POST a new relaxed alert scoped to the small
droplet) plus when not to bother (2+ vCPU boxes won't trip).

Triggered by the 2026-05-10 teelia incident where the weekly cron fired
the fleet-wide CPU alert despite the cron script already wrapping
clamscan in nice 19 + ionice idle + cgroup memory limits.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-10 02:24:17 -04:00
545df9f5c6 Add troubleshooting article: Claude Desktop MCP mass-disconnect from blocking SSH reboot
Documents the failure mode where issuing a synchronous `ssh host reboot`
through Claude Desktop's shell MCP poisons the local MCP transport when
the target severs its session before responding cleanly — eventually
force-disconnecting every MCP at once. Covers diagnostic chain, recovery,
fire-and-forget reboot patterns, and worked example from the 2026-05-10
majorhome AMD-card reboot.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-10 01:28:11 -04:00
7c566cda50 Add: diagnosing Castopod posts that don't appear on Mastodon
Walks the four-step diagnostic chain (post created → activity delivered →
follower exists → notification semantics) for the common confusion where
a Castopod admin's auto-broadcast "doesn't show up" on a Mastodon account
they expected. Most cases are not federation bugs but the difference
between favouriting/boosting (no follow required) and following + the
fact that Mastodon notifications fire only for mentions/follows/favs/
boosts/etc., not for new posts from people you follow. Documents the bell
icon and `@`-mention escape hatches.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-10 00:05:18 -04:00
1c17bdb60a Add: Castopod federation — stale cached avatar URL fix
When a remote actor updates their avatar, Mastodon (Paperclip) deletes the
old S3 object and stores only the new filename. Castopod 2.0.0 caches the
URL of every federated actor in cp_fediverse_actors and never refetches,
so its admin templates emit a dead link forever (the resulting S3 403 is
anti-enumeration, hiding what is really a 404). Article documents the
diagnosis pattern and three fixes (manual UPDATE, DELETE-and-refetch,
bulk audit), plus the Mastodon-side query for sourcing the correct URL.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-08 01:51:18 -04:00
393df3cc45 Add: tuning Netdata web_log_1m_successful for redirect-heavy WordPress
The stock alarm definition counts only 1xx/2xx/304/401/429 as successful,
which causes false CRITICALs on WP sites where 301 canonicalization is
normal traffic (legacy /?p=NNNN, slug edits, host/TLS upgrades, etc.).
Article documents the root cause, verification steps via the access log,
and an in-place threshold retune that keeps the alarm useful as an
"obvious meltdown" floor while delegating real outage detection to the
5xx and 4xx alarms.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-08 01:12:21 -04:00
3bcc58a805 services: add Mastodon --prune-profiles trap and recovery article
Documents the long-standing UX regression caused by
`tootctl media remove --prune-profiles` (and `--remove-headers`)
running on a schedule: cached remote avatars are deleted, but
Mastodon does not auto-refetch on profile view, so quiet remote
accounts stay broken indefinitely.

Article covers:
- The mutually-exclusive flag bug (silent skip if combined)
- Mastodon's actual avatar-refresh trigger model (Update activities,
  not profile views)
- A `refresh-my-follows.sh` pattern with a defensible WHERE clause
  (avatar NULL AND avatar_remote_url present) to avoid infinite
  retry on accounts whose origin has no avatar
- Why header_file_name IS NULL is a bad signal (~20% of users
  legitimately have no custom header)
- The cron decision: most admins should drop --prune-profiles
2026-05-07 12:01:47 -04:00
c5b4de4184 wiki: full index refresh — 106 articles, 17 new since Apr 18
Updated article count (89 → 106), domain counts, per-section
listings, and Recently Updated table. Added all articles published
since 2026-04-18 including Pi-hole, Mastodon, fail2ban digest,
LoRA GGUF, Tailscale iOS, and more.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-05-02 16:45:30 -04:00
4126656c05 wiki: update fail2ban digest + netdata docker health + 3 new articles
- fail2ban-digest-mode-fleet: recidive-only email model, sshd now silent,
  defaults-debian.conf gotcha added
- netdata-docker-health-alarm-tuning: 30m/10m config, tuning history table
- New: wp-fail2ban-logpath-debian-ubuntu, lora-adapter-gguf-conversion-fails,
  tailscale-status-json-hostname-localhost-ios
- Various article updates and nav index refreshes

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-05-02 14:58:07 -04:00
063bfa53d7 Resolve Obsidian Sync conflicts on main 2026-04-29 22:47:46 -04:00
f9c61fbac3 wiki: publish 3 unpushed articles and catch nav up
Articles from prior sessions that were written locally but never shipped:
- 02-selfhosting/cloud/aws-s3-cost-management.md — lifecycle rules, storage class selection, bucket inventory, unexpected-growth investigation
- 02-selfhosting/dns-networking/wake-on-lan-router-ssh.md — WOL magic packets via Asus router SSH + ether-wake, Ansible vault integration
- 02-selfhosting/services/claude-code-remote-control.md — mobile access to a persistent host Claude Code session

Nav updated (index.md + SUMMARY.md):
- Added Cloud subsection under Self-Hosting for aws-s3
- Added wake-on-lan and aws-s3 entries to SUMMARY
- Added claude-code-remote-control to index's Services section
- Added ansible-ssh-host-alias-bypass nav entry (article shipped in 2dbeb22)
- Article count 87 → 89, self-hosting 30 → 32, troubleshooting 33 → 34
2026-04-21 09:17:31 -04:00
76f29a46e5 SUMMARY + index: add 10 missing articles to nav, update counts to 86 2026-04-18 18:48:31 -04:00
b40e484aae Add 5 wiki articles from 2026-04-17/18 work
- ghost-smtp-mailgun-setup: two-system email config (newsletter API + transactional SMTP)
- firewalld-fleet-hardening: Fedora fleet firewall audit-and-harden pattern with Ansible
- clamav-fleet-deployment: fleet deployment with nice/ionice throttling + quarantine
- ansible-check-mode-false-positives: when: not ansible_check_mode guard for verify/assert tasks
- ghost-emailanalytics-lag-warning: submitted status, lag counter, fetchMissing skip explained
2026-04-18 11:13:39 -04:00
4f66955d33 wiki: correct article counts in index and README
Local and remote both have 76 articles on disk, but the counter
and per-domain table were stale (74 total / self-hosting 21 /
troubleshooting 29). Trued up to 76 / 22 / 30.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-14 14:17:03 -04:00
c0837b7e89 wiki: add fail2ban jail for Apache PHP webshell probes
Documents the 2026-04-09 scanner incident where 301-redirected PHP probes
bypassed the existing apache-404scan jail, leaving the scanner unbanned
and firing Netdata web_log_1m_redirects alerts. New jail catches 301/302/
403/404 PHP responses while excluding legitimate WordPress endpoints.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-13 10:17:24 -04:00
326c87421f wiki: add troubleshooting article on /var/run heartbeat reboot false alarm
Captures the majorlab incident where the backup watchdog emailed a missing
heartbeat after a kernel-update reboot wiped /var/run, even though the
backup had actually completed cleanly. Documents the tmpfs root cause and
the fix of storing heartbeats under /var/lib instead.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-13 10:11:24 -04:00
56f1014f73 Add troubleshooting article: wget/curl URLs with special characters
Covers shell quoting for URLs containing &, ?, #, and other characters
that Bash interprets as operators. Common gotcha when downloading from
CDNs with token-based URLs.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-08 10:18:34 -04:00
5af934a6c6 wiki: update SSH docs with bash.exe default shell fix and Windows admin key auth
- ssh-config-key-management: add Windows OpenSSH admin user key auth section
  (administrators_authorized_keys, BOM-free writing, ACL requirements)
- windows-openssh-wsl-default-shell: add bash.exe as recommended fix (Option 1),
  demote PowerShell to Option 2, add shell-not-found diagnostic tip
- windows-sshd-stops-after-reboot: fix stale wsl.exe reference to bash.exe
- index/README: update Recently Updated table and article descriptions

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-07 22:01:36 -04:00
84a1893e80 wiki: fix article count to 73, update frontmatter timestamps
Corrected inflated article count (was 76, actual is 73).
Updated domain breakdown and frontmatter timestamps from Obsidian.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-07 10:51:23 -04:00
daa771760b wiki: add WSL OpenSSH default shell + Ansible world-writable mount articles
Two new troubleshooting articles from today's MajorRig/MajorMac Ansible setup:
- Windows OpenSSH WSL default shell breaks remote SSH commands
- Ansible silently ignores ansible.cfg on WSL2 world-writable mounts

Article count: 76

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-03 10:23:02 -04:00
1a00fef199 Update wiki indexes for WordPress login jail article
Article count 73 → 74. Added to SUMMARY.md, index.md, README.md,
and 02-selfhosting/index.md (which was also missing 5 other security
articles from prior sessions).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-02 16:07:08 -04:00
6f53b7c6db wiki: fix broken wikilinks in index and README related sections
Removed Obsidian [[wikilinks]] pointing to vault-only docs (01-Phases, majorlab)
that don't resolve on the MkDocs site. Kept deploy status as a proper markdown link.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-02 10:59:17 -04:00
6d81e7f020 wiki: add 4 new articles from archive, merge 8 archive notes into existing articles (73 articles)
New: mdadm RAID rebuild, Mastodon instance tuning, Ventoy, Fedora networking/kernel recovery.
Merged: Glacier Deep Archive into rsync, SpamAssassin into hardening checklist,
OBS captions/VLC capture into OBS setup, yt-dlp subtitles/temp fix into yt-dlp.
Updated index.md, README.md, SUMMARY.md with 21 previously missing articles.
Fixed merge conflict in index.md Recently Updated table.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-02 10:55:53 -04:00
8c22ee708d merge: resolve conflicts, add SELinux AVC chart article; update indexes to 53
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-27 03:36:49 -04:00
fb2e3f6168 wiki: add SELinux AVC chart, enriched alerts, new server setup, and pending articles; update indexes
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-27 03:34:33 -04:00
4d59856c1e wiki: add Netdata new server deployment guide (49 articles) 2026-03-18 11:00:41 -04:00
38fe720e63 wiki: add Netdata Docker health alarm tuning article; update indexes to 48
- 02-selfhosting/monitoring/netdata-docker-health-alarm-tuning.md — new
- lookup extended to 5m average, delay: down 5m to prevent Nextcloud AIO update flapping
- SUMMARY.md, index.md, README.md, deploy status updated

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-18 00:10:36 -04:00
59a5cc530e wiki: add Windows sshd and Ollama/Tailscale sleep articles; update indexes to 47
- 05-troubleshooting/networking/windows-sshd-stops-after-reboot.md
- 05-troubleshooting/ollama-macos-sleep-tailscale-disconnect.md
- SUMMARY.md, index.md, README.md: count 45 → 47, add 5 missing articles (3 from 2026-03-16 + 2 today)
- MajorWiki-Deploy-Status.md: session update 2026-03-17

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-17 21:20:15 -04:00
279c094afc wiki: add firewalld mail ports reset article + session updates
- New article: firewalld mail ports wiped after reload (IMAP + webmail outage)
- New article: Plex 4K codec compatibility (Apple TV)
- New article: mdadm RAID recovery after USB hub disconnect
- Updated yt-dlp article
- Updated all index files: SUMMARY.md, index.md, README.md, category indexes
- Article count: 41 → 42

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-15 16:15:02 -04:00
0bcc2c822a wiki: add SELinux vmail and gitea-runner articles; update indexes
- New: SELinux Fixing Dovecot Mail Spool Context (/var/vmail)
  Corrected fix — mail_spool_t only, no dovecot_tmp_t on tmp/ dirs.
  Includes warning and recovery steps for the Postfix delivery outage.
- New: Gitea Actions Runner Boot Race Condition Fix
  network-online.target dependency, RestartSec=10, /etc/hosts workaround.
- Updated SUMMARY.md, index.md, README.md, 05-troubleshooting/index.md
- Article count: 37 → 39; MajorWiki-Deploy-Status updated

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-14 22:49:01 -04:00
1d8be8669e troubleshooting: add Fail2ban IMAP self-ban article
Documents the 2026-03-14 incident where MajorAir's public IP was banned
by the postfix-sasl jail after repeated SASL auth failures, silently
blocking all IMAP connections from Spark Desktop.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-14 21:57:01 -04:00
66fcca8028 docs: update index and README — 22 articles, 2026-03-12, new runbook 2026-03-12 18:10:15 -04:00
f35d1abdc6 docs: sync local vault content to remote
Update index pages, troubleshooting articles, README, and deploy status
to match current local vault state.
2026-03-12 18:03:52 -04:00
999e1107f0 vault backup: 2026-03-11 22:00:24 2026-03-11 22:00:24 -04:00
cc7044e9c7 vault backup: 2026-03-11 21:51:13 2026-03-11 21:51:13 -04:00
5513b28efb vault backup: 2026-03-11 21:44:06 2026-03-11 21:44:06 -04:00
59d25e589b vault backup: 2026-03-11 21:19:44 2026-03-11 21:19:44 -04:00
fbb58fdf3e vault backup: 2026-03-11 11:52:33 2026-03-11 11:52:33 -04:00
9c22a661ea chore: link vault wiki to Gitea 2026-03-11 11:20:12 -04:00