majorwiki

Author	SHA1	Message	Date
MajorLinux	4e63d8546c	mastodon: document S3 ACL upload failures + bulk avatar restore New article mastodon-s3-acl-upload-failures.md: a BucketOwnerEnforced S3 bucket plus a stale S3_PERMISSION/S3_ACL in .env.production makes every Mastodon upload fail with AccessControlListNotSupported, silently. Covers symptoms (incl. why a missing object returns 403 not 404), diagnosis, the fix (S3_PERMISSION= empty, public read via bucket policy), recovery, a synthetic-write health check, and Ansible enforcement. Extend mastodon-prune-profiles-trap.md: add a "Bulk restore at scale" procedure (list existing keys, null missing DB refs, enqueue RedownloadAvatar/HeaderWorker), a "storage-level deletion without DB de-ref" section, and a stronger recommendation to disable automated profile pruning (and scheduled accounts refresh --all) entirely. Link both from SUMMARY.md and the selfhosting index. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-01 15:45:23 -04:00
MajorLinux	155651c373	wiki: ssh.socket wait-ready gate + mastodon post-install hardening Two related additions covering the 2026-05-31 cutover-night incidents on majorlinux and majortoot-hetzner. ssh-socket-tailscale-race-condition.md (update Race 1 fix): - After=tailscaled.service Requires=tailscaled.service orders against the service becoming active, not against tailscale0 having an IPv4 — hosts kept losing SSH intermittently after reboots (incident: majorlinux + majortoot-hetzner 2026-05-31, during cutover-night Ansible reboot). - Canonical fix: a oneshot tailscale-wait-ready.service that polls `ip -4 -o addr show tailscale0` until an address is present, with ssh.socket After=/Requires= that service. Document the full evolution (2026-05-19 BindsTo → 2026-05-23 Requires → 2026-05-31 wait-ready) so future readers don't try the half-fixes thinking they're sufficient. - Add majortoot-hetzner to affected hosts. mastodon-post-install-hardening.md (new): Four upstream-install gaps that bit during the majortoot-hetzner cutover: 1. /home/mastodon at 0750 (useradd default) → nginx www-data can't traverse → every static asset 403s → unstyled "purple screen" in the browser while API/HTML still work through the puma proxy. 2. .env.production at 0644 (mastodon-setup default) → DB_PASS, SECRET_KEY_BASE, OTP_SECRET world-readable once gap (1) is fixed. 3. mastodon user shell at /usr/sbin/nologin → `su - mastodon` blocked. 4. rbenv init in .bashrc only → login shells don't source .bashrc; even when chained, Ubuntu's .bashrc returns early for non-interactive shells. Fix: .bash_profile sets up rbenv BEFORE sourcing .profile + .bashrc, so it works for both interactive and non-interactive logins. All four codified in MajorAnsible configure_mastodon_permissions.yml with self-asserting verification steps. 02-selfhosting/index.md + SUMMARY.md: Add a "Services" section to the selfhosting index linking the mastodon-post-install-hardening article (and the other orphaned services/ entries while there). SUMMARY.md gains one new entry. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-31 11:08:24 -04:00
majorlinux	73c10111e0	Merge branch 'cowork/majorair/wiki-batch-may25'	2026-05-25 13:56:23 -04:00
majorlinux	52ca8a0413	wiki: batch update — 4 new articles + 4 updates New articles: - Postfix SendGrid TLS handshake failure (port 465 vs 587) - Plex transcoding troubleshooting - Ansible Ubuntu reboot detection kernel mismatch - WSL2 PyTorch checkpoint Windows filesystem deadlock Updated: - AWS S3 cost management (expanded) - Network overview (IP updates) - HEVC VAAPI batch encode (progress + fixes) - SUMMARY.md (new entries)	2026-05-25 13:55:10 -04:00
majorlinux	dc897d4a67	Merge branch 'cowork/majorair/ssh-socket-bindsto-fix'	2026-05-23 02:40:45 -04:00
majorlinux	3b8c8b0597	ssh.socket wiki: correct BindsTo→Requires, add warning BindsTo=tailscaled.service causes a systemd ordering cycle that prevents ssh.socket from starting on reboot. Updated the recommended fix to use Requires= and added a warning admonition explaining why BindsTo must not be used. Added tttpod-hetzner to affected hosts list and linked the 2026-05-23 dcaprod incident.	2026-05-23 02:40:04 -04:00
majorlinux	318f50c50b	Merge branch 'cowork/majorair/tailscale-boot-race-wiki'	2026-05-19 20:39:19 -04:00
majorlinux	65b0aa4567	wiki: expand Tailscale race condition article with network-online race Added Race 2: tailscaled starts before network-online.target, causing Tailscale to get stuck with SetNetworkUp(false). Covers both Ubuntu ssh.socket and cross-platform tailscaled ordering issues. Updated references to include majordiscord incident and new Ansible playbook.	2026-05-19 20:39:18 -04:00
majorlinux	eb39da9a26	Merge cowork/majorair/ssh-socket-wiki: ssh.socket Tailscale race condition article	2026-05-19 19:36:19 -04:00
majorlinux	7dc591d257	wiki: add ssh.socket Tailscale race condition troubleshooting article Documents the systemd socket activation race where ssh.socket binds to the Tailscale IP before tailscaled is ready, causing SSH to become unreachable after a Tailscale reconnect. Includes diagnosis steps and the After=/BindsTo= fix.	2026-05-19 19:35:16 -04:00
MajorLinux	64ac418a36	wiki: add ClamAV daemonless mode section + HEVC VAAPI article link	2026-05-15 09:02:24 -04:00
Marcus (via Claude Code)	28518e403e	Add troubleshooting articles: Netdata apps-group FD false-positive + OBS stale script paths - netdata-apps-fds-group-false-positive: the apps_group_file_descriptors_utilization false 100% on forking/root app groups (tailscaled on MajorToot 2026-05-15), the not-a-privilege gotcha, fleet-wide silence fix in MajorAnsible. - obs-stale-script-paths: pending from prior session (not on remote). - SUMMARY.md: link both (re-applied onto upstream after concurrent rebase). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-15 03:22:12 -04:00
majorlinux	a785e85821	Merge branch 'code/majorair/rsyslog-logwatch-fix'	2026-05-13 10:36:06 -04:00
majorlinux	4ec481c584	wiki: add rsyslog requirement to migration checklist and logwatch docs Fedora 44 Hetzner images ship without rsyslog — logwatch produces zero output because /var/log/messages doesn't exist. Added rsyslog to baseline table and new diagnostic section to logwatch article. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-05-13 10:36:00 -04:00
majorlinux	c22457f1aa	Merge branch 'code/majorair/teelia-cpu-docs'	2026-05-11 18:32:18 -04:00
majorlinux	ac84610380	wiki: add 1 vCPU nice/ionice limitation note to ClamAV article nice -n 19 only yields when other processes compete; on single-core VPS boxes the scan still saturates CPU. Document the expectation. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-05-11 18:32:01 -04:00
majorlinux	3df0979786	Merge branch 'code/majorair/logwatch-ca-bundle-docs' Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-05-11 07:37:48 -04:00
majorlinux	de9b661b9d	wiki: add Fedora CA bundle article, update migration checklist and logwatch docs New article documenting missing /etc/pki/tls/certs/ca-bundle.crt symlink on Hetzner Fedora images breaking Postfix TLS, curl, and dnf. Updated VPS migration baseline checklist with timezone, CA bundle, and crond verification steps. Updated logwatch fleet setup with crond check. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-05-11 07:35:42 -04:00
MajorLinux	9c62e7f804	Logwatch fleet article: add cloud-image config-drift section Documents three more patterns surfaced in the 2026-05-10 fleet-mail investigation, all hitting hosts derived from cloud images or cross-provider migrations: - Packer/snapshot-leftover myhostname (postfix EHLO + message-id identifies the build artifact, not the production hostname; remote spam scorers hate it) - Empty relayhost silently routes mail via the public MX instead of the Tailscale-internal path, exposing it to spamchk that internal traffic bypasses - Stale SASL passwd map referencing a missing file from a previous external-SMTP relay setup, deferring every send with "local data error" Each looks benign in isolation. Together they made dcaprod's Logwatch disappear into spamchk for weeks while showing 250 OK on the source. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-10 12:58:00 -04:00
MajorLinux	724ae2a5e3	Add troubleshooting article: PHP 8.4 implicit-nullable vendor patch Generalizes the Castopod/UuidModel incident from 2026-05-10. PHP 8.4 deprecated implicit-nullable parameters (`function f(int $x = null)`). Old vendor libraries spam E_DEPRECATED warnings; CodeIgniter wraps each in a 23-frame stack trace; per-minute spark cron amplifies into 53-80 MB/day log bleed and 22% sustained CPU floor on small VPS boxes. Documents the four-line sed fix AND the substring-match gotcha that extended the fix from 30 seconds to 30 minutes — bare `int \$limit = null` patterns substring-match `?int \$limit = null` elsewhere in the file and produce illegal `??type` syntax. Covers anchored sed patterns, reference-parameter handling (&\$db), the lint-after-every-edit rule, and a bonus section on hunting stray developer debug prints (`log_message('critical', 'ITS HEEEEEEEEEEEERE')`). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-10 12:52:25 -04:00
MajorLinux	631d7e8bc5	Logwatch fleet article: add Fedora CA bundle diagnosis + bounce-source guidance Documents three lessons from the 2026-05-10 fleet outage where the Fedora half (majorhome, majorlab) had been silently failing to send notification mail for days: - Missing /etc/pki/tls/certs/ca-bundle.crt symlink (extracted bundle exists at /etc/pki/ca-trust/extracted/pem/tls-ca-bundle.pem but the consumer-path symlink was lost during a ca-certificates package event). Diagnosis includes the cross-tool tell — dnf and curl break with the same path. Fix is a single ln -sfn. - Methodology: Fedora and majormail log postfix to journald; Debian and Ubuntu log to /var/log/mail.log. Querying the wrong source returns false negatives for healthy hosts. - Bounce-source addresses (Watchtower NOTIFICATION_EMAIL_FROM, fail2ban sender, root@<host>.localdomain) must resolve to real mailboxes — otherwise the first failed delivery generates bounce-of-bounce churn. Also promoting the article from untracked to committed; it had been authored on 2026-05-09 and not yet added to the repo. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-10 12:08:15 -04:00
MajorLinux	a852f7b7bd	ClamAV fleet caveat: add follow-up on the polite-CPU-on-1vCPU edge case Same-day correction. The proposed per-droplet relaxed alert (>95%/30m) turned out to also trip on a 1 vCPU box during low-traffic weekly scans, because there's literally no real load for nice 19 to yield to — clamscan opportunistically fills the vCPU and DO sees 100% utilization regardless of `%nice` vs `%user` split. Documents the three realistic options (accept page / switch to clamdscan / disable alert) and the underlying limit (no DO threshold can distinguish polite from impolite CPU when the box is fully utilized). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-10 02:32:35 -04:00
MajorLinux	af14e36caf	ClamAV fleet article: add DigitalOcean monitoring caveat for 1vCPU droplets DO's hypervisor-level CPU metric doesn't know about nice/ionice — a "polite" weekly clamscan on a 1 vCPU droplet still reads 100% utilization and trips a default >85%/5m alert. Adds a new section explaining the trade-off and providing the DO API recipe (PUT existing alert with explicit entities, POST a new relaxed alert scoped to the small droplet) plus when not to bother (2+ vCPU boxes won't trip). Triggered by the 2026-05-10 teelia incident where the weekly cron fired the fleet-wide CPU alert despite the cron script already wrapping clamscan in nice 19 + ionice idle + cgroup memory limits. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-10 02:24:17 -04:00
MajorLinux	545df9f5c6	Add troubleshooting article: Claude Desktop MCP mass-disconnect from blocking SSH reboot Documents the failure mode where issuing a synchronous `ssh host reboot` through Claude Desktop's shell MCP poisons the local MCP transport when the target severs its session before responding cleanly — eventually force-disconnecting every MCP at once. Covers diagnostic chain, recovery, fire-and-forget reboot patterns, and worked example from the 2026-05-10 majorhome AMD-card reboot. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-10 01:28:11 -04:00
MajorLinux	7c566cda50	Add: diagnosing Castopod posts that don't appear on Mastodon Walks the four-step diagnostic chain (post created → activity delivered → follower exists → notification semantics) for the common confusion where a Castopod admin's auto-broadcast "doesn't show up" on a Mastodon account they expected. Most cases are not federation bugs but the difference between favouriting/boosting (no follow required) and following + the fact that Mastodon notifications fire only for mentions/follows/favs/ boosts/etc., not for new posts from people you follow. Documents the bell icon and `@`-mention escape hatches. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-10 00:05:18 -04:00
MajorLinux	1c17bdb60a	Add: Castopod federation — stale cached avatar URL fix When a remote actor updates their avatar, Mastodon (Paperclip) deletes the old S3 object and stores only the new filename. Castopod 2.0.0 caches the URL of every federated actor in cp_fediverse_actors and never refetches, so its admin templates emit a dead link forever (the resulting S3 403 is anti-enumeration, hiding what is really a 404). Article documents the diagnosis pattern and three fixes (manual UPDATE, DELETE-and-refetch, bulk audit), plus the Mastodon-side query for sourcing the correct URL. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-08 01:51:18 -04:00
MajorLinux	393df3cc45	Add: tuning Netdata web_log_1m_successful for redirect-heavy WordPress The stock alarm definition counts only 1xx/2xx/304/401/429 as successful, which causes false CRITICALs on WP sites where 301 canonicalization is normal traffic (legacy /?p=NNNN, slug edits, host/TLS upgrades, etc.). Article documents the root cause, verification steps via the access log, and an in-place threshold retune that keeps the alarm useful as an "obvious meltdown" floor while delegating real outage detection to the 5xx and 4xx alarms. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-08 01:12:21 -04:00
Marcus Summers	306e5f1f16	Merge branch 'cowork/majormac/mastodon-prune-profiles-trap'	2026-05-07 12:01:48 -04:00
Marcus Summers	3bcc58a805	services: add Mastodon --prune-profiles trap and recovery article Documents the long-standing UX regression caused by `tootctl media remove --prune-profiles` (and `--remove-headers`) running on a schedule: cached remote avatars are deleted, but Mastodon does not auto-refetch on profile view, so quiet remote accounts stay broken indefinitely. Article covers: - The mutually-exclusive flag bug (silent skip if combined) - Mastodon's actual avatar-refresh trigger model (Update activities, not profile views) - A `refresh-my-follows.sh` pattern with a defensible WHERE clause (avatar NULL AND avatar_remote_url present) to avoid infinite retry on accounts whose origin has no avatar - Why header_file_name IS NULL is a bad signal (~20% of users legitimately have no custom header) - The cron decision: most admins should drop --prune-profiles	2026-05-07 12:01:47 -04:00
Marcus Summers	5f31a57ae6	Merge branch 'cowork/majormac/githooks-executable'	2026-05-06 09:44:08 -04:00
Marcus Summers	7e422ee332	githooks: mark pre-commit executable The pre-commit hook (which enforces SUMMARY.md links for new articles) was tracked at mode 100644, so even with `core.hooksPath=.githooks` configured, git silently skipped it. Bump tracked mode to 100755 so fresh clones get the working hook without a manual chmod step. Discovered while installing the wiki-commit/hooks setup on MajorMac. No content change; .githooks/ is outside the MkDocs source so this will not alter the rendered notes.majorshouse.com site. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-06 09:42:06 -04:00
Marcus Summers	3c4cc74aef	Merge: wiki — Ansible regex_search set_fact gotcha	2026-05-06 08:28:22 -04:00
Marcus Summers	ca123b0312	wiki: add troubleshooting article — Ansible regex_search capture group fails in set_fact Documents the gotcha hit during the 2026-05-06 update.yml refactor: the second-positional-argument back-reference form of regex_search ('\1') doesn't reliably select capture groups when used inside set_fact. The fix is to match the broader substring and use .split()[0] (or [-1], etc.) to peel off the value, with a default() bridge for the no-match case. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-06 08:28:21 -04:00
majorlinux	488268ccd1	Merge branch 'cowork/majorair/rescue-stash-keep-files-may02'	2026-05-02 17:50:28 -04:00
majorlinux	213a84ed79	wiki: add .keep files for 04-streaming and 05-troubleshooting subdirs	2026-05-02 17:50:22 -04:00
majorlinux	ae864452f8	wiki: add Fail2Ban Digest Mode nav entry to SUMMARY.md	2026-05-02 17:17:04 -04:00
majorlinux	49a1173dfc	Merge cowork/majorair/index-refresh-may02 — full index refresh (106 articles)	2026-05-02 16:45:37 -04:00
majorlinux	c5b4de4184	wiki: full index refresh — 106 articles, 17 new since Apr 18 Updated article count (89 → 106), domain counts, per-section listings, and Recently Updated table. Added all articles published since 2026-04-18 including Pi-hole, Mastodon, fail2ban digest, LoRA GGUF, Tailscale iOS, and more. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-05-02 16:45:30 -04:00
majorlinux	021c7f6539	Merge cowork/majorair/wiki-updates-may02 — fail2ban digest + netdata docker health + 3 new articles	2026-05-02 16:28:48 -04:00
majorlinux	4126656c05	wiki: update fail2ban digest + netdata docker health + 3 new articles - fail2ban-digest-mode-fleet: recidive-only email model, sshd now silent, defaults-debian.conf gotcha added - netdata-docker-health-alarm-tuning: 30m/10m config, tuning history table - New: wp-fail2ban-logpath-debian-ubuntu, lora-adapter-gguf-conversion-fails, tailscale-status-json-hostname-localhost-ios - Various article updates and nav index refreshes Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-05-02 14:58:07 -04:00
Marcus Summers	264f1f64c3	Merge: wiki — 2026-04-30 SNI filter article update	2026-04-30 13:08:37 -04:00
Marcus Summers	74c4ed9959	wiki: 2026-04-30 update to ISP SNI filtering article Re-diagnoses today's notes.majorshouse.com outage. Original framing was "ISP filter expanded to include 'notes'" — but the actual root cause was a stale A record pointing at 136.54.3.248 (not majorlab's current home IP). Corrects the comparison table to show CNAMEs to apex resolve to 136.56.0.55, and recommends a Cloudflare-proxied CNAME as the durable shape so the apex follows home IP automatically and ISP-level SNI weirdness is bypassed at the same time. Includes the working CF API payload used to flip the record, and an audit checklist for any new *.majorshouse.com subdomain. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-30 13:08:36 -04:00
Marcus Summers	34cc5c3d0b	Merge: wiki — LoRA→GGUF troubleshooting article	2026-04-30 11:24:38 -04:00
Marcus Summers	6e7a0ca21f	wiki: add troubleshooting article — LoRA adapter GGUF conversion fails Documents the gotcha where convert_hf_to_gguf.py crashes with 'config.json not found' because the training output directory holds only the LoRA adapter, not a merged HF model. Includes inline save_pretrained_merged() fix snippet, verification checklist, and resume-pipeline-without-retraining pattern. Discovered today during the MajorTwin v8c pipeline failure (Step 4). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-30 11:22:59 -04:00
majorlinux	85f8a5df2d	Merge pull request 'wiki: add troubleshooting article — iOS Tailscale clients report HostName="localhost"' (#1 ) from code/majormac/tailscale-ios-hostname-fix into main Reviewed-on: #1	2026-04-30 10:00:01 -04:00
Marcus Summers	9de839066c	wiki: add troubleshooting article — iOS Tailscale clients report HostName="localhost" Documents the non-obvious failure mode where /etc/hosts generator scripts using `tailscale status --json \| jq '.HostName'` get poisoned by iOS peers, which always report HostName as the literal string "localhost" because iOS doesn't expose the device name to apps. Includes the buggy and fixed jq filter (use .DNSName first label instead), a real-world Postfix outage example, and a verification checklist. Linked from troubleshooting index and SUMMARY. Discovered while diagnosing a 24h Postfix outage on majordiscord. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-30 09:57:40 -04:00
MajorLinux	ad2f12c16e	wiki: document wiki-commit wrapper + pre-commit hook setup Adds a 'One-liner wrapper' tip and a 'Pre-Commit Hook (in repo)' section to MajorWiki-Deploy-Status.md describing the per-clone setup needed on each workstation: git config core.hooksPath .githooks git config pull.rebase true Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-30 05:30:45 -04:00
MajorLinux	c4fba631e4	wiki: add pre-commit hook — block new articles missing from SUMMARY.md The hook fails any commit that adds (or renames) a .md article without a matching SUMMARY.md entry, addressing the recurring 'article exists but isn't navigable' drift. Excludes meta files (README/index/SUMMARY, category index.md, MajorWiki-Deploy-Status). Bypass with --no-verify. Hook lives in .githooks/ (tracked). Each clone needs: git config core.hooksPath .githooks Companion wrapper ~/bin/wiki-commit (workstation-only, not in repo) does pull --rebase --autostash + add -A + commit + push so cowork pushes don't surprise. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-30 05:30:05 -04:00
MajorLinux	9e96ebb110	wiki: add wp-fail2ban logpath on Debian/Ubuntu (auth.log not syslog) Documents the gotcha discovered during the 2026-04-30 DCAProd XML-RPC outage triage: wp-fail2ban plugin emits via PHP syslog(LOG_AUTH) which lands in /var/log/auth.log on Debian/Ubuntu, not /var/log/syslog. wordpress-{hard,soft,extra} jails configured with logpath=/var/log/syslog (common in tutorials and ansible roles) silently catch zero events. Article includes diagnostic steps, the fix, fail2ban-regex verification, distro cheat sheet (Debian/Ubuntu vs RHEL/Fedora vs systemd-journal-only), and a note on why wordpress-login is unaffected. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-30 05:21:50 -04:00
majorlinux	f40f497b46	Add .gitattributes with obsidian-timestamps merge driver	2026-04-29 22:52:07 -04:00

1 2 3

147 commits