28 changed files with 16 additions and 2595 deletions
--- a/01-linux/distro-specific/wsl2-fedora44-inplace-upgrade.md
+++ b/01-linux/distro-specific/wsl2-fedora44-inplace-upgrade.md
@ -1,119 +0,0 @@
---
-title: WSL2 In-Place Upgrade to Fedora 44 (with gcc14 Blocker + CUDA Repo Swap)
-domain: linux
-category: distro-specific
-tags:
-  - wsl2
-  - fedora
-  - windows
-  - upgrade
-  - dnf
-  - cuda
-  - majorrig
-status: published
-created: 2026-06-11
-updated: 2026-06-11
---
-
-# WSL2 In-Place Upgrade to Fedora 44 (with gcc14 Blocker + CUDA Repo Swap)
-
-In-place upgrade of the FedoraLinux-43 WSL2 instance on MajorRig to Fedora 44 using `dnf system-upgrade` + `dnf5 offline reboot`. Hit one transaction blocker (`gcc14` compat package retired in F44) and swapped the stale `cuda-fedora39` repo to `cuda-fedora44` afterward. Performed 2026-06-11.
-
-## The Short Answer
-
-```powershell
-# PowerShell — backup first
-wsl --shutdown
-wsl --export FedoraLinux-43 D:\backups\fedora43.tar
-```
-
-```bash
-# Inside Fedora
-sudo dnf upgrade --refresh -y
-sudo shutdown -h now
-# relaunch, then:
-sudo dnf remove gcc14-c++ gcc14        # F44 dropped gcc14 — blocks the transaction
-sudo dnf system-upgrade download --releasever=44
-sudo dnf5 offline reboot               # applies offline upgrade, shuts distro down
-# wait a few minutes, relaunch:
-cat /etc/fedora-release                # → Fedora release 44 (Forty Four)
-```
-
-```powershell
-# PowerShell — keep WSL itself current
-wsl --update
-```
-
-## Steps
-
-1. **Back up the instance** (PowerShell). The export tar is roughly the size of the installed system — this one was 86 GB. The target directory must already exist or you get `Wsl/ERROR_PATH_NOT_FOUND`.
-
-```powershell
-wsl --shutdown
-mkdir D:\backups
-wsl --export FedoraLinux-43 D:\backups\fedora43.tar
-```
-
-2. **Fully update the current release, then restart the distro**
-
-```bash
-sudo dnf upgrade --refresh -y
-sudo shutdown -h now
-```
-
-3. **Remove upgrade blockers.** `gcc14`/`gcc14-c++` (compat packages) were retired in Fedora 44, so the transaction fails with "does not belong to a distupgrade repository". Remove them (or use `--allowerasing` and review the summary):
-
-```bash
-sudo dnf remove gcc14-c++ gcc14
-```
-
-4. **Download and apply the upgrade**
-
-```bash
-sudo dnf system-upgrade download --releasever=44
-sudo dnf5 offline reboot
-```
-
-The "reboot" applies the offline transaction and shuts the distro down — there's no real systemd reboot in WSL. Wait a couple of minutes, then relaunch. If it errors on `systemctl`, the fallback is:
-
-```bash
-export DNF_SYSTEM_UPGRADE_NO_REBOOT=1
-sudo -E dnf system-upgrade reboot
-```
-
-5. **Verify and tidy up**
-
-```bash
-cat /etc/fedora-release      # Fedora release 44 (Forty Four)
-sudo dnf upgrade --refresh   # catch post-upgrade updates
-gcc --version                # F44 ships gcc 16; reinstall with `dnf install gcc gcc-c++` if removed
-```
-
-```powershell
-wsl --update   # fixes the post-upgrade Wsl/Service/E_UNEXPECTED catastrophic failure some users hit
-```
-
-## CUDA Repo Swap
-
-`dnf repolist` still showed `cuda-fedora39-x86_64` — NVIDIA repos are pinned per Fedora release and don't follow distro upgrades. NVIDIA publishes a fedora44 repo:
-
-```bash
-sudo rm /etc/yum.repos.d/cuda-fedora39*.repo
-sudo dnf config-manager addrepo --from-repofile=https://developer.download.nvidia.com/compute/cuda/repos/fedora44/x86_64/cuda-fedora44.repo
-sudo dnf upgrade --refresh
-sudo dnf repolist   # confirm cuda-fedora44-x86_64
-```
-
-**WSL caveat:** never install the NVIDIA *driver* inside WSL — the Windows host driver provides the GPU. Only install toolkit packages (e.g. `cuda-toolkit`).
-
-## Gotchas & Notes
-
- **Don't skip more than two releases** in one jump — staged upgrades otherwise.
- **The WSL distro name is just a Windows label** — it still says "FedoraLinux-43" after the upgrade. Cosmetic fixes: Windows Terminal profile name, Start Menu shortcut, and `DistributionName`/`ShortcutPath` under `HKCU\Software\Microsoft\Windows\CurrentVersion\Lxss\{uuid}`.
- **Keep the backup tar** until the upgraded instance has proven stable for a few days, then delete to reclaim the space.
- **Restore path if needed:** `wsl --import FedoraRestore C:\WSL\FedoraRestore D:\backups\fedora43.tar` — remember imports default to root; fix via `/etc/wsl.conf` `[user] default=majorlinux`.
-
-## See Also
-
- [WSL2 Instance Migration (Fedora 43)](wsl2-instance-migration-fedora43.md)
- [WSL2 Backup via PowerShell](wsl2-backup-powershell.md)
--- a/01-linux/index.md
+++ b/01-linux/index.md
@ -23,14 +23,7 @@ A collection of guides covering Linux administration, shell scripting, networkin
 - [Ansible Getting Started](shell-scripting/ansible-getting-started.md)
 - [Bash Scripting Patterns](shell-scripting/bash-scripting-patterns.md)

-## Storage
-
- [SnapRAID & MergerFS Storage Setup](storage/snapraid-mergerfs-setup.md)
- [mdadm — Rebuilding a RAID Array After Reinstall](storage/mdadm-raid-rebuild.md)
- [Growing an LVM Volume by Absorbing Another Disk](storage/lvm-grow-volume-absorb-disk.md)
-
 ## Distro-Specific

 - [Linux Distro Guide for Beginners](distro-specific/linux-distro-guide-beginners.md)
 - [WSL2 Instance Migration to Fedora 43](distro-specific/wsl2-instance-migration-fedora43.md)
- [WSL2 In-Place Upgrade to Fedora 44](distro-specific/wsl2-fedora44-inplace-upgrade.md)
--- a/01-linux/storage/lvm-grow-volume-absorb-disk.md
+++ b/01-linux/storage/lvm-grow-volume-absorb-disk.md
@ -1,159 +0,0 @@
---
-title: "Growing an LVM Volume by Absorbing Another Disk"
-domain: linux
-category: storage
-tags: [lvm, lvextend, vgextend, pvcreate, resize2fs, ext4, storage, disk, homelab]
-status: published
-created: 2026-06-17
-updated: 2026-06-17
---
-
-# Growing an LVM Volume by Absorbing Another Disk
-
-When an LVM-backed filesystem fills up and its volume group (VG) has no free
-extents, you can grow it by adding a second physical disk as a new physical
-volume (PV), extending the VG onto it, then extending the logical volume (LV)
-and its filesystem. With ext4 this can be done **online** — no unmount, no
-downtime for the volume being grown.
-
-This guide covers the common case where the disk you want to absorb is currently
-in use by its own LVM volume (you must evacuate and tear that down first), and
-the precautions that keep it safe.
-
-> [!warning] This enlarges your failure domain
-> A single LV spanning two disks linearly (the default — no RAID/mirror) means
-> **losing either disk loses the entire volume.** ext4 has no parity. Only do
-> this for data you can rebuild, or layer redundancy (mdadm/LVM RAID) underneath.
-> Back up anything irreplaceable first.
-
-## The Short Answer
-
-If the target disk (`/dev/sdX`) is already empty and unused:
-
-```bash
-sudo pvcreate /dev/sdX
-sudo vgextend myvg /dev/sdX
-sudo lvextend -l +100%FREE /dev/myvg/mylv
-sudo resize2fs /dev/mapper/myvg-mylv      # ext4, online; use xfs_growfs for XFS
-```
-
-The rest of this article handles the harder case: the target disk is currently
-holding its own LVM volume with data on it.
-
-## Step-by-Step
-
-### 1. Survey the current layout
-
-```bash
-sudo pvs                       # physical volumes → which VG each belongs to
-sudo vgs                       # volume groups, free extents (VFree)
-sudo lvs                       # logical volumes and sizes
-lsblk -o NAME,SIZE,TYPE,MOUNTPOINT
-df -h
-```
-
-Confirm:
-
- The VG you want to grow (`myvg`) has `0` `VFree` (that's why you're here).
- The disk you want to absorb (`/dev/sdX`) is a **standalone** PV — not a member
-  of an mdadm array, a mergerfs branch, or a SnapRAID parity disk. Repurposing a
-  disk that something else depends on will break that thing silently.
-
-### 2. Evacuate the disk you're about to absorb
-
-Anything on the target disk will be **destroyed**. Move it somewhere with room to
-spare, then prove the copy is intact before you trust it.
-
-```bash
-# Copy preserving permissions/timestamps
-sudo rsync -a /mnt/olddisk/important /destination/with/space/
-
-# Verify byte-for-byte — empty output + exit code 0 means identical
-sudo diff -rq /mnt/olddisk/important /destination/with/space/important && echo OK
-```
-
-For large trees the `diff -rq` (full byte comparison) is slow but is the
-authoritative check — don't skip it before the destructive phase. If an
-application tracks files by path (databases, media servers), update its path
-references to the new location *now*, while the old copy still exists as a
-fallback.
-
-### 3. Unmount and remove the old disk from fstab
-
-```bash
-sudo fuser -m /mnt/olddisk          # confirm nothing holds it open
-sudo umount /mnt/olddisk
-mountpoint -q /mnt/olddisk && echo "STILL MOUNTED" || echo "unmounted"
-
-sudo cp /etc/fstab /etc/fstab.bak-$(date +%Y%m%d)   # always back up fstab
-sudo sed -i '/olddisk/d' /etc/fstab                 # remove the stale entry
-grep olddisk /etc/fstab || echo "fstab line gone"
-```
-
-> [!tip] Verify your `sed` pattern only matches the line you mean
-> A too-broad pattern can delete the wrong fstab entry. Check the file before and
-> after, and keep the backup until you've confirmed the system still boots.
-
-### 4. Tear down the old disk's LVM
-
-```bash
-sudo lvremove -y /dev/oldvg/oldlv
-sudo vgremove -y oldvg
-sudo pvremove -y /dev/sdX        # wipes the LVM label off the disk
-```
-
-This is the point of no return for the old disk's data — which is why steps 2–3
-verified the copy first.
-
-### 5. Add the disk to the target VG and extend
-
-```bash
-sudo pvcreate -y /dev/sdX
-sudo vgextend myvg /dev/sdX
-sudo lvextend -l +100%FREE /dev/myvg/mylv
-```
-
-`lvs`/`vgs` should now show the LV grown to span both disks and `0` free extents.
-
-### 6. Grow the filesystem (online)
-
-```bash
-# ext4 — works while mounted
-sudo resize2fs /dev/mapper/myvg-mylv
-
-# XFS — grows online too, but takes the mountpoint, not the device
-sudo xfs_growfs /mountpoint
-```
-
-`resize2fs` is idempotent — if it gets interrupted, just run it again; it reports
-"Nothing to do!" once the filesystem already fills the LV.
-
-### 7. Verify
-
-```bash
-df -h /mountpoint     # should reflect the new larger size
-sudo pvs              # /dev/sdX now listed under myvg
-sudo vgs myvg         # two PVs, larger VSize
-```
-
-## Notes & Gotchas
-
- **Online resize works for the volume being grown, not the one being removed.**
-  The disk you absorb must be unmounted and torn down; the destination LV stays
-  mounted throughout.
- **`resize2fs` interruption is safe.** ext4 online resize is journaled; re-run it.
- **macOS cruft on evacuated disks.** Trees touched by macOS often carry
-  `._*` AppleDouble files and `.DS_Store` — harmless to drop, but they inflate
-  file counts in `diff`/`rsync` output. Don't mistake them for real data.
- **Check SMART on a disk you're promoting into a bigger role.** A disk with a
-  pending-sector history is riskier once it's in the critical path for a whole
-  multi-disk volume than it was holding a small isolated one.
- **Mountpoint cleanup.** After the old disk is gone, its former mountpoint
-  directory may reappear (it was shadowed by the mount). `rmdir` it if empty.
-  Note `ls -A` exits `0` on an empty directory, so don't gate cleanup on its exit
-  status — test contents explicitly.
-
-## Related
-
- [SnapRAID & MergerFS Storage Setup](snapraid-mergerfs-setup.md) — add redundancy/parity instead of a linear span
- [mdadm — Rebuilding a RAID Array After Reinstall](mdadm-raid-rebuild.md)
--- a/02-selfhosting/cloud/vps-migration-baseline-checklist.md
+++ b/02-selfhosting/cloud/vps-migration-baseline-checklist.md
@ -66,15 +66,14 @@ Every server in the fleet should have these. Check each one after migration:
 ### After Migration

 1. **Set the timezone** — `timedatectl set-timezone America/New_York` (US) or `Europe/London` (UK). Hetzner images default to UTC.
-2. **Set the system hostname** — Hetzner provisions the box as `<host>-hetzner`. Run `hostnamectl set-hostname <host>` and fix the loopback line: `sed -i "s/127.0.1.1.*/127.0.1.1 <host> <host>/" /etc/hosts`. Skip this and **Logwatch emails arrive titled `Logwatch for <host>-hetzner`** weeks later. Do it alongside the Tailscale node rename and Postfix `myhostname` — all three read from the provisioning label. See [Logwatch wrong hostname after migration](../../05-troubleshooting/logwatch-wrong-hostname-after-migration.md).
-3. **Verify CA bundle (Fedora)** — `ls /etc/pki/tls/certs/ca-bundle.crt`. If missing, Postfix TLS, curl, and dnf will all fail silently. See [Fedora CA bundle fix](../../05-troubleshooting/security/fedora-ca-bundle-missing-symlink.md).
-4. **Run `harden.yml` against the new host** — catches most gaps in one pass
-5. **Send a test email** — `echo test | mail -s "test" marcus@majorshouse.com` — if this fails, nothing else can alert you
-6. **Verify crond is running** — `systemctl is-active crond` (Fedora) or `systemctl is-active cron` (Ubuntu). cronie can be `enabled` but not `active` after provisioning.
-7. **Check Netdata Cloud** — verify the new node appears and alerts are flowing
-8. **Compare fail2ban jails** — `fail2ban-client status` on both old and new
-9. **Verify logwatch sends** — `sudo logwatch --output mail --range today`
-10. **Keep the old box powered off but not destroyed** for at least 7 days after remediation
+2. **Verify CA bundle (Fedora)** — `ls /etc/pki/tls/certs/ca-bundle.crt`. If missing, Postfix TLS, curl, and dnf will all fail silently. See [Fedora CA bundle fix](../../05-troubleshooting/security/fedora-ca-bundle-missing-symlink.md).
+3. **Run `harden.yml` against the new host** — catches most gaps in one pass
+4. **Send a test email** — `echo test | mail -s "test" marcus@majorshouse.com` — if this fails, nothing else can alert you
+5. **Verify crond is running** — `systemctl is-active crond` (Fedora) or `systemctl is-active cron` (Ubuntu). cronie can be `enabled` but not `active` after provisioning.
+6. **Check Netdata Cloud** — verify the new node appears and alerts are flowing
+7. **Compare fail2ban jails** — `fail2ban-client status` on both old and new
+8. **Verify logwatch sends** — `sudo logwatch --output mail --range today`
+9. **Keep the old box powered off but not destroyed** for at least 7 days after remediation

 ### Using doctl to Manage Old Droplets

--- a/02-selfhosting/index.md
+++ b/02-selfhosting/index.md
@ -38,7 +38,6 @@ Guides for running your own services at home, including Docker, reverse proxies,
 - [Mastodon Federation](services/mastodon-federation.md)
 - [Mastodon `--prune-profiles` Trap](services/mastodon-prune-profiles-trap.md)
 - [Mastodon on S3 — Silent Upload Failures](services/mastodon-s3-acl-upload-failures.md)
- [Mastodon — Triaging Crowdfunding / Mention-Spam Accounts](services/mastodon-mention-spam-crowdfunding.md)
 - [Ghost SMTP via Mailgun](services/ghost-smtp-mailgun-setup.md)
 - [Updating n8n Docker](services/updating-n8n-docker.md)
 - [Claude Code Remote Control](services/claude-code-remote-control.md)
--- a/02-selfhosting/monitoring/logwatch-fleet-setup.md
+++ b/02-selfhosting/monitoring/logwatch-fleet-setup.md
@ -235,12 +235,9 @@ sed -i '/^127\.0\.1\.1/d' /etc/hosts && \
 systemctl reload postfix
 ```

-> [!tip] Same drift, different symptom: the Logwatch **title**
-> Hetzner provisions boxes with `<host>-hetzner` as the *system* hostname. When that's never corrected, Logwatch (which reads the live hostname at runtime) mails reports titled `Logwatch for <host>-hetzner` — no postfix involvement needed. Same `hostnamectl set-hostname` + `/etc/hosts` fix as above. See [Logwatch wrong hostname after migration](../../05-troubleshooting/logwatch-wrong-hostname-after-migration.md).
-
 ### 2. Empty `relayhost` quietly forces public-MX delivery

-If `postconf relayhost` returns an empty value, postfix doesn't fail — it just does an MX lookup for the destination domain and tries to deliver directly. For mail to your own mail server, that means going via the **public MX** (the domain's external MX record, e.g., `mail.majorshouse.com → 203.0.113.10:25`) instead of the **internal/Tailscale relay path** the rest of the fleet uses.
+If `postconf relayhost` returns an empty value, postfix doesn't fail — it just does an MX lookup for the destination domain and tries to deliver directly. For mail to your own mail server, that means going via the **public MX** (the domain's external MX record, e.g., `mail.majorshouse.com → 165.227.187.191:25`) instead of the **internal/Tailscale relay path** the rest of the fleet uses.

 The public-MX path is subject to whatever spam filtering, content checks, and trust rules the receiving MX has configured for external traffic. Internal Tailscale-IP traffic typically gets a faster trust shortcut (e.g., bypass spamchk pipe). So this single configuration drift causes one host's mail to land in a different code path than its siblings — and then silently get filtered.

--- a/02-selfhosting/security/ansible-flat-playbooks-to-roles.md
+++ b/02-selfhosting/security/ansible-flat-playbooks-to-roles.md
@ -1,130 +0,0 @@
---
-title: "Migrating Flat Ansible Playbooks to Roles (Safely)"
-domain: selfhosting
-category: security
-tags: [ansible, roles, refactor, fleet, migration, fail2ban, infrastructure]
-status: published
-created: 2026-06-18
-updated: 2026-06-18
---
-# Migrating Flat Ansible Playbooks to Roles (Safely)
-
-## Overview
-
-A fleet repo tends to grow a sprawl of flat `configure_*.yml` playbooks — one per subsystem, plus near-duplicates for variants (e.g. ~10 `configure_fail2ban_*` playbooks), all sharing a single overloaded top-level `templates/` directory. It works, but it resists reuse: there is no clean `defaults/` precedence, no encapsulation, and no way to compose a host's full configuration in one place.
-
-Ansible **roles** fix this — but migrating a *live* fleet is where it gets dangerous. The risk is not the refactor itself; it's accidentally changing deployed behaviour while you "just reorganize." This article covers the incremental, regression-free approach used to migrate an 11-host fleet, including the two techniques that keep it safe: **byte-identical migration** and **capture-based reconciliation**.
-
-> This is a process/pattern article. For the specific roles in this fleet, see the internal runbook. The techniques here generalize to any flat-playbook → role migration.
-
-## Decide What Becomes a Role vs. What Stays a Playbook
-
-Not everything should be a role. Draw the line by purpose:
-
-| Becomes a role | Stays a playbook |
-|---|---|
-| Reusable host **configuration** (a subsystem you converge to a desired state) | **Ops / one-off** actions: `update`, `reboot`, `harden`, `bootstrap`, `provision`, `fix_*`, `verify_*` |
-| Has templates/files, defaults, handlers | Orchestrators that just `import_playbook` other things |
-| Applied repeatedly and idempotently | Run-once or run-as-needed remediation |
-
-Roles get the standard `roles/<name>/` layout (`tasks/`, `defaults/`, `handlers/`, `templates/`, `files/`, `meta/`). Name them after the **subsystem noun** (`fail2ban`, `clamav`, `firewall`) — drop the `configure_` verb prefix.
-
-## The Incremental Loop (one role per branch)
-
-Migrate **one subsystem per branch** and validate before merging. This keeps every change small enough to diff by eye and roll back cleanly:
-
-1. `git mv` the templates/files into `roles/<name>/` so **git tracks them as renames** (history preserved, 100% rename score).
-2. Move task bodies into `roles/<name>/tasks/` (split by lifecycle: install → service → config → verify).
-3. Lift tunables into `roles/<name>/defaults/main.yml`; keep per-host overrides in `group_vars`/`host_vars`.
-4. Add a thin entry playbook `<name>.yml` (`hosts: <group>` + `roles: [<name>]`).
-5. Validate with `--check --diff` against a single host **before** merging.
-6. Merge, then move to the next subsystem.
-
-## Technique 1: Byte-Identical Migration
-
-When the goal is "reorganize without changing behaviour," **prove** it. After moving a playbook into a role, the rendered task bodies should be identical to the original. Verify with a normalized diff against `main`:
-
-```bash
-# Compare the role's task body against the original flat playbook,
-# ignoring only comments/whitespace you intend to change.
-git show main:configure_clamav.yml > /tmp/old.yml
-# ...extract the task list from roles/clamav/tasks/*.yml and diff
-diff <(yq '.[] | .tasks' /tmp/old.yml) <(cat roles/clamav/tasks/*.yml)
-```
-
-The acceptance bar: `--check --diff` against a real host returns **`changed=0`** (or only the diffs you explicitly intended, like a doc-comment line). If a "faithful" migration shows unexpected `changed=N`, you altered behaviour — stop and reconcile before merging. Templates moved via `git mv` show as **100% renames** in `git show --stat`, which is your proof the deployed content is unchanged.
-
-## Technique 2: Consolidating Near-Duplicates with Feature Flags
-
-The big win is collapsing a family of near-duplicate playbooks (the ~10 `configure_fail2ban_*`) into **one role with flag-gated task files**:
-
-```yaml
-# group_vars/<group>.yml — hosts self-select which jails/components they get
-fail2ban_jail_sshd: true
-fail2ban_jail_wordpress: true
-fail2ban_jail_nginx_bad_request: false
-```
-
-```yaml
-# roles/fail2ban/tasks/main.yml
- import_tasks: jail_wordpress.yml
-  when: fail2ban_jail_wordpress | default(false)
-```
-
-> **Critical gotcha — key flags to inventory GROUPS, not `ansible_os_family`.** It is tempting to gate OS-specific task files on `ansible_os_family == 'Debian'`. Don't. Inventory groups frequently include hosts the *original playbooks deliberately excluded* (e.g. a LAN-only Debian box that should get the network-wait step but **not** the public SSH bind, or a WSL host in the `fedora` group that must be skipped). Keep the original curated host patterns and set the flag per play/group. Keying on `os_family` silently widens a play's host set and is exactly how a "refactor" pushes config to a host that never had it.
-
-## Technique 3: Capture-Based Reconciliation (the safety net)
-
-This is the one that prevents an outage. Sometimes a role gets written as a **fresh re-implementation** of a subsystem rather than a faithful move — a cleaner `jail.local`, new drop-ins, a different default set. It may even be merged into `site.yml`. The trap: that role has **never been rolled out**, and its config *diverges* from what's actually deployed.
-
-Running it would push divergent config to a live, security-sensitive subsystem (intrusion protection, firewall) across the whole fleet on the next `harden.yml`.
-
-The check that catches it:
-
-```bash
-ansible-playbook fail2ban.yml --check --diff --limit <host>
-# Divergent role => changed=8-12 per host + failures (missing filters/timers)
-# Faithful role  => changed=0, failed=0
-```
-
-**Capture-based reconciliation** is the fix: instead of pushing the role's idea of "correct," bring the **role into parity with the live, working config** first. Capture what's actually deployed, fold it into the role's templates/defaults until `--check` is clean fleet-wide, *then* switch the orchestrator over and retire the old playbooks. Order of operations:
-
-1. **Decide the source of truth** — the live config or the new role. For security subsystems, the live (working) config wins.
-2. **Reconcile** the role to match live until `--check` shows `changed=0, failed=0` on every host.
-3. **Roll out host-by-host** with real runs; verify the service restarts cleanly and (for fail2ban) jails are actually active.
-4. **Only then** delete the old playbooks, rewire `harden.yml`/`bootstrap.yml`, and remove the orphaned top-level templates.
-
-Never delete the old mechanism until the new one is proven converged everywhere. "It's in `site.yml`" is not the same as "it's been rolled out."
-
-## Composition: `site.yml`, `harden.yml`, `bootstrap.yml`
-
-Once subsystems are roles, compose them with thin orchestrators that `import_playbook` the role entry points — so each subsystem keeps a **single source of truth** for its host mapping:
-
-```yaml
-# site.yml — day-to-day fleet convergence, in dependency order
- import_playbook: swap.yml
- import_playbook: tailscale.yml
- import_playbook: ssh_hardening.yml
- import_playbook: firewall.yml
- import_playbook: fail2ban.yml
- import_playbook: clamav.yml
-```
-
-Order matters: base layer (swap) → networking (tailscale) → access (ssh_hardening) → perimeter (firewall) → intrusion protection (fail2ban). Bootstrap-only roles (guest agent, root password, provisioning prerequisites) belong in `bootstrap.yml`, not `site.yml`.
-
-## Verification Checklist
-
- [ ] Templates moved with `git mv` (show as 100% renames)
- [ ] `--check --diff` on a real host = `changed=0` (or only intended diffs)
- [ ] Consolidation flags keyed to **inventory groups**, not `ansible_os_family`
- [ ] Re-implemented roles reconciled to live parity **before** rollout (no surprise `changed=N`)
- [ ] Security subsystems rolled out host-by-host with service-active verification
- [ ] Old playbooks/templates deleted **only after** the role is converged fleet-wide
- [ ] Orchestrators (`site.yml`/`harden.yml`/`bootstrap.yml`) rewired; stale references swept
-
-## Related
-
- [SSH Hardening Fleet-Wide with Ansible](ssh-hardening-ansible-fleet.md)
- [ClamAV Fleet Deployment with Ansible](clamav-fleet-deployment.md)
- [Firewall Hardening with firewalld on Fedora Fleet](firewalld-fleet-hardening.md)
- [Standardizing unattended-upgrades with Ansible](ansible-unattended-upgrades-fleet.md)
--- a/02-selfhosting/services/mastodon-mention-spam-crowdfunding.md
+++ b/02-selfhosting/services/mastodon-mention-spam-crowdfunding.md
@ -1,170 +0,0 @@
---
-title: "Mastodon — Triaging Crowdfunding / Mention-Spam Accounts"
-description: How to tell broadcast fundraising solicitation from genuine mentions, investigate the account and its origin instance with SQL + nodeinfo, and pick a proportionate moderation action.
-tags:
-  - mastodon
-  - moderation
-  - abuse
-  - federation
-  - self-hosting
-created: 2026-06-22
-updated: 2026-06-22
---
-
-# Mastodon — Triaging Crowdfunding / Mention-Spam Accounts
-
-If you run a Mastodon instance, sooner or later you (or your users) start getting tagged by accounts you've never interacted with, posting donation appeals with a link and a wall of hashtags. Some are real people in desperate situations; some are recycled-link scams. Either way, when an account is **broadcasting a solicitation at you** rather than replying to you, it's a moderation question, not a conversation.
-
-This article is the runbook for telling the two apart, investigating both the **account** and its **origin instance**, and choosing an action that's proportionate instead of nuking eight years of legit federation over two bad actors.
-
-## TL;DR
-
- A mention is **broadcast spam**, not engagement, when it's a *standalone post* (not a reply) that *tags a large fixed list* of accounts and carries a *donation link*, usually from a *throwaway profile* on an *open-registration instance*.
- Investigate before acting: pull the account's age/stats/bio and check whether the post is a reply or a 40-way blast (SQL below). Profile the origin instance via its public `nodeinfo`.
- **Default action is an account-level block**, which also federates and removes their follow of you. Escalate to domain-limit / domain-block only when *one instance* produces *repeat offenders*.
- Keep a log so single incidents that are actually a pattern become visible.
-
-## Signals that a mention is broadcast solicitation
-
-Score it on how many of these hold:
-
-| Signal | Why it matters |
-|---|---|
-| **Standalone post, not a reply** (`in_reply_to_account_id IS NULL`) but still tags you | They're broadcasting, not responding |
-| **Tags a large fixed recipient list** (e.g. 40+) | Mass distribution; the same list reused across senders = coordination |
-| **Donation link** in post or bio (`chuffed.org`, `gofundme`, `paypal.me`, `ko-fi`) | The payload |
-| **Throwaway profile** — days old, few followers, follows you but you don't follow back | Disposable, baiting a profile view |
-| **Mass-follow ratio** — following thousands / few hundred followers | Engagement farming |
-| **"I am not a scammer" disclaimer** in bio | Known red-flag phrase |
-| **Origin instance: open registration, no approval** | Easy throwaway-account farm |
-
-> [!warning] Judgment, not a purity test
-> Many of these accounts are real people. The goal is not to adjudicate need — it's to stop *broadcast solicitation aimed at you* and track the *source instances*. Prefer the lightest action that stops it.
-
-## Investigate the account
-
-Connect to the DB on the instance:
-
-```bash
-ssh <your-mastodon-host>
-sudo -u postgres psql mastodon_production
-```
-
-**Profile + stats for a suspect** (age, post count, follower ratio, bio):
-
-```sql
-SELECT a.username||'@'||a.domain,
-       to_char(a.created_at,'YYYY-MM-DD') AS first_seen_locally,
-       st.statuses_count, st.followers_count, st.following_count,
-       left(regexp_replace(COALESCE(a.note,''),'<[^>]+>','','g'),200) AS bio
-FROM accounts a LEFT JOIN account_stats st ON st.account_id=a.id
-WHERE a.domain='<INSTANCE>' AND a.username='<HANDLE>';
-```
-
-**Is the mention a reply or a blast?** `standalone=t` with a high `num_tagged` is the tell:
-
-```sql
-SELECT a.username, to_char(s.created_at,'YYYY-MM-DD HH24:MI') AS posted,
-       s.in_reply_to_account_id IS NULL AS standalone,
-       (SELECT count(*) FROM mentions mm WHERE mm.status_id=s.id) AS num_tagged
-FROM mentions m JOIN statuses s ON s.id=m.status_id
-JOIN accounts a ON a.id=s.account_id
-JOIN accounts me ON me.id=m.account_id AND me.username='<YOU>' AND me.domain IS NULL
-WHERE a.username='<HANDLE>' AND a.domain='<INSTANCE>'
-ORDER BY s.created_at DESC;
-```
-
-**All recent direct mentions of you** (sweep for the wider pattern):
-
-```sql
-SELECT to_char(n.created_at,'YYYY-MM-DD HH24:MI') AS when,
-       a.username||COALESCE('@'||a.domain,'@local') AS who,
-       COALESCE(s.uri,'') AS uri,
-       left(regexp_replace(COALESCE(s.text,''),'<[^>]+>','','g'),200) AS body
-FROM notifications n
-JOIN accounts recip ON recip.id=n.account_id AND recip.username='<YOU>' AND recip.domain IS NULL
-JOIN accounts a ON a.id=n.from_account_id
-LEFT JOIN mentions m ON m.id=n.activity_id AND n.activity_type='Mention'
-LEFT JOIN statuses s ON s.id=m.status_id
-WHERE n.type='mention' ORDER BY n.created_at DESC LIMIT 40;
-```
-
-## Profile the origin instance
-
-Don't judge an instance by one bad account. Pull its public metadata — no auth needed:
-
-```bash
-# Software, version, user counts, registration policy
-NI=$(curl -s https://<INSTANCE>/.well-known/nodeinfo | python3 -c 'import sys,json;print(json.load(sys.stdin)["links"][-1]["href"])')
-curl -s "$NI" | python3 -m json.tool         # software, openRegistrations, usage.users
-
-# Title, contact/admin, rules, registration approval flag
-curl -s https://<INSTANCE>/api/v2/instance | python3 -m json.tool
-```
-
-What to read off it:
-
- **`openRegistrations: true` + `approval_required: false`** → throwaway-account farm; expect more of the same.
- **`totalUsers` vs `activeMonth`** → a huge dormant base is typical of sign-up-and-leave farms.
- **Federation age on your side** — how long you've known the instance, how many of its accounts you cache. A long, broad relationship argues *against* a domain block.
- **The instance's own rules** — many ban "backlink accounts" / harassment, which the mass-tag fundraising violates. That makes **reporting to its admin a legitimate, in-policy path.**
-
-```sql
-- What your instance already knows about the domain
-SELECT (SELECT count(*) FROM accounts WHERE domain='<INSTANCE>') AS known_accounts,
-       (SELECT count(*) FROM statuses s JOIN accounts a ON a.id=s.account_id WHERE a.domain='<INSTANCE>') AS cached_statuses,
-       (SELECT to_char(min(created_at),'YYYY-MM-DD') FROM accounts WHERE domain='<INSTANCE>') AS first_seen,
-       (SELECT count(*) FROM domain_blocks WHERE domain='<INSTANCE>') AS is_domain_blocked;
-```
-
-## The escalation ladder
-
-| Level | Action | Effect | When |
-|---|---|---|---|
-| 1 | **Mute** | You stop seeing them; silent | Borderline; you don't want to cut them off |
-| 2 | **Block (account)** | Cuts mentions, removes their follow, federates to their instance | **Default first action** |
-| 3 | **Report** to source admin | Forwards the offending posts to their moderators | Repeat or egregious; in-policy on most instances |
-| 4 | **Domain-limit (silence)** | Their posts show only if you follow that account | One instance, multiple offenders |
-| 5 | **Domain-block (suspend)** | Severs all known accounts + federation | Instance is predominantly abuse |
-
-### Blocking from a user account (federates + removes follow)
-
-There is no `tootctl accounts block`. Do it through the model's `BlockService` so it tears down the relationship and federates correctly:
-
-```ruby
-# run as the mastodon user:
-#   sudo -u mastodon bash -c 'cd /home/mastodon/live && RAILS_ENV=production bin/rails runner /tmp/block.rb'
-me = Account.find_by(username: "<YOU>", domain: nil)
-%w[Handle1 Handle2].each do |u|
-  t = Account.find_by(username: u, domain: "<INSTANCE>")
-  next puts("NOTFOUND #{u}") if t.nil?
-  BlockService.new.call(me, t)
-  puts "BLOCKED #{u} blocking=#{me.blocking?(t)} they_follow_me=#{t.following?(me)}"
-end
-```
-
-`blocking=true` with `they_follow_me=false` confirms the block landed and the follow was severed.
-
-### Instance-level actions
-
-Domain-limit / domain-block live in the admin UI (**Moderation → Federation**) or via `tootctl`:
-
-```bash
-# Silence (limit) — posts hidden unless followed
-RAILS_ENV=production bin/tootctl domains ... # or set severity=silence in the admin UI
-# Suspend (block) the whole instance
-RAILS_ENV=production bin/tootctl ... # admin UI "Add domain block" is the safe path
-```
-
-> [!tip] Reach for the lightest hammer
-> A domain block is rarely the right first move against an established instance — you lose every legit account and years of federation to swat a couple of accounts. Block the accounts, report them to the source admin, and only escalate the *instance* when it demonstrates a sustained, multi-actor pattern.
-
-## Keep a log
-
-Track offenders and source instances over time so a "one-off" that's actually a campaign becomes visible, and so domain-level decisions are evidence-based. A simple table — date, account, instance, signals, action — plus an instance-watch table with each source's registration policy and offender count is enough.
-
-## Related
-
- [Mastodon `--prune-profiles` Trap](mastodon-prune-profiles-trap.md)
- [Mastodon DB Maintenance](mastodon-db-maintenance.md)
- [Mastodon Federation](mastodon-federation.md)
--- a/02-selfhosting/storage-backup/restic-b2-fleet-backups.md
+++ b/02-selfhosting/storage-backup/restic-b2-fleet-backups.md
@ -1,137 +0,0 @@
---
-title: "App-Consistent Fleet Backups with restic + Backblaze B2"
-domain: selfhosting
-category: storage-backup
-tags: [restic, backblaze, b2, backup, ansible, systemd, postgresql, mysql, sqlite, docker, disaster-recovery]
-status: published
-created: 2026-06-19
-updated: 2026-06-19
---
-
-# App-Consistent Fleet Backups with restic + Backblaze B2
-
-A repeatable pattern for backing up a mixed fleet (Ubuntu + Fedora, VPS + homelab, bare services + Docker) to Backblaze B2 with [restic](https://restic.net) — encrypted, deduplicated, and **app-consistent** (databases are dumped before the snapshot, not copied live). Driven by Ansible and a per-host `systemd` timer.
-
-## The Short Answer
-
-Per host, nightly: **dump every database to a staging dir → `restic backup` that staging dir plus the data paths → apply retention → wipe staging.** A monthly timer runs `restic prune`. Anything that fails emails the admin. One B2 bucket holds a separate repo per host at `b2:<bucket>:<hostname>`.
-
-Retention is `--keep-daily 7 --keep-weekly 4 --keep-monthly 6` (~6 months of history).
-
-## Why dump databases first
-
-Copying a live database's files (`/var/lib/mysql`, a running SQLite file, a Postgres data dir) gives you a *crash-consistent* copy at best — restorable only if you're lucky. Logical dumps are guaranteed consistent:
-
- **MySQL / MariaDB:** `mysqldump --single-transaction --routines --triggers --databases <db>`
- **PostgreSQL:** `pg_dump -Fc <db>` (custom format) via the `postgres` system user (peer auth)
- **SQLite:** `sqlite3 <file> ".backup '<out>'"` — uses the online backup API, safe against a running writer
- **Dockerized DBs:** `docker exec <container> sh -c '<dump cmd>'`, letting the container's own shell expand its root-password env var
-
-restic then backs up the dump files (which dedupe beautifully — only the changed blocks upload each night).
-
-## Repository layout
-
- **One private B2 bucket** (e.g. `majorshouse-backups`).
- **One repo per host:** `b2:majorshouse-backups:<hostname>`.
- The application key needs **read + write + delete** for the bucket. restic deletes objects during `forget`/`prune`, so a pure *append-only* key will break retention. (True append-only requires splitting `forget`/`prune` onto a separate maintenance key — a worthwhile hardening step, but not the default.)
- Credentials live in an `EnvironmentFile` (`/etc/restic/restic-env`, mode `0600`, root): `RESTIC_REPOSITORY`, `RESTIC_PASSWORD`, `B2_ACCOUNT_ID`, `B2_ACCOUNT_KEY`.
-
-## The backup script (shape)
-
-```bash
-set -uo pipefail
-STAGING=/var/backups/restic-staging
-rm -rf "$STAGING"; mkdir -p "$STAGING"; chmod 700 "$STAGING"
-
-# per-engine dumps into $STAGING ...
-mysqldump --single-transaction --routines --triggers --databases wordpress > "$STAGING/mysql-wordpress.sql"
-sudo -u postgres pg_dump -Fc mastodon_production            > "$STAGING/pg-mastodon_production.dump"
-sqlite3 /opt/phantombot/config/phantombot.db ".backup '$STAGING/sqlite-phantombot.db'"
-
-restic backup --tag fleet-backup --host "$(hostname -s)" \
-  "$STAGING" /var/www /etc/letsencrypt --exclude /path/to/already-offsite/media
-
-restic forget --keep-daily 7 --keep-weekly 4 --keep-monthly 6
-rm -rf "$STAGING"
-```
-
-Wrap each step so a failure mails the admin and aborts (don't silently back up a half-state). On hosts where the `mail` CLI is absent, pipe a message to `/usr/sbin/sendmail -t` instead.
-
-## systemd units
-
-A oneshot service + a timer. Stagger `OnCalendar` per host to spread B2 load, and **always set `RESTIC_CACHE_DIR`** (see Gotchas):
-
-```ini
-# restic-backup.service
-[Service]
-Type=oneshot
-EnvironmentFile=/etc/restic/restic-env
-Environment=RESTIC_CACHE_DIR=/var/cache/restic
-ExecStart=/usr/local/sbin/restic-backup.sh
-Nice=10
-IOSchedulingClass=idle
-```
-
-```ini
-# restic-backup.timer
-[Timer]
-OnCalendar=*-*-* 02:30:00
-RandomizedDelaySec=20m
-Persistent=true
-[Install]
-WantedBy=timers.target
-```
-
-A second `restic-prune.timer` runs `restic prune` monthly (`OnCalendar=*-*-01 04:00:00`).
-
-## Restore procedure
-
-The whole point. From the target host (or any host with the repo creds):
-
-```bash
-# load repo + B2 creds without echoing them
-set -a; . /etc/restic/restic-env; set +a
-
-restic snapshots                      # list; note the snapshot ID or use 'latest'
-
-# restore specific paths to a scratch dir (never restore in place blindly)
-restic restore latest --target /tmp/restore \
-  --include /var/backups/restic-staging \
-  --include /var/www/html/wp-config.php
-
-# verify before doing anything with it
-ls -la /tmp/restore/var/backups/restic-staging/
-head -1 /tmp/restore/var/backups/restic-staging/mysql-wordpress.sql   # "-- MySQL dump 10.13 ..."
-```
-
-To recover a database, restore the dump then load it: `mysql <db> < mysql-<db>.sql`, `pg_restore -d <db> pg-<db>.dump`, or copy the SQLite file back. **Test restores periodically** — a backup you've never restored is a hope, not a backup. Restore the highest-stakes data (password manager, mail) first in any drill.
-
-## Adding a host
-
-1. Add it to the `backups` inventory group.
-2. Give it a `host_vars` scope — which DBs to dump and which paths to back up:
-
-   ```yaml
-   restic_backup_oncalendar: "*-*-* 02:40:00"   # stagger
-   restic_mysql_dbs: [castopod_db]
-   restic_paths: [/var/www/html/castopod]
-   restic_excludes: [/var/www/html/castopod/public/media]   # already offsite
-   ```
-3. Run the playbook against that host. The role installs restic, deploys the script + units, `restic init`s the repo if absent, and enables the timers.
-
-## Gotchas & Notes
-
- **`RESTIC_CACHE_DIR` is mandatory under systemd.** systemd services run with no `$HOME`, so restic can't find its cache and warns *"unable to locate cache directory: neither $XDG_CACHE_HOME nor $HOME are defined"* — and re-reads **every file** each run (no incremental). Point it at `/var/cache/restic` in the unit.
- **`sqlite3` may not be installed.** A host that runs a SQLite-backed app (e.g. a bot) often lacks the `sqlite3`/`sqlite` CLI. Install it where `restic_sqlite_paths` is set, or the `.backup` step fails.
- **Docker DB password env-var names vary.** Don't assume: the MariaDB image may use `MYSQL_ROOT_PASSWORD` (not `MARIADB_ROOT_PASSWORD`), and a Postgres container's superuser is whatever `POSTGRES_USER` is set to — reference `"$POSTGRES_USER"` rather than hardcoding `postgres`. Check with `docker exec <c> sh -c 'env | grep -oE "^(MYSQL|MARIADB|POSTGRES)_[A-Z_]*"'` (name only).
- **B2 key needs delete capability.** Otherwise `forget`/`prune` fail. Scope the key to the bucket; reach for per-host `namePrefix`-restricted keys for blast-radius isolation.
- **Exclude data that's already offsite.** Media already synced to object storage (S3/B2 via the app or `rclone`) should be `--exclude`d so you don't pay to store it twice.
- **First upload is slow, the rest are fast.** The initial snapshot reads and uploads everything; subsequent runs only ship changed blocks. For a large first run, fire it detached and watch from a transient unit that emails you on completion.
- **Keep secrets out of git.** The repo password and B2 key belong in an Ansible vault (committed encrypted), referenced into the role — never in plaintext vars.
- **Changing a host's backup paths starts a new snapshot group.** `restic forget` groups snapshots by `host`+`paths` by default, so adding or removing a path on an existing host creates a *separate* lineage: the old path-set and the new one each retain their own 7d/4w/6m snapshots, and `restic snapshots` shows both. Expected, not a bug — but it means the old-path snapshots age out on their own schedule rather than being superseded. To collapse everything into one retention bucket, run `forget` with `--group-by host` (be deliberate: it then treats *any* path-set on that host as the same group).
-
-## See Also
-
- [rsync Backup Patterns](rsync-backup-patterns.md)
- [SnapRAID & MergerFS Storage Setup](../../01-linux/storage/snapraid-mergerfs-setup.md)
- [restic documentation](https://restic.readthedocs.io)
--- a/04-streaming/plex/hevc-vaapi-batch-encode.md
+++ b/04-streaming/plex/hevc-vaapi-batch-encode.md
@ -5,7 +5,7 @@ category: plex
 tags: [plex, ffmpeg, hevc, vaapi, amd, gpu, encode, storage, rx480]
 status: published
 created: 2026-05-15
-updated: 2026-06-05
+updated: 2026-05-22
 ---
 # HEVC Batch Re-Encode for Plex Using VAAPI (AMD GPU)

@ -121,7 +121,7 @@ Each file logs:

 ### Space guard

-The script aborts if free space on the Plex volume drops below 10GB (`MIN_FREE_GB`). Worst-case headroom needed is `source_size + tmp_size` simultaneously — on a 4GB source file that's ~8GB peak. Note: the space check only runs at the **start** of each encode, not during — a large file can still consume significant disk mid-encode.
+The script aborts if free space on the Plex volume drops below 20GB (`MIN_FREE_GB`). Worst-case headroom needed is `source_size + tmp_size` simultaneously — on a 4GB source file that's ~8GB peak.

 ---

@ -278,54 +278,3 @@ local tmp="${dir}/${safe_stem}.hevc.tmp.${ext}"

 After patching, delete the affected entries from `hevc_failed.txt` (or leave them — they'll be re-queued on the next run since they're not in `hevc_done.txt`) and restart the batch.

---
-
-### Many files failing: output larger than source (streaming content)
-
-**Symptom:** A large portion of the queue ends up in `hevc_failed.txt` with log lines like:
-
-```
-[2026-06-05 ...] Output: 4.7G  savings=0 (output larger than source)
-[2026-06-05 ...] WARN: output is larger than source — skipping swap, keeping original
-```
-
-**Cause:** These files are YouTube downloads or streaming archives (Giant Bomb, Twitch VODs, etc.) that were already encoded with an efficient H.264 encoder (typically YouTube's VP9-to-AVC pipeline or a broadcast H.264 encoder at a reasonable bitrate). VAAPI HEVC encoding at QP 28 on a Polaris GPU (RX 480/580) is a hardware encoder with limited rate control precision — it cannot beat a well-tuned software H.264 encode on already-compressed talking-head/gaming content. The output reliably comes out 15–25% *larger* than the source.
-
-The script handles this correctly: it detects output > source, deletes the tmp, keeps the original, and writes to `hevc_failed.txt`. The files are not corrupted. However, without the `already_failed()` guard, the script will re-attempt these files on every queue rebuild, wasting CPU time and briefly consuming 4–8 GB of disk per failed attempt.
-
-**Fix — add `already_failed()` skip logic:**
-
-Patch `~/hevc_batch.sh` to skip files already in `hevc_failed.txt`:
-
-```bash
-# After the existing already_done() function, add:
-already_failed() {
-  [[ -f "$FAILED" ]] && grep -qF "$1" "$FAILED"
-}
-
-# In build_queue(), after the already_done "$f" && continue line:
-already_failed "$f" && continue
-
-# In the main loop, after the already_done "$file" check:
-already_failed "$file" && { log "SKIP (already failed): $file"; continue; }
-```
-
-After patching, the batch will skip all 132+ known-bad files on the next pass and only attempt fresh queue entries.
-
-**Tuning options to improve savings on dense content:**
-
- Lower QP: `--qp 24` or `--qp 22` — more aggressive quality target, better chance of beating source size. Trade-off: larger output for files that do compress.
- Accept the failures: for streaming content archives, the source is already "good enough." Only files that are genuinely oversized H.264 (old stream captures at very high bitrate) will benefit from HEVC re-encode.
-
-**Identifying which files are worth encoding:**
-
-```bash
-# Show source bitrate for all queued files — high-bitrate sources are candidates
-while IFS= read -r f; do
-  bitrate=$(ffprobe -v quiet -show_entries format=bit_rate -of csv=p=0 "$f" 2>/dev/null)
-  echo "$bitrate $f"
-done < ~/hevc_queue.txt | sort -rn | head -20
-```
-
-Files above ~8,000 kbits/s are typically good encode candidates. Files at 3,000–5,000 kbits/s (typical YouTube/Twitch 1080p) will usually fail.
-
--- a/05-troubleshooting/ansible-reboot-become-timeout-wsl2.md
+++ b/05-troubleshooting/ansible-reboot-become-timeout-wsl2.md
@ -1,103 +0,0 @@
---
-title: "Ansible reboot.yml: become Timeout on WSL2 Hosts (Exclude Them)"
-domain: troubleshooting
-category: ansible
-tags: [ansible, wsl, wsl2, windows, reboot, become, privilege-escalation, openssh, inventory]
-status: published
-created: 2026-06-12
-updated: 2026-06-12
---
-
-# Ansible reboot.yml: become Timeout on WSL2 Hosts (Exclude Them)
-
-## Problem
-
-Running a reboot play across a Fedora fleet that includes a WSL2 "host" fails on the WSL2 box at privilege escalation — before the reboot command ever runs:
-
-```console
-$ ansible-playbook reboot.yml --limit fedora
-
-TASK [Reboot the server] *******************************************************
-changed: [majorhome]
-changed: [majorlab]
-changed: [majormail]
-changed: [majordiscord]
-[ERROR]: Task failed: Action failed: Timeout (62s) waiting for privilege
-escalation prompt:
-fatal: [majorrig-wsl]: FAILED! => {"changed": false,
-  "msg": "Timeout (62s) waiting for privilege escalation prompt:",
-  "reboot": false}
-```
-
-Every real server reboots fine. Only the WSL2 host fails, and `"reboot": false` confirms the shutdown command never executed.
-
-## Cause
-
-Two independent problems, either of which is enough to break a reboot play against WSL2:
-
-1. **WSL2 has no real reboot semantics.** `ansible.builtin.reboot` issues a shutdown, then blocks up to `reboot_timeout` (e.g. 900s) waiting for SSH to come back. A WSL2 distro doesn't reboot — it just terminates, and nothing relaunches it automatically. The task would hang the full timeout and then fail.
-
-2. **`become` times out over the Windows OpenSSH → WSL2 bridge.** When a WSL2 box is reached as `majorlinux@host` through Windows' built-in OpenSSH Server (which forwards into WSL via the default shell), Ansible's privilege-escalation handshake watches the SSH stream for the sudo prompt/success marker. Across the Windows-intercept pty, that marker detection stalls until the 62s `timeout`. This happens **even with passwordless sudo** — `NOPASSWD` is configured and correct; Ansible simply never sees the handshake complete.
-
-The error surfaces as #2 (it fails at escalation first), but #1 is the deeper reason WSL2 doesn't belong in a reboot play at all.
-
-## Solution
-
-**Exclude the WSL group from the reboot play.** A WSL2 instance is a managed *workstation environment*, not a server — it belongs in package/update plays but not in server lifecycle operations like reboot.
-
-Scope the play to exclude the `wsl` group so even a broad `--limit` skips it:
-
-```yaml
-# reboot.yml
- name: Reboot servers
-  hosts: all:!wsl     # was: hosts: all
-  become: true
-  tasks:
-    - name: Reboot the server
-      ansible.builtin.reboot:
-        msg: "Reboot initiated by Ansible"
-        reboot_timeout: 900
-```
-
-This assumes your WSL2 hosts are in a dedicated inventory group:
-
-```yaml
-wsl:
-  hosts:
-    majorrig-wsl:
-      ansible_host: 100.98.47.29
-```
-
-Verify the targeting before running — the WSL host should be gone:
-
-```console
-$ ansible-playbook reboot.yml --limit fedora --list-hosts
-  play #1 (all:!wsl): Reboot servers
-    hosts (4):
-      majorhome
-      majorlab
-      majordiscord
-      majormail
-```
-
-### Rebooting the WSL2 instance itself
-
-When you genuinely need to "reboot" WSL2, do it from the Windows side — not Ansible:
-
-```powershell
-wsl --shutdown
-```
-
-The distro relaunches on next access (next SSH login or `wsl` invocation). WSL2 stays in `update.yml` (dnf upgrades) and other package plays; it's only excluded from reboot and other server-specific roles.
-
-## Why not just fix the become timeout?
-
-You *could* raise `timeout` or tweak the become flow, but it doesn't address problem #1 — even a successful escalation would leave the reboot task hanging the full `reboot_timeout` because WSL2 never comes back the way the module expects. Excluding WSL from server lifecycle plays is the correct fix, not a workaround.
-
-## Related
-
- [Ansible: ansible.cfg Ignored on WSL2 Windows Mounts](ansible-wsl2-world-writable-mount-ignores-cfg.md)
- [Windows OpenSSH: WSL Default Shell Breaks Remote Commands](networking/windows-openssh-wsl-default-shell-breaks-remote-commands.md)
- [Ansible: SSH Timeout During dnf upgrade on Fedora Hosts](ansible-ssh-timeout-dnf-upgrade.md)
-</content>
-</invoke>
--- a/05-troubleshooting/claude-code-keychain-prompt-recurring-macos.md
+++ b/05-troubleshooting/claude-code-keychain-prompt-recurring-macos.md
@ -1,73 +0,0 @@
---
-title: "Claude Code Keychain Prompt Keeps Reappearing on macOS (ACL Invalidation)"
-domain: troubleshooting
-category: claude-code
-tags: [claude-code, authentication, oauth, keychain, macos, acl, security]
-status: published
-created: 2026-06-15
-updated: 2026-06-15
---
-
-# Claude Code Keychain Prompt Keeps Reappearing on macOS (ACL Invalidation)
-
-## Symptom
-A macOS dialog repeatedly pops up:
-
-> **security wants to access key "Claude Code-credentials" in your keychain.**
-> To allow this, enter the "login" keychain password. — `[Always Allow] [Deny] [Allow]`
-
-The tell-tale sign: it **comes back even after clicking "Always Allow"** — the usual "trust forever" button doesn't make it stop. Login still works; it's the *permission prompt* that won't quiet down. This is **distinct** from [Claude Code won't log in](claude-code-warp-login-corrupt-keychain-credential.md), where the stored credential is corrupt and login itself fails.
-
-## Cause
-Claude Code stores its OAuth token in the macOS **login keychain** as `Claude Code-credentials`, read via `/usr/bin/security`. macOS binds an "Always Allow" grant (the keychain item's ACL) to the **code-signing identity** of the requesting binary. That grant is silently invalidated when:
-
- **Claude Code updates** — the new binary's signature no longer matches the saved ACL. This is the most common trigger (see claude-code issues #48162, #9403).
- **The credential item is recreated on token refresh** — wipes the ACL.
- **Post-reboot keychain churn** — right after boot, the just-unlocked login keychain plus a concurrent token refresh can race ahead of the ACL settling, producing a *burst* of prompts that stops once a clean refresh completes.
-
-It is **not** a lock-timeout issue if `security show-keychain-info` reports `no-timeout` (below).
-
-## Triage (non-destructive — these do not trigger a prompt)
-```bash
-# Confirm the item exists (metadata only; no secret read)
-security find-generic-password -l "Claude Code-credentials" | grep -E "svce|acct"
-
-# Confirm the login keychain isn't auto-locking
-security show-keychain-info ~/Library/Keychains/login.keychain-db
-# -> "no-timeout" means it won't relock; so recurring prompts = ACL invalidation, not locking
-```
-
-## Fixes
-
-### One-off burst (e.g. right after a reboot)
-Click **Always Allow** (not Allow) once a clean token refresh has completed. With a `no-timeout` keychain the grant then holds, and the post-boot prompt storm usually self-clears within a minute. *Observed exactly this on MajorAir 2026-06-15 — a reboot triggered a burst that stopped on its own.*
-
-### Keeps returning after updates (durable) — reset the credential
-Deleting and re-creating the item rebinds a fresh ACL to the current binary. Costs one re-login.
-```bash
-security delete-generic-password -s "Claude Code-credentials"
-# then re-authenticate inside Claude Code: /login   (or relaunch `claude`)
-```
-
-### Bypass the keychain entirely (workaround)
-Claude Code falls back to `~/.claude/.credentials.json` in non-GUI contexts (SSH, tmux). On a local Mac this can be repurposed to stop keychain prompts for good:
-```bash
-# pipe straight to the file — never echo the token into a shared terminal
-security find-generic-password -s "Claude Code-credentials" -w > ~/.claude/.credentials.json
-chmod 600 ~/.claude/.credentials.json
-security delete-generic-password -s "Claude Code-credentials"
-```
-**Caveats:**
- Token is then **plaintext at rest** (mode 600) instead of encrypted in the keychain.
- A future Claude Code update may rewrite the keychain item.
- GUI-session behaviour for the file fallback is **less documented** than the SSH/tmux case — **verify it holds for your setup before relying on it.**
- Do **not** substitute `CLAUDE_CODE_OAUTH_TOKEN` — it is known to delete credentials on exit (issue #37512).
-
-## Notes
- Same keychain item as the corrupt-credential login failure; if login itself breaks, see the related article.
- Always redirect `-w` output straight to a file — never into a terminal whose scrollback feeds shared context.
-
-## Related
- [Claude Code Won't Log In (Warp & iTerm2) — Corrupt Keychain Credential](claude-code-warp-login-corrupt-keychain-credential.md)
- Config: `~/.claude.json`, login keychain item `Claude Code-credentials`
- First observed: MajorAir, 2026-06-15 (post-reboot prompt burst; self-cleared)
--- a/05-troubleshooting/claude-code-warp-login-corrupt-keychain-credential.md
+++ b/05-troubleshooting/claude-code-warp-login-corrupt-keychain-credential.md
@ -61,6 +61,5 @@ Resolved on step 1+2 — login succeeded after deleting the corrupt Keychain ite
  If that errors with "Expecting value", the stored secret is empty/corrupt — delete and re-login.

 ## Related
- [Claude Code Keychain Prompt Keeps Reappearing on macOS (ACL Invalidation)](claude-code-keychain-prompt-recurring-macos.md) — different symptom: login works but the permission prompt won't stop
 - Config: `~/.claude.json` (oauthAccount, userID), login Keychain item `Claude Code-credentials`
 - Other Claude Code note: `claude-mem-setting-sources-empty-arg.md`
--- a/05-troubleshooting/forgejo-mailer-and-cli-recovery.md
+++ b/05-troubleshooting/forgejo-mailer-and-cli-recovery.md
@ -1,105 +0,0 @@
---
-title: "Forgejo: Account Recovery & CLI Admin When Locked Out of the GUI"
-domain: troubleshooting
-category: general
-tags: [forgejo, gitea, smtp, docker, account-recovery, self-hosting]
-status: published
-created: 2026-06-12
-updated: 2026-06-12
---
-# Forgejo: Account Recovery & CLI Admin When Locked Out of the GUI
-
-Two related problems on a single-admin self-hosted **Forgejo** (or Gitea): the GUI *"Forgot password"* is disabled, and you can't log in to fix it. Here's how to (1) enable account recovery properly, and (2) recover from the command line when you're already locked out.
-
-## Symptoms
-
- The *Forgot password* page shows: **"Account recovery is only available when email is set up. Please set up email to enable account recovery."**
- You can't log in (wrong/forgotten password), so you can't add an SSH key or change settings in the GUI either.
-
-## Part 1 — Enable account recovery (configure the mailer)
-
-Account recovery needs SMTP. If you already run a mail server on your tailnet, relay through it — **no app password needed** when the Forgejo host is `mynetworks`-trusted by that mail server.
-
-Edit `app.ini` (in the data volume, e.g. `/data/gitea/conf/app.ini`):
-
-```ini
-[mailer]
-ENABLED = true
-PROTOCOL = smtp+starttls
-SMTP_ADDR = 100.x.y.z           ; mail server's tailnet IP
-SMTP_PORT = 587
-FROM = forgejo@example.com
-FORCE_TRUST_SERVER_CERT = true  ; required when connecting by IP (cert CN won't match)
-```
-
-Notes:
-
- `FORCE_TRUST_SERVER_CERT = true` is needed when you target the relay by **IP** — the TLS cert is issued for a hostname, not the IP, so verification would otherwise fail. Acceptable on a trusted internal hop.
- Omit `USER`/`PASSWD` if the relay accepts your host via `mynetworks` (no SASL). Otherwise add SMTP auth.
- `app.ini` lives in the persistent volume, so the change **survives container re-creation** (e.g. Watchtower's nightly pull).
-
-Apply and verify:
-
-```bash
-docker restart forgejo
-docker logs forgejo 2>&1 | grep -i "Mail Service Enabled"   # confirms the mailer loaded
-```
-
-Test the SMTP path **before** trusting it (run from the host, mimicking Forgejo's connection):
-
-```bash
-python3 - <<'EOF'
-import smtplib, ssl
-ctx = ssl.create_default_context(); ctx.check_hostname = False; ctx.verify_mode = ssl.CERT_NONE
-s = smtplib.SMTP("100.x.y.z", 587, timeout=15)
-s.ehlo(); s.starttls(context=ctx); s.ehlo()
-s.sendmail("forgejo@example.com", ["you@example.com"],
-           "Subject: test\r\n\r\nForgejo relay path test")
-s.quit(); print("SENT_OK")
-EOF
-```
-
-`SENT_OK` means the relay accepted the message. `/user/forgot_password` should now show the reset form instead of the email error.
-
-> **Container can't reach the tailnet IP?** Docker bridge networks usually route to Tailscale via the host (SNAT to the host's tailnet IP). Confirm with:
-> `docker exec forgejo nc -w5 100.x.y.z 587 </dev/null && echo REACHABLE`
-
-## Part 2 — Recover from the CLI (already locked out)
-
-Forgejo's admin CLI runs inside the container as the git user (UID 1000) and needs no login.
-
-**Reset a password:**
-
-```bash
-docker exec -u 1000 forgejo forgejo admin user change-password -u <user> -p '<newpass>'
-```
-
-> ⚠️ **Gotcha:** `change-password` sets `must_change_password=true` by default. That **forces a change on next GUI login _and_ returns HTTP 403 on the API** (`"You must change your password"`). Clear it:
-> ```bash
-> docker exec -u 1000 forgejo forgejo admin user must-change-password --unset <user>
-> ```
-
-**Add an SSH key without the GUI** (basic-auth API — works only if 2FA is off):
-
-```bash
-curl -u <user>:'<pass>' -X POST -H 'Content-Type: application/json' \
-  -d '{"title":"laptop","key":"ssh-ed25519 AAAA... you@host"}' \
-  http://localhost:3004/api/v1/user/keys
-# HTTP 201 = created
-```
-
-Forgejo regenerates the git user's `authorized_keys` from the database, so `ssh -p <port> git@host` authenticates immediately afterward — no restart needed.
-
-## "The password keeps changing" — it (probably) isn't
-
-If a self-hosted Forgejo admin password *seems* to reset itself, a stock Forgejo container does **not** reset admin passwords. Rule out the server first:
-
- the compose has **no** admin/password env and no custom entrypoint;
- **no** cron, systemd timer, or script runs `forgejo admin user change-password`;
- the data volume is persistent (re-creation keeps the DB, password included).
-
-If all three hold, nothing server-side is changing it — the "changing" password is a **client-side** artifact: a duplicate or stale entry in your password manager autofilling different values. Delete the duplicates and keep one.
-
-## See also
-
- Forgejo — [Config Cheat Sheet → mailer](https://forgejo.org/docs/latest/admin/config-cheat-sheet/)
--- a/05-troubleshooting/index.md
+++ b/05-troubleshooting/index.md
@ -11,7 +11,6 @@ Practical fixes for common Linux, networking, and application problems.
 - [LoRA adapter — GGUF conversion fails with 'config.json not found'](gpu-display/lora-adapter-gguf-conversion-fails.md)

 ## 🌐 Networking & Web
- [Wi-Fi Game Streaming Stutter: 160 MHz Channel Width Saturating the 5 GHz Radio](networking/wifi-160mhz-airtime-saturation-game-streaming.md)
 - [Apache Outage: Fail2ban Self-Ban + Missing iptables Rules](networking/fail2ban-self-ban-apache-outage.md)
 - [Mail Client Stops Receiving: Fail2ban IMAP Self-Ban](networking/fail2ban-imap-self-ban-mail-client.md)
 - [firewalld: Mail Ports Wiped After Reload](networking/firewalld-mail-ports-reset.md)
@ -19,7 +18,6 @@ Practical fixes for common Linux, networking, and application problems.
 - [Postfix header_checks Can't Act on Milter-Added Headers (Use Sieve)](networking/postfix-header-checks-vs-milter-headers.md)
 - [Dovecot Phantom Mailboxes from .dovecot.lda-dupes (mail_home Overlapping the Maildir Root)](networking/dovecot-mail-home-maildir-root-phantom-mailboxes.md)
 - [Tailscale SSH: Unexpected Re-Authentication Prompt](networking/tailscale-ssh-reauth-prompt.md)
- [SSH Alias Falls Through to MagicDNS — Host-Key Verification Failure (No `Host` Block)](networking/ssh-missing-host-block-magicdns-host-key-failure.md)
 - [iOS Tailscale Clients Report HostName="localhost" — Breaks /etc/hosts Generators](networking/tailscale-status-json-hostname-localhost-ios.md)
 - [rsync over Tailscale: Hung in TCP Teardown After Transfer Completes](networking/rsync-tailscale-teardown-stall.md)
 - [Windows OpenSSH: WSL Default Shell Breaks Remote Commands](networking/windows-openssh-wsl-default-shell-breaks-remote-commands.md)
@ -33,7 +31,6 @@ Practical fixes for common Linux, networking, and application problems.
 - [Vault Password File Missing](ansible-vault-password-file-missing.md)
 - [ansible.cfg Ignored on WSL2 Windows Mounts](ansible-wsl2-world-writable-mount-ignores-cfg.md)
 - [regex_search — capture-group argument doesn't work in set_fact](ansible-regex-search-set-fact-capture-group.md)
- [reboot.yml: become Timeout on WSL2 Hosts (Exclude Them)](ansible-reboot-become-timeout-wsl2.md)

 ## 📦 Docker & Systems
 - [Docker & Caddy Recovery After Reboot (Fedora + SELinux)](docker-caddy-selinux-post-reboot-recovery.md)
@ -52,12 +49,9 @@ Practical fixes for common Linux, networking, and application problems.
 ## 📝 Application Specific
 - [Obsidian Vault Recovery — Loading Cache Hang](obsidian-cache-hang-recovery.md)
 - [Gemini CLI Manual Update](gemini-cli-manual-update.md)
- [iPhone Mirroring Hangs on 'Connecting…' — AWDL Data Stall (27.0 Beta)](iphone-mirroring-connecting-hang-awdl-stall-beta.md)

 ## 🤖 AI / Local LLM
 - [Ollama Drops Off Tailscale When Mac Sleeps](ollama-macos-sleep-tailscale-disconnect.md)
 - [Ollama: `ollama run` with Piped Stdin Bypasses Chat Template + SYSTEM Prompt](ollama-chat-template-pipe-stdin-bypass.md)
 - [Windows OpenSSH Server (sshd) Stops After Reboot](networking/windows-sshd-stops-after-reboot.md)
 - [claude-mem Silently Fails with Claude Code 2.1+ (Empty `--setting-sources`)](claude-mem-setting-sources-empty-arg.md)
- [Claude Code Won't Log In (Warp & iTerm2) — Corrupt Keychain Credential](claude-code-warp-login-corrupt-keychain-credential.md)
- [Claude Code Keychain Prompt Keeps Reappearing on macOS (ACL Invalidation)](claude-code-keychain-prompt-recurring-macos.md)
--- a/05-troubleshooting/iphone-mirroring-connecting-hang-awdl-stall-beta.md
+++ b/05-troubleshooting/iphone-mirroring-connecting-hang-awdl-stall-beta.md
@ -2,61 +2,14 @@
 title: "iPhone Mirroring Hangs on 'Connecting…' — AWDL Data Stall (27.0 Beta)"
 domain: troubleshooting
 category: macos
-tags: [macos, iphone-mirroring, continuity, awdl, rapport, quic, tailscale, mullvad, beta, channel-validation, aimesh, quicktime, usb]
+tags: [macos, iphone-mirroring, continuity, awdl, rapport, quic, tailscale, mullvad, beta]
 status: published
 created: 2026-06-09
-updated: 2026-06-15
+updated: 2026-06-09
 ---

 # iPhone Mirroring Hangs on 'Connecting…' — AWDL Data Stall (27.0 Beta)

-## Update 2026‑06‑15 — REGRESSED; reproducibly stuck on "Connecting", and Tailscale was **not** the cure
-
-> **Correction to the 2026‑06‑14 "it WORKS" update below.** On 2026‑06‑15 iPhone Mirroring is **reproducibly stuck on "Connecting to iPhone 16 Pro"** on MajorAir again — with Tailscale `accept-routes` *still* `false`. So the accept‑routes change was **correlation, not the fix**: this is an **intermittent macOS 27.0 beta AWDL bug, independent of Tailscale**.
->
-> **Tried this round — all failed to establish a session:** Tailscale `accept-routes=false` (already in place) · `sudo ifconfig awdl0 down/up` · **full Mac reboot** · cycling the iPhone's Wi‑Fi + Bluetooth.
->
-> **Log signature:** `rapportd` resolves the phone's `_asquic._udp.local` endpoint and `_companion-link` registers (discovery *succeeds*), but the QUIC‑over‑AWDL **datapath never completes into a live session** — `wifip2pd` loops on `AWDLDiscoveryTimeout (hasAdvertises=false)`. Each reset advanced the handshake one stage further (no‑advertises → resolve‑started → endpoint‑resolved) yet none reached a streaming session. **`llw0` never went active (0 bytes)** — confirming no A/V ever flowed, regardless of what the 06‑14 note measured.
->
-> **Stance:** beta OS bug, **no reliable user‑side fix**. Use the **QuickTime USB mirror** workaround (below) when you actually need the phone on screen. The 06‑14 "it works on `llw0`" measurements were real *for that one session* but are **not reproducible** across seeds/sessions — treat mirroring as intermittently broken on the 27.0 betas. This re‑confirms the original **Root cause (conclusion)** section further down (a beta bug, "nothing in local config wrong"), which the 06‑14 update had prematurely overridden.
-
-## Update 2026‑06‑14 (evening) — it WORKS; the "AWDL starvation" finding was the wrong interface
-
-> iPhone Mirroring is now **working** on MajorAir — stable session, clean video, no missing icons — on **ch44/80** with Tailscale `accept-routes=false`. An earlier pass the same day blamed an "AWDL bulk‑path starving at ~90 B/s"; that was **measuring the wrong interface** and is corrected here.
-
-**The video transport is `llw0` (low‑latency WLAN), not `awdl0`.**
-Measured during an active session: **`llw0` ≈ 800 KB/s** (≈6 Mbps of real video), `en0` ~60 KB/s, **`awdl0` ~1 KB/s**. `awdl0` only ever carries AWDL *discovery/control* (~90 B/s) — whether mirroring works or not. So "90 B/s on `awdl0` = starved bulk path" was a **red herring**: the A/V stream rides `llw0`, which the earlier pass never measured.
-
-**What was actually broken was session *stability*.** The `XPC_ERROR_CONNECTION_INTERRUPTED` / `MediaContinuityKit.TaskTimeoutError` teardown loop kept the `llw0` stream from ever sustaining (→ glitchy / missing icons). When the session holds, `llw0` streams clean.
-
-**What changed (not cleanly isolated):** three things differed between the broken and working states — (1) the network fully **settled on ch44** over ~15 h (the failing ch44 test was minutes after a chaotic AiMesh re‑sync + reconnect scramble), (2) Tailscale **`accept-routes` was turned off** (it had been polluting IPv4 routing + the Continuity control plane), and (3) both devices slept/woke. Which one mattered is not yet proven.
-
-**Open test — isolates Tailscale's role:** repro on **MajorMac** with *unaltered* Tailscale (`accept-routes` still **ON**). If mirroring breaks there but works on MajorAir (accept‑routes OFF), that pins Tailscale's accepted routes as the trigger. See [[MajorAir#Known Issues]] for the `accept-routes=false` fix.
-
-**Still valid from earlier today:** congestion ruled out (router `chanim_stats` ch36 = 90 % idle, 86 % txop); the AiMesh / router infra notes below; and iPhone Mirroring is **wireless‑only — no USB transport** (for a wired screen view, use QuickTime, below).
-
-> ⚠️ The iPhone‑radio `isValidChannel`/`awdl0` evidence cited in the original 2026‑06‑09 write‑up below describes AWDL *discovery* health, **not** the video path — read it in light of this correction.
-
-**Wired workaround (works today, no AWDL):**
-iPhone Mirroring is **wireless‑only — there is no USB transport** (confirmed: cable connected throughout, every attempt still used `awdl0`). For a wired view of the screen:
-> **QuickTime Player → File → New Movie Recording → ⌄ next to record → select the iPhone** = full‑rate USB‑C screen mirror (view + record). Does **not** give remote control (tap/type) — that's unique to iPhone Mirroring.
-
-**Infra notes (RT‑AX82U, AiMesh controller):**
- Router SSH is on **port 1025** (not 22); creds in Ansible vault (`router_username` / `router_password`).
- The 5 GHz channel is **AiMesh‑coordinated** and **resists CLI changes** — `wl chanspec` / nvram `wl1_chanspec` get re‑asserted by `acsd2` + AiMesh within seconds, even after `restart_wireless`. Only setting Control Channel to an **explicit value in the Web UI** holds mesh‑wide. Left "Auto" → acsd2 picks **36** (the cleanest channel).
- Any channel change triggers a **mesh re‑sync (~1 min) that drops all Wi‑Fi**; during it MajorAir falls back to the iPhone's **USB Personal Hotspot** (`en7` / `172.20.10.x`) and won't auto‑rejoin home Wi‑Fi while the hotspot feeds it internet (manual Wi‑Fi‑menu join needed).
- **Current state: 5 GHz on ch44/80** (same clean UNII‑1 spectrum as 36; left here to avoid another re‑sync — the Deck streams identically on 44).
-
-**If it breaks again — troubleshooting checklist:**
-1. **It's session stability, not bandwidth.** Look for teardown loops: `log show --last 3m --predicate 'process == "iPhone Mirroring"' | grep -iE "interrupt|timeout|endpoint"`.
-2. **Measure the right interface** — video rides **`llw0`** (hundreds of KB/s when the screen is active), *not* `awdl0` (~90 B/s control is normal): `netstat -ib | awk '/<Link#/{print $1, $7}'` before/after a few seconds.
-3. **Tailscale:** confirm `accept-routes=false` on the Mac (`tailscale debug prefs | grep RouteAll`) — see [[MajorAir#Known Issues]].
-4. **Let the network settle** after any Wi‑Fi/channel change — an AiMesh re‑sync churns AWDL/Continuity state for a minute+; retry once stable.
-5. iPhone: on home Wi‑Fi, near the Mac, **Personal Hotspot off**, not in Low Power Mode.
-6. **Wired fallback that always works:** QuickTime → New Movie Recording → select the iPhone (USB‑C; view/record only, no control).
-
---
-
 ## Symptom
 iPhone Mirroring on the Mac sits on **"Connecting…"** forever and never shows the iPhone screen.
 - Mac: **macOS 27.0 dev beta** (build 26A5353q), MajorAir
--- a/05-troubleshooting/logwatch-wrong-hostname-after-migration.md
+++ b/05-troubleshooting/logwatch-wrong-hostname-after-migration.md
@ -1,150 +0,0 @@
---
-title: "Logwatch Reports the Wrong Hostname (`<host>-hetzner`) After a Migration"
-domain: troubleshooting
-category: monitoring
-tags: [logwatch, hostname, hetzner, migration, monitoring, provisioning, fail2ban]
-status: published
-created: 2026-06-12
-updated: 2026-06-14
---
-
-# Logwatch Reports the Wrong Hostname (`<host>-hetzner`) After a Migration
-
-## Symptom
-
-Daily Logwatch emails from a recently migrated server arrive titled with the
-provisioning label instead of the real hostname:
-
-```
-Logwatch for tttpod-hetzner (Linux)
-Logwatch for dcaprod-hetzner (Linux)
-```
-
-Everything else works — the report is generated, mailed, and delivered. Only the
-**name in the title is wrong**, which makes reports harder to scan and breaks any
-filter or rule that keys on the expected hostname.
-
-## Cause
-
-Logwatch titles each report with the box's **live system hostname**
-(`hostnamectl --static` / `/etc/hostname`) read at runtime — it does *not* keep
-its own copy of the name.
-
-Hetzner Cloud servers are provisioned with a temporary node label as the system
-hostname — `<host>-hetzner` (e.g. `tttpod-hetzner`). The migration runbook renames
-the **Tailscale node** back to the bare name and sets Postfix `myhostname`, but the
-**OS hostname** itself is easy to miss because nothing surfaces it day to day. It
-stays `<host>-hetzner` until something reads `hostname` — Logwatch is usually the
-first thing to do so, weeks later.
-
-Confirm the box is actually mislabelled:
-
-```bash
-ssh root@<host> 'hostnamectl --static; cat /etc/hostname; grep 127.0.1.1 /etc/hosts'
-# static: tttpod-hetzner
-# /etc/hostname: tttpod-hetzner
-# 127.0.1.1 tttpod-hetzner tttpod-hetzner
-```
-
-## Fix
-
-Set the real hostname and fix the matching `/etc/hosts` loopback line:
-
-```bash
-ssh root@<host> '
-  hostnamectl set-hostname <host>
-  sed -i "s/127.0.1.1.*/127.0.1.1 <host> <host>/" /etc/hosts
-  hostnamectl --static          # verify -> <host>
-'
-```
-
-That's it. **Logwatch has no hardcoded hostname override** — verify with:
-
-```bash
-grep -ri hostname /etc/logwatch/ /etc/cron.daily/0logwatch /etc/cron.daily/logwatch 2>/dev/null
-cat /etc/mailname 2>/dev/null
-```
-
-If those are empty (the normal case), Logwatch reads the live hostname on its next
-run, so the **next daily report self-corrects** — no service restart, no logwatch
-config change needed.
-
-> [!note] If `grep` *does* find a hostname pinned in `/etc/logwatch/conf/logwatch.conf`
-> (e.g. a `HostLimit`/`MailFrom` line baked in by Ansible), update it there too —
-> the override file wins over the live hostname.
-
-## Sweep the whole fleet
-
-This is a per-box provisioning leftover, so check every migrated host at once —
-more than one is usually affected:
-
-```bash
-for ip in 100.98.223.93 100.95.137.38 100.64.169.62 100.112.127.0 100.73.85.46; do
-  echo -n "$ip -> "
-  ssh -o ConnectTimeout=8 -o BatchMode=yes root@$ip 'hostnamectl --static' 2>/dev/null \
-    || echo '(unreachable)'
-done
-```
-
-Any value ending in `-hetzner` (or your provider's build label) needs the fix above.
-In the 2026-06 sweep, `tttpod` and `dcaprod` were still `*-hetzner` at the OS
-level; `majortoot`, `majormail`, and `majorlinux` had the correct system hostname
-— but see the variant below: `majormail`'s *configs* were still stale even though
-its hostname wasn't.
-
-## Variant: hostname is correct, but a config has the old name baked in
-
-A second, sneakier form of this drift: the **system hostname is already right**, so
-the sweep above passes and the Logwatch report *title* is correct — yet mail still
-arrives **from** `<host>-hetzner` because the old label is hardcoded in a service's
-`From`/`sender` field. These fields are static text, not derived from the live
-hostname, so fixing `hostnamectl` does nothing for them.
-
-Seen on `majormail` (2026-06-14): system hostname was `majormail`, but
-`Logwatch@majormail-hetzner...` was still the sender. Two configs held it:
-
-```bash
-# sweep a box for the old provisioning label in any send-related config
-ssh root@<host> 'grep -rsn "<host>-hetzner" /etc/logwatch/ /etc/fail2ban/ \
-  /etc/postfix/ /etc/aliases /etc/mailname 2>/dev/null'
-# /etc/logwatch/conf/logwatch.conf:MailFrom = Logwatch@<host>-hetzner.majorshouse.com
-# /etc/fail2ban/jail.local:sender         = fail2ban@<host>-hetzner.majorshouse.com
-```
-
-Fix in place (no restart needed for Logwatch; reload fail2ban for its change):
-
-```bash
-ssh root@<host> '
-  sed -i "s/<host>-hetzner/<host>/g" /etc/logwatch/conf/logwatch.conf /etc/fail2ban/jail.local
-  systemctl reload fail2ban
-'
-```
-
-> [!warning] Check the Ansible source, or it comes back
-> A live `sed` is undone by the next playbook run if the repo still carries the old
-> value. Distinguish two cases:
-> - **Templated** (safe): e.g. `logwatch.yml` sets `MailFrom = Logwatch@{{ inventory_hostname }}...`. If the inventory host is named correctly, a run *regenerates* the right value — it even self-heals a stale box.
-> - **Static file** (will regress): e.g. `roles/fail2ban/files/hosts/<host>/jail.local` with the literal `sender = ...@<host>-hetzner...`. Grep the repo (`grep -rn "<host>-hetzner" .`) and fix the file too, or every deploy re-pushes the stale sender.
-
-Inert backups (`jail.local.bak*`, `*~`) may still contain the old string — they
-don't send mail, so leave them.
-
-## Prevention
-
-Fold "set the system hostname" into the migration bootstrap so it never drifts:
-
-```bash
-hostnamectl set-hostname <host>
-sed -i "s/127.0.1.1.*/127.0.1.1 <host> <host>/" /etc/hosts
-```
-
-Do this in the **same step** that renames the Tailscale node and sets Postfix
-`myhostname` — all three read from the provisioning label and all three must be
-corrected together. See the
-[VPS Migration Baseline Checklist](../02-selfhosting/cloud/vps-migration-baseline-checklist.md).
-
-## Related
-
- [Logwatch Fleet Setup — Surviving Package Upgrades](../02-selfhosting/monitoring/logwatch-fleet-setup.md) — the broader "logwatch went silent / wrong-source" class, including the Packer `myhostname` variant of this same drift
- [VPS Migration Baseline Checklist](../02-selfhosting/cloud/vps-migration-baseline-checklist.md) — the full post-migration verification list
- [Ansible UNREACHABLE: Host Key Verification Failed After a Host Rebuild or Migration](networking/ansible-host-key-verification-failed-rebuilt-host.md) — another IP/identity-drift gotcha from the same Hetzner migration
--- a/05-troubleshooting/macos-background-app-activity-audit-sfltool.md
+++ b/05-troubleshooting/macos-background-app-activity-audit-sfltool.md
@ -1,154 +0,0 @@
---
-title: "Auditing & Cleaning macOS Background App Activity (sfltool dumpbtm)"
-domain: troubleshooting
-category: general
-tags: [macos, background-tasks, btm, sfltool, login-items, system-extensions, uninstall, little-snitch]
-status: published
-created: 2026-06-21
-updated: 2026-06-21
---
-# Auditing & Cleaning macOS Background App Activity (`sfltool dumpbtm`)
-
-## Overview
-macOS tracks every login item, agent, daemon, helper, and extension that may run in the background in its **Background Task Management (BTM)** database. The GUI shows this under **System Settings → General → Login Items & Extensions** ("Allow in the Background"), but the GUI is summarised and hides paths, identifiers, and orphans.
-
-`sfltool dumpbtm` prints the full BTM database from the command line — and the per-user records need **no `sudo`**. This is the fastest way to answer "what is allowed to run in the background, and does each entry still map to an installed app?"
-
-## List what's registered
-
-```bash
-sfltool dumpbtm        # per-user records, no sudo required
-```
-
-Each record looks like:
-
-```
-Name: CleanMyMac Menu
-Type: login item (0x4)
-Disposition: [enabled, allowed, notified] (0xb)
-Identifier: 4.com.macpaw.CleanMyMac-mas.Menu
-URL: Contents/Library/LoginItems/CleanMyMac_5_MAS_Menu.app
-Bundle Identifier: com.macpaw.CleanMyMac-mas.Menu
-Parent Identifier: 2.com.macpaw.CleanMyMac-mas
-```
-
-### Reading the fields
- **Disposition** — `enabled` = actively allowed to run in the background. `disabled` = present but off.
- **Type** — what kind of item it is:
-
-| Type | Meaning |
-|---|---|
-| `app (0x2)` | A normal application entry |
-| `login item (0x4)` | Launches at login (menu-bar apps, helpers) |
-| `agent (0x8)` / `legacy agent` | Per-user background agent |
-| `legacy daemon (0x10010)` | System-wide background daemon |
-| `background tasks (0x2000)` | Abstract background-task registration owned by a parent app — **has no file path of its own** |
-| `developer (0x20)` | A per-developer grouping header (the collapsible row in Settings), **not an app** |
-| `quicklook` / `spotlight` / `dock tile` | Plugins/extensions — not really "background apps" |
-
-## Map entries to installed apps (find orphans)
-
-Two gotchas make naïve path-checking fail:
-
-1. **Absolute paths are stored as `file://` URLs**, not plain `/…`. Strip the `file://` prefix and URL-decode (`%20` → space).
-2. **Child items store a *relative* `URL`** (e.g. `Contents/Library/LoginItems/…`) that must be joined to the **parent record's** absolute path, found via `Parent Identifier`.
-
-A small parser that resolves each record to a real path and flags true orphans:
-
-```python
-import sys, re, os, urllib.parse
-items, cur = [], None
-def push():
-    global cur
-    if cur is not None: items.append(cur)
-for line in sys.stdin:
-    s = line.strip()
-    if re.match(r"^#\d+:$", s): push(); cur = {}; continue
-    if cur is None: continue
-    m = re.match(r"^([A-Za-z][A-Za-z /]+):\s*(.*)$", s)
-    if m: cur[m.group(1).strip()] = m.group(2).strip()
-push()
-byid = {it["Identifier"]: it for it in items if it.get("Identifier")}
-def abspath(it, d=0):
-    if d > 8: return None
-    u = it.get("URL", "")
-    if u and u != "(null)":
-        if u.startswith("file://"): return urllib.parse.unquote(u[7:]).rstrip("/")
-        if u.startswith("/"): return u.rstrip("/")
-        par = byid.get(it.get("Parent Identifier", ""))
-        if par:
-            b = abspath(par, d + 1)
-            if b: return os.path.join(b, urllib.parse.unquote(u)).rstrip("/")
-    return None
-for it in items:
-    if not it.get("Name"): continue
-    p = abspath(it)
-    if p and not os.path.exists(p):
-        print("ORPHAN:", it["Name"], "->", p)
-```
-
-```bash
-sfltool dumpbtm | python3 btm_check.py
-```
-
-> **Expected non-orphans:** `background tasks (0x2000)` and `developer (0x20)` rows legitimately store no path — they are not missing apps. Helpers/daemons that resolve *inside* a parent bundle (e.g. `/Applications/Foo.app/Contents/Library/LoginItems/…`) or in `/Library/…` are also fine; they just don't appear as a top-level `.app`. That is usually why an entry "has no application you can find."
-
-## Disable background for an app
-
-This **cannot be scripted** — Apple deliberately gates the toggle behind the GUI:
-
-**System Settings → General → Login Items & Extensions → "Allow in the Background"** → switch the app off.
-
-Disabling a `developer (0x20)` grouping header turns off all of that developer's sub-items at once.
-
-## Uninstall cleanly — the system-extension trap
-
-**Dragging an app to the Trash is not a full uninstall.** Apps that install a **network/system extension** plus a privileged daemon (firewalls and VPNs especially — Little Snitch, Mullvad, etc.) leave their `/Library` daemon **still loaded and running** after the app is trashed. The BTM entry persists and the background service keeps working.
-
-### 1. Prefer the app's own uninstaller
- **Bundled uninstall script** (Mullvad): runs cleanly, deactivates the system extension, resets the firewall.
-  ```bash
-  sudo "/Applications/Mullvad VPN.app/Contents/Resources/uninstall.sh"
-  ```
- Some apps ship an uninstaller in their DMG or a CLI tool. **Note:** Little Snitch 6.x has **no DMG uninstaller and no `littlesnitch uninstall` subcommand** — manual removal is the supported route there.
-
-### 2. Check whether a system extension is still active
-```bash
-systemextensionsctl list
-```
-If the app's extension is **not** listed (only unrelated ones like Tailscale/Canon remain), the extension is already deactivated and a manual file removal is now complete and safe.
-
-### 3. Manual removal (when no uninstaller exists)
-Find every component first:
-```bash
-ls /Library/LaunchDaemons/<id>* /Library/LaunchAgents/<id>* 2>/dev/null
-ls -d "/Library/Application Support/<Vendor>" 2>/dev/null
-ls ~/Library/Preferences/<id>* 2>/dev/null
-```
-Then boot out the daemon and remove the files:
-```bash
-sudo launchctl bootout system /Library/LaunchDaemons/<id>.daemon.plist 2>/dev/null
-sudo rm -f /Library/LaunchDaemons/<id>.daemon.plist /Library/LaunchAgents/<id>.agent.plist
-sudo rm -rf "/Library/Application Support/<Vendor>" "$HOME/.Trash/<App>.app"
-rm -f ~/Library/Preferences/<id>*.plist     # user-owned, no sudo
-```
-
-> **Shared-container caution:** before deleting `~/Library/Group Containers/*`, check it isn't shared. Microsoft apps share `UBF8T346G9.com.microsoft.oneauth`, `…entrabroker`, and `…teams` across Office/Teams/RDP — delete only the app-specific container (e.g. `…com.microsoft.rdc`), never the shared auth ones.
-
-## Stale BTM "ghost" entries
-
-After a manual uninstall, `sfltool dumpbtm` may still list the removed app, pointing at now-deleted paths. These are harmless orphans (nothing left to load). **BTM reconciles them on the next reboot / login cycle** — a reboot also finalises any system-extension teardown.
-
-## Quick reference
-
-```bash
-sfltool dumpbtm                       # full per-user BTM dump (no sudo)
-sfltool dumpbtm | grep -A6 'Name:'    # browse records
-systemextensionsctl list              # active network/system extensions
-# Verify a removal:
-sfltool dumpbtm | grep -i <vendor>    # should be empty after a reboot
-```
-
-## See also
- Apple gates "Allow in the Background" behind System Settings — there is no supported CLI toggle for BTM dispositions.
- For VPN/firewall apps, always reach for the vendor uninstaller first; manual `rm` alone can leave a registered system extension behind.
--- a/05-troubleshooting/networking/ansible-host-key-verification-failed-rebuilt-host.md
+++ b/05-troubleshooting/networking/ansible-host-key-verification-failed-rebuilt-host.md
@ -1,94 +0,0 @@
---
-title: "Ansible UNREACHABLE: Host Key Verification Failed After a Host Rebuild or Migration"
-domain: troubleshooting
-category: networking
-tags: [ansible, ssh, known-hosts, tailscale, host-key, migration]
-status: published
-created: 2026-06-12
-updated: 2026-06-12
---
-
-# Ansible UNREACHABLE: Host Key Verification Failed After a Host Rebuild or Migration
-
-## Symptom
-
-A subset of hosts in an Ansible run fail at **Gathering Facts** while the rest succeed:
-
-```
-[ERROR]: Task failed: Data could not be sent to remote host "100.112.127.0".
-Make sure this host can be reached over ssh: Host key verification failed.
-fatal: [majormail]: UNREACHABLE! => {"unreachable": true, ...}
-```
-
-The failing hosts are exactly the ones that were recently **rebuilt or migrated** (new server, new OS install, or a cloud move that issued a new Tailscale IP). Hosts that were never rebuilt connect fine.
-
-Confusingly, **interactive `ssh root@<host>` works perfectly** for the same boxes — only Ansible fails.
-
-## Cause
-
-SSH stores each accepted host key in `~/.ssh/known_hosts` keyed by the **exact address you connected with**. A key accepted for `ssh root@tttpod` is saved under the hostname `tttpod`; it is *not* indexed under that node's IP.
-
-Ansible inventories almost always set `ansible_host` to a **literal IP** (here, the Tailscale `100.x.x.x` address). So Ansible's SSH lookup is by IP, finds no matching entry, and with `StrictHostKeyChecking=yes` (or `accept-new` already exhausted) it refuses the connection:
-
-```
-No ED25519 host key is known for 100.112.127.0 and you have requested strict checking.
-Host key verification failed.
-```
-
-The hostname-form and IP-form entries are independent. Fixing interactive SSH (e.g. converting aliases to MagicDNS names and re-accepting keys) does **nothing** for Ansible, because Ansible never uses the hostname.
-
-A rebuilt host also generates **brand-new host keys**, so any old IP-form entry would additionally be a mismatch — but the common case after a migration to a *new* IP is simply that no IP entry exists at all.
-
-## Diagnosis
-
-```bash
-# 1. Is there any known_hosts entry for the failing IP? (0 = none)
-ssh-keygen -F 100.112.127.0
-
-# 2. Reproduce the exact failure without an interactive prompt:
-ssh -o BatchMode=yes -o StrictHostKeyChecking=yes root@100.112.127.0 true
-# -> "Host key verification failed."  confirms the gap
-
-# 3. Confirm the inventory IP is actually the host's CURRENT address
-#    (guards against stale-IP drift, a separate problem):
-tailscale status | grep majormail
-ssh-keyscan -t ed25519 100.112.127.0 | ssh-keygen -lf -   # fingerprint it
-```
-
-If step 3 shows the inventory IP matches the live Tailscale node and the box answers `ssh-keyscan`, the only problem is the missing IP-form key.
-
-## Fix
-
-Add the **IP-form** host keys to the `known_hosts` of the user that runs Ansible. Back up first, scan over the tailnet, de-dup:
-
-```bash
-cp ~/.ssh/known_hosts ~/.ssh/known_hosts.bak.$(date +%Y%m%d)
-
-for ip in 100.98.223.93 100.112.127.0 100.73.85.46 100.95.137.38 100.76.51.16 100.64.169.62; do
-  ssh-keyscan -T 5 -t rsa,ecdsa,ed25519 "$ip" >> ~/.ssh/known_hosts
-done
-sort -u ~/.ssh/known_hosts -o ~/.ssh/known_hosts
-```
-
-Verify before re-running the playbook:
-
-```bash
-ansible <hosts> -m ping        # expect "pong" from each
-```
-
-### Why `ssh-keyscan` is safe here
-
-`ssh-keyscan` trusts whatever answers on the wire — normally a MITM risk. Over **Tailscale**, the connection rides WireGuard, which cryptographically authenticates the peer by its tailnet identity: reaching `100.x.x.x` *guarantees* you are talking to the node that owns that tailnet address. Scanning and trusting the key over the tailnet is therefore as trustworthy as the tailnet itself. Always cross-check the IP against `tailscale status` first (step 3) so you scan the right node.
-
-## Prevention
-
- **Per-workstation, not fleet-wide.** `known_hosts` is local to each machine + user. After a migration, *every* host that runs Ansible (each workstation, plus any control node like `majorlab`) needs the IP keys added independently. Adding them on one Mac does not help the others.
- **Sweep on every migration phase.** A rolling migration changes one node's IP at a time; fold the keyscan above into the post-cutover checklist so Ansible never breaks mid-rollout.
- **Alternative — `accept-new`.** Setting `host_key_checking = False` in `ansible.cfg` (or `ANSIBLE_HOST_KEY_CHECKING=False`) sidesteps the prompt but trades away host-key verification entirely. Prefer the explicit keyscan: it keeps strict checking on for every *future* run while accepting the new key exactly once, under your control.
-
-## Related
-
- SSH-Aliases — Fleet SSH access; the MagicDNS-vs-pinned-IP strategy and the Ansible-by-IP `known_hosts` note
- Network Overview — Tailscale fleet inventory and current IPs
- Hetzner-Migration-Status — the migration that triggered the fleet-wide IP churn
- [[ssh-socket-tailscale-race-condition]] — a different "SSH unreachable after reboot" failure mode
--- a/05-troubleshooting/networking/ssh-missing-host-block-magicdns-host-key-failure.md
+++ b/05-troubleshooting/networking/ssh-missing-host-block-magicdns-host-key-failure.md
@ -1,133 +0,0 @@
---
-title: "SSH Alias Falls Through to MagicDNS — Host-Key Verification Failure (No `Host` Block)"
-domain: selfhosting
-category: troubleshooting
-tags:
-  - ssh
-  - ssh-config
-  - tailscale
-  - magicdns
-  - known-hosts
-  - host-key
-  - troubleshooting
-status: published
-created: 2026-06-11
-updated: 2026-06-12
---
-
-# SSH Alias Falls Through to MagicDNS — Host-Key Verification Failure (No `Host` Block)
-
-## The Problem
-
-You `ssh` to a host you've reached many times before, but now it dies before any
-auth happens:
-
-```
-$ ssh MyMac
-ssh_askpass: exec(/usr/libexec/openssh/ssh-askpass): No such file or directory
-Host key verification failed.
-```
-
-On a headless box (WSL, a server, a CI runner) there's no askpass binary, so the
-prompt can't even be shown — SSH just aborts. Connecting **by Tailscale IP** works
-fine:
-
-```
-$ ssh user@100.74.124.81      # works
-$ ssh MyMac                   # Host key verification failed
-```
-
-## Why It Happens
-
-There is **no `Host MyMac` block in `~/.ssh/config` at all** — and there never was.
-The connection only ever worked by IP, or interactively (where you clicked through
-the first-connect `yes` prompt without noticing).
-
-When no `Host` block matches, SSH uses the literal argument as the hostname. With
-Tailscale MagicDNS, `MyMac` (or `mymac`) resolves to the node — so the *connection*
-succeeds — but the host key it presents is checked against `known_hosts` under the
-name **`mymac`**, which has no entry. Meanwhile the key you actually trust is stored
-under the **IP**:
-
-```
-$ ssh-keygen -F 100.74.124.81      # found — line 67
-$ ssh-keygen -F mymac              # nothing
-```
-
-So strict host-key checking has nothing to match, tries to prompt to accept the
-"new" key, and on a headless host that prompt fails → `Host key verification failed`.
-
-Confirm there's no block (and that `ssh -G` is just echoing defaults):
-
-```
-$ ssh -G MyMac | grep -E '^(hostname|user|port) '
-hostname mymac          # lowercased literal — NOT an explicit HostName
-user youruser           # your local username default — not from a block
-port 22                 # default
-```
-
-If `hostname` equals the arg you typed (just lowercased) and `user` is your local
-login name, there is no matching `Host` block.
-
-## The Fix
-
-Add an explicit `Host` block that **pins the IP** that `known_hosts` already trusts.
-This matches the convention every other host in a Tailscale fleet should follow —
-pin the `100.x` address, not the MagicDNS name:
-
-```sshconfig
-Host MyMac mymac
-  HostName 100.74.124.81
-  User youruser
-  IdentityFile ~/.ssh/id_ed25519
-```
-
-> [!note] When pinning the IP is the *wrong* call
-> Pinning the IP is right while the host is **stable**. If the box gets migrated or
-> rebuilt — new Tailscale IP *and* new host key — the pin rots and `known_hosts`
-> mismatches. At that point switch to **MagicDNS names** so the alias self-heals. See
-> *[MagicDNS Names vs Pinned IPs for Tailscale SSH (After a Fleet Migration)](tailscale-ssh-magicdns-vs-pinned-ip-after-migration.md)*.
-
-Now `ssh MyMac` resolves to `100.74.124.81`, whose key is in `known_hosts`, and the
-check passes with no prompt. Verify non-interactively:
-
-```
-$ ssh -o BatchMode=yes MyMac 'hostname'
-mymac.majorlan
-```
-
-`BatchMode=yes` disables every prompt — if it returns the hostname cleanly, the key
-is trusted and a real key authenticated.
-
-**Don't over-pin the identity.** Run `ssh -v user@<IP> true` and check the
-`Will attempt key` / accepted-key lines first. A workstation often authenticates
-with the *default* `id_ed25519`, not a fleet key — if `id_ed25519_fleet` isn't even
-offered, don't put it in the block.
-
-## Cleanup: Stale `known_hosts` Cruft
-
-Drive-by `ssh` attempts leave junk entries like `mymac-2` (auto-suffixed names from
-old keys). They never match anything once you pin the IP. Purge them:
-
-```
-$ ssh-keygen -R mymac-2
-```
-
-## How to Diagnose This
-
-1. `ssh -o BatchMode=yes <alias> true` — if it fails with `Host key verification
-   failed` (not `Permission denied`), it's a host-key problem, not auth.
-2. `ssh -G <alias> | grep -E '^(hostname|user|port) '` — if `hostname` is just your
-   typed arg and there's no real `HostName`, there's no `Host` block.
-3. `ssh-keygen -F <name>` vs `ssh-keygen -F <ip>` — find which name actually holds
-   the trusted key. Pin whichever one `known_hosts` has (usually the IP).
-
-## Why This Gotcha Is Invisible
-
-It only surfaces on a host with **no askpass** (headless / WSL / cron). On a desktop,
-the first-connect prompt appears, you hit `yes`, an entry gets written under the
-MagicDNS name, and it "just works" — masking the fact that no `Host` block exists and
-the IP-keyed entry is the only durable trust. Move the same config to a headless box
-and the missing block becomes a hard failure. Related: SSH only applies `Host` blocks
-by **literal pattern match**, so connecting by IP also skips them — see *Ansible Fails
-with Permission Denied While `ssh <alias>` Works (Host Alias Bypass)*.
--- a/05-troubleshooting/networking/ssh-rotated-key-not-backfilled-authorized-keys.md
+++ b/05-troubleshooting/networking/ssh-rotated-key-not-backfilled-authorized-keys.md
@ -1,160 +0,0 @@
---
-title: "SSH `Permission denied (publickey)` After Rotating a Key — Backfill Every `authorized_keys`"
-domain: selfhosting
-category: troubleshooting
-tags:
-  - ssh
-  - ssh-keys
-  - authorized-keys
-  - key-rotation
-  - publickey
-  - fleet
-  - troubleshooting
-status: published
-created: 2026-06-17
-updated: 2026-06-17
---
-
-# SSH `Permission denied (publickey)` After Rotating a Key — Backfill Every `authorized_keys`
-
-## The Problem
-
-A host you've SSH'd into for months suddenly rejects you — but **only some hosts**, not all:
-
-```
-$ ssh root@host-a
-root@host-a: Permission denied (publickey).
-
-$ ssh root@host-b      # same key, same workstation — works fine
-host-b $
-```
-
-Nothing changed on the servers. The thing that changed is on **your** side: at some
-point the workstation's SSH key was **regenerated** (lost laptop, rebuild, a key file
-clobbered by a botched copy, a routine rotation). The new public key was pushed to a
-few hosts but never fanned out to the rest. Every host still holding only the *old*
-public key now rejects the new private key with `Permission denied (publickey)`.
-
-> The tell: it's `Permission denied (publickey)`, **not** `Host key verification
-> failed`. The former is an **authorization** failure (the server doesn't trust your
-> key); the latter is the server's key not matching your `known_hosts`. Different
-> problem — see *[SSH Alias Falls Through to MagicDNS — Host-Key Verification Failure](ssh-missing-host-block-magicdns-host-key-failure.md)*.
-
-## Why It Happens
-
-Public-key auth is **per-host**: the server only lets you in if your public key is a
-line in that host's `~/.ssh/authorized_keys`. There is no central directory — each
-host is its own island. So when you rotate a key, *every* host needs the new public
-key appended independently.
-
-It's easy to do this partially without noticing. You regenerate the key, then over the
-next hour you happen to SSH into three boxes and (re-)deploy the key there as part of
-other work. Those three now trust the new key. The other six don't — and you won't
-find out until weeks later when you reach for one of them.
-
-Confirm it's an authorization (key) failure and see which key is being offered:
-
-```
-$ ssh -v root@host-a 2>&1 | grep -E 'Offering|Authentications|Permission denied'
-debug1: Offering public key: /home/you/.ssh/id_ed25519 ED25519 SHA256:XeY1/N9qwB…
-debug1: Authentications that can continue: publickey
-root@host-a: Permission denied (publickey).
-```
-
-The server offered you nothing but `publickey`, you offered your current key, and it
-was refused → your key isn't in that host's `authorized_keys`.
-
-## Scope It First — Don't Fix One Host at a Time
-
-The host you noticed is rarely the only one. Sweep the whole fleet in one pass before
-touching anything, so you fix the real set, not just the squeaky wheel:
-
-```bash
-for h in host-a host-b host-c host-d host-e host-f; do
-  r=$(ssh -o BatchMode=yes -o ConnectTimeout=8 root@"$h" 'echo OK' 2>&1 | tail -1)
-  echo "$h: $r"
-done
-```
-
-`BatchMode=yes` suppresses password/passphrase prompts so a failure fails fast instead
-of hanging. Anything that doesn't print `OK` needs the backfill.
-
-## The Fix
-
-You need a **second, still-trusted** way onto each failing host to append the new key.
-Common transit options, best first:
-
- **Another of your keys that still works** (e.g. a config-management / automation
-  user whose key is authorized fleet-wide, ideally with `sudo`).
- **Another workstation** whose key those hosts still trust.
- **The provider's web console / serial console** as a last resort.
-
-> [!warning] A jump host only helps if *it* can reach the target
-> "Bounce through a box that still trusts me" only works if that box's own key is in
-> the target's `authorized_keys`. A host can trust *your* key yet have no standing
-> trust to a third host (and hit its own `Host key verification failed` on the way).
-> Test the full two-hop path before relying on it.
-
-Using a fleet-wide automation user (`deploy`) with passwordless `sudo` as the transit,
-append the new key idempotently, with a backup, to every failing host:
-
-```bash
-PUBKEY=$(cat ~/.ssh/id_ed25519.pub)
-STAMP=$(date +%Y%m%d-%H%M%S)
-for h in host-a host-c host-e; do          # only the hosts that failed the sweep
-  ssh deploy@"$h" "sudo bash -s" <<EOF
-set -e
-F=/root/.ssh/authorized_keys
-mkdir -p /root/.ssh && touch "\$F"
-cp "\$F" "\$F.bak-$STAMP"                   # backup before any change
-grep -qF "$PUBKEY" "\$F" || printf '%s\n' "$PUBKEY" >> "\$F"   # append only if absent
-chmod 600 "\$F"
-EOF
-done
-```
-
-Three things that keep this safe:
-
- **Append, never overwrite.** `>> "$F"` and the `grep -qF … ||` guard mean you add
-  one line and only if it's missing. Re-running is a no-op — never clobber an
-  `authorized_keys` with `>` or you'll lock out every *other* key on the box.
- **Back up first.** The `.bak-<stamp>` copy is your undo.
- **`chmod 600`.** SSH silently ignores an `authorized_keys` that's group/world
-  writable, which looks exactly like "the key didn't take."
-
-Then verify directly — not through the transit user:
-
-```bash
-for h in host-a host-c host-e; do
-  echo "$h: $(ssh -o BatchMode=yes root@"$h" 'echo OK' 2>&1 | tail -1)"
-done
-```
-
-All `OK` means the new key authenticates on its own.
-
-## Prevention
-
- **Treat rotation as fleet-wide.** When a workstation key changes, the very next step
-  is to fan the new public key out to **every** host's `authorized_keys` in one pass —
-  not opportunistically as you happen to log in. A short `for` loop over the full host
-  list (or a config-management task — see below) closes the gap immediately.
- **Manage `authorized_keys` declaratively.** An Ansible `ansible.posix.authorized_key`
-  task (or equivalent) that lists the *current* set of keys makes "who can log in" a
-  reviewed, version-controlled fact instead of an append-only pile that drifts per host.
- **Keep the old key authorized until the new one is verified everywhere**, then remove
-  the stale line in a deliberate cleanup pass.
-
-## How to Diagnose This (Checklist)
-
-1. `ssh -o BatchMode=yes <host> true` → `Permission denied (publickey)` (auth), not
-   `Host key verification failed` (host key). Confirms which problem you have.
-2. `ssh -v <host> 2>&1 | grep Offering` → which private key is being offered, and its
-   fingerprint.
-3. Sweep the whole fleet with the `BatchMode` loop → get the **full** list of affected
-   hosts before fixing.
-4. Append the new public key (idempotent, backed up, `chmod 600`) via a still-trusted
-   transit path.
-5. Re-verify each host with a direct `BatchMode` login.
-
-Related: *[SSH Config & Key Management](../../01-linux/networking/ssh-config-key-management.md)*
-and *[SSH Hardening Across a Fleet with Ansible](../../02-selfhosting/security/ssh-hardening-ansible-fleet.md)*.
--- a/05-troubleshooting/networking/steam-deck-wifi-flapping-iwd-periodic-scan-rtw88.md
+++ b/05-troubleshooting/networking/steam-deck-wifi-flapping-iwd-periodic-scan-rtw88.md
@ -1,133 +0,0 @@
---
-title: "Steam Deck Wi-Fi Flapping: IWD Periodic Scan + rtw88 Power Save"
-domain: troubleshooting
-category: networking
-tags: [wifi, steam-deck, steamos, iwd, networkmanager, rtw88, rtl8822ce, power-save, supplicant-disconnect, flapping]
-status: published
-created: 2026-06-19
-updated: 2026-06-19
---
-
-# Steam Deck Wi-Fi Flapping: IWD Periodic Scan + rtw88 Power Save
-
-## 🛑 Problem
-
-An OG Steam Deck (LCD model, Realtek **RTL8822CE** on the `rtw88_8822ce` driver) kept "losing" Wi-Fi — it would connect, hold for around a minute, drop, then reconnect a second later, over and over. From the router side the device looked like it was constantly coming and going; from the couch it felt like the network "wouldn't stay connected."
-
-Crucially, **this was not a router problem.** The AP config was correct, RF was clean (strong signal, zero tx retries / beacon loss), and every other client on the network was rock-solid. The fault was entirely on the Deck.
-
-## 🔍 Diagnosis
-
-SteamOS uses **NetworkManager with the `iwd` backend** (not `wpa_supplicant`). That detail is the whole ballgame.
-
-### Step 1 — Confirm the flap and its cadence
-
-```bash
-# how many disconnects this boot?
-journalctl -b -u NetworkManager --no-pager | grep -c supplicant-disconnect
-# 50
-
-# when did they happen?
-journalctl -b -u NetworkManager --no-pager | grep supplicant-disconnect \
-  | awk '{print $1,$2,$3}' | tail
-# 10:20:52 · 10:21:54 · 10:22:57 · 10:24:00 · 10:25:03 · 10:26:05 · 10:27:08 ...
-```
-
-**~63 seconds between every drop.** A fixed, metronome-like interval is the tell — this is a *timer*, not RF noise. The NetworkManager log shows the pattern plainly:
-
-```
-activated -> failed (reason 'supplicant-disconnect')
-... -> activated         # reconnects ~1s later
-```
-
-### Step 2 — Prove the link is healthy *when it's up*
-
-```bash
-iw dev wlan0 station dump | grep -iE 'signal|bitrate|failed|retries|beacon loss'
-#   signal:    -65 dBm
-#   tx retries: 0
-#   tx failed:  0
-#   beacon loss: 0
-```
-
-Strong signal, zero retries, zero beacon loss — the association is clean while it lasts. So the drop is being *commanded*, not caused by a bad radio link.
-
-### Step 3 — Identify the chip and the backend
-
-```bash
-lspci -k | grep -A3 -iE 'network|wireless'
-#   Realtek RTL8822CE ... Kernel driver in use: rtw88_8822ce
-```
-
-The `~63s` interval is **IWD's default periodic background scan**. With no `/etc/iwd/main.conf` present, IWD scans on a timer even while connected, and on the `rtw88` driver that scan knocks the current association over — producing the `supplicant-disconnect` every minute.
-
-A secondary annoyance: `iw dev wlan0 get power_save` reported `on`, which showed up as wildly jittery LAN latency (8–69 ms to the gateway over Wi-Fi, where a healthy 5 GHz link is 2–10 ms).
-
-## ✅ Fix
-
-Two independent changes — the first stops the flap, the second smooths latency.
-
-### 1. Disable IWD's periodic scan (stops the flap)
-
-```bash
-sudo mkdir -p /etc/iwd
-printf '[Scan]\nDisablePeriodicScan=true\n' | sudo tee /etc/iwd/main.conf
-sudo systemctl restart iwd     # briefly drops Wi-Fi; NetworkManager auto-reconnects
-```
-
-Trade-off: with periodic scanning off, the Deck roams to a different/stronger AP (e.g. another AiMesh node) more lazily. Fine for a device that mostly sits in one spot.
-
-### 2. Disable Wi-Fi power save (kills the latency jitter)
-
-The obvious `nmcli connection modify <name> 802-11-wireless.powersave 2` **does not work under the IWD backend** — NetworkManager doesn't enforce that property when `iwd` is managing the radio. Use a dispatcher script instead, with a retry loop because `rtw88` won't accept the setting in the first instant after association on a cold boot:
-
-```bash
-sudo tee /etc/NetworkManager/dispatcher.d/90-wifi-powersave >/dev/null <<'SCRIPT'
-#!/bin/sh
-# Disable Wi-Fi power save on the wireless iface (retry: rtw88 may not accept it instantly on boot)
-case "$2" in
-  up|dhcp4-change|connectivity-change)
-    case "$1" in
-      wl*)
-        for n in 1 2 3 4 5; do
-          /usr/bin/iw dev "$1" set power_save off 2>/dev/null
-          [ "$(/usr/bin/iw dev "$1" get power_save 2>/dev/null)" = "Power save: off" ] && break
-          sleep 1
-        done
-      ;;
-    esac
-  ;;
-esac
-SCRIPT
-sudo chmod +x /etc/NetworkManager/dispatcher.d/90-wifi-powersave
-sudo iw dev wlan0 set power_save off    # apply now without waiting for a reconnect
-```
-
-> 💡 A single-shot dispatcher (no retry) **silently fails on a cold boot** — it fires before the interface is ready, the `iw` call no-ops, and power save stays on. Verify with `iw get power_save` *after a real reboot*, not just after a service restart.
-
-## 🔁 Verification
-
-```bash
-# was 50/boot, ~once a minute:
-journalctl -b -u NetworkManager --no-pager | grep -c supplicant-disconnect
-# 0
-iw dev wlan0 get power_save
-# Power save: off
-```
-
-A 3-minute continuous `ping` showed **180/180 replies, 0 loss**, latency tightened to **6–11 ms**. Confirmed across a full cold reboot: the Deck auto-rejoins Wi-Fi, both settings persist, and the disconnect counter stays at 0.
-
-## 📌 Notes
-
- **Persistence:** `/etc/iwd/main.conf` and the dispatcher live in `/etc`, which survives reboots. A major SteamOS update *can* reset `/etc` — re-apply if the flapping returns after an OS update.
- **Fully reversible:**
-  ```bash
-  sudo rm /etc/iwd/main.conf /etc/NetworkManager/dispatcher.d/90-wifi-powersave
-  sudo systemctl restart iwd
-  ```
- **Interface name** is usually `wlan0`; confirm with `iw dev` if different.
- The same IWD-periodic-scan behavior can affect other `iwd`-based distros (Arch, some Fedora spins) on flaky/older Wi-Fi chips — the `DisablePeriodicScan` fix is general, not Deck-specific.
-
-## 🔗 Related
-
- [Wi-Fi Game Streaming Stutter: 160 MHz Channel Width Saturating the 5 GHz Radio](wifi-160mhz-airtime-saturation-game-streaming.md) — the *other* Steam Deck Wi-Fi issue (airtime contention, router-side), distinct from this client-side flap.
--- a/05-troubleshooting/networking/tailscale-ssh-magicdns-vs-pinned-ip-after-migration.md
+++ b/05-troubleshooting/networking/tailscale-ssh-magicdns-vs-pinned-ip-after-migration.md
@ -1,163 +0,0 @@
---
-title: "MagicDNS Names vs Pinned IPs for Tailscale SSH (After a Fleet Migration)"
-domain: troubleshooting
-category: networking
-tags:
-  - ssh
-  - ssh-config
-  - tailscale
-  - magicdns
-  - known-hosts
-  - host-key
-  - migration
-  - wsl2
-status: published
-created: 2026-06-12
-updated: 2026-06-12
---
-
-# MagicDNS Names vs Pinned IPs for Tailscale SSH (After a Fleet Migration)
-
-You have SSH aliases for a Tailscale fleet (`alias tttpod='ssh root@100.84.42.102'`).
-They worked for months. Then you migrate or rebuild some nodes — and now a third of
-them hang on connect or refuse the host key. This is the failure mode that hardcoded
-addresses hit, and why the durable answer is **MagicDNS names**, not pinned IPs.
-
-> This is the sequel to *[SSH Alias Falls Through to MagicDNS — Host-Key Verification
-> Failure (No `Host` Block)](ssh-missing-host-block-magicdns-host-key-failure.md)*.
-> That article says **pin the IP** `known_hosts` already trusts — correct when the
-> node is stable. This one covers what happens when a migration changes the IP *and*
-> the host key, which is exactly when IP-pinning stops paying off.
-
-## The Three Failure Modes
-
-A migration/rebuild can trigger any of these — often several at once across a fleet,
-which is what makes it confusing:
-
-### 1. Stale hardcoded IP → connection times out
-
-The node re-registered on the tailnet with a **new** Tailscale IP, but your alias
-still names the old one:
-
-```
-$ tttpod
-ssh: connect to host 100.84.42.102 port 22: Operation timed out
-```
-
-The old address is dead; SSH waits the full timeout and gives up. Confirm by asking
-the tailnet for the node's *current* IP by name:
-
-```
-$ tailscale status | grep tttpod
-100.95.137.38   tttpod   ...     # alias points at 100.84.42.102 — stale
-```
-
-### 2. Cold-path teardown → first connect after idle times out
-
-The IP is correct and the node is up (it answers `ping`), but TCP/22 still times out
-on the *first* try after a quiet period, then works on retry. Tailscale 1.98.x is more
-aggressive about tearing down **idle direct UDP paths**; the first SSH has to
-re-establish NAT traversal, which can overrun SSH's default connect timeout.
-
-```
-$ tailscale status | grep tttpod
-100.95.137.38   tttpod   ...   idle, tx 9360 rx 0      # cold path
-$ tailscale ping tttpod
-pong from tttpod (100.95.137.38) via 5.161.118.84:41641 in 48ms   # warms instantly
-```
-
-### 3. Host-key verification failed → box was rebuilt
-
-The node was reinstalled, so it presents a **new** SSH host key. Your `known_hosts`
-still has the old one, so even `StrictHostKeyChecking=accept-new` aborts — `accept-new`
-only adds *genuinely new* hosts, it refuses a **mismatch**:
-
-```
-$ ssh root@tttpod hostname
-Host key verification failed.
-```
-
-## The Fix
-
-Three changes, applied on every **name-capable** machine (see the WSL2 caveat below):
-
-### a. Switch aliases from IPs to MagicDNS names
-
-```bash
-# before — rots on every migration
-alias tttpod='ssh root@100.84.42.102'
-# after — always resolves the node's current IP
-alias tttpod='ssh root@tttpod'
-```
-
-MagicDNS resolves the name to whatever IP the node currently has, so a future
-migration needs **zero** alias edits. This is the whole point: the tailnet already
-knows the mapping — stop duplicating (and stale-ing) it in your dotfiles.
-
-> **Exception:** if there's no tailnet device with that exact name (e.g. an alias
-> `teelia` pointing at a node actually named `temptedparadise`), MagicDNS can't
-> resolve it — keep the IP for that one.
-
-### b. Purge stale host keys, then re-accept
-
-After a rebuild, clear the old entries under **both** the name and the current IP,
-then reconnect with `accept-new` to record the fresh key. Over Tailscale's
-authenticated WireGuard tunnel, a key change from a known rebuild is safe to accept.
-
-```bash
-for pair in "tttpod:100.95.137.38" "majortoot:100.64.169.62" "dcaprod:100.98.223.93"; do
-  n="${pair%%:*}"; ip="${pair##*:}"
-  ssh-keygen -R "$n"; ssh-keygen -R "$ip"
-done
-# repopulate
-ssh -o StrictHostKeyChecking=accept-new root@tttpod hostname
-```
-
-### c. Add a cold-path cushion to `~/.ssh/config`
-
-Give the first (cold) connection time to renegotiate instead of erroring:
-
-```sshconfig
-Host majorlinux tttpod majortoot majordiscord dcaprod majormail majorhome
-    ConnectTimeout 25
-    ServerAliveInterval 30
-    ServerAliveCountMax 4
-```
-
-`ConnectTimeout 25` turns the cold-path timeout into a ~1–2 s pause. The keepalives
-hold the path open during an active session so it doesn't drop mid-command.
-
-## Caveat: WSL2 Can't Use MagicDNS
-
-A Linux box under **WSL2** typically has **no `tailscale` CLI and no MagicDNS
-resolver** — it rides the Windows host's networking, and name lookups for tailnet
-nodes fail:
-
-```
-$ getent hosts tttpod        # (inside WSL2)
-                             # nothing — no resolution
-$ command -v tailscale       # nothing — CLI lives on the Windows side
-```
-
-On those machines you **must** keep hardcoded IPs in `~/.ssh/config` (or use `Host`
-blocks with explicit `HostName <ip>`), and refresh them by hand when a node migrates.
-There's no self-healing option there — the trade is unavoidable.
-
-## Diagnosis Checklist
-
-1. `tailscale status | grep <host>` — does your alias's IP match the **current** one?
-   (Mode 1: stale IP.)
-2. `ping`/`tailscale ping <host>` works but TCP/22 times out on first try, succeeds on
-   retry? (Mode 2: cold path.)
-3. `ssh root@<host> true` → `Host key verification failed` (not `Permission denied`)?
-   (Mode 3: rebuilt box, stale `known_hosts`.)
-4. Is the client a WSL2 box? `getent hosts <name>` returns nothing → MagicDNS
-   unavailable, stay on IPs.
-
-## Takeaway
-
-Pin the IP when a host is **stable** and the IP-keyed `known_hosts` entry is your
-durable trust anchor. Switch to **MagicDNS names** when hosts **move** — migrations,
-rebuilds, provider changes — so the tailnet's own name→IP mapping does the work your
-dotfiles kept getting wrong. And on WSL2, you don't get the choice: hardcoded IPs,
-refreshed by hand.
--- a/05-troubleshooting/networking/wifi-160mhz-airtime-saturation-game-streaming.md
+++ b/05-troubleshooting/networking/wifi-160mhz-airtime-saturation-game-streaming.md
@ -1,115 +0,0 @@
---
-title: "Wi-Fi Game Streaming Stutter: 160 MHz Channel Width Saturating the 5 GHz Radio"
-domain: troubleshooting
-category: networking
-tags: [wifi, 5ghz, 160mhz, channel-width, dfs, steam-deck, game-streaming, asuswrt, airtime, chanim]
-status: published
-created: 2026-06-13
-updated: 2026-06-13
---
-
-# Wi-Fi Game Streaming Stutter: 160 MHz Channel Width Saturating the 5 GHz Radio
-
-## 🛑 Problem
-
-Streaming a game from a desktop (wired) to a Steam Deck over Wi-Fi was stuttering intermittently — fine for a while, then choppy, hard to reproduce on demand. Throughput tests "looked fine," which is exactly why it was hard to pin down: **game streaming fails on jitter and microbursts of contention, not on average bandwidth.**
-
-The Wi-Fi was an Asus RT-AX82U (AsusWRT, stock firmware) with the 5 GHz radio set to **Auto channel at 160 MHz width**.
-
-## 🔍 Diagnosis
-
-The key insight: **signal was excellent, but latency was not.** That combination means the airwaves are busy, not weak.
-
-### Step 1 — Measure jitter to the gateway from a Wi-Fi client
-
-```bash
-ping -c 20 -i 0.2 192.168.50.1
-# round-trip min/avg/max/stddev = 7.5/27.0/61.0/16.5 ms
-```
-
-27 ms **average** and 16 ms of jitter to your *own router* over Wi-Fi is pathological. A healthy 5 GHz link sits at 2–5 ms. Yet the client's signal was **-43 dBm** (excellent) with a clean **-92 dBm** noise floor. Strong signal + high jitter = **airtime contention**, not range or interference at the receiver.
-
-### Step 2 — Confirm channel utilization at the router
-
-AsusWRT/Broadcom exposes per-channel airtime stats via `wl chanim_stats`. SSH into the router and run it against the 5 GHz interface:
-
-```bash
-# 5 GHz interface name varies (eth6/eth7); resolve it from nvram
-IF=$(nvram get wl1_ifname)
-wl -i "$IF" chanspec        # e.g. 36/160 (0xe832)  → channel 36, 160 MHz
-wl -i "$IF" assoclist | wc -l   # number of associated 5 GHz clients
-wl -i "$IF" chanim_stats
-```
-
-The smoking gun (`chanim_stats`, version 3):
-
-```
-chanspec  tx  inbss obss nocat nopkt doze txop goodtx badtx glitch ... idle
-0xe832    92    2    1    2     1     0    4    8      81    2          14
-```
-
-Read it as percentages of airtime:
-
-| Field | Value | Meaning |
-|-------|-------|---------|
-| `tx` | **92** | Channel busy transmitting 92% of the time |
-| `txop` | **4** | Transmit-opportunities available only 4% — the channel is starved |
-| `idle` | **14** | Channel idle only 14% |
-| `goodtx` / `badtx` | 8 / **81** | Failed/retried transmits vastly outnumber good ones |
-
-Seventeen clients were associated to that one 5 GHz radio.
-
-### Step 3 — Understand why 160 MHz makes it worse
-
-A 160 MHz channel on the lower 5 GHz band spans channels **36–64**, which overlaps DFS sub-blocks. To stay clean it needs 160 MHz of *uncontended* spectrum — but in a dense RF environment (≈25 neighbor APs here, several on 5 GHz channels 48/52/100/132/153 that overlap or border the block), any one busy neighbor degrades the **entire** wide channel. 160 MHz also makes the radio **DFS-radar exposed**: a single radar detection forces a channel-switch with a 1 s+ blackout — a stream-killer.
-
-So 160 MHz buys a higher *peak* PHY rate that game streaming doesn't need, at the cost of the *stability* it absolutely does.
-
-## ✅ Fix
-
-Drop the 5 GHz radio to **80 MHz** and pin it to a **non-DFS** channel (UNII-1: 36/40/44/48 — no radar, no DFS blackouts).
-
-GUI: **Wireless → 5 GHz → Channel Bandwidth = 80 MHz**, **Control Channel = 36**, turn off "Auto."
-
-Or over SSH (`nvram` + `restart_wireless`):
-
-```bash
-nvram set wl1_bw_cap=7        # cap at 80 MHz (bitmask: 1=20, 3=40, 7=80, 15=160)
-nvram set wl1_chanspec=36/80  # channel 36 @ 80 MHz
-nvram set wl1_channel=36
-nvram commit
-service restart_wireless      # ~15-20s radio bounce, drops all clients briefly
-```
-
-> [!warning] `restart_wireless` drops every Wi-Fi client for 15–20 seconds. `nvram commit` runs *before* the restart, so the config persists even if your own SSH/Wi-Fi session drops.
-
-## 📊 Result
-
-Verified from both the router and a client after the radio came back:
-
-| Metric | Before (36/160) | After (36/80) |
-|--------|-----------------|---------------|
-| Channel tx-busy | 92% | **9%** |
-| Transmit-opportunity available | 4% | **79%** |
-| Channel idle | 14% | **87%** |
-| Failed tx (`badtx` vs `goodtx`) | 81 vs 8 | **1 vs 3** |
-| Gateway ping (avg / floor) | 27 ms / 7.5 ms | **9 ms / 2.7 ms** |
-| PHY peak rate | 1729 Mbps | 1200 Mbps |
-
-The PHY peak dropped (narrower channel) but that is irrelevant — Steam Remote Play wants ~30–50 Mbps with *consistent* airtime, which it now has. The stutter resolved.
-
-## 🧠 Takeaways
-
- **Diagnose Wi-Fi streaming problems with jitter, not throughput.** A speed test can pass while a stream stutters. Ping your gateway and watch the stddev.
- **Strong signal + high latency = airtime congestion.** Don't chase signal strength when RSSI is already good; look at channel utilization (`chanim_stats`).
- **160 MHz is a trap in a dense RF environment.** Use 80 MHz for reliability; reserve 160 MHz for clean spectrum and short range.
- **Prefer non-DFS channels (36–48) for anything latency-sensitive** — DFS radar events cause silent multi-second dropouts.
- **Wire the *source*.** The streaming PC should be on Ethernet so the video only crosses the air once (AP → handheld). The handheld has to be Wi-Fi; the desktop doesn't.
- **Isolate IoT on 2.4 GHz** (separate SSID) so it never competes for 5 GHz airtime with latency-sensitive clients.
-
-## Related
-
- [Steam Deck Wi-Fi Flapping: IWD Periodic Scan + rtw88 Power Save](steam-deck-wifi-flapping-iwd-periodic-scan-rtw88.md) — the *other* Steam Deck Wi-Fi issue (client-side flap), distinct from this router-side airtime problem.
- [Network Overview](../../02-selfhosting/dns-networking/network-overview.md)
- [Wake-on-LAN via Router SSH](../../02-selfhosting/dns-networking/wake-on-lan-router-ssh.md)
- [Pi-hole v6 Group Management — Per-Client DNS Rules](../../02-selfhosting/dns-networking/pihole-v6-group-management.md)
--- a/05-troubleshooting/time-machine-apfs-orphaned-previous-blocks-backup.md
+++ b/05-troubleshooting/time-machine-apfs-orphaned-previous-blocks-backup.md
@ -1,120 +0,0 @@
---
-title: "Time Machine: Orphaned APFS .previous Folder Blocks All Backups"
-domain: troubleshooting
-category: general
-tags: [macos, time-machine, apfs, backup, fsck, disk-utility]
-status: published
-created: 2026-06-18
-updated: 2026-06-18
---
-# Time Machine: Orphaned APFS `.previous` Folder Blocks All Backups
-
-## Overview
-On an APFS Time Machine destination, an interrupted backup can leave behind an orphaned staging folder named `<timestamp>.previous` (plus a matching, uncatalogued APFS snapshot). Every subsequent backup reads that folder during *FindingChanges*, hits a metadata-type mismatch, and aborts — so backups silently stop running. macOS shows only a generic "**Time Machine couldn't complete the backup … An unknown error occurred.**"
-
-The trap: because the orphan is **not in Time Machine's catalog** and the destination is OS-protected, every obvious removal tool (`rm`, `chmod`, `tmutil delete`, `diskutil deleteSnapshot`) refuses it. The clean fix is **First Aid (`fsck_apfs`)**, which has authority over the volume and clears the orphaned snapshot.
-
-## Symptoms
- "Time Machine couldn't complete the backup to '<disk>' — An unknown error occurred."
- Backups haven't run since around the time of an interrupted/cancelled backup.
- The destination disk is mounted and has plenty of free space (not full, not disconnected).
- `tmutil status` cycles through `Starting` / `FindingChanges` and never reaches `Copying`.
-
-## Root Cause
-`backupd` logs the real error on a loop (every ~15 s):
-
-```bash
-log show --predicate 'subsystem == "com.apple.TimeMachine"' --last 10m --style compact \
-  | grep -iE 'previous|error'
-```
-```
-[TMStructure] Expected SnapshotInProgressContainer metadata type but found APFSBackup
-  metadata type at URL '.../<disk>/2026-06-17-172230.previous/'
-```
-
-An earlier backup was interrupted mid-run. It left two orphans tied to that timestamp, **neither registered in Time Machine's backup catalog**:
-
-1. A staging directory `<timestamp>.previous` on the destination volume.
-2. A matching APFS snapshot `com.apple.TimeMachine.<timestamp>.backup`.
-
-Time Machine expects the staging folder to be a `SnapshotInProgressContainer` but finds completed-backup (`APFSBackup`) metadata, so it bails before copying anything.
-
-> **Ignore the surrounding log noise.** `com.apple.backupd.sandbox.xpc: connection invalid`, `Mountpoint '…' is still valid`, and `missingName` on `/System/Volumes/Data/home` are all normal on a healthy backup — flagged `E` but harmless. The only line that matters is the `SnapshotInProgressContainer` mismatch.
-
-## Diagnosis
-
-Confirm the disk is healthy (not the problem) and locate the orphan:
-
-```bash
-tmutil status                              # stuck in Starting/FindingChanges, never Copying
-df -h | grep -i "<disk-name>"              # mounted, plenty free
-diskutil apfs listSnapshots <diskNsN>      # note the highest/last snapshot timestamp
-```
-
-If `listSnapshots` shows a final snapshot whose timestamp matches the `.previous` folder in the error, that's the orphaned pair.
-
-## Why the Obvious Tools Fail
-
-Do **not** burn time trying to force the folder out — here's what each tool does and why it refuses:
-
-| Command | Result | Reason |
-|---|---|---|
-| `sudo rm -rf …/<ts>.previous` | `Operation not permitted` | TM applies a `group:everyone deny delete` ACL that overrides root. |
-| `sudo chmod -RN …/<ts>.previous` | runs for minutes, then fails | A `.previous` folder is a **full copy of the entire Mac filesystem**; `-R` walks the whole tree and can't clear ACLs on the SIP-`restricted` system files inside (`/usr/bin/sh`, frameworks, keymaps). `rm` then hits the same wall. |
-| `sudo tmutil delete -p …/<ts>.previous` | `Invalid deletion target (error 22)` | Not a registered backup. |
-| `sudo tmutil delete -t <timestamp>` | `error 2 (No such file)` | No catalog entry for that timestamp. |
-| `sudo diskutil apfs deleteSnapshot <diskNsN> -uuid <uuid>` | `Not a valid APFS Snapshot UUID` | TM-managed snapshot; diskutil won't remove it directly. |
-
-> **If you started a `chmod -R` and killed it:** the live system is unaffected — `chmod -R` does not follow symlinks out of the backup tree. Verify with `ls -lde ~/Desktop` (normal ACLs = untouched). Stop a runaway with `sudo pkill -f '<timestamp>.previous'`.
-
-## Fix — Run First Aid (`fsck_apfs`)
-
-First Aid runs with full authority over the volume and clears the orphaned snapshot, which defuses the `.previous` folder's metadata mismatch.
-
-```bash
-# 1. Stop the looping backup
-sudo tmutil stopbackup
-
-# 2. Verify the destination volume (live mode is fine; read-only check)
-sudo diskutil verifyVolume <diskNsN>
-#    or: Disk Utility → View → Show All Devices → select the TM volume → First Aid → Run
-```
-
-`verifyVolume` enumerates and validates every snapshot; the verify/remount cycle purges the orphaned in-progress snapshot. Expected result:
-
-```
-The volume <name> appears to be OK
-File system check exit code is 0
-```
-
-Confirm the orphan snapshot is gone (count drops by one; the matching timestamp no longer appears):
-
-```bash
-diskutil apfs listSnapshots <diskNsN>
-```
-
-Then restart and watch it succeed:
-
-```bash
-sudo tmutil startbackup --auto
-tmutil status      # should reach BackupPhase = Copying with no SnapshotInProgressContainer errors
-```
-
-If `verifyVolume` reports problems rather than "appears to be OK", run the repair (it must unmount the volume):
-
-```bash
-sudo diskutil repairVolume <diskNsN>
-```
-
-## Notes
- The first backup after the fix is often a large catch-up (hundreds of GB) because the chain was broken — let it finish; it returns to quick hourly increments afterward.
- The inert `<timestamp>.previous` **folder** may still sit on the volume after the fix. Time Machine now ignores it, so it's not blocking — but it consumes space. Removing it cleanly requires booting to **Recovery Mode**, `csrutil disable`, `rm -rf` the folder, then `csrutil enable` — only worth it to reclaim the space.
- Time Machine identifies its destination by `DestinationID` (a UUID), not the volume name, so renaming the disk later is safe.
- Interrupted backups are more likely on flaky USB-SATA bridge enclosures (e.g. some WD My Passport units) whose slow sleep/wake transitions can drop the drive mid-backup.
-
-## Tags
-`macos` `time-machine` `apfs` `backup` `fsck-apfs` `disk-utility` `snapshot` `first-aid`
-
-## See Also
- [SnapRAID & MergerFS Storage Setup](../01-linux/storage/snapraid-mergerfs-setup.md)
- MajorMac Incident Log (2026-06-18) — the originating incident
--- a/05-troubleshooting/wordpress-67-textdomain-just-in-time-notice.md
+++ b/05-troubleshooting/wordpress-67-textdomain-just-in-time-notice.md
@ -1,193 +0,0 @@
---
-title: "WordPress 6.7 _load_textdomain_just_in_time Notice (Theme/Plugin Loads Translations Too Early)"
-domain: troubleshooting
-category: troubleshooting
-tags:
-  - wordpress
-  - wordpress-6.7
-  - php
-  - i18n
-  - textdomain
-  - theme
-  - mu-plugin
-  - deprecation
-  - troubleshooting
-status: published
-created: 2026-06-21
-updated: 2026-06-21
---
-
-# WordPress 6.7 `_load_textdomain_just_in_time` Notice
-
-> **TL;DR** — WordPress 6.7 added a `doing_it_wrong` notice that fires when a translation function (`__()`, `_e()`, `esc_html__()`, …) is called for a text domain **before the `init` action**. It's almost always a theme or plugin registering nav menus / sidebars / labels on `after_setup_theme` (which runs before `init`). The notice is **debug-only and harmless** — translations still load via the just-in-time fallback. If the offending code is in your own (or an updatable) theme/plugin, fix it at the source by deferring to `init`. If it's a **non-updating or third-party** theme you don't want to hand-edit, suppress *only this one notice* with a `doing_it_wrong_trigger_error` filter in a tiny mu-plugin.
-
---
-
-## Symptom
-
-With `WP_DEBUG` on (or in Query Monitor's PHP panel), you see:
-
-```
-Function _load_textdomain_just_in_time was called incorrectly.
-Translation loading for the <domain> domain was triggered too early.
-This is usually an indicator for some code in the plugin or theme running too early.
-Translations should be loaded at the init action or later.
-(This message was added in version 6.7.0.)
-
-_load_textdomain_just_in_time()  wp-includes/l10n.php
-get_translations_for_domain()    wp-includes/l10n.php
-translate()                      wp-includes/l10n.php
-__()                             wp-includes/l10n.php
-WordPress Core
-```
-
-The key fields are **the domain name** (e.g. `marstheme`, `woocommerce`, `astra`) and the fact that the stack bottoms out in **WordPress Core** via `__()` — that tells you *some* extension called a translation function, not that core is broken.
-
-## Why it happens (the WP 6.7 change)
-
-Before 6.7, WordPress silently "just-in-time" loaded a text domain the first time you translated a string in it. 6.7 kept the JIT loading but started **warning** when it's triggered before `init`, because:
-
- Translations loaded before `init` can't be filtered/overridden by other plugins that hook `init`.
- It signals the extension is doing setup work earlier than the WordPress lifecycle intends.
-
-The usual culprit is code on **`after_setup_theme`** (which fires *before* `init`) that translates a label inline, e.g.:
-
-```php
-function mytheme_setup() {
-    register_nav_menus( array(
-        'primary' => __( 'Primary Menu', 'mytheme' ),   // <-- translate call before init
-    ) );
-}
-add_action( 'after_setup_theme', 'mytheme_setup' );
-```
-
-> **Important:** explicitly calling `load_theme_textdomain()` / `load_plugin_textdomain()` early does **not** fix the notice, and as of WP 4.6+ themes on wordpress.org don't even need to call it. The notice is about the *translate call*, not about whether the domain was loaded. Moving only the `load_*_textdomain()` call around is a common dead-end (see the gotcha below).
-
-## Diagnostic chain
-
-### 1. Identify the domain and what owns it
-
-The notice names the domain. Find which theme/plugin uses it:
-
-```bash
-WPROOT=/var/www/html
-grep -rlw '<domain>' "$WPROOT/wp-content/themes" "$WPROOT/wp-content/plugins" 2>/dev/null
-
-# Which extension has the most references (i.e. owns the domain)?
-grep -rl '<domain>' "$WPROOT/wp-content/" 2>/dev/null \
-  | sed -E "s#$WPROOT/wp-content/(themes|plugins|mu-plugins)/([^/]+)/.*#\1/\2#" \
-  | sort | uniq -c | sort -rn | head
-```
-
-> **Watch for renamed/forked themes.** The domain often does **not** match the theme's folder name. A theme bought as "Mars" and re-slugged to `kappa` keeps `marstheme` as its text domain in all 40+ template files. So `wp theme list` shows `kappa` active while the notice says `marstheme` — they're the same thing.
-
-### 2. Confirm it's active and whether it can be updated
-
-```bash
-sudo -u www-data wp --path=$WPROOT theme list --fields=name,status,version,update
-sudo -u www-data wp --path=$WPROOT plugin list --fields=name,status,version,update
-```
-
- `update available` → **update it first** (newest releases of most themes/plugins fixed this in late 2024/2025). That's the proper fix; the rest of this article is for when you can't.
- `update none` on a **renamed/custom fork** → no upstream exists, so updating is impossible. Go to the suppression fix.
-
-### 3. Pin down the early call (optional)
-
-```bash
-grep -rn "__(\s*['\"].*['\"]\s*,\s*['\"]<domain>['\"]" \
-  "$WPROOT/wp-content/themes/<theme>" | head
-```
-
-Look for translate calls inside functions hooked to `after_setup_theme`, `setup_theme`, `plugins_loaded`, or run at file scope in `functions.php`.
-
-## The fix
-
-### Option A — fix it at the source (own / updatable code)
-
-Defer the translation. Either register the raw string and translate at render time, or move the registration to `init`:
-
-```php
-// Before: translated on after_setup_theme (too early)
-add_action( 'after_setup_theme', function () {
-    register_nav_menus( array( 'primary' => __( 'Primary Menu', 'mytheme' ) ) );
-} );
-
-// After: register the menu location on init, where translation is allowed
-add_action( 'init', function () {
-    register_nav_menus( array( 'primary' => __( 'Primary Menu', 'mytheme' ) ) );
-} );
-```
-
-Don't do this by editing a theme/plugin that receives updates — your change is wiped on the next update. Use Option B for those.
-
-### Option B — suppress just this notice (third-party / non-updating code)
-
-When the early call lives in a theme you don't control and can't update (a renamed commercial fork, an abandoned plugin), the clean, update-safe move is to silence **only** the `_load_textdomain_just_in_time` notice — not all `doing_it_wrong` output — via a must-use plugin.
-
-Create `wp-content/mu-plugins/fix-textdomain.php`:
-
-```php
-<?php
-/**
- * Suppress the WP 6.7 "_load_textdomain_just_in_time was called incorrectly"
- * notice for a theme/plugin that translates before init.
- *
- * Scope is intentionally narrow: only this one function is silenced, so other
- * doing_it_wrong notices still surface. Translations still load via the JIT
- * fallback, so nothing visible changes for visitors.
- */
-add_filter( 'doing_it_wrong_trigger_error', function ( $trigger, $function_name ) {
-    return '_load_textdomain_just_in_time' === $function_name ? false : $trigger;
-}, 10, 2 );
-```
-
-`mu-plugins/` loads automatically (no activation, can't be deactivated from the admin), and runs early enough to register the filter before the notice fires.
-
-#### Verify
-
-```bash
-WPROOT=/var/www/html
-
-# 1. Syntax-check the mu-plugin
-php -l "$WPROOT/wp-content/mu-plugins/fix-textdomain.php"
-#    -> No syntax errors detected
-
-# 2. Confirm WP still boots and the filter is registered
-sudo -u www-data wp --path=$WPROOT eval \
-  'echo has_filter("doing_it_wrong_trigger_error") ? "filter set\n" : "MISSING\n";'
-
-# 3. Clear the debug log, trigger an early translate, confirm 0 new notices
-DBG="$WPROOT/wp-content/debug.log"
-[ -f "$DBG" ] && : > "$DBG"
-sudo -u www-data wp --path=$WPROOT eval '__("Primary Menu","<domain>");' >/dev/null 2>&1
-grep -c "<domain>" "$DBG" 2>/dev/null || echo 0
-#    -> 0
-```
-
-## Gotchas
-
-### The "load the textdomain earlier/later" dead-end
-
-A very common (wrong) first attempt is an mu-plugin that just calls `load_theme_textdomain()` on `plugins_loaded` or `after_setup_theme`:
-
-```php
-// DOES NOT FIX THE NOTICE
-add_action( 'plugins_loaded', function () {
-    load_theme_textdomain( 'mytheme', get_template_directory() . '/languages' );
-}, 0 );
-```
-
-`plugins_loaded` still runs **before `init`**, and — more importantly — the notice is triggered by the theme's own early `__()` call, not by whether you've loaded the domain. This code is dead weight. If you find one in place, replace it with the Option B filter rather than tweaking its hook/priority.
-
-### Don't blanket-suppress all deprecations
-
-Resist `error_reporting(E_ALL & ~E_DEPRECATED)` or returning `false` from `doing_it_wrong_trigger_error` unconditionally — that also hides genuinely useful warnings (a plugin breaking on a future PHP/WP bump). Scope the filter to the one `function_name`.
-
-### Renamed theme ⇒ domain ≠ folder
-
-Re-stating because it costs the most time: the domain in the notice can be the theme's *original* slug, not its current folder. Always `grep` for the domain to find the real owner before concluding "I don't even have that theme installed."
-
-## See also
-
- [Patching PHP 8.4 Implicit-Nullable Deprecations in Vendor Packages](php-84-vendor-implicit-nullable-patch.md) — the other "harmless deprecation that floods logs" pattern on the WordPress fleet
- [WordPress developer note: i18n improvements in 6.7](https://make.wordpress.org/core/2024/10/21/i18n-improvements-in-6-7/) — the canonical reference for this change
--- a/05-troubleshooting/yt-dlp-fedora-js-challenge.md
+++ b/05-troubleshooting/yt-dlp-fedora-js-challenge.md
@ -10,7 +10,7 @@ tags:
  - deno
 status: published
 created: 2026-04-02
-updated: 2026-06-16T18:35
+updated: 2026-04-30T05:21
 ---
 # yt-dlp YouTube JS Challenge Fix (Fedora)

@ -84,43 +84,12 @@ echo '--remote-components ejs:github' > ~/.config/yt-dlp/config

 ## Maintenance

-YouTube pushes extractor changes frequently. Keep yt-dlp current.
-
-### Updating: the `-U` trap + avoid duplicate installs
-
-`yt-dlp -U` **does not work** when yt-dlp was installed via pip/PyPI — the PyPI build deliberately disables the self-updater:
-
-```
-ERROR: You installed yt-dlp with pip or using the wheel from PyPi; Use that to update
-```
-
-Update through pip instead. **Pick one install method and stick to it** — running both a user install and a system install leaves two copies that drift out of sync (one updates, the other stays stale and shadows it depending on `$PATH` / sudo).
-
-**Recommended — single user install (no sudo):**
-
-```bash
-pip3 install -U --user yt-dlp
-```
-
-This lives in `~/.local/bin/yt-dlp` and is first on a normal user's `$PATH`. Update it the same way; never use sudo.
-
-**Alternative — system-wide (Fedora, PEP 668):**
+YouTube pushes extractor changes frequently. Keep yt-dlp current:

 ```bash
 sudo pip install -U yt-dlp --break-system-packages
 ```

-> Only use `--break-system-packages` if you intentionally want a root-owned copy in `/usr/local`. Do **not** mix it with a `--user` install.
-
-**Check for and remove a duplicate install:**
-
-```bash
-which -a yt-dlp            # more than one path = duplicate installs
-sudo pip3 uninstall -y yt-dlp   # removes the /usr/local (system) copy + its wrapper
-```
-
-> If installed via the standalone binary (not pip), `yt-dlp -U` is the correct updater.
-
 ---

 ## Known Limitations
--- a/SUMMARY.md
+++ b/SUMMARY.md
@ -1,6 +1,6 @@
 ---
 created: 2026-04-02T16:03
-updated: 2026-06-21T11:46
+updated: 2026-05-15T09:00
 ---
 * [Home](index.md)
 * [Linux & Sysadmin](01-linux/index.md)
@ -12,12 +12,10 @@ updated: 2026-06-21T11:46
    * [Bash Scripting Patterns](01-linux/shell-scripting/bash-scripting-patterns.md)
    * [SnapRAID & MergerFS Storage Setup](01-linux/storage/snapraid-mergerfs-setup.md)
    * [mdadm — Rebuilding a RAID Array After Reinstall](01-linux/storage/mdadm-raid-rebuild.md)
-    * [Growing an LVM Volume by Absorbing Another Disk](01-linux/storage/lvm-grow-volume-absorb-disk.md)
    * [Linux Distro Guide for Beginners](01-linux/distro-specific/linux-distro-guide-beginners.md)
    * [WSL2 Instance Migration to Fedora 43](01-linux/distro-specific/wsl2-instance-migration-fedora43.md)
    * [WSL2 Training Environment Rebuild](01-linux/distro-specific/wsl2-rebuild-fedora43-training-env.md)
    * [WSL2 Backup via PowerShell](01-linux/distro-specific/wsl2-backup-powershell.md)
-    * [WSL2 In-Place Upgrade to Fedora 44](01-linux/distro-specific/wsl2-fedora44-inplace-upgrade.md)
 * [Self-Hosting & Homelab](02-selfhosting/index.md)
    * [Self-Hosting Starter Guide](02-selfhosting/docker/self-hosting-starter-guide.md)
    * [Docker vs VMs for the Homelab](02-selfhosting/docker/docker-vs-vms-homelab.md)
@ -32,7 +30,6 @@ updated: 2026-06-21T11:46
    * [AWS S3 Cost Management](02-selfhosting/cloud/aws-s3-cost-management.md)
    * [VPS Migration Baseline Checklist](02-selfhosting/cloud/vps-migration-baseline-checklist.md)
    * [rsync Backup Patterns](02-selfhosting/storage-backup/rsync-backup-patterns.md)
-    * [Fleet Backups with restic + B2](02-selfhosting/storage-backup/restic-b2-fleet-backups.md)
    * [Tuning Netdata Web Log Alerts](02-selfhosting/monitoring/tuning-netdata-web-log-alerts.md)
    * [Tuning Netdata Docker Health Alarms](02-selfhosting/monitoring/netdata-docker-health-alarm-tuning.md)
    * [Deploying Netdata to a New Server](02-selfhosting/monitoring/netdata-new-server-setup.md)
@ -44,7 +41,6 @@ updated: 2026-06-21T11:46
    * [Mastodon Post-Install Hardening (Permissions + Account)](02-selfhosting/services/mastodon-post-install-hardening.md)
    * [Mastodon — The `--prune-profiles` Trap and How to Recover](02-selfhosting/services/mastodon-prune-profiles-trap.md)
    * [Mastodon on S3 — Silent Upload Failures (BucketOwnerEnforced/ACLs)](02-selfhosting/services/mastodon-s3-acl-upload-failures.md)
-    * [Mastodon — Triaging Crowdfunding / Mention-Spam Accounts](02-selfhosting/services/mastodon-mention-spam-crowdfunding.md)
    * [Ghost Email Configuration with Mailgun](02-selfhosting/services/ghost-smtp-mailgun-setup.md)
    * [Inbound Spam Filtering: spamass-milter + SpamAssassin Bayes](02-selfhosting/services/postfix-spamassassin-bayes-spam-filtering.md)
    * [Claude Code Remote Control — Mobile Access to a Persistent Host Session](02-selfhosting/services/claude-code-remote-control.md)
@ -60,7 +56,6 @@ updated: 2026-06-21T11:46
    * [Fail2ban Custom Jail: Nginx Bad Request Detection](02-selfhosting/security/fail2ban-nginx-bad-request-jail.md)
    * [Fail2ban Custom Jail: Apache Bad Request Detection](02-selfhosting/security/fail2ban-apache-bad-request-jail.md)
    * [SSH Hardening Fleet-Wide with Ansible](02-selfhosting/security/ssh-hardening-ansible-fleet.md)
-    * [Migrating Flat Ansible Playbooks to Roles (Safely)](02-selfhosting/security/ansible-flat-playbooks-to-roles.md)
    * [ClamAV Fleet Deployment with Ansible](02-selfhosting/security/clamav-fleet-deployment.md)
    * [Fail2Ban Digest Mode — Fleet-Wide Quiet Alerts](02-selfhosting/security/fail2ban-digest-mode-fleet.md)
    * [Apache CVE-2026-23918 — HTTP/2 Double Free Mitigation](02-selfhosting/security/apache-cve-2026-23918-http2-mitigation.md)
@ -81,8 +76,6 @@ updated: 2026-06-21T11:46
    * [HEVC Batch Re-Encode for Plex Using VAAPI (AMD GPU)](04-streaming/plex/hevc-vaapi-batch-encode.md)
    * [Plex Transcoding Troubleshooting](04-streaming/plex/plex-transcoding-troubleshooting.md)
 * [Troubleshooting](05-troubleshooting/index.md)
-    * [Wi-Fi Game Streaming Stutter: 160 MHz Channel Width Saturating the 5 GHz Radio](05-troubleshooting/networking/wifi-160mhz-airtime-saturation-game-streaming.md)
-    * [Steam Deck Wi-Fi Flapping: IWD Periodic Scan + rtw88 Power Save](05-troubleshooting/networking/steam-deck-wifi-flapping-iwd-periodic-scan-rtw88.md)
    * [Apache Outage: Fail2ban Self-Ban + Missing iptables Rules](05-troubleshooting/networking/fail2ban-self-ban-apache-outage.md)
    * [Postfix + SendGrid: TLS Handshake Failure (Port 465 vs 587)](05-troubleshooting/networking/postfix-sendgrid-tls-handshake-failure.md)
    * [Mail Client Stops Receiving: Fail2ban IMAP Self-Ban](05-troubleshooting/networking/fail2ban-imap-self-ban-mail-client.md)
@ -108,7 +101,6 @@ updated: 2026-06-21T11:46
    * [Gemini CLI Manual Update](05-troubleshooting/gemini-cli-manual-update.md)
    * [MajorWiki Setup & Publishing Pipeline](05-troubleshooting/majwiki-setup-and-pipeline.md)
    * [Gitea Actions Runner: Boot Race Condition Fix](05-troubleshooting/gitea-runner-boot-race-network-target.md)
-    * [Forgejo: Account Recovery & CLI Admin When Locked Out of the GUI](05-troubleshooting/forgejo-mailer-and-cli-recovery.md)
    * [Cron Heartbeat False Alarm: /var/run Cleared by Reboot](05-troubleshooting/cron-heartbeat-tmpfs-reboot-false-alarm.md)
    * [SELinux: Fixing Dovecot Mail Spool Context (/var/vmail)](05-troubleshooting/selinux-dovecot-vmail-context.md)
    * [SELinux: Wrong /etc/localtime Label Silently Breaks Timezone Changes](05-troubleshooting/selinux-localtime-label-breaks-timezone.md)
@ -119,17 +111,11 @@ updated: 2026-06-21T11:46
    * [Claude Desktop MCP Server Started via wsl.exe Sees Empty Environment (WSLENV)](05-troubleshooting/wsl-env-claude-desktop-mcp.md)
    * [Claude Desktop MCP Mass-Disconnect After Blocking SSH Reboot](05-troubleshooting/claude-desktop-mcp-mass-disconnect-blocking-reboot.md)
    * [Patching PHP 8.4 Implicit-Nullable Deprecations in Vendor Packages](05-troubleshooting/php-84-vendor-implicit-nullable-patch.md)
-    * [WordPress 6.7 `_load_textdomain_just_in_time` Notice (Translations Loaded Too Early)](05-troubleshooting/wordpress-67-textdomain-just-in-time-notice.md)
    * [Ollama Drops Off Tailscale When Mac Sleeps](05-troubleshooting/ollama-macos-sleep-tailscale-disconnect.md)
    * [Ollama: `ollama run` with Piped Stdin Bypasses Chat Template + SYSTEM Prompt](05-troubleshooting/ollama-chat-template-pipe-stdin-bypass.md)
-    * [Claude Code Won't Log In (Warp & iTerm2) — Corrupt Keychain Credential](05-troubleshooting/claude-code-warp-login-corrupt-keychain-credential.md)
-    * [Claude Code Keychain Prompt Keeps Reappearing on macOS (ACL Invalidation)](05-troubleshooting/claude-code-keychain-prompt-recurring-macos.md)
-    * [iPhone Mirroring Hangs on 'Connecting…' — AWDL Data Stall (27.0 Beta)](05-troubleshooting/iphone-mirroring-connecting-hang-awdl-stall-beta.md)
    * [rsync over Tailscale: Hung in TCP Teardown After Transfer Completes](05-troubleshooting/networking/rsync-tailscale-teardown-stall.md)
    * [iOS Tailscale Clients Report HostName="localhost" — Breaks /etc/hosts Generators](05-troubleshooting/networking/tailscale-status-json-hostname-localhost-ios.md)
    * [macOS: Repeating Alert Tone from Mirrored iPhone Notification](05-troubleshooting/macos-mirrored-notification-alert-loop.md)
-    * [Auditing & Cleaning macOS Background App Activity (sfltool dumpbtm)](05-troubleshooting/macos-background-app-activity-audit-sfltool.md)
-    * [Time Machine: Orphaned APFS `.previous` Folder Blocks All Backups](05-troubleshooting/time-machine-apfs-orphaned-previous-blocks-backup.md)
    * [OBS Studio: Stale Script Paths After Windows Profile Rename](05-troubleshooting/obs-stale-script-paths-after-windows-profile-rename.md)
    * [ClamAV CPU Spike: Safe Scheduling with nice/ionice](05-troubleshooting/security/clamscan-cpu-spike-nice-ionice.md)
    * [Logwatch Falsely Reports 'No freshclam updates' in ClamAV Daemon Mode](05-troubleshooting/security/freshclam-logwatch-false-no-updates.md)
@ -141,16 +127,10 @@ updated: 2026-06-21T11:46
    * [Ansible: SSH Timeout During dnf upgrade on Fedora Hosts](05-troubleshooting/ansible-ssh-timeout-dnf-upgrade.md)
    * [Ansible: regex_search Capture-Group Argument Fails in set_fact](05-troubleshooting/ansible-regex-search-set-fact-capture-group.md)
    * [Ansible: Ubuntu Reboot Detection Misses Kernel Upgrades](05-troubleshooting/ansible-ubuntu-reboot-detection-kernel-mismatch.md)
-    * [Ansible: reboot.yml become Timeout on WSL2 Hosts (Exclude Them)](05-troubleshooting/ansible-reboot-become-timeout-wsl2.md)
    * [Fedora Networking & Kernel Troubleshooting](05-troubleshooting/fedora-networking-kernel-recovery.md)
    * [Systemd Session Scope Fails at Login](05-troubleshooting/systemd/session-scope-failure-at-login.md)
    * [wget/curl: URLs with Special Characters Fail in Bash](05-troubleshooting/wget-url-special-characters.md)
    * [Ansible: Check Mode False Positives in Verify/Assert Tasks](05-troubleshooting/ansible-check-mode-false-positives.md)
    * [Ansible Fails with Permission Denied While `ssh <alias>` Works (Host Alias Bypass)](05-troubleshooting/ansible-ssh-host-alias-bypass.md)
-    * [SSH Alias Falls Through to MagicDNS — Host-Key Verification Failure (No `Host` Block)](05-troubleshooting/networking/ssh-missing-host-block-magicdns-host-key-failure.md)
-    * [MagicDNS Names vs Pinned IPs for Tailscale SSH (After a Fleet Migration)](05-troubleshooting/networking/tailscale-ssh-magicdns-vs-pinned-ip-after-migration.md)
-    * [`Permission denied (publickey)` After Rotating a Key — Backfill Every `authorized_keys`](05-troubleshooting/networking/ssh-rotated-key-not-backfilled-authorized-keys.md)
-    * [Ansible UNREACHABLE: Host Key Verification Failed After a Host Rebuild or Migration](05-troubleshooting/networking/ansible-host-key-verification-failed-rebuilt-host.md)
-    * [Logwatch Reports the Wrong Hostname (`<host>-hetzner`) After a Migration](05-troubleshooting/logwatch-wrong-hostname-after-migration.md)
    * [Ghost EmailAnalytics Lag Warning — What It Means and When to Worry](05-troubleshooting/ghost-emailanalytics-lag-warning.md)
    * [claude-mem: --setting-sources Empty Arg Bug (Claude Code 2.1.x)](05-troubleshooting/claude-mem-setting-sources-empty-arg.md)