Merge branch 'code/majorrig/mastodon-mention-spam-wiki'

wiki: add Mastodon crowdfunding/mention-spam triage runbook
Runbook for telling broadcast fundraising solicitation from genuine mentions: signal checklist, SQL to investigate the account and its origin instance via nodeinfo, BlockService snippet, and a proportionate escalation ladder (mute -> block -> report -> domain-limit -> domain-block). Registered in SUMMARY.md and the self-hosting section index.
2026-06-22 13:50:21 -04:00 · 2026-06-22 13:49:35 -04:00 · 2026-06-21 13:01:34 -04:00 · 2026-06-21 13:00:35 -04:00 · 2026-06-21 12:34:06 -04:00 · 2026-06-21 12:33:56 -04:00
28 changed files with 2595 additions and 16 deletions
--- a/01-linux/distro-specific/wsl2-fedora44-inplace-upgrade.md
+++ b/01-linux/distro-specific/wsl2-fedora44-inplace-upgrade.md
@ -0,0 +1,119 @@
+---
+title: WSL2 In-Place Upgrade to Fedora 44 (with gcc14 Blocker + CUDA Repo Swap)
+domain: linux
+category: distro-specific
+tags:
+  - wsl2
+  - fedora
+  - windows
+  - upgrade
+  - dnf
+  - cuda
+  - majorrig
+status: published
+created: 2026-06-11
+updated: 2026-06-11
+---
+
+# WSL2 In-Place Upgrade to Fedora 44 (with gcc14 Blocker + CUDA Repo Swap)
+
+In-place upgrade of the FedoraLinux-43 WSL2 instance on MajorRig to Fedora 44 using `dnf system-upgrade` + `dnf5 offline reboot`. Hit one transaction blocker (`gcc14` compat package retired in F44) and swapped the stale `cuda-fedora39` repo to `cuda-fedora44` afterward. Performed 2026-06-11.
+
+## The Short Answer
+
+```powershell
+# PowerShell — backup first
+wsl --shutdown
+wsl --export FedoraLinux-43 D:\backups\fedora43.tar
+```
+
+```bash
+# Inside Fedora
+sudo dnf upgrade --refresh -y
+sudo shutdown -h now
+# relaunch, then:
+sudo dnf remove gcc14-c++ gcc14        # F44 dropped gcc14 — blocks the transaction
+sudo dnf system-upgrade download --releasever=44
+sudo dnf5 offline reboot               # applies offline upgrade, shuts distro down
+# wait a few minutes, relaunch:
+cat /etc/fedora-release                # → Fedora release 44 (Forty Four)
+```
+
+```powershell
+# PowerShell — keep WSL itself current
+wsl --update
+```
+
+## Steps
+
+1. **Back up the instance** (PowerShell). The export tar is roughly the size of the installed system — this one was 86 GB. The target directory must already exist or you get `Wsl/ERROR_PATH_NOT_FOUND`.
+
+```powershell
+wsl --shutdown
+mkdir D:\backups
+wsl --export FedoraLinux-43 D:\backups\fedora43.tar
+```
+
+2. **Fully update the current release, then restart the distro**
+
+```bash
+sudo dnf upgrade --refresh -y
+sudo shutdown -h now
+```
+
+3. **Remove upgrade blockers.** `gcc14`/`gcc14-c++` (compat packages) were retired in Fedora 44, so the transaction fails with "does not belong to a distupgrade repository". Remove them (or use `--allowerasing` and review the summary):
+
+```bash
+sudo dnf remove gcc14-c++ gcc14
+```
+
+4. **Download and apply the upgrade**
+
+```bash
+sudo dnf system-upgrade download --releasever=44
+sudo dnf5 offline reboot
+```
+
+The "reboot" applies the offline transaction and shuts the distro down — there's no real systemd reboot in WSL. Wait a couple of minutes, then relaunch. If it errors on `systemctl`, the fallback is:
+
+```bash
+export DNF_SYSTEM_UPGRADE_NO_REBOOT=1
+sudo -E dnf system-upgrade reboot
+```
+
+5. **Verify and tidy up**
+
+```bash
+cat /etc/fedora-release      # Fedora release 44 (Forty Four)
+sudo dnf upgrade --refresh   # catch post-upgrade updates
+gcc --version                # F44 ships gcc 16; reinstall with `dnf install gcc gcc-c++` if removed
+```
+
+```powershell
+wsl --update   # fixes the post-upgrade Wsl/Service/E_UNEXPECTED catastrophic failure some users hit
+```
+
+## CUDA Repo Swap
+
+`dnf repolist` still showed `cuda-fedora39-x86_64` — NVIDIA repos are pinned per Fedora release and don't follow distro upgrades. NVIDIA publishes a fedora44 repo:
+
+```bash
+sudo rm /etc/yum.repos.d/cuda-fedora39*.repo
+sudo dnf config-manager addrepo --from-repofile=https://developer.download.nvidia.com/compute/cuda/repos/fedora44/x86_64/cuda-fedora44.repo
+sudo dnf upgrade --refresh
+sudo dnf repolist   # confirm cuda-fedora44-x86_64
+```
+
+**WSL caveat:** never install the NVIDIA *driver* inside WSL — the Windows host driver provides the GPU. Only install toolkit packages (e.g. `cuda-toolkit`).
+
+## Gotchas & Notes
+
+- **Don't skip more than two releases** in one jump — staged upgrades otherwise.
+- **The WSL distro name is just a Windows label** — it still says "FedoraLinux-43" after the upgrade. Cosmetic fixes: Windows Terminal profile name, Start Menu shortcut, and `DistributionName`/`ShortcutPath` under `HKCU\Software\Microsoft\Windows\CurrentVersion\Lxss\{uuid}`.
+- **Keep the backup tar** until the upgraded instance has proven stable for a few days, then delete to reclaim the space.
+- **Restore path if needed:** `wsl --import FedoraRestore C:\WSL\FedoraRestore D:\backups\fedora43.tar` — remember imports default to root; fix via `/etc/wsl.conf` `[user] default=majorlinux`.
+
+## See Also
+
+- [WSL2 Instance Migration (Fedora 43)](wsl2-instance-migration-fedora43.md)
+- [WSL2 Backup via PowerShell](wsl2-backup-powershell.md)
--- a/01-linux/index.md
+++ b/01-linux/index.md
@ -23,7 +23,14 @@ A collection of guides covering Linux administration, shell scripting, networkin
 - [Ansible Getting Started](shell-scripting/ansible-getting-started.md)
 - [Bash Scripting Patterns](shell-scripting/bash-scripting-patterns.md)

+## Storage
+
+- [SnapRAID & MergerFS Storage Setup](storage/snapraid-mergerfs-setup.md)
+- [mdadm — Rebuilding a RAID Array After Reinstall](storage/mdadm-raid-rebuild.md)
+- [Growing an LVM Volume by Absorbing Another Disk](storage/lvm-grow-volume-absorb-disk.md)
+
 ## Distro-Specific

 - [Linux Distro Guide for Beginners](distro-specific/linux-distro-guide-beginners.md)
 - [WSL2 Instance Migration to Fedora 43](distro-specific/wsl2-instance-migration-fedora43.md)
+- [WSL2 In-Place Upgrade to Fedora 44](distro-specific/wsl2-fedora44-inplace-upgrade.md)
--- a/01-linux/storage/lvm-grow-volume-absorb-disk.md
+++ b/01-linux/storage/lvm-grow-volume-absorb-disk.md
@ -0,0 +1,159 @@
+---
+title: "Growing an LVM Volume by Absorbing Another Disk"
+domain: linux
+category: storage
+tags: [lvm, lvextend, vgextend, pvcreate, resize2fs, ext4, storage, disk, homelab]
+status: published
+created: 2026-06-17
+updated: 2026-06-17
+---
+
+# Growing an LVM Volume by Absorbing Another Disk
+
+When an LVM-backed filesystem fills up and its volume group (VG) has no free
+extents, you can grow it by adding a second physical disk as a new physical
+volume (PV), extending the VG onto it, then extending the logical volume (LV)
+and its filesystem. With ext4 this can be done **online** — no unmount, no
+downtime for the volume being grown.
+
+This guide covers the common case where the disk you want to absorb is currently
+in use by its own LVM volume (you must evacuate and tear that down first), and
+the precautions that keep it safe.
+
+> [!warning] This enlarges your failure domain
+> A single LV spanning two disks linearly (the default — no RAID/mirror) means
+> **losing either disk loses the entire volume.** ext4 has no parity. Only do
+> this for data you can rebuild, or layer redundancy (mdadm/LVM RAID) underneath.
+> Back up anything irreplaceable first.
+
+## The Short Answer
+
+If the target disk (`/dev/sdX`) is already empty and unused:
+
+```bash
+sudo pvcreate /dev/sdX
+sudo vgextend myvg /dev/sdX
+sudo lvextend -l +100%FREE /dev/myvg/mylv
+sudo resize2fs /dev/mapper/myvg-mylv      # ext4, online; use xfs_growfs for XFS
+```
+
+The rest of this article handles the harder case: the target disk is currently
+holding its own LVM volume with data on it.
+
+## Step-by-Step
+
+### 1. Survey the current layout
+
+```bash
+sudo pvs                       # physical volumes → which VG each belongs to
+sudo vgs                       # volume groups, free extents (VFree)
+sudo lvs                       # logical volumes and sizes
+lsblk -o NAME,SIZE,TYPE,MOUNTPOINT
+df -h
+```
+
+Confirm:
+
+- The VG you want to grow (`myvg`) has `0` `VFree` (that's why you're here).
+- The disk you want to absorb (`/dev/sdX`) is a **standalone** PV — not a member
+  of an mdadm array, a mergerfs branch, or a SnapRAID parity disk. Repurposing a
+  disk that something else depends on will break that thing silently.
+
+### 2. Evacuate the disk you're about to absorb
+
+Anything on the target disk will be **destroyed**. Move it somewhere with room to
+spare, then prove the copy is intact before you trust it.
+
+```bash
+# Copy preserving permissions/timestamps
+sudo rsync -a /mnt/olddisk/important /destination/with/space/
+
+# Verify byte-for-byte — empty output + exit code 0 means identical
+sudo diff -rq /mnt/olddisk/important /destination/with/space/important && echo OK
+```
+
+For large trees the `diff -rq` (full byte comparison) is slow but is the
+authoritative check — don't skip it before the destructive phase. If an
+application tracks files by path (databases, media servers), update its path
+references to the new location *now*, while the old copy still exists as a
+fallback.
+
+### 3. Unmount and remove the old disk from fstab
+
+```bash
+sudo fuser -m /mnt/olddisk          # confirm nothing holds it open
+sudo umount /mnt/olddisk
+mountpoint -q /mnt/olddisk && echo "STILL MOUNTED" || echo "unmounted"
+
+sudo cp /etc/fstab /etc/fstab.bak-$(date +%Y%m%d)   # always back up fstab
+sudo sed -i '/olddisk/d' /etc/fstab                 # remove the stale entry
+grep olddisk /etc/fstab || echo "fstab line gone"
+```
+
+> [!tip] Verify your `sed` pattern only matches the line you mean
+> A too-broad pattern can delete the wrong fstab entry. Check the file before and
+> after, and keep the backup until you've confirmed the system still boots.
+
+### 4. Tear down the old disk's LVM
+
+```bash
+sudo lvremove -y /dev/oldvg/oldlv
+sudo vgremove -y oldvg
+sudo pvremove -y /dev/sdX        # wipes the LVM label off the disk
+```
+
+This is the point of no return for the old disk's data — which is why steps 2–3
+verified the copy first.
+
+### 5. Add the disk to the target VG and extend
+
+```bash
+sudo pvcreate -y /dev/sdX
+sudo vgextend myvg /dev/sdX
+sudo lvextend -l +100%FREE /dev/myvg/mylv
+```
+
+`lvs`/`vgs` should now show the LV grown to span both disks and `0` free extents.
+
+### 6. Grow the filesystem (online)
+
+```bash
+# ext4 — works while mounted
+sudo resize2fs /dev/mapper/myvg-mylv
+
+# XFS — grows online too, but takes the mountpoint, not the device
+sudo xfs_growfs /mountpoint
+```
+
+`resize2fs` is idempotent — if it gets interrupted, just run it again; it reports
+"Nothing to do!" once the filesystem already fills the LV.
+
+### 7. Verify
+
+```bash
+df -h /mountpoint     # should reflect the new larger size
+sudo pvs              # /dev/sdX now listed under myvg
+sudo vgs myvg         # two PVs, larger VSize
+```
+
+## Notes & Gotchas
+
+- **Online resize works for the volume being grown, not the one being removed.**
+  The disk you absorb must be unmounted and torn down; the destination LV stays
+  mounted throughout.
+- **`resize2fs` interruption is safe.** ext4 online resize is journaled; re-run it.
+- **macOS cruft on evacuated disks.** Trees touched by macOS often carry
+  `._*` AppleDouble files and `.DS_Store` — harmless to drop, but they inflate
+  file counts in `diff`/`rsync` output. Don't mistake them for real data.
+- **Check SMART on a disk you're promoting into a bigger role.** A disk with a
+  pending-sector history is riskier once it's in the critical path for a whole
+  multi-disk volume than it was holding a small isolated one.
+- **Mountpoint cleanup.** After the old disk is gone, its former mountpoint
+  directory may reappear (it was shadowed by the mount). `rmdir` it if empty.
+  Note `ls -A` exits `0` on an empty directory, so don't gate cleanup on its exit
+  status — test contents explicitly.
+
+## Related
+
+- [SnapRAID & MergerFS Storage Setup](snapraid-mergerfs-setup.md) — add redundancy/parity instead of a linear span
+- [mdadm — Rebuilding a RAID Array After Reinstall](mdadm-raid-rebuild.md)
--- a/02-selfhosting/cloud/vps-migration-baseline-checklist.md
+++ b/02-selfhosting/cloud/vps-migration-baseline-checklist.md
@ -66,14 +66,15 @@ Every server in the fleet should have these. Check each one after migration:
 ### After Migration

 1. **Set the timezone** — `timedatectl set-timezone America/New_York` (US) or `Europe/London` (UK). Hetzner images default to UTC.
-2. **Verify CA bundle (Fedora)** — `ls /etc/pki/tls/certs/ca-bundle.crt`. If missing, Postfix TLS, curl, and dnf will all fail silently. See [Fedora CA bundle fix](../../05-troubleshooting/security/fedora-ca-bundle-missing-symlink.md).
-3. **Run `harden.yml` against the new host** — catches most gaps in one pass
-4. **Send a test email** — `echo test | mail -s "test" marcus@majorshouse.com` — if this fails, nothing else can alert you
-5. **Verify crond is running** — `systemctl is-active crond` (Fedora) or `systemctl is-active cron` (Ubuntu). cronie can be `enabled` but not `active` after provisioning.
-6. **Check Netdata Cloud** — verify the new node appears and alerts are flowing
-7. **Compare fail2ban jails** — `fail2ban-client status` on both old and new
-8. **Verify logwatch sends** — `sudo logwatch --output mail --range today`
-9. **Keep the old box powered off but not destroyed** for at least 7 days after remediation
+2. **Set the system hostname** — Hetzner provisions the box as `<host>-hetzner`. Run `hostnamectl set-hostname <host>` and fix the loopback line: `sed -i "s/127.0.1.1.*/127.0.1.1 <host> <host>/" /etc/hosts`. Skip this and **Logwatch emails arrive titled `Logwatch for <host>-hetzner`** weeks later. Do it alongside the Tailscale node rename and Postfix `myhostname` — all three read from the provisioning label. See [Logwatch wrong hostname after migration](../../05-troubleshooting/logwatch-wrong-hostname-after-migration.md).
+3. **Verify CA bundle (Fedora)** — `ls /etc/pki/tls/certs/ca-bundle.crt`. If missing, Postfix TLS, curl, and dnf will all fail silently. See [Fedora CA bundle fix](../../05-troubleshooting/security/fedora-ca-bundle-missing-symlink.md).
+4. **Run `harden.yml` against the new host** — catches most gaps in one pass
+5. **Send a test email** — `echo test | mail -s "test" marcus@majorshouse.com` — if this fails, nothing else can alert you
+6. **Verify crond is running** — `systemctl is-active crond` (Fedora) or `systemctl is-active cron` (Ubuntu). cronie can be `enabled` but not `active` after provisioning.
+7. **Check Netdata Cloud** — verify the new node appears and alerts are flowing
+8. **Compare fail2ban jails** — `fail2ban-client status` on both old and new
+9. **Verify logwatch sends** — `sudo logwatch --output mail --range today`
+10. **Keep the old box powered off but not destroyed** for at least 7 days after remediation

 ### Using doctl to Manage Old Droplets

--- a/02-selfhosting/index.md
+++ b/02-selfhosting/index.md
@ -38,6 +38,7 @@ Guides for running your own services at home, including Docker, reverse proxies,
 - [Mastodon Federation](services/mastodon-federation.md)
 - [Mastodon `--prune-profiles` Trap](services/mastodon-prune-profiles-trap.md)
 - [Mastodon on S3 — Silent Upload Failures](services/mastodon-s3-acl-upload-failures.md)
+- [Mastodon — Triaging Crowdfunding / Mention-Spam Accounts](services/mastodon-mention-spam-crowdfunding.md)
 - [Ghost SMTP via Mailgun](services/ghost-smtp-mailgun-setup.md)
 - [Updating n8n Docker](services/updating-n8n-docker.md)
 - [Claude Code Remote Control](services/claude-code-remote-control.md)
--- a/02-selfhosting/monitoring/logwatch-fleet-setup.md
+++ b/02-selfhosting/monitoring/logwatch-fleet-setup.md
@ -235,9 +235,12 @@ sed -i '/^127\.0\.1\.1/d' /etc/hosts && \
 systemctl reload postfix
 ```

+> [!tip] Same drift, different symptom: the Logwatch **title**
+> Hetzner provisions boxes with `<host>-hetzner` as the *system* hostname. When that's never corrected, Logwatch (which reads the live hostname at runtime) mails reports titled `Logwatch for <host>-hetzner` — no postfix involvement needed. Same `hostnamectl set-hostname` + `/etc/hosts` fix as above. See [Logwatch wrong hostname after migration](../../05-troubleshooting/logwatch-wrong-hostname-after-migration.md).
+
 ### 2. Empty `relayhost` quietly forces public-MX delivery

-If `postconf relayhost` returns an empty value, postfix doesn't fail — it just does an MX lookup for the destination domain and tries to deliver directly. For mail to your own mail server, that means going via the **public MX** (the domain's external MX record, e.g., `mail.majorshouse.com → 165.227.187.191:25`) instead of the **internal/Tailscale relay path** the rest of the fleet uses.
+If `postconf relayhost` returns an empty value, postfix doesn't fail — it just does an MX lookup for the destination domain and tries to deliver directly. For mail to your own mail server, that means going via the **public MX** (the domain's external MX record, e.g., `mail.majorshouse.com → 203.0.113.10:25`) instead of the **internal/Tailscale relay path** the rest of the fleet uses.

 The public-MX path is subject to whatever spam filtering, content checks, and trust rules the receiving MX has configured for external traffic. Internal Tailscale-IP traffic typically gets a faster trust shortcut (e.g., bypass spamchk pipe). So this single configuration drift causes one host's mail to land in a different code path than its siblings — and then silently get filtered.

--- a/02-selfhosting/security/ansible-flat-playbooks-to-roles.md
+++ b/02-selfhosting/security/ansible-flat-playbooks-to-roles.md
@ -0,0 +1,130 @@
+---
+title: "Migrating Flat Ansible Playbooks to Roles (Safely)"
+domain: selfhosting
+category: security
+tags: [ansible, roles, refactor, fleet, migration, fail2ban, infrastructure]
+status: published
+created: 2026-06-18
+updated: 2026-06-18
+---
+# Migrating Flat Ansible Playbooks to Roles (Safely)
+
+## Overview
+
+A fleet repo tends to grow a sprawl of flat `configure_*.yml` playbooks — one per subsystem, plus near-duplicates for variants (e.g. ~10 `configure_fail2ban_*` playbooks), all sharing a single overloaded top-level `templates/` directory. It works, but it resists reuse: there is no clean `defaults/` precedence, no encapsulation, and no way to compose a host's full configuration in one place.
+
+Ansible **roles** fix this — but migrating a *live* fleet is where it gets dangerous. The risk is not the refactor itself; it's accidentally changing deployed behaviour while you "just reorganize." This article covers the incremental, regression-free approach used to migrate an 11-host fleet, including the two techniques that keep it safe: **byte-identical migration** and **capture-based reconciliation**.
+
+> This is a process/pattern article. For the specific roles in this fleet, see the internal runbook. The techniques here generalize to any flat-playbook → role migration.
+
+## Decide What Becomes a Role vs. What Stays a Playbook
+
+Not everything should be a role. Draw the line by purpose:
+
+| Becomes a role | Stays a playbook |
+|---|---|
+| Reusable host **configuration** (a subsystem you converge to a desired state) | **Ops / one-off** actions: `update`, `reboot`, `harden`, `bootstrap`, `provision`, `fix_*`, `verify_*` |
+| Has templates/files, defaults, handlers | Orchestrators that just `import_playbook` other things |
+| Applied repeatedly and idempotently | Run-once or run-as-needed remediation |
+
+Roles get the standard `roles/<name>/` layout (`tasks/`, `defaults/`, `handlers/`, `templates/`, `files/`, `meta/`). Name them after the **subsystem noun** (`fail2ban`, `clamav`, `firewall`) — drop the `configure_` verb prefix.
+
+## The Incremental Loop (one role per branch)
+
+Migrate **one subsystem per branch** and validate before merging. This keeps every change small enough to diff by eye and roll back cleanly:
+
+1. `git mv` the templates/files into `roles/<name>/` so **git tracks them as renames** (history preserved, 100% rename score).
+2. Move task bodies into `roles/<name>/tasks/` (split by lifecycle: install → service → config → verify).
+3. Lift tunables into `roles/<name>/defaults/main.yml`; keep per-host overrides in `group_vars`/`host_vars`.
+4. Add a thin entry playbook `<name>.yml` (`hosts: <group>` + `roles: [<name>]`).
+5. Validate with `--check --diff` against a single host **before** merging.
+6. Merge, then move to the next subsystem.
+
+## Technique 1: Byte-Identical Migration
+
+When the goal is "reorganize without changing behaviour," **prove** it. After moving a playbook into a role, the rendered task bodies should be identical to the original. Verify with a normalized diff against `main`:
+
+```bash
+# Compare the role's task body against the original flat playbook,
+# ignoring only comments/whitespace you intend to change.
+git show main:configure_clamav.yml > /tmp/old.yml
+# ...extract the task list from roles/clamav/tasks/*.yml and diff
+diff <(yq '.[] | .tasks' /tmp/old.yml) <(cat roles/clamav/tasks/*.yml)
+```
+
+The acceptance bar: `--check --diff` against a real host returns **`changed=0`** (or only the diffs you explicitly intended, like a doc-comment line). If a "faithful" migration shows unexpected `changed=N`, you altered behaviour — stop and reconcile before merging. Templates moved via `git mv` show as **100% renames** in `git show --stat`, which is your proof the deployed content is unchanged.
+
+## Technique 2: Consolidating Near-Duplicates with Feature Flags
+
+The big win is collapsing a family of near-duplicate playbooks (the ~10 `configure_fail2ban_*`) into **one role with flag-gated task files**:
+
+```yaml
+# group_vars/<group>.yml — hosts self-select which jails/components they get
+fail2ban_jail_sshd: true
+fail2ban_jail_wordpress: true
+fail2ban_jail_nginx_bad_request: false
+```
+
+```yaml
+# roles/fail2ban/tasks/main.yml
+- import_tasks: jail_wordpress.yml
+  when: fail2ban_jail_wordpress | default(false)
+```
+
+> **Critical gotcha — key flags to inventory GROUPS, not `ansible_os_family`.** It is tempting to gate OS-specific task files on `ansible_os_family == 'Debian'`. Don't. Inventory groups frequently include hosts the *original playbooks deliberately excluded* (e.g. a LAN-only Debian box that should get the network-wait step but **not** the public SSH bind, or a WSL host in the `fedora` group that must be skipped). Keep the original curated host patterns and set the flag per play/group. Keying on `os_family` silently widens a play's host set and is exactly how a "refactor" pushes config to a host that never had it.
+
+## Technique 3: Capture-Based Reconciliation (the safety net)
+
+This is the one that prevents an outage. Sometimes a role gets written as a **fresh re-implementation** of a subsystem rather than a faithful move — a cleaner `jail.local`, new drop-ins, a different default set. It may even be merged into `site.yml`. The trap: that role has **never been rolled out**, and its config *diverges* from what's actually deployed.
+
+Running it would push divergent config to a live, security-sensitive subsystem (intrusion protection, firewall) across the whole fleet on the next `harden.yml`.
+
+The check that catches it:
+
+```bash
+ansible-playbook fail2ban.yml --check --diff --limit <host>
+# Divergent role => changed=8-12 per host + failures (missing filters/timers)
+# Faithful role  => changed=0, failed=0
+```
+
+**Capture-based reconciliation** is the fix: instead of pushing the role's idea of "correct," bring the **role into parity with the live, working config** first. Capture what's actually deployed, fold it into the role's templates/defaults until `--check` is clean fleet-wide, *then* switch the orchestrator over and retire the old playbooks. Order of operations:
+
+1. **Decide the source of truth** — the live config or the new role. For security subsystems, the live (working) config wins.
+2. **Reconcile** the role to match live until `--check` shows `changed=0, failed=0` on every host.
+3. **Roll out host-by-host** with real runs; verify the service restarts cleanly and (for fail2ban) jails are actually active.
+4. **Only then** delete the old playbooks, rewire `harden.yml`/`bootstrap.yml`, and remove the orphaned top-level templates.
+
+Never delete the old mechanism until the new one is proven converged everywhere. "It's in `site.yml`" is not the same as "it's been rolled out."
+
+## Composition: `site.yml`, `harden.yml`, `bootstrap.yml`
+
+Once subsystems are roles, compose them with thin orchestrators that `import_playbook` the role entry points — so each subsystem keeps a **single source of truth** for its host mapping:
+
+```yaml
+# site.yml — day-to-day fleet convergence, in dependency order
+- import_playbook: swap.yml
+- import_playbook: tailscale.yml
+- import_playbook: ssh_hardening.yml
+- import_playbook: firewall.yml
+- import_playbook: fail2ban.yml
+- import_playbook: clamav.yml
+```
+
+Order matters: base layer (swap) → networking (tailscale) → access (ssh_hardening) → perimeter (firewall) → intrusion protection (fail2ban). Bootstrap-only roles (guest agent, root password, provisioning prerequisites) belong in `bootstrap.yml`, not `site.yml`.
+
+## Verification Checklist
+
+- [ ] Templates moved with `git mv` (show as 100% renames)
+- [ ] `--check --diff` on a real host = `changed=0` (or only intended diffs)
+- [ ] Consolidation flags keyed to **inventory groups**, not `ansible_os_family`
+- [ ] Re-implemented roles reconciled to live parity **before** rollout (no surprise `changed=N`)
+- [ ] Security subsystems rolled out host-by-host with service-active verification
+- [ ] Old playbooks/templates deleted **only after** the role is converged fleet-wide
+- [ ] Orchestrators (`site.yml`/`harden.yml`/`bootstrap.yml`) rewired; stale references swept
+
+## Related
+
+- [SSH Hardening Fleet-Wide with Ansible](ssh-hardening-ansible-fleet.md)
+- [ClamAV Fleet Deployment with Ansible](clamav-fleet-deployment.md)
+- [Firewall Hardening with firewalld on Fedora Fleet](firewalld-fleet-hardening.md)
+- [Standardizing unattended-upgrades with Ansible](ansible-unattended-upgrades-fleet.md)
--- a/02-selfhosting/services/mastodon-mention-spam-crowdfunding.md
+++ b/02-selfhosting/services/mastodon-mention-spam-crowdfunding.md
@ -0,0 +1,170 @@
+---
+title: "Mastodon — Triaging Crowdfunding / Mention-Spam Accounts"
+description: How to tell broadcast fundraising solicitation from genuine mentions, investigate the account and its origin instance with SQL + nodeinfo, and pick a proportionate moderation action.
+tags:
+  - mastodon
+  - moderation
+  - abuse
+  - federation
+  - self-hosting
+created: 2026-06-22
+updated: 2026-06-22
+---
+
+# Mastodon — Triaging Crowdfunding / Mention-Spam Accounts
+
+If you run a Mastodon instance, sooner or later you (or your users) start getting tagged by accounts you've never interacted with, posting donation appeals with a link and a wall of hashtags. Some are real people in desperate situations; some are recycled-link scams. Either way, when an account is **broadcasting a solicitation at you** rather than replying to you, it's a moderation question, not a conversation.
+
+This article is the runbook for telling the two apart, investigating both the **account** and its **origin instance**, and choosing an action that's proportionate instead of nuking eight years of legit federation over two bad actors.
+
+## TL;DR
+
+- A mention is **broadcast spam**, not engagement, when it's a *standalone post* (not a reply) that *tags a large fixed list* of accounts and carries a *donation link*, usually from a *throwaway profile* on an *open-registration instance*.
+- Investigate before acting: pull the account's age/stats/bio and check whether the post is a reply or a 40-way blast (SQL below). Profile the origin instance via its public `nodeinfo`.
+- **Default action is an account-level block**, which also federates and removes their follow of you. Escalate to domain-limit / domain-block only when *one instance* produces *repeat offenders*.
+- Keep a log so single incidents that are actually a pattern become visible.
+
+## Signals that a mention is broadcast solicitation
+
+Score it on how many of these hold:
+
+| Signal | Why it matters |
+|---|---|
+| **Standalone post, not a reply** (`in_reply_to_account_id IS NULL`) but still tags you | They're broadcasting, not responding |
+| **Tags a large fixed recipient list** (e.g. 40+) | Mass distribution; the same list reused across senders = coordination |
+| **Donation link** in post or bio (`chuffed.org`, `gofundme`, `paypal.me`, `ko-fi`) | The payload |
+| **Throwaway profile** — days old, few followers, follows you but you don't follow back | Disposable, baiting a profile view |
+| **Mass-follow ratio** — following thousands / few hundred followers | Engagement farming |
+| **"I am not a scammer" disclaimer** in bio | Known red-flag phrase |
+| **Origin instance: open registration, no approval** | Easy throwaway-account farm |
+
+> [!warning] Judgment, not a purity test
+> Many of these accounts are real people. The goal is not to adjudicate need — it's to stop *broadcast solicitation aimed at you* and track the *source instances*. Prefer the lightest action that stops it.
+
+## Investigate the account
+
+Connect to the DB on the instance:
+
+```bash
+ssh <your-mastodon-host>
+sudo -u postgres psql mastodon_production
+```
+
+**Profile + stats for a suspect** (age, post count, follower ratio, bio):
+
+```sql
+SELECT a.username||'@'||a.domain,
+       to_char(a.created_at,'YYYY-MM-DD') AS first_seen_locally,
+       st.statuses_count, st.followers_count, st.following_count,
+       left(regexp_replace(COALESCE(a.note,''),'<[^>]+>','','g'),200) AS bio
+FROM accounts a LEFT JOIN account_stats st ON st.account_id=a.id
+WHERE a.domain='<INSTANCE>' AND a.username='<HANDLE>';
+```
+
+**Is the mention a reply or a blast?** `standalone=t` with a high `num_tagged` is the tell:
+
+```sql
+SELECT a.username, to_char(s.created_at,'YYYY-MM-DD HH24:MI') AS posted,
+       s.in_reply_to_account_id IS NULL AS standalone,
+       (SELECT count(*) FROM mentions mm WHERE mm.status_id=s.id) AS num_tagged
+FROM mentions m JOIN statuses s ON s.id=m.status_id
+JOIN accounts a ON a.id=s.account_id
+JOIN accounts me ON me.id=m.account_id AND me.username='<YOU>' AND me.domain IS NULL
+WHERE a.username='<HANDLE>' AND a.domain='<INSTANCE>'
+ORDER BY s.created_at DESC;
+```
+
+**All recent direct mentions of you** (sweep for the wider pattern):
+
+```sql
+SELECT to_char(n.created_at,'YYYY-MM-DD HH24:MI') AS when,
+       a.username||COALESCE('@'||a.domain,'@local') AS who,
+       COALESCE(s.uri,'') AS uri,
+       left(regexp_replace(COALESCE(s.text,''),'<[^>]+>','','g'),200) AS body
+FROM notifications n
+JOIN accounts recip ON recip.id=n.account_id AND recip.username='<YOU>' AND recip.domain IS NULL
+JOIN accounts a ON a.id=n.from_account_id
+LEFT JOIN mentions m ON m.id=n.activity_id AND n.activity_type='Mention'
+LEFT JOIN statuses s ON s.id=m.status_id
+WHERE n.type='mention' ORDER BY n.created_at DESC LIMIT 40;
+```
+
+## Profile the origin instance
+
+Don't judge an instance by one bad account. Pull its public metadata — no auth needed:
+
+```bash
+# Software, version, user counts, registration policy
+NI=$(curl -s https://<INSTANCE>/.well-known/nodeinfo | python3 -c 'import sys,json;print(json.load(sys.stdin)["links"][-1]["href"])')
+curl -s "$NI" | python3 -m json.tool         # software, openRegistrations, usage.users
+
+# Title, contact/admin, rules, registration approval flag
+curl -s https://<INSTANCE>/api/v2/instance | python3 -m json.tool
+```
+
+What to read off it:
+
+- **`openRegistrations: true` + `approval_required: false`** → throwaway-account farm; expect more of the same.
+- **`totalUsers` vs `activeMonth`** → a huge dormant base is typical of sign-up-and-leave farms.
+- **Federation age on your side** — how long you've known the instance, how many of its accounts you cache. A long, broad relationship argues *against* a domain block.
+- **The instance's own rules** — many ban "backlink accounts" / harassment, which the mass-tag fundraising violates. That makes **reporting to its admin a legitimate, in-policy path.**
+
+```sql
+-- What your instance already knows about the domain
+SELECT (SELECT count(*) FROM accounts WHERE domain='<INSTANCE>') AS known_accounts,
+       (SELECT count(*) FROM statuses s JOIN accounts a ON a.id=s.account_id WHERE a.domain='<INSTANCE>') AS cached_statuses,
+       (SELECT to_char(min(created_at),'YYYY-MM-DD') FROM accounts WHERE domain='<INSTANCE>') AS first_seen,
+       (SELECT count(*) FROM domain_blocks WHERE domain='<INSTANCE>') AS is_domain_blocked;
+```
+
+## The escalation ladder
+
+| Level | Action | Effect | When |
+|---|---|---|---|
+| 1 | **Mute** | You stop seeing them; silent | Borderline; you don't want to cut them off |
+| 2 | **Block (account)** | Cuts mentions, removes their follow, federates to their instance | **Default first action** |
+| 3 | **Report** to source admin | Forwards the offending posts to their moderators | Repeat or egregious; in-policy on most instances |
+| 4 | **Domain-limit (silence)** | Their posts show only if you follow that account | One instance, multiple offenders |
+| 5 | **Domain-block (suspend)** | Severs all known accounts + federation | Instance is predominantly abuse |
+
+### Blocking from a user account (federates + removes follow)
+
+There is no `tootctl accounts block`. Do it through the model's `BlockService` so it tears down the relationship and federates correctly:
+
+```ruby
+# run as the mastodon user:
+#   sudo -u mastodon bash -c 'cd /home/mastodon/live && RAILS_ENV=production bin/rails runner /tmp/block.rb'
+me = Account.find_by(username: "<YOU>", domain: nil)
+%w[Handle1 Handle2].each do |u|
+  t = Account.find_by(username: u, domain: "<INSTANCE>")
+  next puts("NOTFOUND #{u}") if t.nil?
+  BlockService.new.call(me, t)
+  puts "BLOCKED #{u} blocking=#{me.blocking?(t)} they_follow_me=#{t.following?(me)}"
+end
+```
+
+`blocking=true` with `they_follow_me=false` confirms the block landed and the follow was severed.
+
+### Instance-level actions
+
+Domain-limit / domain-block live in the admin UI (**Moderation → Federation**) or via `tootctl`:
+
+```bash
+# Silence (limit) — posts hidden unless followed
+RAILS_ENV=production bin/tootctl domains ... # or set severity=silence in the admin UI
+# Suspend (block) the whole instance
+RAILS_ENV=production bin/tootctl ... # admin UI "Add domain block" is the safe path
+```
+
+> [!tip] Reach for the lightest hammer
+> A domain block is rarely the right first move against an established instance — you lose every legit account and years of federation to swat a couple of accounts. Block the accounts, report them to the source admin, and only escalate the *instance* when it demonstrates a sustained, multi-actor pattern.
+
+## Keep a log
+
+Track offenders and source instances over time so a "one-off" that's actually a campaign becomes visible, and so domain-level decisions are evidence-based. A simple table — date, account, instance, signals, action — plus an instance-watch table with each source's registration policy and offender count is enough.
+
+## Related
+
+- [Mastodon `--prune-profiles` Trap](mastodon-prune-profiles-trap.md)
+- [Mastodon DB Maintenance](mastodon-db-maintenance.md)
+- [Mastodon Federation](mastodon-federation.md)
--- a/02-selfhosting/storage-backup/restic-b2-fleet-backups.md
+++ b/02-selfhosting/storage-backup/restic-b2-fleet-backups.md
@ -0,0 +1,137 @@
+---
+title: "App-Consistent Fleet Backups with restic + Backblaze B2"
+domain: selfhosting
+category: storage-backup
+tags: [restic, backblaze, b2, backup, ansible, systemd, postgresql, mysql, sqlite, docker, disaster-recovery]
+status: published
+created: 2026-06-19
+updated: 2026-06-19
+---
+
+# App-Consistent Fleet Backups with restic + Backblaze B2
+
+A repeatable pattern for backing up a mixed fleet (Ubuntu + Fedora, VPS + homelab, bare services + Docker) to Backblaze B2 with [restic](https://restic.net) — encrypted, deduplicated, and **app-consistent** (databases are dumped before the snapshot, not copied live). Driven by Ansible and a per-host `systemd` timer.
+
+## The Short Answer
+
+Per host, nightly: **dump every database to a staging dir → `restic backup` that staging dir plus the data paths → apply retention → wipe staging.** A monthly timer runs `restic prune`. Anything that fails emails the admin. One B2 bucket holds a separate repo per host at `b2:<bucket>:<hostname>`.
+
+Retention is `--keep-daily 7 --keep-weekly 4 --keep-monthly 6` (~6 months of history).
+
+## Why dump databases first
+
+Copying a live database's files (`/var/lib/mysql`, a running SQLite file, a Postgres data dir) gives you a *crash-consistent* copy at best — restorable only if you're lucky. Logical dumps are guaranteed consistent:
+
+- **MySQL / MariaDB:** `mysqldump --single-transaction --routines --triggers --databases <db>`
+- **PostgreSQL:** `pg_dump -Fc <db>` (custom format) via the `postgres` system user (peer auth)
+- **SQLite:** `sqlite3 <file> ".backup '<out>'"` — uses the online backup API, safe against a running writer
+- **Dockerized DBs:** `docker exec <container> sh -c '<dump cmd>'`, letting the container's own shell expand its root-password env var
+
+restic then backs up the dump files (which dedupe beautifully — only the changed blocks upload each night).
+
+## Repository layout
+
+- **One private B2 bucket** (e.g. `majorshouse-backups`).
+- **One repo per host:** `b2:majorshouse-backups:<hostname>`.
+- The application key needs **read + write + delete** for the bucket. restic deletes objects during `forget`/`prune`, so a pure *append-only* key will break retention. (True append-only requires splitting `forget`/`prune` onto a separate maintenance key — a worthwhile hardening step, but not the default.)
+- Credentials live in an `EnvironmentFile` (`/etc/restic/restic-env`, mode `0600`, root): `RESTIC_REPOSITORY`, `RESTIC_PASSWORD`, `B2_ACCOUNT_ID`, `B2_ACCOUNT_KEY`.
+
+## The backup script (shape)
+
+```bash
+set -uo pipefail
+STAGING=/var/backups/restic-staging
+rm -rf "$STAGING"; mkdir -p "$STAGING"; chmod 700 "$STAGING"
+
+# per-engine dumps into $STAGING ...
+mysqldump --single-transaction --routines --triggers --databases wordpress > "$STAGING/mysql-wordpress.sql"
+sudo -u postgres pg_dump -Fc mastodon_production            > "$STAGING/pg-mastodon_production.dump"
+sqlite3 /opt/phantombot/config/phantombot.db ".backup '$STAGING/sqlite-phantombot.db'"
+
+restic backup --tag fleet-backup --host "$(hostname -s)" \
+  "$STAGING" /var/www /etc/letsencrypt --exclude /path/to/already-offsite/media
+
+restic forget --keep-daily 7 --keep-weekly 4 --keep-monthly 6
+rm -rf "$STAGING"
+```
+
+Wrap each step so a failure mails the admin and aborts (don't silently back up a half-state). On hosts where the `mail` CLI is absent, pipe a message to `/usr/sbin/sendmail -t` instead.
+
+## systemd units
+
+A oneshot service + a timer. Stagger `OnCalendar` per host to spread B2 load, and **always set `RESTIC_CACHE_DIR`** (see Gotchas):
+
+```ini
+# restic-backup.service
+[Service]
+Type=oneshot
+EnvironmentFile=/etc/restic/restic-env
+Environment=RESTIC_CACHE_DIR=/var/cache/restic
+ExecStart=/usr/local/sbin/restic-backup.sh
+Nice=10
+IOSchedulingClass=idle
+```
+
+```ini
+# restic-backup.timer
+[Timer]
+OnCalendar=*-*-* 02:30:00
+RandomizedDelaySec=20m
+Persistent=true
+[Install]
+WantedBy=timers.target
+```
+
+A second `restic-prune.timer` runs `restic prune` monthly (`OnCalendar=*-*-01 04:00:00`).
+
+## Restore procedure
+
+The whole point. From the target host (or any host with the repo creds):
+
+```bash
+# load repo + B2 creds without echoing them
+set -a; . /etc/restic/restic-env; set +a
+
+restic snapshots                      # list; note the snapshot ID or use 'latest'
+
+# restore specific paths to a scratch dir (never restore in place blindly)
+restic restore latest --target /tmp/restore \
+  --include /var/backups/restic-staging \
+  --include /var/www/html/wp-config.php
+
+# verify before doing anything with it
+ls -la /tmp/restore/var/backups/restic-staging/
+head -1 /tmp/restore/var/backups/restic-staging/mysql-wordpress.sql   # "-- MySQL dump 10.13 ..."
+```
+
+To recover a database, restore the dump then load it: `mysql <db> < mysql-<db>.sql`, `pg_restore -d <db> pg-<db>.dump`, or copy the SQLite file back. **Test restores periodically** — a backup you've never restored is a hope, not a backup. Restore the highest-stakes data (password manager, mail) first in any drill.
+
+## Adding a host
+
+1. Add it to the `backups` inventory group.
+2. Give it a `host_vars` scope — which DBs to dump and which paths to back up:
+
+   ```yaml
+   restic_backup_oncalendar: "*-*-* 02:40:00"   # stagger
+   restic_mysql_dbs: [castopod_db]
+   restic_paths: [/var/www/html/castopod]
+   restic_excludes: [/var/www/html/castopod/public/media]   # already offsite
+   ```
+3. Run the playbook against that host. The role installs restic, deploys the script + units, `restic init`s the repo if absent, and enables the timers.
+
+## Gotchas & Notes
+
+- **`RESTIC_CACHE_DIR` is mandatory under systemd.** systemd services run with no `$HOME`, so restic can't find its cache and warns *"unable to locate cache directory: neither $XDG_CACHE_HOME nor $HOME are defined"* — and re-reads **every file** each run (no incremental). Point it at `/var/cache/restic` in the unit.
+- **`sqlite3` may not be installed.** A host that runs a SQLite-backed app (e.g. a bot) often lacks the `sqlite3`/`sqlite` CLI. Install it where `restic_sqlite_paths` is set, or the `.backup` step fails.
+- **Docker DB password env-var names vary.** Don't assume: the MariaDB image may use `MYSQL_ROOT_PASSWORD` (not `MARIADB_ROOT_PASSWORD`), and a Postgres container's superuser is whatever `POSTGRES_USER` is set to — reference `"$POSTGRES_USER"` rather than hardcoding `postgres`. Check with `docker exec <c> sh -c 'env | grep -oE "^(MYSQL|MARIADB|POSTGRES)_[A-Z_]*"'` (name only).
+- **B2 key needs delete capability.** Otherwise `forget`/`prune` fail. Scope the key to the bucket; reach for per-host `namePrefix`-restricted keys for blast-radius isolation.
+- **Exclude data that's already offsite.** Media already synced to object storage (S3/B2 via the app or `rclone`) should be `--exclude`d so you don't pay to store it twice.
+- **First upload is slow, the rest are fast.** The initial snapshot reads and uploads everything; subsequent runs only ship changed blocks. For a large first run, fire it detached and watch from a transient unit that emails you on completion.
+- **Keep secrets out of git.** The repo password and B2 key belong in an Ansible vault (committed encrypted), referenced into the role — never in plaintext vars.
+- **Changing a host's backup paths starts a new snapshot group.** `restic forget` groups snapshots by `host`+`paths` by default, so adding or removing a path on an existing host creates a *separate* lineage: the old path-set and the new one each retain their own 7d/4w/6m snapshots, and `restic snapshots` shows both. Expected, not a bug — but it means the old-path snapshots age out on their own schedule rather than being superseded. To collapse everything into one retention bucket, run `forget` with `--group-by host` (be deliberate: it then treats *any* path-set on that host as the same group).
+
+## See Also
+
+- [rsync Backup Patterns](rsync-backup-patterns.md)
+- [SnapRAID & MergerFS Storage Setup](../../01-linux/storage/snapraid-mergerfs-setup.md)
+- [restic documentation](https://restic.readthedocs.io)
--- a/04-streaming/plex/hevc-vaapi-batch-encode.md
+++ b/04-streaming/plex/hevc-vaapi-batch-encode.md
@ -5,7 +5,7 @@ category: plex
 tags: [plex, ffmpeg, hevc, vaapi, amd, gpu, encode, storage, rx480]
 status: published
 created: 2026-05-15
-updated: 2026-05-22
+updated: 2026-06-05
 ---
 # HEVC Batch Re-Encode for Plex Using VAAPI (AMD GPU)

@ -121,7 +121,7 @@ Each file logs:

 ### Space guard

-The script aborts if free space on the Plex volume drops below 20GB (`MIN_FREE_GB`). Worst-case headroom needed is `source_size + tmp_size` simultaneously — on a 4GB source file that's ~8GB peak.
+The script aborts if free space on the Plex volume drops below 10GB (`MIN_FREE_GB`). Worst-case headroom needed is `source_size + tmp_size` simultaneously — on a 4GB source file that's ~8GB peak. Note: the space check only runs at the **start** of each encode, not during — a large file can still consume significant disk mid-encode.

 ---

@ -278,3 +278,54 @@ local tmp="${dir}/${safe_stem}.hevc.tmp.${ext}"

 After patching, delete the affected entries from `hevc_failed.txt` (or leave them — they'll be re-queued on the next run since they're not in `hevc_done.txt`) and restart the batch.

+---
+
+### Many files failing: output larger than source (streaming content)
+
+**Symptom:** A large portion of the queue ends up in `hevc_failed.txt` with log lines like:
+
+```
+[2026-06-05 ...] Output: 4.7G  savings=0 (output larger than source)
+[2026-06-05 ...] WARN: output is larger than source — skipping swap, keeping original
+```
+
+**Cause:** These files are YouTube downloads or streaming archives (Giant Bomb, Twitch VODs, etc.) that were already encoded with an efficient H.264 encoder (typically YouTube's VP9-to-AVC pipeline or a broadcast H.264 encoder at a reasonable bitrate). VAAPI HEVC encoding at QP 28 on a Polaris GPU (RX 480/580) is a hardware encoder with limited rate control precision — it cannot beat a well-tuned software H.264 encode on already-compressed talking-head/gaming content. The output reliably comes out 15–25% *larger* than the source.
+
+The script handles this correctly: it detects output > source, deletes the tmp, keeps the original, and writes to `hevc_failed.txt`. The files are not corrupted. However, without the `already_failed()` guard, the script will re-attempt these files on every queue rebuild, wasting CPU time and briefly consuming 4–8 GB of disk per failed attempt.
+
+**Fix — add `already_failed()` skip logic:**
+
+Patch `~/hevc_batch.sh` to skip files already in `hevc_failed.txt`:
+
+```bash
+# After the existing already_done() function, add:
+already_failed() {
+  [[ -f "$FAILED" ]] && grep -qF "$1" "$FAILED"
+}
+
+# In build_queue(), after the already_done "$f" && continue line:
+already_failed "$f" && continue
+
+# In the main loop, after the already_done "$file" check:
+already_failed "$file" && { log "SKIP (already failed): $file"; continue; }
+```
+
+After patching, the batch will skip all 132+ known-bad files on the next pass and only attempt fresh queue entries.
+
+**Tuning options to improve savings on dense content:**
+
+- Lower QP: `--qp 24` or `--qp 22` — more aggressive quality target, better chance of beating source size. Trade-off: larger output for files that do compress.
+- Accept the failures: for streaming content archives, the source is already "good enough." Only files that are genuinely oversized H.264 (old stream captures at very high bitrate) will benefit from HEVC re-encode.
+
+**Identifying which files are worth encoding:**
+
+```bash
+# Show source bitrate for all queued files — high-bitrate sources are candidates
+while IFS= read -r f; do
+  bitrate=$(ffprobe -v quiet -show_entries format=bit_rate -of csv=p=0 "$f" 2>/dev/null)
+  echo "$bitrate $f"
+done < ~/hevc_queue.txt | sort -rn | head -20
+```
+
+Files above ~8,000 kbits/s are typically good encode candidates. Files at 3,000–5,000 kbits/s (typical YouTube/Twitch 1080p) will usually fail.
+
--- a/05-troubleshooting/ansible-reboot-become-timeout-wsl2.md
+++ b/05-troubleshooting/ansible-reboot-become-timeout-wsl2.md
@ -0,0 +1,103 @@
+---
+title: "Ansible reboot.yml: become Timeout on WSL2 Hosts (Exclude Them)"
+domain: troubleshooting
+category: ansible
+tags: [ansible, wsl, wsl2, windows, reboot, become, privilege-escalation, openssh, inventory]
+status: published
+created: 2026-06-12
+updated: 2026-06-12
+---
+
+# Ansible reboot.yml: become Timeout on WSL2 Hosts (Exclude Them)
+
+## Problem
+
+Running a reboot play across a Fedora fleet that includes a WSL2 "host" fails on the WSL2 box at privilege escalation — before the reboot command ever runs:
+
+```console
+$ ansible-playbook reboot.yml --limit fedora
+
+TASK [Reboot the server] *******************************************************
+changed: [majorhome]
+changed: [majorlab]
+changed: [majormail]
+changed: [majordiscord]
+[ERROR]: Task failed: Action failed: Timeout (62s) waiting for privilege
+escalation prompt:
+fatal: [majorrig-wsl]: FAILED! => {"changed": false,
+  "msg": "Timeout (62s) waiting for privilege escalation prompt:",
+  "reboot": false}
+```
+
+Every real server reboots fine. Only the WSL2 host fails, and `"reboot": false` confirms the shutdown command never executed.
+
+## Cause
+
+Two independent problems, either of which is enough to break a reboot play against WSL2:
+
+1. **WSL2 has no real reboot semantics.** `ansible.builtin.reboot` issues a shutdown, then blocks up to `reboot_timeout` (e.g. 900s) waiting for SSH to come back. A WSL2 distro doesn't reboot — it just terminates, and nothing relaunches it automatically. The task would hang the full timeout and then fail.
+
+2. **`become` times out over the Windows OpenSSH → WSL2 bridge.** When a WSL2 box is reached as `majorlinux@host` through Windows' built-in OpenSSH Server (which forwards into WSL via the default shell), Ansible's privilege-escalation handshake watches the SSH stream for the sudo prompt/success marker. Across the Windows-intercept pty, that marker detection stalls until the 62s `timeout`. This happens **even with passwordless sudo** — `NOPASSWD` is configured and correct; Ansible simply never sees the handshake complete.
+
+The error surfaces as #2 (it fails at escalation first), but #1 is the deeper reason WSL2 doesn't belong in a reboot play at all.
+
+## Solution
+
+**Exclude the WSL group from the reboot play.** A WSL2 instance is a managed *workstation environment*, not a server — it belongs in package/update plays but not in server lifecycle operations like reboot.
+
+Scope the play to exclude the `wsl` group so even a broad `--limit` skips it:
+
+```yaml
+# reboot.yml
+- name: Reboot servers
+  hosts: all:!wsl     # was: hosts: all
+  become: true
+  tasks:
+    - name: Reboot the server
+      ansible.builtin.reboot:
+        msg: "Reboot initiated by Ansible"
+        reboot_timeout: 900
+```
+
+This assumes your WSL2 hosts are in a dedicated inventory group:
+
+```yaml
+wsl:
+  hosts:
+    majorrig-wsl:
+      ansible_host: 100.98.47.29
+```
+
+Verify the targeting before running — the WSL host should be gone:
+
+```console
+$ ansible-playbook reboot.yml --limit fedora --list-hosts
+  play #1 (all:!wsl): Reboot servers
+    hosts (4):
+      majorhome
+      majorlab
+      majordiscord
+      majormail
+```
+
+### Rebooting the WSL2 instance itself
+
+When you genuinely need to "reboot" WSL2, do it from the Windows side — not Ansible:
+
+```powershell
+wsl --shutdown
+```
+
+The distro relaunches on next access (next SSH login or `wsl` invocation). WSL2 stays in `update.yml` (dnf upgrades) and other package plays; it's only excluded from reboot and other server-specific roles.
+
+## Why not just fix the become timeout?
+
+You *could* raise `timeout` or tweak the become flow, but it doesn't address problem #1 — even a successful escalation would leave the reboot task hanging the full `reboot_timeout` because WSL2 never comes back the way the module expects. Excluding WSL from server lifecycle plays is the correct fix, not a workaround.
+
+## Related
+
+- [Ansible: ansible.cfg Ignored on WSL2 Windows Mounts](ansible-wsl2-world-writable-mount-ignores-cfg.md)
+- [Windows OpenSSH: WSL Default Shell Breaks Remote Commands](networking/windows-openssh-wsl-default-shell-breaks-remote-commands.md)
+- [Ansible: SSH Timeout During dnf upgrade on Fedora Hosts](ansible-ssh-timeout-dnf-upgrade.md)
+</content>
+</invoke>
--- a/05-troubleshooting/claude-code-keychain-prompt-recurring-macos.md
+++ b/05-troubleshooting/claude-code-keychain-prompt-recurring-macos.md
@ -0,0 +1,73 @@
+---
+title: "Claude Code Keychain Prompt Keeps Reappearing on macOS (ACL Invalidation)"
+domain: troubleshooting
+category: claude-code
+tags: [claude-code, authentication, oauth, keychain, macos, acl, security]
+status: published
+created: 2026-06-15
+updated: 2026-06-15
+---
+
+# Claude Code Keychain Prompt Keeps Reappearing on macOS (ACL Invalidation)
+
+## Symptom
+A macOS dialog repeatedly pops up:
+
+> **security wants to access key "Claude Code-credentials" in your keychain.**
+> To allow this, enter the "login" keychain password. — `[Always Allow] [Deny] [Allow]`
+
+The tell-tale sign: it **comes back even after clicking "Always Allow"** — the usual "trust forever" button doesn't make it stop. Login still works; it's the *permission prompt* that won't quiet down. This is **distinct** from [Claude Code won't log in](claude-code-warp-login-corrupt-keychain-credential.md), where the stored credential is corrupt and login itself fails.
+
+## Cause
+Claude Code stores its OAuth token in the macOS **login keychain** as `Claude Code-credentials`, read via `/usr/bin/security`. macOS binds an "Always Allow" grant (the keychain item's ACL) to the **code-signing identity** of the requesting binary. That grant is silently invalidated when:
+
+- **Claude Code updates** — the new binary's signature no longer matches the saved ACL. This is the most common trigger (see claude-code issues #48162, #9403).
+- **The credential item is recreated on token refresh** — wipes the ACL.
+- **Post-reboot keychain churn** — right after boot, the just-unlocked login keychain plus a concurrent token refresh can race ahead of the ACL settling, producing a *burst* of prompts that stops once a clean refresh completes.
+
+It is **not** a lock-timeout issue if `security show-keychain-info` reports `no-timeout` (below).
+
+## Triage (non-destructive — these do not trigger a prompt)
+```bash
+# Confirm the item exists (metadata only; no secret read)
+security find-generic-password -l "Claude Code-credentials" | grep -E "svce|acct"
+
+# Confirm the login keychain isn't auto-locking
+security show-keychain-info ~/Library/Keychains/login.keychain-db
+# -> "no-timeout" means it won't relock; so recurring prompts = ACL invalidation, not locking
+```
+
+## Fixes
+
+### One-off burst (e.g. right after a reboot)
+Click **Always Allow** (not Allow) once a clean token refresh has completed. With a `no-timeout` keychain the grant then holds, and the post-boot prompt storm usually self-clears within a minute. *Observed exactly this on MajorAir 2026-06-15 — a reboot triggered a burst that stopped on its own.*
+
+### Keeps returning after updates (durable) — reset the credential
+Deleting and re-creating the item rebinds a fresh ACL to the current binary. Costs one re-login.
+```bash
+security delete-generic-password -s "Claude Code-credentials"
+# then re-authenticate inside Claude Code: /login   (or relaunch `claude`)
+```
+
+### Bypass the keychain entirely (workaround)
+Claude Code falls back to `~/.claude/.credentials.json` in non-GUI contexts (SSH, tmux). On a local Mac this can be repurposed to stop keychain prompts for good:
+```bash
+# pipe straight to the file — never echo the token into a shared terminal
+security find-generic-password -s "Claude Code-credentials" -w > ~/.claude/.credentials.json
+chmod 600 ~/.claude/.credentials.json
+security delete-generic-password -s "Claude Code-credentials"
+```
+**Caveats:**
+- Token is then **plaintext at rest** (mode 600) instead of encrypted in the keychain.
+- A future Claude Code update may rewrite the keychain item.
+- GUI-session behaviour for the file fallback is **less documented** than the SSH/tmux case — **verify it holds for your setup before relying on it.**
+- Do **not** substitute `CLAUDE_CODE_OAUTH_TOKEN` — it is known to delete credentials on exit (issue #37512).
+
+## Notes
+- Same keychain item as the corrupt-credential login failure; if login itself breaks, see the related article.
+- Always redirect `-w` output straight to a file — never into a terminal whose scrollback feeds shared context.
+
+## Related
+- [Claude Code Won't Log In (Warp & iTerm2) — Corrupt Keychain Credential](claude-code-warp-login-corrupt-keychain-credential.md)
+- Config: `~/.claude.json`, login keychain item `Claude Code-credentials`
+- First observed: MajorAir, 2026-06-15 (post-reboot prompt burst; self-cleared)
--- a/05-troubleshooting/claude-code-warp-login-corrupt-keychain-credential.md
+++ b/05-troubleshooting/claude-code-warp-login-corrupt-keychain-credential.md
@ -61,5 +61,6 @@ Resolved on step 1+2 — login succeeded after deleting the corrupt Keychain ite
  If that errors with "Expecting value", the stored secret is empty/corrupt — delete and re-login.

 ## Related
+- [Claude Code Keychain Prompt Keeps Reappearing on macOS (ACL Invalidation)](claude-code-keychain-prompt-recurring-macos.md) — different symptom: login works but the permission prompt won't stop
 - Config: `~/.claude.json` (oauthAccount, userID), login Keychain item `Claude Code-credentials`
 - Other Claude Code note: `claude-mem-setting-sources-empty-arg.md`
--- a/05-troubleshooting/forgejo-mailer-and-cli-recovery.md
+++ b/05-troubleshooting/forgejo-mailer-and-cli-recovery.md
@ -0,0 +1,105 @@
+---
+title: "Forgejo: Account Recovery & CLI Admin When Locked Out of the GUI"
+domain: troubleshooting
+category: general
+tags: [forgejo, gitea, smtp, docker, account-recovery, self-hosting]
+status: published
+created: 2026-06-12
+updated: 2026-06-12
+---
+# Forgejo: Account Recovery & CLI Admin When Locked Out of the GUI
+
+Two related problems on a single-admin self-hosted **Forgejo** (or Gitea): the GUI *"Forgot password"* is disabled, and you can't log in to fix it. Here's how to (1) enable account recovery properly, and (2) recover from the command line when you're already locked out.
+
+## Symptoms
+
+- The *Forgot password* page shows: **"Account recovery is only available when email is set up. Please set up email to enable account recovery."**
+- You can't log in (wrong/forgotten password), so you can't add an SSH key or change settings in the GUI either.
+
+## Part 1 — Enable account recovery (configure the mailer)
+
+Account recovery needs SMTP. If you already run a mail server on your tailnet, relay through it — **no app password needed** when the Forgejo host is `mynetworks`-trusted by that mail server.
+
+Edit `app.ini` (in the data volume, e.g. `/data/gitea/conf/app.ini`):
+
+```ini
+[mailer]
+ENABLED = true
+PROTOCOL = smtp+starttls
+SMTP_ADDR = 100.x.y.z           ; mail server's tailnet IP
+SMTP_PORT = 587
+FROM = forgejo@example.com
+FORCE_TRUST_SERVER_CERT = true  ; required when connecting by IP (cert CN won't match)
+```
+
+Notes:
+
+- `FORCE_TRUST_SERVER_CERT = true` is needed when you target the relay by **IP** — the TLS cert is issued for a hostname, not the IP, so verification would otherwise fail. Acceptable on a trusted internal hop.
+- Omit `USER`/`PASSWD` if the relay accepts your host via `mynetworks` (no SASL). Otherwise add SMTP auth.
+- `app.ini` lives in the persistent volume, so the change **survives container re-creation** (e.g. Watchtower's nightly pull).
+
+Apply and verify:
+
+```bash
+docker restart forgejo
+docker logs forgejo 2>&1 | grep -i "Mail Service Enabled"   # confirms the mailer loaded
+```
+
+Test the SMTP path **before** trusting it (run from the host, mimicking Forgejo's connection):
+
+```bash
+python3 - <<'EOF'
+import smtplib, ssl
+ctx = ssl.create_default_context(); ctx.check_hostname = False; ctx.verify_mode = ssl.CERT_NONE
+s = smtplib.SMTP("100.x.y.z", 587, timeout=15)
+s.ehlo(); s.starttls(context=ctx); s.ehlo()
+s.sendmail("forgejo@example.com", ["you@example.com"],
+           "Subject: test\r\n\r\nForgejo relay path test")
+s.quit(); print("SENT_OK")
+EOF
+```
+
+`SENT_OK` means the relay accepted the message. `/user/forgot_password` should now show the reset form instead of the email error.
+
+> **Container can't reach the tailnet IP?** Docker bridge networks usually route to Tailscale via the host (SNAT to the host's tailnet IP). Confirm with:
+> `docker exec forgejo nc -w5 100.x.y.z 587 </dev/null && echo REACHABLE`
+
+## Part 2 — Recover from the CLI (already locked out)
+
+Forgejo's admin CLI runs inside the container as the git user (UID 1000) and needs no login.
+
+**Reset a password:**
+
+```bash
+docker exec -u 1000 forgejo forgejo admin user change-password -u <user> -p '<newpass>'
+```
+
+> ⚠️ **Gotcha:** `change-password` sets `must_change_password=true` by default. That **forces a change on next GUI login _and_ returns HTTP 403 on the API** (`"You must change your password"`). Clear it:
+> ```bash
+> docker exec -u 1000 forgejo forgejo admin user must-change-password --unset <user>
+> ```
+
+**Add an SSH key without the GUI** (basic-auth API — works only if 2FA is off):
+
+```bash
+curl -u <user>:'<pass>' -X POST -H 'Content-Type: application/json' \
+  -d '{"title":"laptop","key":"ssh-ed25519 AAAA... you@host"}' \
+  http://localhost:3004/api/v1/user/keys
+# HTTP 201 = created
+```
+
+Forgejo regenerates the git user's `authorized_keys` from the database, so `ssh -p <port> git@host` authenticates immediately afterward — no restart needed.
+
+## "The password keeps changing" — it (probably) isn't
+
+If a self-hosted Forgejo admin password *seems* to reset itself, a stock Forgejo container does **not** reset admin passwords. Rule out the server first:
+
+- the compose has **no** admin/password env and no custom entrypoint;
+- **no** cron, systemd timer, or script runs `forgejo admin user change-password`;
+- the data volume is persistent (re-creation keeps the DB, password included).
+
+If all three hold, nothing server-side is changing it — the "changing" password is a **client-side** artifact: a duplicate or stale entry in your password manager autofilling different values. Delete the duplicates and keep one.
+
+## See also
+
+- Forgejo — [Config Cheat Sheet → mailer](https://forgejo.org/docs/latest/admin/config-cheat-sheet/)
--- a/05-troubleshooting/index.md
+++ b/05-troubleshooting/index.md
@ -11,6 +11,7 @@ Practical fixes for common Linux, networking, and application problems.
 - [LoRA adapter — GGUF conversion fails with 'config.json not found'](gpu-display/lora-adapter-gguf-conversion-fails.md)

 ## 🌐 Networking & Web
+- [Wi-Fi Game Streaming Stutter: 160 MHz Channel Width Saturating the 5 GHz Radio](networking/wifi-160mhz-airtime-saturation-game-streaming.md)
 - [Apache Outage: Fail2ban Self-Ban + Missing iptables Rules](networking/fail2ban-self-ban-apache-outage.md)
 - [Mail Client Stops Receiving: Fail2ban IMAP Self-Ban](networking/fail2ban-imap-self-ban-mail-client.md)
 - [firewalld: Mail Ports Wiped After Reload](networking/firewalld-mail-ports-reset.md)
@ -18,6 +19,7 @@ Practical fixes for common Linux, networking, and application problems.
 - [Postfix header_checks Can't Act on Milter-Added Headers (Use Sieve)](networking/postfix-header-checks-vs-milter-headers.md)
 - [Dovecot Phantom Mailboxes from .dovecot.lda-dupes (mail_home Overlapping the Maildir Root)](networking/dovecot-mail-home-maildir-root-phantom-mailboxes.md)
 - [Tailscale SSH: Unexpected Re-Authentication Prompt](networking/tailscale-ssh-reauth-prompt.md)
+- [SSH Alias Falls Through to MagicDNS — Host-Key Verification Failure (No `Host` Block)](networking/ssh-missing-host-block-magicdns-host-key-failure.md)
 - [iOS Tailscale Clients Report HostName="localhost" — Breaks /etc/hosts Generators](networking/tailscale-status-json-hostname-localhost-ios.md)
 - [rsync over Tailscale: Hung in TCP Teardown After Transfer Completes](networking/rsync-tailscale-teardown-stall.md)
 - [Windows OpenSSH: WSL Default Shell Breaks Remote Commands](networking/windows-openssh-wsl-default-shell-breaks-remote-commands.md)
@ -31,6 +33,7 @@ Practical fixes for common Linux, networking, and application problems.
 - [Vault Password File Missing](ansible-vault-password-file-missing.md)
 - [ansible.cfg Ignored on WSL2 Windows Mounts](ansible-wsl2-world-writable-mount-ignores-cfg.md)
 - [regex_search — capture-group argument doesn't work in set_fact](ansible-regex-search-set-fact-capture-group.md)
+- [reboot.yml: become Timeout on WSL2 Hosts (Exclude Them)](ansible-reboot-become-timeout-wsl2.md)

 ## 📦 Docker & Systems
 - [Docker & Caddy Recovery After Reboot (Fedora + SELinux)](docker-caddy-selinux-post-reboot-recovery.md)
@ -49,9 +52,12 @@ Practical fixes for common Linux, networking, and application problems.
 ## 📝 Application Specific
 - [Obsidian Vault Recovery — Loading Cache Hang](obsidian-cache-hang-recovery.md)
 - [Gemini CLI Manual Update](gemini-cli-manual-update.md)
+- [iPhone Mirroring Hangs on 'Connecting…' — AWDL Data Stall (27.0 Beta)](iphone-mirroring-connecting-hang-awdl-stall-beta.md)

 ## 🤖 AI / Local LLM
 - [Ollama Drops Off Tailscale When Mac Sleeps](ollama-macos-sleep-tailscale-disconnect.md)
 - [Ollama: `ollama run` with Piped Stdin Bypasses Chat Template + SYSTEM Prompt](ollama-chat-template-pipe-stdin-bypass.md)
 - [Windows OpenSSH Server (sshd) Stops After Reboot](networking/windows-sshd-stops-after-reboot.md)
 - [claude-mem Silently Fails with Claude Code 2.1+ (Empty `--setting-sources`)](claude-mem-setting-sources-empty-arg.md)
+- [Claude Code Won't Log In (Warp & iTerm2) — Corrupt Keychain Credential](claude-code-warp-login-corrupt-keychain-credential.md)
+- [Claude Code Keychain Prompt Keeps Reappearing on macOS (ACL Invalidation)](claude-code-keychain-prompt-recurring-macos.md)
--- a/05-troubleshooting/iphone-mirroring-connecting-hang-awdl-stall-beta.md
+++ b/05-troubleshooting/iphone-mirroring-connecting-hang-awdl-stall-beta.md
@ -2,14 +2,61 @@
 title: "iPhone Mirroring Hangs on 'Connecting…' — AWDL Data Stall (27.0 Beta)"
 domain: troubleshooting
 category: macos
-tags: [macos, iphone-mirroring, continuity, awdl, rapport, quic, tailscale, mullvad, beta]
+tags: [macos, iphone-mirroring, continuity, awdl, rapport, quic, tailscale, mullvad, beta, channel-validation, aimesh, quicktime, usb]
 status: published
 created: 2026-06-09
-updated: 2026-06-09
+updated: 2026-06-15
 ---

 # iPhone Mirroring Hangs on 'Connecting…' — AWDL Data Stall (27.0 Beta)

+## Update 2026‑06‑15 — REGRESSED; reproducibly stuck on "Connecting", and Tailscale was **not** the cure
+
+> **Correction to the 2026‑06‑14 "it WORKS" update below.** On 2026‑06‑15 iPhone Mirroring is **reproducibly stuck on "Connecting to iPhone 16 Pro"** on MajorAir again — with Tailscale `accept-routes` *still* `false`. So the accept‑routes change was **correlation, not the fix**: this is an **intermittent macOS 27.0 beta AWDL bug, independent of Tailscale**.
+>
+> **Tried this round — all failed to establish a session:** Tailscale `accept-routes=false` (already in place) · `sudo ifconfig awdl0 down/up` · **full Mac reboot** · cycling the iPhone's Wi‑Fi + Bluetooth.
+>
+> **Log signature:** `rapportd` resolves the phone's `_asquic._udp.local` endpoint and `_companion-link` registers (discovery *succeeds*), but the QUIC‑over‑AWDL **datapath never completes into a live session** — `wifip2pd` loops on `AWDLDiscoveryTimeout (hasAdvertises=false)`. Each reset advanced the handshake one stage further (no‑advertises → resolve‑started → endpoint‑resolved) yet none reached a streaming session. **`llw0` never went active (0 bytes)** — confirming no A/V ever flowed, regardless of what the 06‑14 note measured.
+>
+> **Stance:** beta OS bug, **no reliable user‑side fix**. Use the **QuickTime USB mirror** workaround (below) when you actually need the phone on screen. The 06‑14 "it works on `llw0`" measurements were real *for that one session* but are **not reproducible** across seeds/sessions — treat mirroring as intermittently broken on the 27.0 betas. This re‑confirms the original **Root cause (conclusion)** section further down (a beta bug, "nothing in local config wrong"), which the 06‑14 update had prematurely overridden.
+
+## Update 2026‑06‑14 (evening) — it WORKS; the "AWDL starvation" finding was the wrong interface
+
+> iPhone Mirroring is now **working** on MajorAir — stable session, clean video, no missing icons — on **ch44/80** with Tailscale `accept-routes=false`. An earlier pass the same day blamed an "AWDL bulk‑path starving at ~90 B/s"; that was **measuring the wrong interface** and is corrected here.
+
+**The video transport is `llw0` (low‑latency WLAN), not `awdl0`.**
+Measured during an active session: **`llw0` ≈ 800 KB/s** (≈6 Mbps of real video), `en0` ~60 KB/s, **`awdl0` ~1 KB/s**. `awdl0` only ever carries AWDL *discovery/control* (~90 B/s) — whether mirroring works or not. So "90 B/s on `awdl0` = starved bulk path" was a **red herring**: the A/V stream rides `llw0`, which the earlier pass never measured.
+
+**What was actually broken was session *stability*.** The `XPC_ERROR_CONNECTION_INTERRUPTED` / `MediaContinuityKit.TaskTimeoutError` teardown loop kept the `llw0` stream from ever sustaining (→ glitchy / missing icons). When the session holds, `llw0` streams clean.
+
+**What changed (not cleanly isolated):** three things differed between the broken and working states — (1) the network fully **settled on ch44** over ~15 h (the failing ch44 test was minutes after a chaotic AiMesh re‑sync + reconnect scramble), (2) Tailscale **`accept-routes` was turned off** (it had been polluting IPv4 routing + the Continuity control plane), and (3) both devices slept/woke. Which one mattered is not yet proven.
+
+**Open test — isolates Tailscale's role:** repro on **MajorMac** with *unaltered* Tailscale (`accept-routes` still **ON**). If mirroring breaks there but works on MajorAir (accept‑routes OFF), that pins Tailscale's accepted routes as the trigger. See [[MajorAir#Known Issues]] for the `accept-routes=false` fix.
+
+**Still valid from earlier today:** congestion ruled out (router `chanim_stats` ch36 = 90 % idle, 86 % txop); the AiMesh / router infra notes below; and iPhone Mirroring is **wireless‑only — no USB transport** (for a wired screen view, use QuickTime, below).
+
+> ⚠️ The iPhone‑radio `isValidChannel`/`awdl0` evidence cited in the original 2026‑06‑09 write‑up below describes AWDL *discovery* health, **not** the video path — read it in light of this correction.
+
+**Wired workaround (works today, no AWDL):**
+iPhone Mirroring is **wireless‑only — there is no USB transport** (confirmed: cable connected throughout, every attempt still used `awdl0`). For a wired view of the screen:
+> **QuickTime Player → File → New Movie Recording → ⌄ next to record → select the iPhone** = full‑rate USB‑C screen mirror (view + record). Does **not** give remote control (tap/type) — that's unique to iPhone Mirroring.
+
+**Infra notes (RT‑AX82U, AiMesh controller):**
+- Router SSH is on **port 1025** (not 22); creds in Ansible vault (`router_username` / `router_password`).
+- The 5 GHz channel is **AiMesh‑coordinated** and **resists CLI changes** — `wl chanspec` / nvram `wl1_chanspec` get re‑asserted by `acsd2` + AiMesh within seconds, even after `restart_wireless`. Only setting Control Channel to an **explicit value in the Web UI** holds mesh‑wide. Left "Auto" → acsd2 picks **36** (the cleanest channel).
+- Any channel change triggers a **mesh re‑sync (~1 min) that drops all Wi‑Fi**; during it MajorAir falls back to the iPhone's **USB Personal Hotspot** (`en7` / `172.20.10.x`) and won't auto‑rejoin home Wi‑Fi while the hotspot feeds it internet (manual Wi‑Fi‑menu join needed).
+- **Current state: 5 GHz on ch44/80** (same clean UNII‑1 spectrum as 36; left here to avoid another re‑sync — the Deck streams identically on 44).
+
+**If it breaks again — troubleshooting checklist:**
+1. **It's session stability, not bandwidth.** Look for teardown loops: `log show --last 3m --predicate 'process == "iPhone Mirroring"' | grep -iE "interrupt|timeout|endpoint"`.
+2. **Measure the right interface** — video rides **`llw0`** (hundreds of KB/s when the screen is active), *not* `awdl0` (~90 B/s control is normal): `netstat -ib | awk '/<Link#/{print $1, $7}'` before/after a few seconds.
+3. **Tailscale:** confirm `accept-routes=false` on the Mac (`tailscale debug prefs | grep RouteAll`) — see [[MajorAir#Known Issues]].
+4. **Let the network settle** after any Wi‑Fi/channel change — an AiMesh re‑sync churns AWDL/Continuity state for a minute+; retry once stable.
+5. iPhone: on home Wi‑Fi, near the Mac, **Personal Hotspot off**, not in Low Power Mode.
+6. **Wired fallback that always works:** QuickTime → New Movie Recording → select the iPhone (USB‑C; view/record only, no control).
+
+---
+
 ## Symptom
 iPhone Mirroring on the Mac sits on **"Connecting…"** forever and never shows the iPhone screen.
 - Mac: **macOS 27.0 dev beta** (build 26A5353q), MajorAir
--- a/05-troubleshooting/logwatch-wrong-hostname-after-migration.md
+++ b/05-troubleshooting/logwatch-wrong-hostname-after-migration.md
@ -0,0 +1,150 @@
+---
+title: "Logwatch Reports the Wrong Hostname (`<host>-hetzner`) After a Migration"
+domain: troubleshooting
+category: monitoring
+tags: [logwatch, hostname, hetzner, migration, monitoring, provisioning, fail2ban]
+status: published
+created: 2026-06-12
+updated: 2026-06-14
+---
+
+# Logwatch Reports the Wrong Hostname (`<host>-hetzner`) After a Migration
+
+## Symptom
+
+Daily Logwatch emails from a recently migrated server arrive titled with the
+provisioning label instead of the real hostname:
+
+```
+Logwatch for tttpod-hetzner (Linux)
+Logwatch for dcaprod-hetzner (Linux)
+```
+
+Everything else works — the report is generated, mailed, and delivered. Only the
+**name in the title is wrong**, which makes reports harder to scan and breaks any
+filter or rule that keys on the expected hostname.
+
+## Cause
+
+Logwatch titles each report with the box's **live system hostname**
+(`hostnamectl --static` / `/etc/hostname`) read at runtime — it does *not* keep
+its own copy of the name.
+
+Hetzner Cloud servers are provisioned with a temporary node label as the system
+hostname — `<host>-hetzner` (e.g. `tttpod-hetzner`). The migration runbook renames
+the **Tailscale node** back to the bare name and sets Postfix `myhostname`, but the
+**OS hostname** itself is easy to miss because nothing surfaces it day to day. It
+stays `<host>-hetzner` until something reads `hostname` — Logwatch is usually the
+first thing to do so, weeks later.
+
+Confirm the box is actually mislabelled:
+
+```bash
+ssh root@<host> 'hostnamectl --static; cat /etc/hostname; grep 127.0.1.1 /etc/hosts'
+# static: tttpod-hetzner
+# /etc/hostname: tttpod-hetzner
+# 127.0.1.1 tttpod-hetzner tttpod-hetzner
+```
+
+## Fix
+
+Set the real hostname and fix the matching `/etc/hosts` loopback line:
+
+```bash
+ssh root@<host> '
+  hostnamectl set-hostname <host>
+  sed -i "s/127.0.1.1.*/127.0.1.1 <host> <host>/" /etc/hosts
+  hostnamectl --static          # verify -> <host>
+'
+```
+
+That's it. **Logwatch has no hardcoded hostname override** — verify with:
+
+```bash
+grep -ri hostname /etc/logwatch/ /etc/cron.daily/0logwatch /etc/cron.daily/logwatch 2>/dev/null
+cat /etc/mailname 2>/dev/null
+```
+
+If those are empty (the normal case), Logwatch reads the live hostname on its next
+run, so the **next daily report self-corrects** — no service restart, no logwatch
+config change needed.
+
+> [!note] If `grep` *does* find a hostname pinned in `/etc/logwatch/conf/logwatch.conf`
+> (e.g. a `HostLimit`/`MailFrom` line baked in by Ansible), update it there too —
+> the override file wins over the live hostname.
+
+## Sweep the whole fleet
+
+This is a per-box provisioning leftover, so check every migrated host at once —
+more than one is usually affected:
+
+```bash
+for ip in 100.98.223.93 100.95.137.38 100.64.169.62 100.112.127.0 100.73.85.46; do
+  echo -n "$ip -> "
+  ssh -o ConnectTimeout=8 -o BatchMode=yes root@$ip 'hostnamectl --static' 2>/dev/null \
+    || echo '(unreachable)'
+done
+```
+
+Any value ending in `-hetzner` (or your provider's build label) needs the fix above.
+In the 2026-06 sweep, `tttpod` and `dcaprod` were still `*-hetzner` at the OS
+level; `majortoot`, `majormail`, and `majorlinux` had the correct system hostname
+— but see the variant below: `majormail`'s *configs* were still stale even though
+its hostname wasn't.
+
+## Variant: hostname is correct, but a config has the old name baked in
+
+A second, sneakier form of this drift: the **system hostname is already right**, so
+the sweep above passes and the Logwatch report *title* is correct — yet mail still
+arrives **from** `<host>-hetzner` because the old label is hardcoded in a service's
+`From`/`sender` field. These fields are static text, not derived from the live
+hostname, so fixing `hostnamectl` does nothing for them.
+
+Seen on `majormail` (2026-06-14): system hostname was `majormail`, but
+`Logwatch@majormail-hetzner...` was still the sender. Two configs held it:
+
+```bash
+# sweep a box for the old provisioning label in any send-related config
+ssh root@<host> 'grep -rsn "<host>-hetzner" /etc/logwatch/ /etc/fail2ban/ \
+  /etc/postfix/ /etc/aliases /etc/mailname 2>/dev/null'
+# /etc/logwatch/conf/logwatch.conf:MailFrom = Logwatch@<host>-hetzner.majorshouse.com
+# /etc/fail2ban/jail.local:sender         = fail2ban@<host>-hetzner.majorshouse.com
+```
+
+Fix in place (no restart needed for Logwatch; reload fail2ban for its change):
+
+```bash
+ssh root@<host> '
+  sed -i "s/<host>-hetzner/<host>/g" /etc/logwatch/conf/logwatch.conf /etc/fail2ban/jail.local
+  systemctl reload fail2ban
+'
+```
+
+> [!warning] Check the Ansible source, or it comes back
+> A live `sed` is undone by the next playbook run if the repo still carries the old
+> value. Distinguish two cases:
+> - **Templated** (safe): e.g. `logwatch.yml` sets `MailFrom = Logwatch@{{ inventory_hostname }}...`. If the inventory host is named correctly, a run *regenerates* the right value — it even self-heals a stale box.
+> - **Static file** (will regress): e.g. `roles/fail2ban/files/hosts/<host>/jail.local` with the literal `sender = ...@<host>-hetzner...`. Grep the repo (`grep -rn "<host>-hetzner" .`) and fix the file too, or every deploy re-pushes the stale sender.
+
+Inert backups (`jail.local.bak*`, `*~`) may still contain the old string — they
+don't send mail, so leave them.
+
+## Prevention
+
+Fold "set the system hostname" into the migration bootstrap so it never drifts:
+
+```bash
+hostnamectl set-hostname <host>
+sed -i "s/127.0.1.1.*/127.0.1.1 <host> <host>/" /etc/hosts
+```
+
+Do this in the **same step** that renames the Tailscale node and sets Postfix
+`myhostname` — all three read from the provisioning label and all three must be
+corrected together. See the
+[VPS Migration Baseline Checklist](../02-selfhosting/cloud/vps-migration-baseline-checklist.md).
+
+## Related
+
+- [Logwatch Fleet Setup — Surviving Package Upgrades](../02-selfhosting/monitoring/logwatch-fleet-setup.md) — the broader "logwatch went silent / wrong-source" class, including the Packer `myhostname` variant of this same drift
+- [VPS Migration Baseline Checklist](../02-selfhosting/cloud/vps-migration-baseline-checklist.md) — the full post-migration verification list
+- [Ansible UNREACHABLE: Host Key Verification Failed After a Host Rebuild or Migration](networking/ansible-host-key-verification-failed-rebuilt-host.md) — another IP/identity-drift gotcha from the same Hetzner migration
--- a/05-troubleshooting/macos-background-app-activity-audit-sfltool.md
+++ b/05-troubleshooting/macos-background-app-activity-audit-sfltool.md
@ -0,0 +1,154 @@
+---
+title: "Auditing & Cleaning macOS Background App Activity (sfltool dumpbtm)"
+domain: troubleshooting
+category: general
+tags: [macos, background-tasks, btm, sfltool, login-items, system-extensions, uninstall, little-snitch]
+status: published
+created: 2026-06-21
+updated: 2026-06-21
+---
+# Auditing & Cleaning macOS Background App Activity (`sfltool dumpbtm`)
+
+## Overview
+macOS tracks every login item, agent, daemon, helper, and extension that may run in the background in its **Background Task Management (BTM)** database. The GUI shows this under **System Settings → General → Login Items & Extensions** ("Allow in the Background"), but the GUI is summarised and hides paths, identifiers, and orphans.
+
+`sfltool dumpbtm` prints the full BTM database from the command line — and the per-user records need **no `sudo`**. This is the fastest way to answer "what is allowed to run in the background, and does each entry still map to an installed app?"
+
+## List what's registered
+
+```bash
+sfltool dumpbtm        # per-user records, no sudo required
+```
+
+Each record looks like:
+
+```
+Name: CleanMyMac Menu
+Type: login item (0x4)
+Disposition: [enabled, allowed, notified] (0xb)
+Identifier: 4.com.macpaw.CleanMyMac-mas.Menu
+URL: Contents/Library/LoginItems/CleanMyMac_5_MAS_Menu.app
+Bundle Identifier: com.macpaw.CleanMyMac-mas.Menu
+Parent Identifier: 2.com.macpaw.CleanMyMac-mas
+```
+
+### Reading the fields
+- **Disposition** — `enabled` = actively allowed to run in the background. `disabled` = present but off.
+- **Type** — what kind of item it is:
+
+| Type | Meaning |
+|---|---|
+| `app (0x2)` | A normal application entry |
+| `login item (0x4)` | Launches at login (menu-bar apps, helpers) |
+| `agent (0x8)` / `legacy agent` | Per-user background agent |
+| `legacy daemon (0x10010)` | System-wide background daemon |
+| `background tasks (0x2000)` | Abstract background-task registration owned by a parent app — **has no file path of its own** |
+| `developer (0x20)` | A per-developer grouping header (the collapsible row in Settings), **not an app** |
+| `quicklook` / `spotlight` / `dock tile` | Plugins/extensions — not really "background apps" |
+
+## Map entries to installed apps (find orphans)
+
+Two gotchas make naïve path-checking fail:
+
+1. **Absolute paths are stored as `file://` URLs**, not plain `/…`. Strip the `file://` prefix and URL-decode (`%20` → space).
+2. **Child items store a *relative* `URL`** (e.g. `Contents/Library/LoginItems/…`) that must be joined to the **parent record's** absolute path, found via `Parent Identifier`.
+
+A small parser that resolves each record to a real path and flags true orphans:
+
+```python
+import sys, re, os, urllib.parse
+items, cur = [], None
+def push():
+    global cur
+    if cur is not None: items.append(cur)
+for line in sys.stdin:
+    s = line.strip()
+    if re.match(r"^#\d+:$", s): push(); cur = {}; continue
+    if cur is None: continue
+    m = re.match(r"^([A-Za-z][A-Za-z /]+):\s*(.*)$", s)
+    if m: cur[m.group(1).strip()] = m.group(2).strip()
+push()
+byid = {it["Identifier"]: it for it in items if it.get("Identifier")}
+def abspath(it, d=0):
+    if d > 8: return None
+    u = it.get("URL", "")
+    if u and u != "(null)":
+        if u.startswith("file://"): return urllib.parse.unquote(u[7:]).rstrip("/")
+        if u.startswith("/"): return u.rstrip("/")
+        par = byid.get(it.get("Parent Identifier", ""))
+        if par:
+            b = abspath(par, d + 1)
+            if b: return os.path.join(b, urllib.parse.unquote(u)).rstrip("/")
+    return None
+for it in items:
+    if not it.get("Name"): continue
+    p = abspath(it)
+    if p and not os.path.exists(p):
+        print("ORPHAN:", it["Name"], "->", p)
+```
+
+```bash
+sfltool dumpbtm | python3 btm_check.py
+```
+
+> **Expected non-orphans:** `background tasks (0x2000)` and `developer (0x20)` rows legitimately store no path — they are not missing apps. Helpers/daemons that resolve *inside* a parent bundle (e.g. `/Applications/Foo.app/Contents/Library/LoginItems/…`) or in `/Library/…` are also fine; they just don't appear as a top-level `.app`. That is usually why an entry "has no application you can find."
+
+## Disable background for an app
+
+This **cannot be scripted** — Apple deliberately gates the toggle behind the GUI:
+
+**System Settings → General → Login Items & Extensions → "Allow in the Background"** → switch the app off.
+
+Disabling a `developer (0x20)` grouping header turns off all of that developer's sub-items at once.
+
+## Uninstall cleanly — the system-extension trap
+
+**Dragging an app to the Trash is not a full uninstall.** Apps that install a **network/system extension** plus a privileged daemon (firewalls and VPNs especially — Little Snitch, Mullvad, etc.) leave their `/Library` daemon **still loaded and running** after the app is trashed. The BTM entry persists and the background service keeps working.
+
+### 1. Prefer the app's own uninstaller
+- **Bundled uninstall script** (Mullvad): runs cleanly, deactivates the system extension, resets the firewall.
+  ```bash
+  sudo "/Applications/Mullvad VPN.app/Contents/Resources/uninstall.sh"
+  ```
+- Some apps ship an uninstaller in their DMG or a CLI tool. **Note:** Little Snitch 6.x has **no DMG uninstaller and no `littlesnitch uninstall` subcommand** — manual removal is the supported route there.
+
+### 2. Check whether a system extension is still active
+```bash
+systemextensionsctl list
+```
+If the app's extension is **not** listed (only unrelated ones like Tailscale/Canon remain), the extension is already deactivated and a manual file removal is now complete and safe.
+
+### 3. Manual removal (when no uninstaller exists)
+Find every component first:
+```bash
+ls /Library/LaunchDaemons/<id>* /Library/LaunchAgents/<id>* 2>/dev/null
+ls -d "/Library/Application Support/<Vendor>" 2>/dev/null
+ls ~/Library/Preferences/<id>* 2>/dev/null
+```
+Then boot out the daemon and remove the files:
+```bash
+sudo launchctl bootout system /Library/LaunchDaemons/<id>.daemon.plist 2>/dev/null
+sudo rm -f /Library/LaunchDaemons/<id>.daemon.plist /Library/LaunchAgents/<id>.agent.plist
+sudo rm -rf "/Library/Application Support/<Vendor>" "$HOME/.Trash/<App>.app"
+rm -f ~/Library/Preferences/<id>*.plist     # user-owned, no sudo
+```
+
+> **Shared-container caution:** before deleting `~/Library/Group Containers/*`, check it isn't shared. Microsoft apps share `UBF8T346G9.com.microsoft.oneauth`, `…entrabroker`, and `…teams` across Office/Teams/RDP — delete only the app-specific container (e.g. `…com.microsoft.rdc`), never the shared auth ones.
+
+## Stale BTM "ghost" entries
+
+After a manual uninstall, `sfltool dumpbtm` may still list the removed app, pointing at now-deleted paths. These are harmless orphans (nothing left to load). **BTM reconciles them on the next reboot / login cycle** — a reboot also finalises any system-extension teardown.
+
+## Quick reference
+
+```bash
+sfltool dumpbtm                       # full per-user BTM dump (no sudo)
+sfltool dumpbtm | grep -A6 'Name:'    # browse records
+systemextensionsctl list              # active network/system extensions
+# Verify a removal:
+sfltool dumpbtm | grep -i <vendor>    # should be empty after a reboot
+```
+
+## See also
+- Apple gates "Allow in the Background" behind System Settings — there is no supported CLI toggle for BTM dispositions.
+- For VPN/firewall apps, always reach for the vendor uninstaller first; manual `rm` alone can leave a registered system extension behind.
--- a/05-troubleshooting/networking/ansible-host-key-verification-failed-rebuilt-host.md
+++ b/05-troubleshooting/networking/ansible-host-key-verification-failed-rebuilt-host.md
@ -0,0 +1,94 @@
+---
+title: "Ansible UNREACHABLE: Host Key Verification Failed After a Host Rebuild or Migration"
+domain: troubleshooting
+category: networking
+tags: [ansible, ssh, known-hosts, tailscale, host-key, migration]
+status: published
+created: 2026-06-12
+updated: 2026-06-12
+---
+
+# Ansible UNREACHABLE: Host Key Verification Failed After a Host Rebuild or Migration
+
+## Symptom
+
+A subset of hosts in an Ansible run fail at **Gathering Facts** while the rest succeed:
+
+```
+[ERROR]: Task failed: Data could not be sent to remote host "100.112.127.0".
+Make sure this host can be reached over ssh: Host key verification failed.
+fatal: [majormail]: UNREACHABLE! => {"unreachable": true, ...}
+```
+
+The failing hosts are exactly the ones that were recently **rebuilt or migrated** (new server, new OS install, or a cloud move that issued a new Tailscale IP). Hosts that were never rebuilt connect fine.
+
+Confusingly, **interactive `ssh root@<host>` works perfectly** for the same boxes — only Ansible fails.
+
+## Cause
+
+SSH stores each accepted host key in `~/.ssh/known_hosts` keyed by the **exact address you connected with**. A key accepted for `ssh root@tttpod` is saved under the hostname `tttpod`; it is *not* indexed under that node's IP.
+
+Ansible inventories almost always set `ansible_host` to a **literal IP** (here, the Tailscale `100.x.x.x` address). So Ansible's SSH lookup is by IP, finds no matching entry, and with `StrictHostKeyChecking=yes` (or `accept-new` already exhausted) it refuses the connection:
+
+```
+No ED25519 host key is known for 100.112.127.0 and you have requested strict checking.
+Host key verification failed.
+```
+
+The hostname-form and IP-form entries are independent. Fixing interactive SSH (e.g. converting aliases to MagicDNS names and re-accepting keys) does **nothing** for Ansible, because Ansible never uses the hostname.
+
+A rebuilt host also generates **brand-new host keys**, so any old IP-form entry would additionally be a mismatch — but the common case after a migration to a *new* IP is simply that no IP entry exists at all.
+
+## Diagnosis
+
+```bash
+# 1. Is there any known_hosts entry for the failing IP? (0 = none)
+ssh-keygen -F 100.112.127.0
+
+# 2. Reproduce the exact failure without an interactive prompt:
+ssh -o BatchMode=yes -o StrictHostKeyChecking=yes root@100.112.127.0 true
+# -> "Host key verification failed."  confirms the gap
+
+# 3. Confirm the inventory IP is actually the host's CURRENT address
+#    (guards against stale-IP drift, a separate problem):
+tailscale status | grep majormail
+ssh-keyscan -t ed25519 100.112.127.0 | ssh-keygen -lf -   # fingerprint it
+```
+
+If step 3 shows the inventory IP matches the live Tailscale node and the box answers `ssh-keyscan`, the only problem is the missing IP-form key.
+
+## Fix
+
+Add the **IP-form** host keys to the `known_hosts` of the user that runs Ansible. Back up first, scan over the tailnet, de-dup:
+
+```bash
+cp ~/.ssh/known_hosts ~/.ssh/known_hosts.bak.$(date +%Y%m%d)
+
+for ip in 100.98.223.93 100.112.127.0 100.73.85.46 100.95.137.38 100.76.51.16 100.64.169.62; do
+  ssh-keyscan -T 5 -t rsa,ecdsa,ed25519 "$ip" >> ~/.ssh/known_hosts
+done
+sort -u ~/.ssh/known_hosts -o ~/.ssh/known_hosts
+```
+
+Verify before re-running the playbook:
+
+```bash
+ansible <hosts> -m ping        # expect "pong" from each
+```
+
+### Why `ssh-keyscan` is safe here
+
+`ssh-keyscan` trusts whatever answers on the wire — normally a MITM risk. Over **Tailscale**, the connection rides WireGuard, which cryptographically authenticates the peer by its tailnet identity: reaching `100.x.x.x` *guarantees* you are talking to the node that owns that tailnet address. Scanning and trusting the key over the tailnet is therefore as trustworthy as the tailnet itself. Always cross-check the IP against `tailscale status` first (step 3) so you scan the right node.
+
+## Prevention
+
+- **Per-workstation, not fleet-wide.** `known_hosts` is local to each machine + user. After a migration, *every* host that runs Ansible (each workstation, plus any control node like `majorlab`) needs the IP keys added independently. Adding them on one Mac does not help the others.
+- **Sweep on every migration phase.** A rolling migration changes one node's IP at a time; fold the keyscan above into the post-cutover checklist so Ansible never breaks mid-rollout.
+- **Alternative — `accept-new`.** Setting `host_key_checking = False` in `ansible.cfg` (or `ANSIBLE_HOST_KEY_CHECKING=False`) sidesteps the prompt but trades away host-key verification entirely. Prefer the explicit keyscan: it keeps strict checking on for every *future* run while accepting the new key exactly once, under your control.
+
+## Related
+
+- SSH-Aliases — Fleet SSH access; the MagicDNS-vs-pinned-IP strategy and the Ansible-by-IP `known_hosts` note
+- Network Overview — Tailscale fleet inventory and current IPs
+- Hetzner-Migration-Status — the migration that triggered the fleet-wide IP churn
+- [[ssh-socket-tailscale-race-condition]] — a different "SSH unreachable after reboot" failure mode
--- a/05-troubleshooting/networking/ssh-missing-host-block-magicdns-host-key-failure.md
+++ b/05-troubleshooting/networking/ssh-missing-host-block-magicdns-host-key-failure.md
@ -0,0 +1,133 @@
+---
+title: "SSH Alias Falls Through to MagicDNS — Host-Key Verification Failure (No `Host` Block)"
+domain: selfhosting
+category: troubleshooting
+tags:
+  - ssh
+  - ssh-config
+  - tailscale
+  - magicdns
+  - known-hosts
+  - host-key
+  - troubleshooting
+status: published
+created: 2026-06-11
+updated: 2026-06-12
+---
+
+# SSH Alias Falls Through to MagicDNS — Host-Key Verification Failure (No `Host` Block)
+
+## The Problem
+
+You `ssh` to a host you've reached many times before, but now it dies before any
+auth happens:
+
+```
+$ ssh MyMac
+ssh_askpass: exec(/usr/libexec/openssh/ssh-askpass): No such file or directory
+Host key verification failed.
+```
+
+On a headless box (WSL, a server, a CI runner) there's no askpass binary, so the
+prompt can't even be shown — SSH just aborts. Connecting **by Tailscale IP** works
+fine:
+
+```
+$ ssh user@100.74.124.81      # works
+$ ssh MyMac                   # Host key verification failed
+```
+
+## Why It Happens
+
+There is **no `Host MyMac` block in `~/.ssh/config` at all** — and there never was.
+The connection only ever worked by IP, or interactively (where you clicked through
+the first-connect `yes` prompt without noticing).
+
+When no `Host` block matches, SSH uses the literal argument as the hostname. With
+Tailscale MagicDNS, `MyMac` (or `mymac`) resolves to the node — so the *connection*
+succeeds — but the host key it presents is checked against `known_hosts` under the
+name **`mymac`**, which has no entry. Meanwhile the key you actually trust is stored
+under the **IP**:
+
+```
+$ ssh-keygen -F 100.74.124.81      # found — line 67
+$ ssh-keygen -F mymac              # nothing
+```
+
+So strict host-key checking has nothing to match, tries to prompt to accept the
+"new" key, and on a headless host that prompt fails → `Host key verification failed`.
+
+Confirm there's no block (and that `ssh -G` is just echoing defaults):
+
+```
+$ ssh -G MyMac | grep -E '^(hostname|user|port) '
+hostname mymac          # lowercased literal — NOT an explicit HostName
+user youruser           # your local username default — not from a block
+port 22                 # default
+```
+
+If `hostname` equals the arg you typed (just lowercased) and `user` is your local
+login name, there is no matching `Host` block.
+
+## The Fix
+
+Add an explicit `Host` block that **pins the IP** that `known_hosts` already trusts.
+This matches the convention every other host in a Tailscale fleet should follow —
+pin the `100.x` address, not the MagicDNS name:
+
+```sshconfig
+Host MyMac mymac
+  HostName 100.74.124.81
+  User youruser
+  IdentityFile ~/.ssh/id_ed25519
+```
+
+> [!note] When pinning the IP is the *wrong* call
+> Pinning the IP is right while the host is **stable**. If the box gets migrated or
+> rebuilt — new Tailscale IP *and* new host key — the pin rots and `known_hosts`
+> mismatches. At that point switch to **MagicDNS names** so the alias self-heals. See
+> *[MagicDNS Names vs Pinned IPs for Tailscale SSH (After a Fleet Migration)](tailscale-ssh-magicdns-vs-pinned-ip-after-migration.md)*.
+
+Now `ssh MyMac` resolves to `100.74.124.81`, whose key is in `known_hosts`, and the
+check passes with no prompt. Verify non-interactively:
+
+```
+$ ssh -o BatchMode=yes MyMac 'hostname'
+mymac.majorlan
+```
+
+`BatchMode=yes` disables every prompt — if it returns the hostname cleanly, the key
+is trusted and a real key authenticated.
+
+**Don't over-pin the identity.** Run `ssh -v user@<IP> true` and check the
+`Will attempt key` / accepted-key lines first. A workstation often authenticates
+with the *default* `id_ed25519`, not a fleet key — if `id_ed25519_fleet` isn't even
+offered, don't put it in the block.
+
+## Cleanup: Stale `known_hosts` Cruft
+
+Drive-by `ssh` attempts leave junk entries like `mymac-2` (auto-suffixed names from
+old keys). They never match anything once you pin the IP. Purge them:
+
+```
+$ ssh-keygen -R mymac-2
+```
+
+## How to Diagnose This
+
+1. `ssh -o BatchMode=yes <alias> true` — if it fails with `Host key verification
+   failed` (not `Permission denied`), it's a host-key problem, not auth.
+2. `ssh -G <alias> | grep -E '^(hostname|user|port) '` — if `hostname` is just your
+   typed arg and there's no real `HostName`, there's no `Host` block.
+3. `ssh-keygen -F <name>` vs `ssh-keygen -F <ip>` — find which name actually holds
+   the trusted key. Pin whichever one `known_hosts` has (usually the IP).
+
+## Why This Gotcha Is Invisible
+
+It only surfaces on a host with **no askpass** (headless / WSL / cron). On a desktop,
+the first-connect prompt appears, you hit `yes`, an entry gets written under the
+MagicDNS name, and it "just works" — masking the fact that no `Host` block exists and
+the IP-keyed entry is the only durable trust. Move the same config to a headless box
+and the missing block becomes a hard failure. Related: SSH only applies `Host` blocks
+by **literal pattern match**, so connecting by IP also skips them — see *Ansible Fails
+with Permission Denied While `ssh <alias>` Works (Host Alias Bypass)*.
--- a/05-troubleshooting/networking/ssh-rotated-key-not-backfilled-authorized-keys.md
+++ b/05-troubleshooting/networking/ssh-rotated-key-not-backfilled-authorized-keys.md
@ -0,0 +1,160 @@
+---
+title: "SSH `Permission denied (publickey)` After Rotating a Key — Backfill Every `authorized_keys`"
+domain: selfhosting
+category: troubleshooting
+tags:
+  - ssh
+  - ssh-keys
+  - authorized-keys
+  - key-rotation
+  - publickey
+  - fleet
+  - troubleshooting
+status: published
+created: 2026-06-17
+updated: 2026-06-17
+---
+
+# SSH `Permission denied (publickey)` After Rotating a Key — Backfill Every `authorized_keys`
+
+## The Problem
+
+A host you've SSH'd into for months suddenly rejects you — but **only some hosts**, not all:
+
+```
+$ ssh root@host-a
+root@host-a: Permission denied (publickey).
+
+$ ssh root@host-b      # same key, same workstation — works fine
+host-b $
+```
+
+Nothing changed on the servers. The thing that changed is on **your** side: at some
+point the workstation's SSH key was **regenerated** (lost laptop, rebuild, a key file
+clobbered by a botched copy, a routine rotation). The new public key was pushed to a
+few hosts but never fanned out to the rest. Every host still holding only the *old*
+public key now rejects the new private key with `Permission denied (publickey)`.
+
+> The tell: it's `Permission denied (publickey)`, **not** `Host key verification
+> failed`. The former is an **authorization** failure (the server doesn't trust your
+> key); the latter is the server's key not matching your `known_hosts`. Different
+> problem — see *[SSH Alias Falls Through to MagicDNS — Host-Key Verification Failure](ssh-missing-host-block-magicdns-host-key-failure.md)*.
+
+## Why It Happens
+
+Public-key auth is **per-host**: the server only lets you in if your public key is a
+line in that host's `~/.ssh/authorized_keys`. There is no central directory — each
+host is its own island. So when you rotate a key, *every* host needs the new public
+key appended independently.
+
+It's easy to do this partially without noticing. You regenerate the key, then over the
+next hour you happen to SSH into three boxes and (re-)deploy the key there as part of
+other work. Those three now trust the new key. The other six don't — and you won't
+find out until weeks later when you reach for one of them.
+
+Confirm it's an authorization (key) failure and see which key is being offered:
+
+```
+$ ssh -v root@host-a 2>&1 | grep -E 'Offering|Authentications|Permission denied'
+debug1: Offering public key: /home/you/.ssh/id_ed25519 ED25519 SHA256:XeY1/N9qwB…
+debug1: Authentications that can continue: publickey
+root@host-a: Permission denied (publickey).
+```
+
+The server offered you nothing but `publickey`, you offered your current key, and it
+was refused → your key isn't in that host's `authorized_keys`.
+
+## Scope It First — Don't Fix One Host at a Time
+
+The host you noticed is rarely the only one. Sweep the whole fleet in one pass before
+touching anything, so you fix the real set, not just the squeaky wheel:
+
+```bash
+for h in host-a host-b host-c host-d host-e host-f; do
+  r=$(ssh -o BatchMode=yes -o ConnectTimeout=8 root@"$h" 'echo OK' 2>&1 | tail -1)
+  echo "$h: $r"
+done
+```
+
+`BatchMode=yes` suppresses password/passphrase prompts so a failure fails fast instead
+of hanging. Anything that doesn't print `OK` needs the backfill.
+
+## The Fix
+
+You need a **second, still-trusted** way onto each failing host to append the new key.
+Common transit options, best first:
+
+- **Another of your keys that still works** (e.g. a config-management / automation
+  user whose key is authorized fleet-wide, ideally with `sudo`).
+- **Another workstation** whose key those hosts still trust.
+- **The provider's web console / serial console** as a last resort.
+
+> [!warning] A jump host only helps if *it* can reach the target
+> "Bounce through a box that still trusts me" only works if that box's own key is in
+> the target's `authorized_keys`. A host can trust *your* key yet have no standing
+> trust to a third host (and hit its own `Host key verification failed` on the way).
+> Test the full two-hop path before relying on it.
+
+Using a fleet-wide automation user (`deploy`) with passwordless `sudo` as the transit,
+append the new key idempotently, with a backup, to every failing host:
+
+```bash
+PUBKEY=$(cat ~/.ssh/id_ed25519.pub)
+STAMP=$(date +%Y%m%d-%H%M%S)
+for h in host-a host-c host-e; do          # only the hosts that failed the sweep
+  ssh deploy@"$h" "sudo bash -s" <<EOF
+set -e
+F=/root/.ssh/authorized_keys
+mkdir -p /root/.ssh && touch "\$F"
+cp "\$F" "\$F.bak-$STAMP"                   # backup before any change
+grep -qF "$PUBKEY" "\$F" || printf '%s\n' "$PUBKEY" >> "\$F"   # append only if absent
+chmod 600 "\$F"
+EOF
+done
+```
+
+Three things that keep this safe:
+
+- **Append, never overwrite.** `>> "$F"` and the `grep -qF … ||` guard mean you add
+  one line and only if it's missing. Re-running is a no-op — never clobber an
+  `authorized_keys` with `>` or you'll lock out every *other* key on the box.
+- **Back up first.** The `.bak-<stamp>` copy is your undo.
+- **`chmod 600`.** SSH silently ignores an `authorized_keys` that's group/world
+  writable, which looks exactly like "the key didn't take."
+
+Then verify directly — not through the transit user:
+
+```bash
+for h in host-a host-c host-e; do
+  echo "$h: $(ssh -o BatchMode=yes root@"$h" 'echo OK' 2>&1 | tail -1)"
+done
+```
+
+All `OK` means the new key authenticates on its own.
+
+## Prevention
+
+- **Treat rotation as fleet-wide.** When a workstation key changes, the very next step
+  is to fan the new public key out to **every** host's `authorized_keys` in one pass —
+  not opportunistically as you happen to log in. A short `for` loop over the full host
+  list (or a config-management task — see below) closes the gap immediately.
+- **Manage `authorized_keys` declaratively.** An Ansible `ansible.posix.authorized_key`
+  task (or equivalent) that lists the *current* set of keys makes "who can log in" a
+  reviewed, version-controlled fact instead of an append-only pile that drifts per host.
+- **Keep the old key authorized until the new one is verified everywhere**, then remove
+  the stale line in a deliberate cleanup pass.
+
+## How to Diagnose This (Checklist)
+
+1. `ssh -o BatchMode=yes <host> true` → `Permission denied (publickey)` (auth), not
+   `Host key verification failed` (host key). Confirms which problem you have.
+2. `ssh -v <host> 2>&1 | grep Offering` → which private key is being offered, and its
+   fingerprint.
+3. Sweep the whole fleet with the `BatchMode` loop → get the **full** list of affected
+   hosts before fixing.
+4. Append the new public key (idempotent, backed up, `chmod 600`) via a still-trusted
+   transit path.
+5. Re-verify each host with a direct `BatchMode` login.
+
+Related: *[SSH Config & Key Management](../../01-linux/networking/ssh-config-key-management.md)*
+and *[SSH Hardening Across a Fleet with Ansible](../../02-selfhosting/security/ssh-hardening-ansible-fleet.md)*.
--- a/05-troubleshooting/networking/steam-deck-wifi-flapping-iwd-periodic-scan-rtw88.md
+++ b/05-troubleshooting/networking/steam-deck-wifi-flapping-iwd-periodic-scan-rtw88.md
@ -0,0 +1,133 @@
+---
+title: "Steam Deck Wi-Fi Flapping: IWD Periodic Scan + rtw88 Power Save"
+domain: troubleshooting
+category: networking
+tags: [wifi, steam-deck, steamos, iwd, networkmanager, rtw88, rtl8822ce, power-save, supplicant-disconnect, flapping]
+status: published
+created: 2026-06-19
+updated: 2026-06-19
+---
+
+# Steam Deck Wi-Fi Flapping: IWD Periodic Scan + rtw88 Power Save
+
+## 🛑 Problem
+
+An OG Steam Deck (LCD model, Realtek **RTL8822CE** on the `rtw88_8822ce` driver) kept "losing" Wi-Fi — it would connect, hold for around a minute, drop, then reconnect a second later, over and over. From the router side the device looked like it was constantly coming and going; from the couch it felt like the network "wouldn't stay connected."
+
+Crucially, **this was not a router problem.** The AP config was correct, RF was clean (strong signal, zero tx retries / beacon loss), and every other client on the network was rock-solid. The fault was entirely on the Deck.
+
+## 🔍 Diagnosis
+
+SteamOS uses **NetworkManager with the `iwd` backend** (not `wpa_supplicant`). That detail is the whole ballgame.
+
+### Step 1 — Confirm the flap and its cadence
+
+```bash
+# how many disconnects this boot?
+journalctl -b -u NetworkManager --no-pager | grep -c supplicant-disconnect
+# 50
+
+# when did they happen?
+journalctl -b -u NetworkManager --no-pager | grep supplicant-disconnect \
+  | awk '{print $1,$2,$3}' | tail
+# 10:20:52 · 10:21:54 · 10:22:57 · 10:24:00 · 10:25:03 · 10:26:05 · 10:27:08 ...
+```
+
+**~63 seconds between every drop.** A fixed, metronome-like interval is the tell — this is a *timer*, not RF noise. The NetworkManager log shows the pattern plainly:
+
+```
+activated -> failed (reason 'supplicant-disconnect')
+... -> activated         # reconnects ~1s later
+```
+
+### Step 2 — Prove the link is healthy *when it's up*
+
+```bash
+iw dev wlan0 station dump | grep -iE 'signal|bitrate|failed|retries|beacon loss'
+#   signal:    -65 dBm
+#   tx retries: 0
+#   tx failed:  0
+#   beacon loss: 0
+```
+
+Strong signal, zero retries, zero beacon loss — the association is clean while it lasts. So the drop is being *commanded*, not caused by a bad radio link.
+
+### Step 3 — Identify the chip and the backend
+
+```bash
+lspci -k | grep -A3 -iE 'network|wireless'
+#   Realtek RTL8822CE ... Kernel driver in use: rtw88_8822ce
+```
+
+The `~63s` interval is **IWD's default periodic background scan**. With no `/etc/iwd/main.conf` present, IWD scans on a timer even while connected, and on the `rtw88` driver that scan knocks the current association over — producing the `supplicant-disconnect` every minute.
+
+A secondary annoyance: `iw dev wlan0 get power_save` reported `on`, which showed up as wildly jittery LAN latency (8–69 ms to the gateway over Wi-Fi, where a healthy 5 GHz link is 2–10 ms).
+
+## ✅ Fix
+
+Two independent changes — the first stops the flap, the second smooths latency.
+
+### 1. Disable IWD's periodic scan (stops the flap)
+
+```bash
+sudo mkdir -p /etc/iwd
+printf '[Scan]\nDisablePeriodicScan=true\n' | sudo tee /etc/iwd/main.conf
+sudo systemctl restart iwd     # briefly drops Wi-Fi; NetworkManager auto-reconnects
+```
+
+Trade-off: with periodic scanning off, the Deck roams to a different/stronger AP (e.g. another AiMesh node) more lazily. Fine for a device that mostly sits in one spot.
+
+### 2. Disable Wi-Fi power save (kills the latency jitter)
+
+The obvious `nmcli connection modify <name> 802-11-wireless.powersave 2` **does not work under the IWD backend** — NetworkManager doesn't enforce that property when `iwd` is managing the radio. Use a dispatcher script instead, with a retry loop because `rtw88` won't accept the setting in the first instant after association on a cold boot:
+
+```bash
+sudo tee /etc/NetworkManager/dispatcher.d/90-wifi-powersave >/dev/null <<'SCRIPT'
+#!/bin/sh
+# Disable Wi-Fi power save on the wireless iface (retry: rtw88 may not accept it instantly on boot)
+case "$2" in
+  up|dhcp4-change|connectivity-change)
+    case "$1" in
+      wl*)
+        for n in 1 2 3 4 5; do
+          /usr/bin/iw dev "$1" set power_save off 2>/dev/null
+          [ "$(/usr/bin/iw dev "$1" get power_save 2>/dev/null)" = "Power save: off" ] && break
+          sleep 1
+        done
+      ;;
+    esac
+  ;;
+esac
+SCRIPT
+sudo chmod +x /etc/NetworkManager/dispatcher.d/90-wifi-powersave
+sudo iw dev wlan0 set power_save off    # apply now without waiting for a reconnect
+```
+
+> 💡 A single-shot dispatcher (no retry) **silently fails on a cold boot** — it fires before the interface is ready, the `iw` call no-ops, and power save stays on. Verify with `iw get power_save` *after a real reboot*, not just after a service restart.
+
+## 🔁 Verification
+
+```bash
+# was 50/boot, ~once a minute:
+journalctl -b -u NetworkManager --no-pager | grep -c supplicant-disconnect
+# 0
+iw dev wlan0 get power_save
+# Power save: off
+```
+
+A 3-minute continuous `ping` showed **180/180 replies, 0 loss**, latency tightened to **6–11 ms**. Confirmed across a full cold reboot: the Deck auto-rejoins Wi-Fi, both settings persist, and the disconnect counter stays at 0.
+
+## 📌 Notes
+
+- **Persistence:** `/etc/iwd/main.conf` and the dispatcher live in `/etc`, which survives reboots. A major SteamOS update *can* reset `/etc` — re-apply if the flapping returns after an OS update.
+- **Fully reversible:**
+  ```bash
+  sudo rm /etc/iwd/main.conf /etc/NetworkManager/dispatcher.d/90-wifi-powersave
+  sudo systemctl restart iwd
+  ```
+- **Interface name** is usually `wlan0`; confirm with `iw dev` if different.
+- The same IWD-periodic-scan behavior can affect other `iwd`-based distros (Arch, some Fedora spins) on flaky/older Wi-Fi chips — the `DisablePeriodicScan` fix is general, not Deck-specific.
+
+## 🔗 Related
+
+- [Wi-Fi Game Streaming Stutter: 160 MHz Channel Width Saturating the 5 GHz Radio](wifi-160mhz-airtime-saturation-game-streaming.md) — the *other* Steam Deck Wi-Fi issue (airtime contention, router-side), distinct from this client-side flap.
--- a/05-troubleshooting/networking/tailscale-ssh-magicdns-vs-pinned-ip-after-migration.md
+++ b/05-troubleshooting/networking/tailscale-ssh-magicdns-vs-pinned-ip-after-migration.md
@ -0,0 +1,163 @@
+---
+title: "MagicDNS Names vs Pinned IPs for Tailscale SSH (After a Fleet Migration)"
+domain: troubleshooting
+category: networking
+tags:
+  - ssh
+  - ssh-config
+  - tailscale
+  - magicdns
+  - known-hosts
+  - host-key
+  - migration
+  - wsl2
+status: published
+created: 2026-06-12
+updated: 2026-06-12
+---
+
+# MagicDNS Names vs Pinned IPs for Tailscale SSH (After a Fleet Migration)
+
+You have SSH aliases for a Tailscale fleet (`alias tttpod='ssh root@100.84.42.102'`).
+They worked for months. Then you migrate or rebuild some nodes — and now a third of
+them hang on connect or refuse the host key. This is the failure mode that hardcoded
+addresses hit, and why the durable answer is **MagicDNS names**, not pinned IPs.
+
+> This is the sequel to *[SSH Alias Falls Through to MagicDNS — Host-Key Verification
+> Failure (No `Host` Block)](ssh-missing-host-block-magicdns-host-key-failure.md)*.
+> That article says **pin the IP** `known_hosts` already trusts — correct when the
+> node is stable. This one covers what happens when a migration changes the IP *and*
+> the host key, which is exactly when IP-pinning stops paying off.
+
+## The Three Failure Modes
+
+A migration/rebuild can trigger any of these — often several at once across a fleet,
+which is what makes it confusing:
+
+### 1. Stale hardcoded IP → connection times out
+
+The node re-registered on the tailnet with a **new** Tailscale IP, but your alias
+still names the old one:
+
+```
+$ tttpod
+ssh: connect to host 100.84.42.102 port 22: Operation timed out
+```
+
+The old address is dead; SSH waits the full timeout and gives up. Confirm by asking
+the tailnet for the node's *current* IP by name:
+
+```
+$ tailscale status | grep tttpod
+100.95.137.38   tttpod   ...     # alias points at 100.84.42.102 — stale
+```
+
+### 2. Cold-path teardown → first connect after idle times out
+
+The IP is correct and the node is up (it answers `ping`), but TCP/22 still times out
+on the *first* try after a quiet period, then works on retry. Tailscale 1.98.x is more
+aggressive about tearing down **idle direct UDP paths**; the first SSH has to
+re-establish NAT traversal, which can overrun SSH's default connect timeout.
+
+```
+$ tailscale status | grep tttpod
+100.95.137.38   tttpod   ...   idle, tx 9360 rx 0      # cold path
+$ tailscale ping tttpod
+pong from tttpod (100.95.137.38) via 5.161.118.84:41641 in 48ms   # warms instantly
+```
+
+### 3. Host-key verification failed → box was rebuilt
+
+The node was reinstalled, so it presents a **new** SSH host key. Your `known_hosts`
+still has the old one, so even `StrictHostKeyChecking=accept-new` aborts — `accept-new`
+only adds *genuinely new* hosts, it refuses a **mismatch**:
+
+```
+$ ssh root@tttpod hostname
+Host key verification failed.
+```
+
+## The Fix
+
+Three changes, applied on every **name-capable** machine (see the WSL2 caveat below):
+
+### a. Switch aliases from IPs to MagicDNS names
+
+```bash
+# before — rots on every migration
+alias tttpod='ssh root@100.84.42.102'
+# after — always resolves the node's current IP
+alias tttpod='ssh root@tttpod'
+```
+
+MagicDNS resolves the name to whatever IP the node currently has, so a future
+migration needs **zero** alias edits. This is the whole point: the tailnet already
+knows the mapping — stop duplicating (and stale-ing) it in your dotfiles.
+
+> **Exception:** if there's no tailnet device with that exact name (e.g. an alias
+> `teelia` pointing at a node actually named `temptedparadise`), MagicDNS can't
+> resolve it — keep the IP for that one.
+
+### b. Purge stale host keys, then re-accept
+
+After a rebuild, clear the old entries under **both** the name and the current IP,
+then reconnect with `accept-new` to record the fresh key. Over Tailscale's
+authenticated WireGuard tunnel, a key change from a known rebuild is safe to accept.
+
+```bash
+for pair in "tttpod:100.95.137.38" "majortoot:100.64.169.62" "dcaprod:100.98.223.93"; do
+  n="${pair%%:*}"; ip="${pair##*:}"
+  ssh-keygen -R "$n"; ssh-keygen -R "$ip"
+done
+# repopulate
+ssh -o StrictHostKeyChecking=accept-new root@tttpod hostname
+```
+
+### c. Add a cold-path cushion to `~/.ssh/config`
+
+Give the first (cold) connection time to renegotiate instead of erroring:
+
+```sshconfig
+Host majorlinux tttpod majortoot majordiscord dcaprod majormail majorhome
+    ConnectTimeout 25
+    ServerAliveInterval 30
+    ServerAliveCountMax 4
+```
+
+`ConnectTimeout 25` turns the cold-path timeout into a ~1–2 s pause. The keepalives
+hold the path open during an active session so it doesn't drop mid-command.
+
+## Caveat: WSL2 Can't Use MagicDNS
+
+A Linux box under **WSL2** typically has **no `tailscale` CLI and no MagicDNS
+resolver** — it rides the Windows host's networking, and name lookups for tailnet
+nodes fail:
+
+```
+$ getent hosts tttpod        # (inside WSL2)
+                             # nothing — no resolution
+$ command -v tailscale       # nothing — CLI lives on the Windows side
+```
+
+On those machines you **must** keep hardcoded IPs in `~/.ssh/config` (or use `Host`
+blocks with explicit `HostName <ip>`), and refresh them by hand when a node migrates.
+There's no self-healing option there — the trade is unavoidable.
+
+## Diagnosis Checklist
+
+1. `tailscale status | grep <host>` — does your alias's IP match the **current** one?
+   (Mode 1: stale IP.)
+2. `ping`/`tailscale ping <host>` works but TCP/22 times out on first try, succeeds on
+   retry? (Mode 2: cold path.)
+3. `ssh root@<host> true` → `Host key verification failed` (not `Permission denied`)?
+   (Mode 3: rebuilt box, stale `known_hosts`.)
+4. Is the client a WSL2 box? `getent hosts <name>` returns nothing → MagicDNS
+   unavailable, stay on IPs.
+
+## Takeaway
+
+Pin the IP when a host is **stable** and the IP-keyed `known_hosts` entry is your
+durable trust anchor. Switch to **MagicDNS names** when hosts **move** — migrations,
+rebuilds, provider changes — so the tailnet's own name→IP mapping does the work your
+dotfiles kept getting wrong. And on WSL2, you don't get the choice: hardcoded IPs,
+refreshed by hand.
--- a/05-troubleshooting/networking/wifi-160mhz-airtime-saturation-game-streaming.md
+++ b/05-troubleshooting/networking/wifi-160mhz-airtime-saturation-game-streaming.md
@ -0,0 +1,115 @@
+---
+title: "Wi-Fi Game Streaming Stutter: 160 MHz Channel Width Saturating the 5 GHz Radio"
+domain: troubleshooting
+category: networking
+tags: [wifi, 5ghz, 160mhz, channel-width, dfs, steam-deck, game-streaming, asuswrt, airtime, chanim]
+status: published
+created: 2026-06-13
+updated: 2026-06-13
+---
+
+# Wi-Fi Game Streaming Stutter: 160 MHz Channel Width Saturating the 5 GHz Radio
+
+## 🛑 Problem
+
+Streaming a game from a desktop (wired) to a Steam Deck over Wi-Fi was stuttering intermittently — fine for a while, then choppy, hard to reproduce on demand. Throughput tests "looked fine," which is exactly why it was hard to pin down: **game streaming fails on jitter and microbursts of contention, not on average bandwidth.**
+
+The Wi-Fi was an Asus RT-AX82U (AsusWRT, stock firmware) with the 5 GHz radio set to **Auto channel at 160 MHz width**.
+
+## 🔍 Diagnosis
+
+The key insight: **signal was excellent, but latency was not.** That combination means the airwaves are busy, not weak.
+
+### Step 1 — Measure jitter to the gateway from a Wi-Fi client
+
+```bash
+ping -c 20 -i 0.2 192.168.50.1
+# round-trip min/avg/max/stddev = 7.5/27.0/61.0/16.5 ms
+```
+
+27 ms **average** and 16 ms of jitter to your *own router* over Wi-Fi is pathological. A healthy 5 GHz link sits at 2–5 ms. Yet the client's signal was **-43 dBm** (excellent) with a clean **-92 dBm** noise floor. Strong signal + high jitter = **airtime contention**, not range or interference at the receiver.
+
+### Step 2 — Confirm channel utilization at the router
+
+AsusWRT/Broadcom exposes per-channel airtime stats via `wl chanim_stats`. SSH into the router and run it against the 5 GHz interface:
+
+```bash
+# 5 GHz interface name varies (eth6/eth7); resolve it from nvram
+IF=$(nvram get wl1_ifname)
+wl -i "$IF" chanspec        # e.g. 36/160 (0xe832)  → channel 36, 160 MHz
+wl -i "$IF" assoclist | wc -l   # number of associated 5 GHz clients
+wl -i "$IF" chanim_stats
+```
+
+The smoking gun (`chanim_stats`, version 3):
+
+```
+chanspec  tx  inbss obss nocat nopkt doze txop goodtx badtx glitch ... idle
+0xe832    92    2    1    2     1     0    4    8      81    2          14
+```
+
+Read it as percentages of airtime:
+
+| Field | Value | Meaning |
+|-------|-------|---------|
+| `tx` | **92** | Channel busy transmitting 92% of the time |
+| `txop` | **4** | Transmit-opportunities available only 4% — the channel is starved |
+| `idle` | **14** | Channel idle only 14% |
+| `goodtx` / `badtx` | 8 / **81** | Failed/retried transmits vastly outnumber good ones |
+
+Seventeen clients were associated to that one 5 GHz radio.
+
+### Step 3 — Understand why 160 MHz makes it worse
+
+A 160 MHz channel on the lower 5 GHz band spans channels **36–64**, which overlaps DFS sub-blocks. To stay clean it needs 160 MHz of *uncontended* spectrum — but in a dense RF environment (≈25 neighbor APs here, several on 5 GHz channels 48/52/100/132/153 that overlap or border the block), any one busy neighbor degrades the **entire** wide channel. 160 MHz also makes the radio **DFS-radar exposed**: a single radar detection forces a channel-switch with a 1 s+ blackout — a stream-killer.
+
+So 160 MHz buys a higher *peak* PHY rate that game streaming doesn't need, at the cost of the *stability* it absolutely does.
+
+## ✅ Fix
+
+Drop the 5 GHz radio to **80 MHz** and pin it to a **non-DFS** channel (UNII-1: 36/40/44/48 — no radar, no DFS blackouts).
+
+GUI: **Wireless → 5 GHz → Channel Bandwidth = 80 MHz**, **Control Channel = 36**, turn off "Auto."
+
+Or over SSH (`nvram` + `restart_wireless`):
+
+```bash
+nvram set wl1_bw_cap=7        # cap at 80 MHz (bitmask: 1=20, 3=40, 7=80, 15=160)
+nvram set wl1_chanspec=36/80  # channel 36 @ 80 MHz
+nvram set wl1_channel=36
+nvram commit
+service restart_wireless      # ~15-20s radio bounce, drops all clients briefly
+```
+
+> [!warning] `restart_wireless` drops every Wi-Fi client for 15–20 seconds. `nvram commit` runs *before* the restart, so the config persists even if your own SSH/Wi-Fi session drops.
+
+## 📊 Result
+
+Verified from both the router and a client after the radio came back:
+
+| Metric | Before (36/160) | After (36/80) |
+|--------|-----------------|---------------|
+| Channel tx-busy | 92% | **9%** |
+| Transmit-opportunity available | 4% | **79%** |
+| Channel idle | 14% | **87%** |
+| Failed tx (`badtx` vs `goodtx`) | 81 vs 8 | **1 vs 3** |
+| Gateway ping (avg / floor) | 27 ms / 7.5 ms | **9 ms / 2.7 ms** |
+| PHY peak rate | 1729 Mbps | 1200 Mbps |
+
+The PHY peak dropped (narrower channel) but that is irrelevant — Steam Remote Play wants ~30–50 Mbps with *consistent* airtime, which it now has. The stutter resolved.
+
+## 🧠 Takeaways
+
+- **Diagnose Wi-Fi streaming problems with jitter, not throughput.** A speed test can pass while a stream stutters. Ping your gateway and watch the stddev.
+- **Strong signal + high latency = airtime congestion.** Don't chase signal strength when RSSI is already good; look at channel utilization (`chanim_stats`).
+- **160 MHz is a trap in a dense RF environment.** Use 80 MHz for reliability; reserve 160 MHz for clean spectrum and short range.
+- **Prefer non-DFS channels (36–48) for anything latency-sensitive** — DFS radar events cause silent multi-second dropouts.
+- **Wire the *source*.** The streaming PC should be on Ethernet so the video only crosses the air once (AP → handheld). The handheld has to be Wi-Fi; the desktop doesn't.
+- **Isolate IoT on 2.4 GHz** (separate SSID) so it never competes for 5 GHz airtime with latency-sensitive clients.
+
+## Related
+
+- [Steam Deck Wi-Fi Flapping: IWD Periodic Scan + rtw88 Power Save](steam-deck-wifi-flapping-iwd-periodic-scan-rtw88.md) — the *other* Steam Deck Wi-Fi issue (client-side flap), distinct from this router-side airtime problem.
+- [Network Overview](../../02-selfhosting/dns-networking/network-overview.md)
+- [Wake-on-LAN via Router SSH](../../02-selfhosting/dns-networking/wake-on-lan-router-ssh.md)
+- [Pi-hole v6 Group Management — Per-Client DNS Rules](../../02-selfhosting/dns-networking/pihole-v6-group-management.md)
--- a/05-troubleshooting/time-machine-apfs-orphaned-previous-blocks-backup.md
+++ b/05-troubleshooting/time-machine-apfs-orphaned-previous-blocks-backup.md
@ -0,0 +1,120 @@
+---
+title: "Time Machine: Orphaned APFS .previous Folder Blocks All Backups"
+domain: troubleshooting
+category: general
+tags: [macos, time-machine, apfs, backup, fsck, disk-utility]
+status: published
+created: 2026-06-18
+updated: 2026-06-18
+---
+# Time Machine: Orphaned APFS `.previous` Folder Blocks All Backups
+
+## Overview
+On an APFS Time Machine destination, an interrupted backup can leave behind an orphaned staging folder named `<timestamp>.previous` (plus a matching, uncatalogued APFS snapshot). Every subsequent backup reads that folder during *FindingChanges*, hits a metadata-type mismatch, and aborts — so backups silently stop running. macOS shows only a generic "**Time Machine couldn't complete the backup … An unknown error occurred.**"
+
+The trap: because the orphan is **not in Time Machine's catalog** and the destination is OS-protected, every obvious removal tool (`rm`, `chmod`, `tmutil delete`, `diskutil deleteSnapshot`) refuses it. The clean fix is **First Aid (`fsck_apfs`)**, which has authority over the volume and clears the orphaned snapshot.
+
+## Symptoms
+- "Time Machine couldn't complete the backup to '<disk>' — An unknown error occurred."
+- Backups haven't run since around the time of an interrupted/cancelled backup.
+- The destination disk is mounted and has plenty of free space (not full, not disconnected).
+- `tmutil status` cycles through `Starting` / `FindingChanges` and never reaches `Copying`.
+
+## Root Cause
+`backupd` logs the real error on a loop (every ~15 s):
+
+```bash
+log show --predicate 'subsystem == "com.apple.TimeMachine"' --last 10m --style compact \
+  | grep -iE 'previous|error'
+```
+```
+[TMStructure] Expected SnapshotInProgressContainer metadata type but found APFSBackup
+  metadata type at URL '.../<disk>/2026-06-17-172230.previous/'
+```
+
+An earlier backup was interrupted mid-run. It left two orphans tied to that timestamp, **neither registered in Time Machine's backup catalog**:
+
+1. A staging directory `<timestamp>.previous` on the destination volume.
+2. A matching APFS snapshot `com.apple.TimeMachine.<timestamp>.backup`.
+
+Time Machine expects the staging folder to be a `SnapshotInProgressContainer` but finds completed-backup (`APFSBackup`) metadata, so it bails before copying anything.
+
+> **Ignore the surrounding log noise.** `com.apple.backupd.sandbox.xpc: connection invalid`, `Mountpoint '…' is still valid`, and `missingName` on `/System/Volumes/Data/home` are all normal on a healthy backup — flagged `E` but harmless. The only line that matters is the `SnapshotInProgressContainer` mismatch.
+
+## Diagnosis
+
+Confirm the disk is healthy (not the problem) and locate the orphan:
+
+```bash
+tmutil status                              # stuck in Starting/FindingChanges, never Copying
+df -h | grep -i "<disk-name>"              # mounted, plenty free
+diskutil apfs listSnapshots <diskNsN>      # note the highest/last snapshot timestamp
+```
+
+If `listSnapshots` shows a final snapshot whose timestamp matches the `.previous` folder in the error, that's the orphaned pair.
+
+## Why the Obvious Tools Fail
+
+Do **not** burn time trying to force the folder out — here's what each tool does and why it refuses:
+
+| Command | Result | Reason |
+|---|---|---|
+| `sudo rm -rf …/<ts>.previous` | `Operation not permitted` | TM applies a `group:everyone deny delete` ACL that overrides root. |
+| `sudo chmod -RN …/<ts>.previous` | runs for minutes, then fails | A `.previous` folder is a **full copy of the entire Mac filesystem**; `-R` walks the whole tree and can't clear ACLs on the SIP-`restricted` system files inside (`/usr/bin/sh`, frameworks, keymaps). `rm` then hits the same wall. |
+| `sudo tmutil delete -p …/<ts>.previous` | `Invalid deletion target (error 22)` | Not a registered backup. |
+| `sudo tmutil delete -t <timestamp>` | `error 2 (No such file)` | No catalog entry for that timestamp. |
+| `sudo diskutil apfs deleteSnapshot <diskNsN> -uuid <uuid>` | `Not a valid APFS Snapshot UUID` | TM-managed snapshot; diskutil won't remove it directly. |
+
+> **If you started a `chmod -R` and killed it:** the live system is unaffected — `chmod -R` does not follow symlinks out of the backup tree. Verify with `ls -lde ~/Desktop` (normal ACLs = untouched). Stop a runaway with `sudo pkill -f '<timestamp>.previous'`.
+
+## Fix — Run First Aid (`fsck_apfs`)
+
+First Aid runs with full authority over the volume and clears the orphaned snapshot, which defuses the `.previous` folder's metadata mismatch.
+
+```bash
+# 1. Stop the looping backup
+sudo tmutil stopbackup
+
+# 2. Verify the destination volume (live mode is fine; read-only check)
+sudo diskutil verifyVolume <diskNsN>
+#    or: Disk Utility → View → Show All Devices → select the TM volume → First Aid → Run
+```
+
+`verifyVolume` enumerates and validates every snapshot; the verify/remount cycle purges the orphaned in-progress snapshot. Expected result:
+
+```
+The volume <name> appears to be OK
+File system check exit code is 0
+```
+
+Confirm the orphan snapshot is gone (count drops by one; the matching timestamp no longer appears):
+
+```bash
+diskutil apfs listSnapshots <diskNsN>
+```
+
+Then restart and watch it succeed:
+
+```bash
+sudo tmutil startbackup --auto
+tmutil status      # should reach BackupPhase = Copying with no SnapshotInProgressContainer errors
+```
+
+If `verifyVolume` reports problems rather than "appears to be OK", run the repair (it must unmount the volume):
+
+```bash
+sudo diskutil repairVolume <diskNsN>
+```
+
+## Notes
+- The first backup after the fix is often a large catch-up (hundreds of GB) because the chain was broken — let it finish; it returns to quick hourly increments afterward.
+- The inert `<timestamp>.previous` **folder** may still sit on the volume after the fix. Time Machine now ignores it, so it's not blocking — but it consumes space. Removing it cleanly requires booting to **Recovery Mode**, `csrutil disable`, `rm -rf` the folder, then `csrutil enable` — only worth it to reclaim the space.
+- Time Machine identifies its destination by `DestinationID` (a UUID), not the volume name, so renaming the disk later is safe.
+- Interrupted backups are more likely on flaky USB-SATA bridge enclosures (e.g. some WD My Passport units) whose slow sleep/wake transitions can drop the drive mid-backup.
+
+## Tags
+`macos` `time-machine` `apfs` `backup` `fsck-apfs` `disk-utility` `snapshot` `first-aid`
+
+## See Also
+- [SnapRAID & MergerFS Storage Setup](../01-linux/storage/snapraid-mergerfs-setup.md)
+- MajorMac Incident Log (2026-06-18) — the originating incident
--- a/05-troubleshooting/wordpress-67-textdomain-just-in-time-notice.md
+++ b/05-troubleshooting/wordpress-67-textdomain-just-in-time-notice.md
@ -0,0 +1,193 @@
+---
+title: "WordPress 6.7 _load_textdomain_just_in_time Notice (Theme/Plugin Loads Translations Too Early)"
+domain: troubleshooting
+category: troubleshooting
+tags:
+  - wordpress
+  - wordpress-6.7
+  - php
+  - i18n
+  - textdomain
+  - theme
+  - mu-plugin
+  - deprecation
+  - troubleshooting
+status: published
+created: 2026-06-21
+updated: 2026-06-21
+---
+
+# WordPress 6.7 `_load_textdomain_just_in_time` Notice
+
+> **TL;DR** — WordPress 6.7 added a `doing_it_wrong` notice that fires when a translation function (`__()`, `_e()`, `esc_html__()`, …) is called for a text domain **before the `init` action**. It's almost always a theme or plugin registering nav menus / sidebars / labels on `after_setup_theme` (which runs before `init`). The notice is **debug-only and harmless** — translations still load via the just-in-time fallback. If the offending code is in your own (or an updatable) theme/plugin, fix it at the source by deferring to `init`. If it's a **non-updating or third-party** theme you don't want to hand-edit, suppress *only this one notice* with a `doing_it_wrong_trigger_error` filter in a tiny mu-plugin.
+
+---
+
+## Symptom
+
+With `WP_DEBUG` on (or in Query Monitor's PHP panel), you see:
+
+```
+Function _load_textdomain_just_in_time was called incorrectly.
+Translation loading for the <domain> domain was triggered too early.
+This is usually an indicator for some code in the plugin or theme running too early.
+Translations should be loaded at the init action or later.
+(This message was added in version 6.7.0.)
+
+_load_textdomain_just_in_time()  wp-includes/l10n.php
+get_translations_for_domain()    wp-includes/l10n.php
+translate()                      wp-includes/l10n.php
+__()                             wp-includes/l10n.php
+WordPress Core
+```
+
+The key fields are **the domain name** (e.g. `marstheme`, `woocommerce`, `astra`) and the fact that the stack bottoms out in **WordPress Core** via `__()` — that tells you *some* extension called a translation function, not that core is broken.
+
+## Why it happens (the WP 6.7 change)
+
+Before 6.7, WordPress silently "just-in-time" loaded a text domain the first time you translated a string in it. 6.7 kept the JIT loading but started **warning** when it's triggered before `init`, because:
+
+- Translations loaded before `init` can't be filtered/overridden by other plugins that hook `init`.
+- It signals the extension is doing setup work earlier than the WordPress lifecycle intends.
+
+The usual culprit is code on **`after_setup_theme`** (which fires *before* `init`) that translates a label inline, e.g.:
+
+```php
+function mytheme_setup() {
+    register_nav_menus( array(
+        'primary' => __( 'Primary Menu', 'mytheme' ),   // <-- translate call before init
+    ) );
+}
+add_action( 'after_setup_theme', 'mytheme_setup' );
+```
+
+> **Important:** explicitly calling `load_theme_textdomain()` / `load_plugin_textdomain()` early does **not** fix the notice, and as of WP 4.6+ themes on wordpress.org don't even need to call it. The notice is about the *translate call*, not about whether the domain was loaded. Moving only the `load_*_textdomain()` call around is a common dead-end (see the gotcha below).
+
+## Diagnostic chain
+
+### 1. Identify the domain and what owns it
+
+The notice names the domain. Find which theme/plugin uses it:
+
+```bash
+WPROOT=/var/www/html
+grep -rlw '<domain>' "$WPROOT/wp-content/themes" "$WPROOT/wp-content/plugins" 2>/dev/null
+
+# Which extension has the most references (i.e. owns the domain)?
+grep -rl '<domain>' "$WPROOT/wp-content/" 2>/dev/null \
+  | sed -E "s#$WPROOT/wp-content/(themes|plugins|mu-plugins)/([^/]+)/.*#\1/\2#" \
+  | sort | uniq -c | sort -rn | head
+```
+
+> **Watch for renamed/forked themes.** The domain often does **not** match the theme's folder name. A theme bought as "Mars" and re-slugged to `kappa` keeps `marstheme` as its text domain in all 40+ template files. So `wp theme list` shows `kappa` active while the notice says `marstheme` — they're the same thing.
+
+### 2. Confirm it's active and whether it can be updated
+
+```bash
+sudo -u www-data wp --path=$WPROOT theme list --fields=name,status,version,update
+sudo -u www-data wp --path=$WPROOT plugin list --fields=name,status,version,update
+```
+
+- `update available` → **update it first** (newest releases of most themes/plugins fixed this in late 2024/2025). That's the proper fix; the rest of this article is for when you can't.
+- `update none` on a **renamed/custom fork** → no upstream exists, so updating is impossible. Go to the suppression fix.
+
+### 3. Pin down the early call (optional)
+
+```bash
+grep -rn "__(\s*['\"].*['\"]\s*,\s*['\"]<domain>['\"]" \
+  "$WPROOT/wp-content/themes/<theme>" | head
+```
+
+Look for translate calls inside functions hooked to `after_setup_theme`, `setup_theme`, `plugins_loaded`, or run at file scope in `functions.php`.
+
+## The fix
+
+### Option A — fix it at the source (own / updatable code)
+
+Defer the translation. Either register the raw string and translate at render time, or move the registration to `init`:
+
+```php
+// Before: translated on after_setup_theme (too early)
+add_action( 'after_setup_theme', function () {
+    register_nav_menus( array( 'primary' => __( 'Primary Menu', 'mytheme' ) ) );
+} );
+
+// After: register the menu location on init, where translation is allowed
+add_action( 'init', function () {
+    register_nav_menus( array( 'primary' => __( 'Primary Menu', 'mytheme' ) ) );
+} );
+```
+
+Don't do this by editing a theme/plugin that receives updates — your change is wiped on the next update. Use Option B for those.
+
+### Option B — suppress just this notice (third-party / non-updating code)
+
+When the early call lives in a theme you don't control and can't update (a renamed commercial fork, an abandoned plugin), the clean, update-safe move is to silence **only** the `_load_textdomain_just_in_time` notice — not all `doing_it_wrong` output — via a must-use plugin.
+
+Create `wp-content/mu-plugins/fix-textdomain.php`:
+
+```php
+<?php
+/**
+ * Suppress the WP 6.7 "_load_textdomain_just_in_time was called incorrectly"
+ * notice for a theme/plugin that translates before init.
+ *
+ * Scope is intentionally narrow: only this one function is silenced, so other
+ * doing_it_wrong notices still surface. Translations still load via the JIT
+ * fallback, so nothing visible changes for visitors.
+ */
+add_filter( 'doing_it_wrong_trigger_error', function ( $trigger, $function_name ) {
+    return '_load_textdomain_just_in_time' === $function_name ? false : $trigger;
+}, 10, 2 );
+```
+
+`mu-plugins/` loads automatically (no activation, can't be deactivated from the admin), and runs early enough to register the filter before the notice fires.
+
+#### Verify
+
+```bash
+WPROOT=/var/www/html
+
+# 1. Syntax-check the mu-plugin
+php -l "$WPROOT/wp-content/mu-plugins/fix-textdomain.php"
+#    -> No syntax errors detected
+
+# 2. Confirm WP still boots and the filter is registered
+sudo -u www-data wp --path=$WPROOT eval \
+  'echo has_filter("doing_it_wrong_trigger_error") ? "filter set\n" : "MISSING\n";'
+
+# 3. Clear the debug log, trigger an early translate, confirm 0 new notices
+DBG="$WPROOT/wp-content/debug.log"
+[ -f "$DBG" ] && : > "$DBG"
+sudo -u www-data wp --path=$WPROOT eval '__("Primary Menu","<domain>");' >/dev/null 2>&1
+grep -c "<domain>" "$DBG" 2>/dev/null || echo 0
+#    -> 0
+```
+
+## Gotchas
+
+### The "load the textdomain earlier/later" dead-end
+
+A very common (wrong) first attempt is an mu-plugin that just calls `load_theme_textdomain()` on `plugins_loaded` or `after_setup_theme`:
+
+```php
+// DOES NOT FIX THE NOTICE
+add_action( 'plugins_loaded', function () {
+    load_theme_textdomain( 'mytheme', get_template_directory() . '/languages' );
+}, 0 );
+```
+
+`plugins_loaded` still runs **before `init`**, and — more importantly — the notice is triggered by the theme's own early `__()` call, not by whether you've loaded the domain. This code is dead weight. If you find one in place, replace it with the Option B filter rather than tweaking its hook/priority.
+
+### Don't blanket-suppress all deprecations
+
+Resist `error_reporting(E_ALL & ~E_DEPRECATED)` or returning `false` from `doing_it_wrong_trigger_error` unconditionally — that also hides genuinely useful warnings (a plugin breaking on a future PHP/WP bump). Scope the filter to the one `function_name`.
+
+### Renamed theme ⇒ domain ≠ folder
+
+Re-stating because it costs the most time: the domain in the notice can be the theme's *original* slug, not its current folder. Always `grep` for the domain to find the real owner before concluding "I don't even have that theme installed."
+
+## See also
+
+- [Patching PHP 8.4 Implicit-Nullable Deprecations in Vendor Packages](php-84-vendor-implicit-nullable-patch.md) — the other "harmless deprecation that floods logs" pattern on the WordPress fleet
+- [WordPress developer note: i18n improvements in 6.7](https://make.wordpress.org/core/2024/10/21/i18n-improvements-in-6-7/) — the canonical reference for this change
--- a/05-troubleshooting/yt-dlp-fedora-js-challenge.md
+++ b/05-troubleshooting/yt-dlp-fedora-js-challenge.md
@ -10,7 +10,7 @@ tags:
  - deno
 status: published
 created: 2026-04-02
-updated: 2026-04-30T05:21
+updated: 2026-06-16T18:35
 ---
 # yt-dlp YouTube JS Challenge Fix (Fedora)

@ -84,12 +84,43 @@ echo '--remote-components ejs:github' > ~/.config/yt-dlp/config

 ## Maintenance

-YouTube pushes extractor changes frequently. Keep yt-dlp current:
+YouTube pushes extractor changes frequently. Keep yt-dlp current.
+
+### Updating: the `-U` trap + avoid duplicate installs
+
+`yt-dlp -U` **does not work** when yt-dlp was installed via pip/PyPI — the PyPI build deliberately disables the self-updater:
+
+```
+ERROR: You installed yt-dlp with pip or using the wheel from PyPi; Use that to update
+```
+
+Update through pip instead. **Pick one install method and stick to it** — running both a user install and a system install leaves two copies that drift out of sync (one updates, the other stays stale and shadows it depending on `$PATH` / sudo).
+
+**Recommended — single user install (no sudo):**
+
+```bash
+pip3 install -U --user yt-dlp
+```
+
+This lives in `~/.local/bin/yt-dlp` and is first on a normal user's `$PATH`. Update it the same way; never use sudo.
+
+**Alternative — system-wide (Fedora, PEP 668):**

 ```bash
 sudo pip install -U yt-dlp --break-system-packages
 ```

+> Only use `--break-system-packages` if you intentionally want a root-owned copy in `/usr/local`. Do **not** mix it with a `--user` install.
+
+**Check for and remove a duplicate install:**
+
+```bash
+which -a yt-dlp            # more than one path = duplicate installs
+sudo pip3 uninstall -y yt-dlp   # removes the /usr/local (system) copy + its wrapper
+```
+
+> If installed via the standalone binary (not pip), `yt-dlp -U` is the correct updater.
+
 ---

 ## Known Limitations
--- a/SUMMARY.md
+++ b/SUMMARY.md
@ -1,6 +1,6 @@
 ---
 created: 2026-04-02T16:03
-updated: 2026-05-15T09:00
+updated: 2026-06-21T11:46
 ---
 * [Home](index.md)
 * [Linux & Sysadmin](01-linux/index.md)
@ -12,10 +12,12 @@ updated: 2026-05-15T09:00
    * [Bash Scripting Patterns](01-linux/shell-scripting/bash-scripting-patterns.md)
    * [SnapRAID & MergerFS Storage Setup](01-linux/storage/snapraid-mergerfs-setup.md)
    * [mdadm — Rebuilding a RAID Array After Reinstall](01-linux/storage/mdadm-raid-rebuild.md)
+    * [Growing an LVM Volume by Absorbing Another Disk](01-linux/storage/lvm-grow-volume-absorb-disk.md)
    * [Linux Distro Guide for Beginners](01-linux/distro-specific/linux-distro-guide-beginners.md)
    * [WSL2 Instance Migration to Fedora 43](01-linux/distro-specific/wsl2-instance-migration-fedora43.md)
    * [WSL2 Training Environment Rebuild](01-linux/distro-specific/wsl2-rebuild-fedora43-training-env.md)
    * [WSL2 Backup via PowerShell](01-linux/distro-specific/wsl2-backup-powershell.md)
+    * [WSL2 In-Place Upgrade to Fedora 44](01-linux/distro-specific/wsl2-fedora44-inplace-upgrade.md)
 * [Self-Hosting & Homelab](02-selfhosting/index.md)
    * [Self-Hosting Starter Guide](02-selfhosting/docker/self-hosting-starter-guide.md)
    * [Docker vs VMs for the Homelab](02-selfhosting/docker/docker-vs-vms-homelab.md)
@ -30,6 +32,7 @@ updated: 2026-05-15T09:00
    * [AWS S3 Cost Management](02-selfhosting/cloud/aws-s3-cost-management.md)
    * [VPS Migration Baseline Checklist](02-selfhosting/cloud/vps-migration-baseline-checklist.md)
    * [rsync Backup Patterns](02-selfhosting/storage-backup/rsync-backup-patterns.md)
+    * [Fleet Backups with restic + B2](02-selfhosting/storage-backup/restic-b2-fleet-backups.md)
    * [Tuning Netdata Web Log Alerts](02-selfhosting/monitoring/tuning-netdata-web-log-alerts.md)
    * [Tuning Netdata Docker Health Alarms](02-selfhosting/monitoring/netdata-docker-health-alarm-tuning.md)
    * [Deploying Netdata to a New Server](02-selfhosting/monitoring/netdata-new-server-setup.md)
@ -41,6 +44,7 @@ updated: 2026-05-15T09:00
    * [Mastodon Post-Install Hardening (Permissions + Account)](02-selfhosting/services/mastodon-post-install-hardening.md)
    * [Mastodon — The `--prune-profiles` Trap and How to Recover](02-selfhosting/services/mastodon-prune-profiles-trap.md)
    * [Mastodon on S3 — Silent Upload Failures (BucketOwnerEnforced/ACLs)](02-selfhosting/services/mastodon-s3-acl-upload-failures.md)
+    * [Mastodon — Triaging Crowdfunding / Mention-Spam Accounts](02-selfhosting/services/mastodon-mention-spam-crowdfunding.md)
    * [Ghost Email Configuration with Mailgun](02-selfhosting/services/ghost-smtp-mailgun-setup.md)
    * [Inbound Spam Filtering: spamass-milter + SpamAssassin Bayes](02-selfhosting/services/postfix-spamassassin-bayes-spam-filtering.md)
    * [Claude Code Remote Control — Mobile Access to a Persistent Host Session](02-selfhosting/services/claude-code-remote-control.md)
@ -56,6 +60,7 @@ updated: 2026-05-15T09:00
    * [Fail2ban Custom Jail: Nginx Bad Request Detection](02-selfhosting/security/fail2ban-nginx-bad-request-jail.md)
    * [Fail2ban Custom Jail: Apache Bad Request Detection](02-selfhosting/security/fail2ban-apache-bad-request-jail.md)
    * [SSH Hardening Fleet-Wide with Ansible](02-selfhosting/security/ssh-hardening-ansible-fleet.md)
+    * [Migrating Flat Ansible Playbooks to Roles (Safely)](02-selfhosting/security/ansible-flat-playbooks-to-roles.md)
    * [ClamAV Fleet Deployment with Ansible](02-selfhosting/security/clamav-fleet-deployment.md)
    * [Fail2Ban Digest Mode — Fleet-Wide Quiet Alerts](02-selfhosting/security/fail2ban-digest-mode-fleet.md)
    * [Apache CVE-2026-23918 — HTTP/2 Double Free Mitigation](02-selfhosting/security/apache-cve-2026-23918-http2-mitigation.md)
@ -76,6 +81,8 @@ updated: 2026-05-15T09:00
    * [HEVC Batch Re-Encode for Plex Using VAAPI (AMD GPU)](04-streaming/plex/hevc-vaapi-batch-encode.md)
    * [Plex Transcoding Troubleshooting](04-streaming/plex/plex-transcoding-troubleshooting.md)
 * [Troubleshooting](05-troubleshooting/index.md)
+    * [Wi-Fi Game Streaming Stutter: 160 MHz Channel Width Saturating the 5 GHz Radio](05-troubleshooting/networking/wifi-160mhz-airtime-saturation-game-streaming.md)
+    * [Steam Deck Wi-Fi Flapping: IWD Periodic Scan + rtw88 Power Save](05-troubleshooting/networking/steam-deck-wifi-flapping-iwd-periodic-scan-rtw88.md)
    * [Apache Outage: Fail2ban Self-Ban + Missing iptables Rules](05-troubleshooting/networking/fail2ban-self-ban-apache-outage.md)
    * [Postfix + SendGrid: TLS Handshake Failure (Port 465 vs 587)](05-troubleshooting/networking/postfix-sendgrid-tls-handshake-failure.md)
    * [Mail Client Stops Receiving: Fail2ban IMAP Self-Ban](05-troubleshooting/networking/fail2ban-imap-self-ban-mail-client.md)
@ -101,6 +108,7 @@ updated: 2026-05-15T09:00
    * [Gemini CLI Manual Update](05-troubleshooting/gemini-cli-manual-update.md)
    * [MajorWiki Setup & Publishing Pipeline](05-troubleshooting/majwiki-setup-and-pipeline.md)
    * [Gitea Actions Runner: Boot Race Condition Fix](05-troubleshooting/gitea-runner-boot-race-network-target.md)
+    * [Forgejo: Account Recovery & CLI Admin When Locked Out of the GUI](05-troubleshooting/forgejo-mailer-and-cli-recovery.md)
    * [Cron Heartbeat False Alarm: /var/run Cleared by Reboot](05-troubleshooting/cron-heartbeat-tmpfs-reboot-false-alarm.md)
    * [SELinux: Fixing Dovecot Mail Spool Context (/var/vmail)](05-troubleshooting/selinux-dovecot-vmail-context.md)
    * [SELinux: Wrong /etc/localtime Label Silently Breaks Timezone Changes](05-troubleshooting/selinux-localtime-label-breaks-timezone.md)
@ -111,11 +119,17 @@ updated: 2026-05-15T09:00
    * [Claude Desktop MCP Server Started via wsl.exe Sees Empty Environment (WSLENV)](05-troubleshooting/wsl-env-claude-desktop-mcp.md)
    * [Claude Desktop MCP Mass-Disconnect After Blocking SSH Reboot](05-troubleshooting/claude-desktop-mcp-mass-disconnect-blocking-reboot.md)
    * [Patching PHP 8.4 Implicit-Nullable Deprecations in Vendor Packages](05-troubleshooting/php-84-vendor-implicit-nullable-patch.md)
+    * [WordPress 6.7 `_load_textdomain_just_in_time` Notice (Translations Loaded Too Early)](05-troubleshooting/wordpress-67-textdomain-just-in-time-notice.md)
    * [Ollama Drops Off Tailscale When Mac Sleeps](05-troubleshooting/ollama-macos-sleep-tailscale-disconnect.md)
    * [Ollama: `ollama run` with Piped Stdin Bypasses Chat Template + SYSTEM Prompt](05-troubleshooting/ollama-chat-template-pipe-stdin-bypass.md)
+    * [Claude Code Won't Log In (Warp & iTerm2) — Corrupt Keychain Credential](05-troubleshooting/claude-code-warp-login-corrupt-keychain-credential.md)
+    * [Claude Code Keychain Prompt Keeps Reappearing on macOS (ACL Invalidation)](05-troubleshooting/claude-code-keychain-prompt-recurring-macos.md)
+    * [iPhone Mirroring Hangs on 'Connecting…' — AWDL Data Stall (27.0 Beta)](05-troubleshooting/iphone-mirroring-connecting-hang-awdl-stall-beta.md)
    * [rsync over Tailscale: Hung in TCP Teardown After Transfer Completes](05-troubleshooting/networking/rsync-tailscale-teardown-stall.md)
    * [iOS Tailscale Clients Report HostName="localhost" — Breaks /etc/hosts Generators](05-troubleshooting/networking/tailscale-status-json-hostname-localhost-ios.md)
    * [macOS: Repeating Alert Tone from Mirrored iPhone Notification](05-troubleshooting/macos-mirrored-notification-alert-loop.md)
+    * [Auditing & Cleaning macOS Background App Activity (sfltool dumpbtm)](05-troubleshooting/macos-background-app-activity-audit-sfltool.md)
+    * [Time Machine: Orphaned APFS `.previous` Folder Blocks All Backups](05-troubleshooting/time-machine-apfs-orphaned-previous-blocks-backup.md)
    * [OBS Studio: Stale Script Paths After Windows Profile Rename](05-troubleshooting/obs-stale-script-paths-after-windows-profile-rename.md)
    * [ClamAV CPU Spike: Safe Scheduling with nice/ionice](05-troubleshooting/security/clamscan-cpu-spike-nice-ionice.md)
    * [Logwatch Falsely Reports 'No freshclam updates' in ClamAV Daemon Mode](05-troubleshooting/security/freshclam-logwatch-false-no-updates.md)
@ -127,10 +141,16 @@ updated: 2026-05-15T09:00
    * [Ansible: SSH Timeout During dnf upgrade on Fedora Hosts](05-troubleshooting/ansible-ssh-timeout-dnf-upgrade.md)
    * [Ansible: regex_search Capture-Group Argument Fails in set_fact](05-troubleshooting/ansible-regex-search-set-fact-capture-group.md)
    * [Ansible: Ubuntu Reboot Detection Misses Kernel Upgrades](05-troubleshooting/ansible-ubuntu-reboot-detection-kernel-mismatch.md)
+    * [Ansible: reboot.yml become Timeout on WSL2 Hosts (Exclude Them)](05-troubleshooting/ansible-reboot-become-timeout-wsl2.md)
    * [Fedora Networking & Kernel Troubleshooting](05-troubleshooting/fedora-networking-kernel-recovery.md)
    * [Systemd Session Scope Fails at Login](05-troubleshooting/systemd/session-scope-failure-at-login.md)
    * [wget/curl: URLs with Special Characters Fail in Bash](05-troubleshooting/wget-url-special-characters.md)
    * [Ansible: Check Mode False Positives in Verify/Assert Tasks](05-troubleshooting/ansible-check-mode-false-positives.md)
    * [Ansible Fails with Permission Denied While `ssh <alias>` Works (Host Alias Bypass)](05-troubleshooting/ansible-ssh-host-alias-bypass.md)
+    * [SSH Alias Falls Through to MagicDNS — Host-Key Verification Failure (No `Host` Block)](05-troubleshooting/networking/ssh-missing-host-block-magicdns-host-key-failure.md)
+    * [MagicDNS Names vs Pinned IPs for Tailscale SSH (After a Fleet Migration)](05-troubleshooting/networking/tailscale-ssh-magicdns-vs-pinned-ip-after-migration.md)
+    * [`Permission denied (publickey)` After Rotating a Key — Backfill Every `authorized_keys`](05-troubleshooting/networking/ssh-rotated-key-not-backfilled-authorized-keys.md)
+    * [Ansible UNREACHABLE: Host Key Verification Failed After a Host Rebuild or Migration](05-troubleshooting/networking/ansible-host-key-verification-failed-rebuilt-host.md)
+    * [Logwatch Reports the Wrong Hostname (`<host>-hetzner`) After a Migration](05-troubleshooting/logwatch-wrong-hostname-after-migration.md)
    * [Ghost EmailAnalytics Lag Warning — What It Means and When to Worry](05-troubleshooting/ghost-emailanalytics-lag-warning.md)
    * [claude-mem: --setting-sources Empty Arg Bug (Claude Code 2.1.x)](05-troubleshooting/claude-mem-setting-sources-empty-arg.md)
Author	SHA1	Message	Date
MajorLinux	8d9bd34118	Merge branch 'code/majorrig/mastodon-mention-spam-wiki'	2026-06-22 13:50:21 -04:00
MajorLinux	2def4c6f30	wiki: add Mastodon crowdfunding/mention-spam triage runbook Runbook for telling broadcast fundraising solicitation from genuine mentions: signal checklist, SQL to investigate the account and its origin instance via nodeinfo, BlockService snippet, and a proportionate escalation ladder (mute -> block -> report -> domain-limit -> domain-block). Registered in SUMMARY.md and the self-hosting section index.	2026-06-22 13:49:35 -04:00
majorlinux	44c9d38b9f	Merge branch 'code/MajorAir/macos-btm-audit-wiki'	2026-06-21 13:01:34 -04:00
majorlinux	623f04720c	Add macOS guide: auditing & cleaning Background App Activity (sfltool dumpbtm)	2026-06-21 13:00:35 -04:00
majorlinux	69d60b7753	Merge branch 'code/MajorAir/restic-snapshot-group-gotcha'	2026-06-21 12:34:06 -04:00
majorlinux	c358e0dfea	restic runbook: document the snapshot-group-per-path-set gotcha Changing a host's restic_paths spawns a new snapshot group (restic groups by host+paths), so old and new path-sets each keep their own retention lineage. Surfaced while extending majorlab's backup scope.	2026-06-21 12:33:56 -04:00
majorlinux	a45ef55862	Merge branch 'code/MajorAir/wp-textdomain-wiki'	2026-06-21 11:44:52 -04:00
majorlinux	e767ebffcb	Add runbook: WordPress 6.7 _load_textdomain_just_in_time notice Covers the WP 6.7 doing_it_wrong notice fired when a theme/plugin translates before init (e.g. nav-menu labels on after_setup_theme). Documents source fix (defer to init) and the update-safe mu-plugin suppression via doing_it_wrong_trigger_error, plus the renamed-theme domain gotcha. Derived from the majorlinux.com kappa/marstheme triage.	2026-06-21 11:44:48 -04:00
MajorLinux	96db073b78	Add LVM volume-grow guide; publish iPhone Mirroring + Claude Code login fixes	2026-06-19 15:00:19 -04:00
majorlinux	cf5e35da1d	Merge branch 'code/majorair/steam-deck-wifi-flap-article'	2026-06-19 11:36:09 -04:00
majorlinux	cb90bb69a2	wiki: add Steam Deck Wi-Fi flapping runbook (IWD periodic scan + rtw88 power save) Client-side fix for OG Steam Deck (RTL8822CE/rtw88) flapping ~once a minute on SteamOS: disable IWD periodic scan + disable Wi-Fi power save via NM dispatcher. Cross-linked with the 160MHz airtime article; registered in SUMMARY.md nav.	2026-06-19 11:36:06 -04:00
majorlinux	4599ed607c	wiki: add restic + B2 fleet backups runbook Architecture, per-engine DB dump patterns, restore procedure, add-a-host, and gotchas (RESTIC_CACHE_DIR/$HOME, missing sqlite3, docker dump env vars, delete-capable B2 key). Linked in SUMMARY under storage-backup.	2026-06-19 10:05:16 -04:00
Marcus Summers	2bed2cbae3	Merge branch 'code/majormac/ansible-roles-migration-article'	2026-06-18 14:32:02 -04:00
Marcus Summers	ebdb28e9e2	Add wiki article: migrating flat Ansible playbooks to roles (capture-based reconciliation)	2026-06-18 14:31:46 -04:00
Marcus Summers	4fa5e33d93	Merge branch 'code/majormac/tm-orphaned-previous-article'	2026-06-18 10:09:45 -04:00
Marcus Summers	cfff75af1c	Add troubleshooting article: Time Machine orphaned APFS .previous blocks backups	2026-06-18 10:09:45 -04:00
Marcus Summers	06162273f7	Merge branch 'code/majormac/ssh-key-backfill-article'	2026-06-17 13:15:19 -04:00
Marcus Summers	e1767bc19e	Add troubleshooting article: Permission denied (publickey) after key rotation New 05-troubleshooting/networking article covering the per-host nature of authorized_keys: rotating a workstation SSH key requires backfilling the new pubkey to every host, or hosts holding only the old key reject it with Permission denied (publickey). Includes fleet-sweep diagnosis, idempotent backed-up backfill via a still-trusted transit user, and prevention. Wired into SUMMARY.md nav.	2026-06-17 13:14:41 -04:00
majorlinux	0d08e21ee4	Merge branch 'code/majorair/yt-dlp-update-docs'	2026-06-16 19:12:21 -04:00
majorlinux	2121d3ff1b	yt-dlp: document -U trap and avoid duplicate pip installs Add a Maintenance subsection covering why 'yt-dlp -U' fails on PyPI builds and how to update via pip, plus how to detect/remove a duplicate user+system install (the issue hit on majorhome 2026-06-16).	2026-06-16 19:12:06 -04:00
majorlinux	1d73b2defa	Merge branch 'code/majorair/keychain-prompt-wiki'	2026-06-15 20:12:21 -04:00
majorlinux	34d9ee42b1	Add wiki: Claude Code keychain prompt keeps reappearing on macOS New troubleshooting article for the recurring 'security wants to access Claude Code-credentials' prompt that persists even after Always Allow (ACL invalidation on binary-signature change / token refresh / post-boot churn). Covers triage, the reset-and-relogin fix, and the file-based credentials workaround with its plaintext tradeoff. Registered in SUMMARY + troubleshooting index; cross-linked with the corrupt-credential login-failure article (distinct symptom).	2026-06-15 20:12:11 -04:00
majorlinux	700ca95158	Merge branch 'code/majorair/iphone-mirroring-regression'	2026-06-15 19:58:24 -04:00
majorlinux	a5df9e4873	Correct iPhone Mirroring article: regressed on 27.0 beta, not a Tailscale fix 2026-06-15: mirroring is reproducibly stuck on Connecting again with Tailscale accept-routes still off, so the 06-14 it-works conclusion was wrong. _asquic endpoint resolves but the QUIC/AWDL datapath never completes; awdl0 bounce, full reboot, and phone radio cycle all failed. Reframed as an intermittent macOS 27.0 beta AWDL bug; QuickTime USB remains the workaround.	2026-06-15 19:58:20 -04:00
majorlinux	7703b963e1	Merge branch 'code/majorair/wiki-dummy-ip'	2026-06-15 19:26:58 -04:00
majorlinux	5050001909	Replace real majormail IP with documentation IP in logwatch example The postfix MX-lookup example hard-coded majormail's real public IP (stale DO address). Swap in an RFC 5737 documentation IP (203.0.113.10) so the published wiki doesn't expose a real fleet IP.	2026-06-15 19:26:49 -04:00
majorlinux	9085740fa3	Merge branch 'code/majorair/iphone-mirroring-llw0-correction'	2026-06-14 19:10:33 -04:00
majorlinux	75154ff80c	iPhone Mirroring: correct transport finding (video on llw0 not awdl0), it works on ch44, what-changed + MajorMac open test (2026-06-14 evening)	2026-06-14 19:10:06 -04:00
majorlinux	4c95f8a88a	Merge branch 'code/majorair/iphone-mirroring-doc-update'	2026-06-14 04:31:55 -04:00
majorlinux	805c0f0a8f	iPhone Mirroring AWDL article: refined root cause, Tailscale/congestion ruled out, ch36+ch44 both fail, QuickTime USB workaround, revisit checklist (2026-06-14)	2026-06-14 04:30:22 -04:00
majorlinux	e5d1e39af9	Merge branch 'code/majorair/wiki-stale-hostname-config-variant'	2026-06-14 04:00:25 -04:00
majorlinux	852375ddf0	logwatch-hostname wiki: add hostname-correct-but-config-baked variant majormail (2026-06-14) had the correct system hostname but still mailed from majormail-hetzner — the old provisioning label was hardcoded in logwatch.conf MailFrom and fail2ban jail.local sender. Add a variant section covering the config grep sweep and the templated-vs-static Ansible regression caveat.	2026-06-14 04:00:18 -04:00
majorlinux	9dd730fc29	Add nav entries for Warp keychain login + iPhone Mirroring AWDL articles	2026-06-13 09:58:26 -04:00
majorlinux	e0595c04fd	Publish drafts: Warp keychain login + iPhone Mirroring AWDL stall	2026-06-13 09:57:37 -04:00
MajorLinux	27ea2dc62b	Add troubleshooting article: Wi-Fi 160 MHz airtime saturation breaking game streaming	2026-06-13 09:48:43 -04:00
Marcus Summers	3f94ebb963	Merge branch 'code/majormac/wiki-forgejo-recovery'	2026-06-12 17:36:55 -04:00
Marcus Summers	14cc1ba4b8	wiki: Forgejo account recovery & CLI admin when locked out of the GUI Covers enabling the [mailer] for password recovery (relay via a tailnet mail server, no-auth/mynetworks, FORCE_TRUST_SERVER_CERT for IP targets), CLI password reset + the must-change-password=true gotcha, adding an SSH key via the basic-auth API when locked out, and ruling out a server-side cause for a 'changing' password.	2026-06-12 17:36:54 -04:00
Marcus Summers	fecae727d1	Merge branch 'code/majormac/logwatch-hostname-wiki'	2026-06-12 10:58:17 -04:00
Marcus Summers	0d1697c0d6	wiki: Logwatch wrong hostname (<host>-hetzner) after migration New troubleshooting runbook for Logwatch reports titled with the Hetzner provisioning label instead of the real hostname; cross-linked from the logwatch fleet-setup and VPS migration baseline articles, plus a new 'set system hostname' step in the post-migration checklist.	2026-06-12 10:58:17 -04:00
Marcus Summers	4f6898eb6c	Merge branch 'code/majormac/ansible-hostkey-wiki'	2026-06-12 09:32:00 -04:00
Marcus Summers	11b455a0e2	Add runbook: Ansible host-key verification failed after host rebuild/migration Documents the Ansible-by-IP known_hosts gap: interactive ssh works (key stored under hostname) but Ansible connects by inventory IP and fails with UNREACHABLE/Host key verification failed. Includes tailnet-safe ssh-keyscan fix and prevention notes. Surfaced by the Hetzner migration IP churn.	2026-06-12 09:30:09 -04:00
majorlinux	bc4ff144df	wiki: add Ansible reboot.yml become-timeout-on-WSL2 troubleshooting article Documents why WSL2 hosts fail an Ansible reboot play at privilege escalation (Timeout waiting for privilege escalation prompt) — WSL2 has no real reboot semantics + become stalls over the Windows OpenSSH->WSL2 bridge — and the fix: scope reboot.yml to hosts: all:!wsl. Registered in SUMMARY.md and 05-troubleshooting/index.md.	2026-06-12 03:57:17 -04:00
majorlinux	950759da52	wiki: add MagicDNS-names-vs-pinned-IPs Tailscale SSH article New troubleshooting/networking article covering the three SSH failure modes after a fleet migration (stale hardcoded IP, Tailscale 1.98.x cold-path teardown, rebuilt-box host-key mismatch) and the durable fix (MagicDNS names + known_hosts purge + ConnectTimeout), with the WSL2 no-resolver caveat. Cross-links the existing host-key article (adds a 'when pinning the IP is wrong' callout) and adds the SUMMARY nav entry.	2026-06-12 01:33:31 -04:00
MajorLinux	877c4b815f	wiki: add WSL2 Fedora 44 in-place upgrade article (gcc14 blocker + CUDA repo swap)	2026-06-11 22:48:55 -04:00
MajorLinux	27b1ae244c	Merge branch 'code/majorrig/wiki-hevc-already-failed-skip'	2026-06-11 20:16:21 -04:00
MajorLinux	ce2e761d33	hevc-vaapi-batch-encode: add already_failed() skip for streaming content Document that VAAPI HEVC on Polaris can't beat already-efficient H.264 (YouTube/ Twitch/stream archives), so output comes out larger and lands in hevc_failed.txt. Add already_failed() guard so the batch skips known-bad files on queue rebuilds instead of re-attempting them. Also: MIN_FREE_GB note (start-only check) and a source-bitrate triage snippet for picking real encode candidates.	2026-06-11 20:16:19 -04:00
MajorLinux	513d94aa84	Merge branch 'code/majorrig/wiki-ssh-magicdns-article'	2026-06-11 20:12:34 -04:00
MajorLinux	9b066d0e54	Add troubleshooting article: SSH alias MagicDNS fall-through host-key failure New 05-troubleshooting/networking article covering the case where ssh <alias> fails host-key verification because no Host block exists and the alias resolves via Tailscale MagicDNS to a name with no known_hosts entry (key stored under the IP). Registered in SUMMARY.md and the troubleshooting index.	2026-06-11 20:12:22 -04:00