wiki: expand Tailscale race condition article with network-online race

Added Race 2: tailscaled starts before network-online.target, causing Tailscale to get stuck with SetNetworkUp(false). Covers both Ubuntu ssh.socket and cross-platform tailscaled ordering issues. Updated references to include majordiscord incident and new Ansible playbook.
Merge cowork/majorair/ssh-socket-wiki: ssh.socket Tailscale race condition article
2026-05-19 20:39:18 -04:00 · 2026-05-19 19:36:19 -04:00 · 2026-05-19 19:35:16 -04:00 · 2026-05-15 09:02:24 -04:00 · 2026-05-15 03:22:12 -04:00 · 2026-05-13 10:36:06 -04:00
10 changed files with 834 additions and 7 deletions
--- a/02-selfhosting/cloud/vps-migration-baseline-checklist.md
+++ b/02-selfhosting/cloud/vps-migration-baseline-checklist.md
@ -0,0 +1,97 @@
 ---
 title: VPS Migration Baseline Checklist
 description: What to verify after migrating a server to a new provider — the packages, services, and configs that must match the old box
 tags:
  - migration
  - vps
  - hetzner
  - digitalocean
  - ansible
  - checklist
 status: published
 created: 2026-05-09
 updated: 2026-05-13T10:35
 ---
 # VPS Migration Baseline Checklist
 When migrating a server from one VPS provider to another, it's easy to focus on the application (bots, web services, databases) and forget the infrastructure baseline. This checklist covers the common components that make a server operational beyond just running the app.
 ## Background
 During the Hetzner migration (2026-05), `majordiscord` was migrated with only the application layer (PhantomBot, Red-DiscordBot) and core infrastructure (Netdata, Tailscale, fail2ban). Missing from the new box: Postfix (email relay), logwatch, ClamAV, and dnf-automatic. The gap went unnoticed for a week because all monitoring email depended on the missing Postfix.
 ## The Checklist
 ### Before Migration
 Power on both old and new boxes. Run this comparison to find gaps:
 ```bash
 # Fedora — list baseline packages on both hosts
 ssh root@OLD_HOST 'rpm -qa --qf "%{NAME}\n" | sort | grep -iE "fail2ban|logwatch|postfix|netdata|clamav|dnf-auto|tailscale|cronie|firewalld"'
 ssh root@NEW_HOST 'rpm -qa --qf "%{NAME}\n" | sort | grep -iE "fail2ban|logwatch|postfix|netdata|clamav|dnf-auto|tailscale|cronie|firewalld"'
 # Ubuntu — list baseline packages on both hosts
 ssh root@OLD_HOST 'dpkg -l | grep -iE "fail2ban|logwatch|postfix|netdata|clamav|unattended|tailscale" | awk "{print \$2}" | sort'
 ssh root@NEW_HOST 'dpkg -l | grep -iE "fail2ban|logwatch|postfix|netdata|clamav|unattended|tailscale" | awk "{print \$2}" | sort'
 ```
 Compare enabled services:
 ```bash
 ssh root@HOST 'systemctl list-unit-files --state=enabled --no-pager | grep -iE "fail2ban|logwatch|postfix|netdata|clamav|dnf-auto|tailscale|cronie|firewalld|sshd"'
 ```
 ### Baseline Components
 Every server in the fleet should have these. Check each one after migration:
 | Component | Package (Fedora) | Package (Ubuntu) | Ansible Playbook | Notes |
 |-----------|-----------------|------------------|------------------|-------|
 | Monitoring | `netdata` | `netdata` | `netdata.yml` | Claim to Netdata Cloud if applicable |
 | VPN | `tailscale` | `tailscale` | — (manual join) | Rename node in Tailscale admin |
 | Intrusion prevention | `fail2ban` | `fail2ban` | `harden.yml` | Check jail.local, banaction matches firewall |
 | Email relay | `postfix` | `postfix` | `configure_postfix_relay.yml` | Required by logwatch, Netdata, fail2ban |
 | Log summaries | `logwatch` | `logwatch` | `logwatch.yml` | Override file, not defaults — see [logwatch fleet setup](../monitoring/logwatch-fleet-setup.md) |
 | Firewall | `firewalld` | `ufw` | `configure_firewall_*.yml` | Verify fail2ban banaction matches |
 | Cron | `cronie` | `cron` | — (usually pre-installed) | Required by logwatch |
 | Auto-updates | `dnf-automatic` | `unattended-upgrades` | `ansible-unattended-upgrades-fleet` | Security patches only |
 | Antivirus | `clamav` | `clamav` | `configure_clamav.yml` | Internet-facing hosts only |
 | SSH hardening | `openssh-server` | `openssh-server` | `configure_ssh_hardening.yml` | Key-only, no root password |
 | Timezone | — | — | — | US servers: `America/New_York`; UK: `Europe/London`. Hetzner defaults to UTC. |
 | CA bundle (Fedora) | `ca-certificates` | `ca-certificates` | — | Verify `/etc/pki/tls/certs/ca-bundle.crt` symlink exists — see [Fedora CA bundle fix](../../05-troubleshooting/security/fedora-ca-bundle-missing-symlink.md) |
 | Syslog (Fedora) | `rsyslog` | — (pre-installed) | — | Fedora 44 Hetzner images have journald only. Logwatch needs `/var/log/messages` + `/var/log/secure`. |
 ### After Migration
 1. **Set the timezone** — `timedatectl set-timezone America/New_York` (US) or `Europe/London` (UK). Hetzner images default to UTC.
 2. **Verify CA bundle (Fedora)** — `ls /etc/pki/tls/certs/ca-bundle.crt`. If missing, Postfix TLS, curl, and dnf will all fail silently. See [Fedora CA bundle fix](../../05-troubleshooting/security/fedora-ca-bundle-missing-symlink.md).
 3. **Run `harden.yml` against the new host** — catches most gaps in one pass
 4. **Send a test email** — `echo test | mail -s "test" marcus@majorshouse.com` — if this fails, nothing else can alert you
 5. **Verify crond is running** — `systemctl is-active crond` (Fedora) or `systemctl is-active cron` (Ubuntu). cronie can be `enabled` but not `active` after provisioning.
 6. **Check Netdata Cloud** — verify the new node appears and alerts are flowing
 7. **Compare fail2ban jails** — `fail2ban-client status` on both old and new
 8. **Verify logwatch sends** — `sudo logwatch --output mail --range today`
 9. **Keep the old box powered off but not destroyed** for at least 7 days after remediation
 ### Using doctl to Manage Old Droplets
 ```bash
 # Authenticate (token from Ansible vault)
 cd ~/MajorAnsible
 ansible-vault view group_vars/all/vault.yml | grep vault_do_oauth_token | awk '{print $2}' | xargs doctl auth init --access-token
 # List droplets
 doctl compute droplet list --format Name,ID,Status,PublicIPv4
 # Power on for comparison
 doctl compute droplet-action power-on DROPLET_ID
 # Power off when done
 doctl compute droplet-action power-off DROPLET_ID
 ```
 ## Lesson Learned
 Application migration is not server migration. The app can work perfectly while the monitoring, alerting, and email infrastructure is completely broken. Always compare the full package baseline between old and new boxes before calling a migration complete.
--- a/02-selfhosting/monitoring/logwatch-fleet-setup.md
+++ b/02-selfhosting/monitoring/logwatch-fleet-setup.md
@ -9,7 +9,7 @@ tags:
  - ubuntu
 status: published
 created: 2026-05-09
-updated: 2026-05-10T13:00
+updated: 2026-05-13T10:35
 ---
 # Logwatch Fleet Setup — Surviving Package Upgrades
@ -91,10 +91,22 @@ Include it in `harden.yml` so every new server gets logwatch as part of the base
 After deploying, test immediately:
 ```bash
 # Verify crond is actually running — cronie can be "enabled" but not "active"
 systemctl is-active crond   # Fedora
 systemctl is-active cron    # Ubuntu
 # If inactive, start it
 sudo systemctl start crond
 # Then test logwatch manually
 sudo logwatch --output mail --range today
 ```
-Check that the email arrives. If it doesn't, verify Postfix is installed and relaying correctly — logwatch depends on a working local MTA.
+Check that the email arrives. If it doesn't, verify:
 1. **crond is running** — if `inactive`, cron.daily never fires and logwatch never runs. No errors anywhere.
 2. **Postfix is installed and relaying** — logwatch depends on a working local MTA.
 3. **CA bundle exists (Fedora)** — missing `/etc/pki/tls/certs/ca-bundle.crt` breaks Postfix TLS relay. See [Fedora CA bundle fix](../../05-troubleshooting/security/fedora-ca-bundle-missing-symlink.md).
 ## Diagnosing Silent Failures
@ -105,6 +117,32 @@ dpkg -V logwatch  # Debian
 # Look for S.5....T. on the defaults file — means it was replaced
 # S = size, 5 = md5, T = timestamp changed
 # Check if logwatch produces any output at all
 logwatch --output stdout --range yesterday | wc -l
 # If 0 lines — logwatch has no log data to report (see rsyslog section below)
 ```
 ## Fedora: rsyslog Missing — Logwatch Produces Zero Output
 Fedora 44 cloud images (Hetzner, possibly others) ship with **journald only** — no rsyslog. This means `/var/log/messages`, `/var/log/secure`, and `/var/log/cron` do not exist. Logwatch scans those files, finds nothing, produces empty output, and sends no email. Exit code is still 0 — no error anywhere.
 This is particularly insidious because everything else can be correct (crond running, postfix relaying, logwatch config pointing to the right recipient) and you'll still get silence.
 ```bash
 # Diagnose
 rpm -q rsyslog          # "package rsyslog is not installed"
 ls /var/log/messages    # "No such file or directory"
 # Fix
 dnf install -y rsyslog
 systemctl enable --now rsyslog
 # Verify log files appear
 ls /var/log/messages /var/log/secure /var/log/cron
 # Test logwatch
 logwatch --output stdout --range today | wc -l   # should be >0
 ```
 ## Fedora CA Bundle Missing — Postfix TLS Engine Unavailable
--- a/02-selfhosting/security/clamav-fleet-deployment.md
+++ b/02-selfhosting/security/clamav-fleet-deployment.md
@ -11,7 +11,7 @@ tags:
  - cron
 status: published
 created: 2026-04-18
-updated: 2026-05-10T01:50
+updated: 2026-05-15T03:00
 ---
 # ClamAV Fleet Deployment with Ansible
@ -226,6 +226,41 @@ The "polite CPU is invisible to DO" trick stops working once the box is small en
 **Alternative considered: switch to `clamdscan`** — uses a resident `clamd` daemon, signatures stay loaded, scan finishes ~10× faster with much less CPU/RAM. Better long-term answer, but requires running `clamd` continuously (memory cost on small boxes is ~250 MB resident vs the cron approach which only holds RAM during scan). Trade-off, not strictly better.
 ## Daemonless Mode on Memory-Constrained Hosts
 On hosts with ≤2 GB RAM, running `clamd` continuously is often counterproductive. The daemon loads its full signature database (~950 MB RSS) into memory and keeps it resident. On small VMs this crowds out MySQL, PHP-FPM, and other services — often pushing the whole system into swap rather than preventing anything.
 **Affected hosts (fleet history):**
 | Host | RAM | Incident | Resolution |
 |------|-----|----------|------------|
 | teelia | 1.9 GB | 2026-04-27 — clamd 728 MB RSS, 94% RAM alert | daemonless |
 | dcaprod | 3.8 GB | 2026-04-30 — clamd OOM thrash after 512M cgroup cap | daemonless |
 | majorlinux | 2.0 GB | 2026-05-15 — clamd 980 MB swap, mysqld swapping 293 MB | daemonless |
 **The fix: `clamav_use_daemon: false` host_var**
 `configure_clamav.yml` supports a per-host override. Add to the host's `host_vars/<hostname>/vars.yml`:
 ```yaml
 clamav_use_daemon: false
 ```
 Then re-run the playbook:
 ```bash
 ansible-playbook configure_clamav.yml --limit <hostname>
 ```
 This will:
 - Stop and disable `clamav-daemon.service` and `clamav-daemon.socket`
 - Deploy the weekly scan template using `clamscan` (daemonless, loads DB per run)
 - Leave `clamav-freshclam` active so definitions stay current
 **Trade-off:** Each weekly scan loads the signature DB fresh (~950 MB peak RAM for the scan duration, then freed). The scan takes longer than `clamdscan` (~3–5× on a warm daemon), but this is acceptable for a weekly background job. The `systemd-run MemoryMax` cgroup wrapper in the scan template caps peak usage so the scan can't OOM the host.
 **Rule of thumb:** Use daemon mode (`clamav_use_daemon: true` or unset) on hosts with ≥4 GB RAM where scan speed matters (mail servers, upload handlers). Use daemonless on webservers and small VMs where continuous memory residency is the bigger risk.
 ## See Also
 - [clamscan-cpu-spike-nice-ionice](../../05-troubleshooting/security/clamscan-cpu-spike-nice-ionice.md) — troubleshooting CPU spikes from unthrottled scans
--- a/04-streaming/plex/hevc-vaapi-batch-encode.md
+++ b/04-streaming/plex/hevc-vaapi-batch-encode.md
@ -0,0 +1,168 @@
 ---
 title: "HEVC Batch Re-Encode for Plex Using VAAPI (AMD GPU)"
 domain: streaming
 category: plex
 tags: [plex, ffmpeg, hevc, vaapi, amd, gpu, encode, storage, rx480]
 status: published
 created: 2026-05-15
 updated: 2026-05-15
 ---
 # HEVC Batch Re-Encode for Plex Using VAAPI (AMD GPU)
 ## Problem
 Plex NVMe storage is filling up from a large library of H.264-encoded video files (YouTube downloads, stream archives, etc.). Re-encoding to HEVC (H.265) reclaims 30–50% of disk space. The catch: Plex tracks each file's "date added" in a SQLite database, and that order matters for playback queues. Naive re-encode-and-replace approaches can corrupt or reset that metadata.
 ## Solution
 Use `ffmpeg` with `hevc_vaapi` (AMD GPU hardware encoder) to batch re-encode files in-place using an atomic rename swap that preserves the Plex database record — including `added_at` — without any Plex downtime or database editing.
 ---
 ## How Plex Stores "Date Added"
 Plex does **not** use file modification time (`mtime`) for "date added." It stores a Unix timestamp in its SQLite database:
 ```sql
 -- Plex DB location (override via systemd unit may differ — check):
 -- /var/lib/plexmediaserver/Library/Application Support/Plex Media Server/
 --   Plug-in Support/Databases/com.plexapp.plugins.library.db
 -- (or wherever PLEX_MEDIA_SERVER_APPLICATION_SUPPORT_DIR points)
 SELECT mi.added_at, datetime(mi.added_at, 'unixepoch'), mp.file
 FROM metadata_items mi
 JOIN media_items me ON me.metadata_item_id = mi.id
 JOIN media_parts mp ON mp.media_item_id = me.id
 WHERE mp.file LIKE '%your-file%';
 ```
 > **Note:** If the default path returns 0 rows, check your actual data directory:
 > ```bash
 > systemctl cat plexmediaserver | grep APPLICATION_SUPPORT
 > ```
 The `added_at` field is keyed to the **file path** in `media_parts`. As long as the file path doesn't change, the database record — including `added_at` — is untouched even after the file's content is replaced.
 ---
 ## Why VAAPI Instead of libx265
 On a host with an AMD RX 480/580 (or similar Polaris GPU), hardware HEVC encoding via VAAPI is roughly **9× faster** than software libx265 at comparable quality:
 | Encoder | Speed (1080p) | Notes |
 |---|---|---|
 | libx265 -preset medium | ~21 fps / 0.35× | Best quality/size ratio |
 | hevc_vaapi QP 28 | ~186 fps / 3.1× | Sufficient for streaming content |
 For 1080p streaming content (game streams, podcasts, YouTube archival), the quality difference is imperceptible. libx265 is preferable only for archival encodes where absolute quality matters.
 ### Verify VAAPI is working
 ```bash
 vainfo 2>&1 | grep -E "vaapi|HEVC|hevc|Driver"
 ls /dev/dri/renderD128
 ```
 You need `VAProfileHEVCMain : VAEntrypointEncSlice` in the output. If missing, install `mesa-va-drivers-freeworld` (RPM Fusion) for AMD hardware.
 ---
 ## The Atomic Swap Strategy
 The key insight: `mv file.tmp file` on the **same filesystem** is an atomic inode rename at the kernel level. Plex sees the same path still present — it never fires a "file removed" event, so the `metadata_items` record (including `added_at`) is preserved.
 **Safe sequence:**
 1. Encode source → `.hevc.tmp.mp4` alongside the original
 2. Verify the output with `ffprobe`
 3. `touch -r original.mp4 temp.mp4` — copy mtime (cosmetic, not required)
 4. `mv temp.mp4 original.mp4` — atomic replace
 **The one pitfall:** if the original file is deleted *before* the `mv`, Plex orphans the DB record (removes `metadata_items` entry on next scan) and re-indexes the new file with a fresh `added_at`. The original must still exist at swap time.
 ---
 ## The Batch Script
 Script lives at `~/hevc_batch.sh` on majorhome.
 ```bash
 # Dry run — scan and report what would be encoded, no changes
 bash ~/hevc_batch.sh --dry-run
 # Full run (default: files >1GB, QP 28)
 tmux new-session -d -s hevc_batch 'bash ~/hevc_batch.sh'
 # Custom options
 bash ~/hevc_batch.sh --min-size-gb 2 --qp 26
 ```
 ### Queue and resume
 The script writes a queue file at `~/hevc_queue.txt` on first run (scanning all files with ffprobe — takes ~10 min for a large library). On subsequent runs it resumes from where it left off. Completed files are logged to `~/hevc_done.txt`. Failed files go to `~/hevc_failed.txt`.
 To restart from scratch: `rm ~/hevc_queue.txt ~/hevc_done.txt`
 ### Log output
 ```bash
 # Structured log lines only (skip ffmpeg progress noise)
 grep '^\[20' ~/hevc_batch.log
 # Watch live progress
 tail -f ~/hevc_batch.log | grep '^\[20'
 ```
 Each file logs:
 - Source size and codec
 - `Plex added_at before: <unix timestamp>`
 - ffmpeg exit code and elapsed time
 - Output size and savings
 - `DB check: added_at PRESERVED ✓` (or WARN if changed)
 ### Space guard
 The script aborts if free space on the Plex volume drops below 20GB (`MIN_FREE_GB`). Worst-case headroom needed is `source_size + tmp_size` simultaneously — on a 4GB source file that's ~8GB peak.
 ---
 ## ffmpeg Command
 ```bash
 ffmpeg \
  -vaapi_device /dev/dri/renderD128 \
  -i "input.mp4" \
  -vf 'format=nv12,hwupload' \
  -c:v hevc_vaapi -rc_mode CQP -qp 28 \
  -c:a copy \
  -movflags +faststart \
  -y "output.tmp.mp4"
 ```
 - `-rc_mode CQP -qp 28` — constant quantizer; higher value = smaller file / lower quality. QP 24 is high quality, QP 28 is good for streaming content.
 - `-vf 'format=nv12,hwupload'` — required to move frames to GPU memory for VAAPI encoding.
 - `-c:a copy` — passes audio through untouched.
 - `hevc_vaapi` does not support 10-bit output on Polaris (RX 480/580). For 10-bit HDR sources, fall back to `libx265` with color signaling flags.
 ---
 ## Plex Data Directory Override
 On majorhome, the Plex data directory is overridden in the systemd unit — the default path `/var/lib/plexmediaserver/` is empty:
 ```bash
 systemctl cat plexmediaserver | grep APPLICATION_SUPPORT
 # Environment=PLEX_MEDIA_SERVER_APPLICATION_SUPPORT_DIR=/plex/plexdata/Library/Application Support
 ```
 The actual DB path is therefore:
 ```
 /plex/plexdata/Library/Application Support/Plex Media Server/Plug-in Support/Databases/com.plexapp.plugins.library.db
 ```
 ---
 ## Related
 - [[plex-4k-codec-compatibility]] — Apple TV Direct Play compatibility, HEVC HDR notes
 - [[snapraid-mergerfs-setup]] — MajorRAID storage pool setup
 - [[SnapRAID-Majorhome]] — majorhome SnapRAID project
--- a/05-troubleshooting/networking/ssh-socket-tailscale-race-condition.md
+++ b/05-troubleshooting/networking/ssh-socket-tailscale-race-condition.md
@ -0,0 +1,119 @@
 # Tailscale Boot Race Conditions (SSH Unreachable After Reboot)
 Two related race conditions can make a host unreachable via Tailscale after reboot. Both stem from systemd services starting before Tailscale or the network is ready.
 ---
 ## Race 1: ssh.socket Binds Before Tailscale Is Up (Ubuntu)
 ### Symptom
 SSH to a host via Tailscale IP times out. `tailscale ping` works, `tailscale status` shows `active; direct`, but SSH on port 22 refuses connections. No access via Hetzner console if root password is unset.
 ### Cause
 Ubuntu 24.04 uses systemd **socket activation** for SSH (`ssh.socket` instead of persistent `ssh.service`). When the socket override binds to a Tailscale IP, it can start *before* `tailscaled.service` is ready. The bind may succeed initially (Tailscale state file caches the IP), but a later Tailscale reconnect or interface reset invalidates the bound address silently — SSH dies with no recovery path.
 ### Diagnosis
 ```bash
 # From another host:
 tailscale ping <IP>          # succeeds — host is up
 ssh root@<IP>                # times out — sshd not listening
 # After gaining console access or reboot:
 systemctl status ssh.socket  # check Listen: address
 journalctl -b -1 -u ssh     # likely empty — sshd never spawned
 journalctl -b -1 -u ssh.socket  # socket started before tailscaled
 ```
 ### Fix
 Add Tailscale dependency to the socket override:
 ```ini
 # /etc/systemd/system/ssh.socket.d/override.conf
 [Unit]
 After=tailscaled.service
 BindsTo=tailscaled.service
 [Socket]
 ListenStream=
 ListenStream=<TAILSCALE_IP>:22
 ```
 Then reload and restart:
 ```bash
 systemctl daemon-reload
 systemctl restart ssh.socket
 systemctl status ssh.socket   # verify Listen: shows correct IP
 ```
 - `After=` ensures the socket waits for Tailscale to start
 - `BindsTo=` restarts the socket if Tailscale restarts, preventing stale binds
 ### Affected Hosts
 Ubuntu hosts using `configure_tailscale_ssh_only.yml`: majorlinux, dcaprod-hetzner. Fedora hosts (majordiscord) use firewall rules for SSH restriction — not affected by this race.
 ---
 ## Race 2: tailscaled Starts Before Network Is Online (All Hosts)
 ### Symptom
 Host reboots but never appears on Tailscale. `tailscale ping` times out entirely. SSH is dead because Tailscale never connects. The host is up (accessible via provider console) but isolated from the Tailscale network.
 ### Cause
 `tailscaled.service` ships with `After=network-pre.target`, which fires *before* the network interface has an IP. On VPS hosts (especially Hetzner), the interface can take several seconds to come online. Tailscale starts, sees no network (`SetNetworkUp(false)`, `link state: defaultRoute= ifs={} v4=false v6=false`), fails DNS bootstrap and DERP relay connections, and gets stuck — never retrying.
 ### Diagnosis
 ```bash
 # From Hetzner console or another access method:
 journalctl -b -u tailscaled | grep -E "SetNetworkUp|link state|error|DERP"
 # Look for:
 #   magicsock: SetNetworkUp(false)
 #   link state: interfaces.State{defaultRoute= ifs={} v4=false v6=false}
 #   health: Tailscale could not connect to any relay server
 ```
 ### Fix
 Deploy a systemd drop-in to wait for full network connectivity:
 ```ini
 # /etc/systemd/system/tailscaled.service.d/override.conf
 [Unit]
 After=network-online.target
 Wants=network-online.target
 ```
 Then reload and restart:
 ```bash
 systemctl daemon-reload
 systemctl restart tailscaled
 ```
 ### Affected Hosts
 All hosts where Tailscale is the primary access path. Particularly impactful on VPS hosts with slow interface bringup. Both Fedora and Ubuntu hosts are affected.
 ---
 ## Prevention
 - Set root passwords on all VPS hosts for emergency console access
 - Ansible playbooks deploy both fixes automatically:
  - `configure_tailscale_network_wait.yml` — tailscaled network-online dependency (all hosts)
  - `configure_tailscale_ssh_only.yml` — ssh.socket Tailscale dependency (Ubuntu only)
 ## References
 - [[dcaprod#2026-05-19 — SSH unreachable due to ssh.socket race condition with Tailscale]]
 - [[majordiscord#2026-05-19 — Tailscale boot race: unreachable after Ansible reboot]]
 - [[majorlinux#2026-05-19 — ssh.socket override patched: added Tailscale dependency]]
 - Ansible: `configure_tailscale_ssh_only.yml`, `configure_tailscale_network_wait.yml`
--- a/05-troubleshooting/obs-stale-script-paths-after-windows-profile-rename.md
+++ b/05-troubleshooting/obs-stale-script-paths-after-windows-profile-rename.md
@ -0,0 +1,129 @@
 ---
 title: "OBS Studio — \"Error opening file: (null)\" After Windows Profile Rename"
 domain: troubleshooting
 category: streaming
 tags: [obs, streaming, windows, lua, profile-migration]
 status: published
 created: 2026-05-14
 updated: 2026-05-14
 ---
 # OBS Studio — "Error opening file: (null)" After Windows Profile Rename
 ## Symptom
 Loading a scene collection in OBS Studio triggers a popup like:
 ```
 [<ScriptName>.lua] Error opening file: (null)
 ```
 The `(null)` is the giveaway: OBS resolved the registered script path to nothing — the file doesn't exist where the scene collection says it does. Most commonly this happens after a Windows profile was renamed or migrated and `C:\Users\<old>\...` paths were not updated.
 ## Why it happens
 OBS stores per-scene-collection Lua/Python script registrations inside the scene collection JSON at:
 ```
 %APPDATA%\obs-studio\basic\scenes\<Collection>.json
 ```
 Each entry under `modules.scripts-tool[]` is an absolute Windows path. Renaming the Windows profile does not rewrite these — the JSON keeps pointing at the old `C:\Users\<old>\...` location, and OBS surfaces the resolution failure as a `(null)` popup on collection load.
 ## Diagnose
 From WSL (or any shell with access to `%APPDATA%`):
 ```bash
 OBS_DIR="/mnt/c/Users/<current-windows-user>/AppData/Roaming/obs-studio"
 # 1. List scene collections
 ls "$OBS_DIR/basic/scenes/"
 # 2. Find collections referencing the missing script
 grep -l -i "<script-name-substring>" "$OBS_DIR/basic/scenes/"*.json
 # 3. Dump the scripts-tool paths from each suspect collection
 python3 -c "
 import json, sys
 d = json.load(open(sys.argv[1]))
 for s in d.get('modules', {}).get('scripts-tool', []):
    print(s.get('path'))
 " "$OBS_DIR/basic/scenes/<Collection>.json"
 ```
 If a printed path contains `C:/Users/<old-username>/...` and the file doesn't exist on disk, you've found it.
 ## Fix
 > [!warning] Close OBS first
 > OBS rewrites the scene collection JSON when it exits. Any edit made while OBS is running will be overwritten. Confirm with `tasklist.exe | grep obs64` (WSL) or Task Manager.
 ### 1. Make the missing script reachable
 Either:
 - **Re-extract / restore the script** to a path under the new profile (recommended — gives you a clean canonical home), or
 - **Leave it in the rescue/migration folder** and point OBS there (fragile if the rescue folder is later deleted).
 ### 2. Back up the scene collection JSON
 ```bash
 SCENES="/mnt/c/Users/<current-windows-user>/AppData/Roaming/obs-studio/basic/scenes"
 STAMP="$(date +%Y%m%d-%H%M%S)"
 cp -p "$SCENES/<Collection>.json" "$SCENES/<Collection>.json.$STAMP.bak"
 ```
 ### 3. Rewrite the paths atomically
 Edit the JSON in place by parsing it, replacing the matched path strings, and writing through a temp file (so a crash mid-write can't corrupt the collection):
 ```bash
 python3 <<'PY'
 import json, os
 scenes  = "/mnt/c/Users/<current-windows-user>/AppData/Roaming/obs-studio/basic/scenes"
 mapping = {
    "C:/Users/<old>/Pictures/.../<script>.lua":
    "C:/Users/<new>/Pictures/.../<script>.lua",
 }
 for fn in ("<Collection>.json",):
    path = os.path.join(scenes, fn)
    d = json.load(open(path))
    for entry in d.get("modules", {}).get("scripts-tool", []):
        if entry.get("path") in mapping:
            entry["path"] = mapping[entry["path"]]
    tmp = path + ".tmp"
    json.dump(d, open(tmp, "w"), indent=4)
    os.replace(tmp, path)
 PY
 ```
 OBS scene JSONs use forward slashes in Windows paths — preserve that style.
 ### 4. Verify
 Re-run the diagnostic Python snippet and confirm every printed path resolves to a real file (translate `C:/` → `/mnt/c/` from WSL).
 ### 5. Reopen OBS
 Load the scene collection. The popup should be gone.
 ## Why not just remove the script?
 If the script is part of a third-party overlay pack (Twitch Pimpage, OWN3D, etc.), removing the registration also removes the overlay's source presets — fixing the path keeps the imported scenes intact. If you don't actually use the overlay anymore, removing the `scripts-tool` entry is fine; OBS will silently drop the broken reference on next save.
 ## Generalization
 This same pattern applies to any OBS asset path stored in a scene collection or profile:
 - Browser source local files
 - Image / media source files
 - Lua / Python script paths
 - VST plugin paths
 All of them are absolute, all of them survive a Windows profile rename in stale form, and all of them can be batch-rewritten with the same JSON-edit pattern above. Search for the old username substring across `%APPDATA%\obs-studio\` to catch them all in one pass.
 ## Related
 - [[../../MajorInfrastructure/Devices/MajorRig|MajorRig device note]] — Incident Log 2026-05-14 (TTT/MLS scene popups) and 2026-05-07 (`majli` profile retirement that left these references stranded)
 - [[../04-streaming/obs/obs-studio-setup-encoding|OBS Studio Setup and Encoding Settings]]
--- a/05-troubleshooting/security/clamscan-cpu-spike-nice-ionice.md
+++ b/05-troubleshooting/security/clamscan-cpu-spike-nice-ionice.md
@ -1,11 +1,17 @@
 ---
-title: "ClamAV Safe Scheduling on Live Servers"
+title: ClamAV Safe Scheduling on Live Servers
 domain: troubleshooting
 category: security
-tags: [clamav, cpu, nice, ionice, cron, vps]
+tags:
  - clamav
  - cpu
  - nice
  - ionice
  - cron
  - vps
 status: published
 created: 2026-04-02
-updated: 2026-04-02
+updated: 2026-05-11T18:31
 ---
 # ClamAV Safe Scheduling on Live Servers
@ -75,6 +81,7 @@ kill <PID>
 - `ionice -c 3` (Idle) requires Linux kernel ≥ 2.6.13 and CFQ/BFQ I/O scheduler. Works on most Ubuntu/Debian/Fedora systems.
 - On multi-core servers, consider also using `cpulimit` for a hard cap: `cpulimit -l 30 -- clamscan ...`
 - Always keep `--exclude=/sys` (and optionally `--exclude=/proc`, `--exclude=/dev`) to avoid scanning virtual filesystems.
 - **1 vCPU limitation:** `nice` and `ionice` only help when other processes compete for resources. On a single-core VPS, clamscan will still saturate the CPU at 57-100% even with `nice -n 19 ionice -c 3` — there's nothing to yield to. Accept the weekly spike as benign, or reduce scan scope to shorten the window.
 ## Related
--- a/05-troubleshooting/security/fedora-ca-bundle-missing-symlink.md
+++ b/05-troubleshooting/security/fedora-ca-bundle-missing-symlink.md
@ -0,0 +1,116 @@
 ---
 title: "Fedora CA Bundle Missing Symlink — TLS Breaks Fleet-Wide"
 description: Hetzner-provisioned Fedora images may be missing the /etc/pki/tls/certs/ca-bundle.crt symlink, silently breaking Postfix TLS relay, curl, and dnf
 tags:
  - fedora
  - tls
  - postfix
  - ca-certificates
  - hetzner
  - troubleshooting
 status: published
 created: 2026-05-11
 updated: 2026-05-11
 ---
 # Fedora CA Bundle Missing Symlink
 On Fedora, many TLS clients (Postfix, curl, dnf) look for the CA bundle at `/etc/pki/tls/certs/ca-bundle.crt`. This path is normally a symlink to `/etc/pki/ca-trust/extracted/pem/tls-ca-bundle.pem`, shipped by the `ca-certificates` package.
 On Hetzner Cloud Fedora images (observed on Fedora 44, May 2026), this symlink can be missing despite `ca-certificates` being installed. The extracted bundle exists, but the consumer-facing symlink does not.
 ## Symptoms
 Postfix relay to a TLS-required upstream fails:
 ```
 postfix/smtp: cannot load Certification Authority data,
  CAfile="/etc/pki/tls/certs/ca-bundle.crt",
  CApath="/etc/pki/tls/certs": disabling TLS support
 ```
 If your relay requires TLS (port 465 with `smtp_tls_wrappermode = yes`, or `smtp_tls_security_level = encrypt`), mail silently queues as deferred. No bounce, no alert — just silence.
 Other symptoms on the same box:
 ```bash
 # curl fails
 curl https://example.com
 # error: Problem with the SSL CA cert (path? access rights?)
 # dnf fails
 dnf list --installed
 # Curl error (77): Problem with the SSL CA cert
 ```
 ## Diagnosis
 ```bash
 # Check the symlink
 ls -la /etc/pki/tls/certs/ca-bundle.crt
 # Expected: symlink -> /etc/pki/ca-trust/extracted/pem/tls-ca-bundle.pem
 # Broken: "No such file or directory"
 # Verify the extracted bundle exists
 ls -la /etc/pki/ca-trust/extracted/pem/tls-ca-bundle.pem
 # Should exist (~220 KB, ~140-150 certs)
 # Confirm the package is installed
 rpm -q ca-certificates
 # Should return a version string
 ```
 If the extracted bundle exists but the symlink at `/etc/pki/tls/certs/ca-bundle.crt` is missing, that's the problem.
 ## Fix
 ```bash
 sudo ln -sf /etc/pki/ca-trust/extracted/pem/tls-ca-bundle.pem \
            /etc/pki/tls/certs/ca-bundle.crt
 sudo systemctl restart postfix
 sudo postqueue -f   # flush any deferred mail
 ```
 Verify:
 ```bash
 # Symlink exists
 ls -la /etc/pki/tls/certs/ca-bundle.crt
 # Postfix can relay
 echo "Subject: TLS test" | sendmail -v marcus@majorshouse.com
 # curl works
 curl -sI https://example.com | head -1
 ```
 ## Fleet Audit
 If one Hetzner-provisioned Fedora host has this issue, check the others:
 ```bash
 for host in majordiscord majorlab majorhome majormail; do
  echo "$host: $(ssh root@$host 'ls /etc/pki/tls/certs/ca-bundle.crt 2>&1' | tail -1)"
 done
 ```
 Hosts returning "No such file or directory" are silently broken for all TLS operations.
 ## Why This Happens
 `update-ca-trust extract` regenerates the files under `/etc/pki/ca-trust/extracted/` but does not create the legacy consumer-path symlink at `/etc/pki/tls/certs/ca-bundle.crt`. That symlink is shipped by the `ca-certificates` RPM. On cloud images built from minimal installs or snapshot-based provisioning, the symlink can be lost during image creation or a partial upgrade.
 ## Prevention
 Add to your provisioning checklist (see [VPS Migration Baseline Checklist](../../02-selfhosting/cloud/vps-migration-baseline-checklist.md)):
 ```bash
 # Fedora provisioning — verify CA bundle symlink
 ls /etc/pki/tls/certs/ca-bundle.crt || \
  ln -sf /etc/pki/ca-trust/extracted/pem/tls-ca-bundle.pem /etc/pki/tls/certs/ca-bundle.crt
 ```
 ## Related
 - [Logwatch Fleet Setup](../../02-selfhosting/monitoring/logwatch-fleet-setup.md) — logwatch depends on a working Postfix relay, which depends on TLS, which depends on this symlink
 - [VPS Migration Baseline Checklist](../../02-selfhosting/cloud/vps-migration-baseline-checklist.md) — includes CA bundle verification step
--- a/05-troubleshooting/security/netdata-apps-fds-group-false-positive.md
+++ b/05-troubleshooting/security/netdata-apps-fds-group-false-positive.md
@ -0,0 +1,112 @@
 ---
 title: Netdata apps-group FD-utilisation false 100% (silenced fleet-wide)
 domain: troubleshooting
 category: security
 tags:
  - netdata
  - apps.plugin
  - file-descriptors
  - tailscale
  - false-positive
  - ansible
  - fleet
 status: published
 created: 2026-05-15
 updated: 2026-05-15T02:40
 ---
 # Netdata apps-group FD-utilisation false 100%
 The Netdata stock alarm **`apps_group_file_descriptors_utilization`** (from
 `/usr/lib/netdata/conf.d/health.d/file_descriptors.conf`) fires
 `Raised to Warning — App group <X> file descriptors utilization = 100%`
 emails for application groups that are perfectly healthy. First hit on
 **MajorToot** (the `tailscaled` app group), 2026-05-15.
 ## The Problem
 A Netdata email arrives: *"App group tailscaled file descriptors utilization
 = 100% on MajorToot"*. The process is fine. On the host:
 ```
 PID 1047    tailscaled (daemon)   fds=35  soft_limit=524287  util=0.01%
 PID 1984541 tailscaled (child)    fds=10  soft_limit=524287  util=0.00%
 PID 1984548 bash (tailscale hook) fds=5   soft_limit=1024    util=0.49%
 ```
 No PID exceeds **0.5%**, yet `app.fds_open_limit` reads ~100%. Over 1h the raw
 chart was min 0 / **mean 36.7** / max 100, with sustained multi-minute 100%
 plateaus (not isolated spikes).
 > This is **not** an `apps.plugin` privilege problem. apps.plugin already has
 > `cap_dac_read_search,cap_sys_ptrace` and `sudo -u netdata cat
 > /proc/<pid>/limits` succeeds. Verify before "fixing" privileges — it's a
 > no-op.
 ## Root Cause
 The stock alarm does `lookup: max -10s` over **every PID in the app group**.
 App groups whose processes fork short-lived children (tailscaled spawns
 route/DNS helpers and bash hooks; `bash` children inherit the systemd default
 soft limit of 1024) trip a false 100%: apps.plugin's per-PID FD-limit read
 **races on transient/just-forked PIDs**, and because the group lookup uses
 `max`, a single bad 10-second sample pegs the entire group to ~100%. The
 signal carries no usable information for any forking/root app group.
 A `lookup: average -5m` does **not** rescue it — the bogus reading sits at
 ~100% for sustained multi-minute stretches, so the 5-minute rolling average
 itself still reaches 100.0% (empirically verified on MajorToot).
 ## The Fix
 Silence this template fleet-wide, keep the reliable system-wide FD alarm.
 - **Codified in Ansible** (do not hand-edit hosts): `MajorAnsible/netdata.yml`
  ships `templates/health_apps_fds_group.conf.j2` to
  `/etc/netdata/health.d/apps_fds_group_override.conf` and reloads via
  `netdatacli reload-health`.
 - The override redefines `apps_group_file_descriptors_utilization` with
  `to: silent`. Netdata loads `/etc/netdata/health.d/` *after* the stock
  `conf.d` dir, so a same-name template deterministically supersedes the stock
  one (same mechanism as the manual `tcp_resets.conf` override, 2026-04-30).
 - **Safety net retained:** the companion stock template
  `system_file_descriptors_utilization` (on `system.file_nr_utilization`,
  `crit > 90`, `to: sysadmin`) is untouched and still catches genuine
  system-wide FD exhaustion regardless of app grouping.
 - The reload handler is restart-tolerant (`retries`/`until` + `failed_when`
  ignoring a `netdata.pipe` socket-absent error) because on hosts where the
  notify-config also drifts, `Restart Netdata` and `Reload Netdata health`
  can race during the ~5s restart window.
 ## Verification
 ```bash
 ssh <host> 'curl -s "http://localhost:19999/api/v1/alarms?all=true" \
 | python3 -c "import sys,json;A=json.load(sys.stdin)[\"alarms\"]; \
 print(A[\"app.tailscaled_fds_open_limit.apps_group_file_descriptors_utilization\"][\"recipient\"])"'
 # expect: silent
 ```
 After the fix the alarm still shows `status=WARNING` in the dashboard
 (cosmetic — silencing suppresses the *notification*, not the computed state);
 `recipient=silent` confirms no more emails. The system-wide alarm should read
 `CLEAR recipient=sysadmin`.
 ## Notes
 - Silenced fleet-wide on all 10 servers 2026-05-15 (workstations majorrig/
  majormac were asleep — irrelevant, they are not fleet servers).
 - Any future host running a forking/root daemon in a named app group would
  have hit the same false positive; silencing is fleet-wide and pre-emptive.
 - **Follow-up debt:** the manual `/etc/netdata/health.d/tcp_resets.conf`
  override on MajorToot (2026-04-30) is still **not codified in
  `netdata.yml`** — a per-host divergence the fleet play does not manage.
  Worth folding into Ansible the same way.
 ## Related
 - [[clamscan-cpu-spike-nice-ionice]]
 - [[netdata-web-log-successful-redirect-heavy-tuning]]
 - Server doc: `30-Areas/MajorInfrastructure/Servers/majortoot.md` (incident
  2026-05-15)
 - Playbook: `MajorAnsible/netdata.yml` +
  `templates/health_apps_fds_group.conf.j2`
--- a/SUMMARY.md
+++ b/SUMMARY.md
@ -1,6 +1,6 @@
 ---
 created: 2026-04-02T16:03
-updated: 2026-05-10T00:10
+updated: 2026-05-15T09:00
 ---
 * [Home](index.md)
 * [Linux & Sysadmin](01-linux/index.md)
@ -28,6 +28,7 @@ updated: 2026-05-10T00:10
    * [Wake-on-LAN via Router SSH](02-selfhosting/dns-networking/wake-on-lan-router-ssh.md)
    * [Pi-hole v6 Group Management — Per-Client DNS Rules](02-selfhosting/dns-networking/pihole-v6-group-management.md)
    * [AWS S3 Cost Management](02-selfhosting/cloud/aws-s3-cost-management.md)
    * [VPS Migration Baseline Checklist](02-selfhosting/cloud/vps-migration-baseline-checklist.md)
    * [rsync Backup Patterns](02-selfhosting/storage-backup/rsync-backup-patterns.md)
    * [Tuning Netdata Web Log Alerts](02-selfhosting/monitoring/tuning-netdata-web-log-alerts.md)
    * [Tuning Netdata Docker Health Alarms](02-selfhosting/monitoring/netdata-docker-health-alarm-tuning.md)
@ -69,11 +70,13 @@ updated: 2026-05-10T00:10
 * [Streaming & Podcasting](04-streaming/index.md)
    * [OBS Studio Setup & Encoding](04-streaming/obs/obs-studio-setup-encoding.md)
    * [Plex 4K Codec Compatibility (Apple TV)](04-streaming/plex/plex-4k-codec-compatibility.md)
    * [HEVC Batch Re-Encode for Plex Using VAAPI (AMD GPU)](04-streaming/plex/hevc-vaapi-batch-encode.md)
 * [Troubleshooting](05-troubleshooting/index.md)
    * [Apache Outage: Fail2ban Self-Ban + Missing iptables Rules](05-troubleshooting/networking/fail2ban-self-ban-apache-outage.md)
    * [Mail Client Stops Receiving: Fail2ban IMAP Self-Ban](05-troubleshooting/networking/fail2ban-imap-self-ban-mail-client.md)
    * [firewalld: Mail Ports Wiped After Reload](05-troubleshooting/networking/firewalld-mail-ports-reset.md)
    * [Tailscale SSH: Unexpected Re-Authentication Prompt](05-troubleshooting/networking/tailscale-ssh-reauth-prompt.md)
    * [ssh.socket Unreachable After Reboot (Tailscale Race Condition)](05-troubleshooting/networking/ssh-socket-tailscale-race-condition.md)
    * [Fail2ban & UFW Rule Bloat Cleanup](05-troubleshooting/networking/fail2ban-ufw-rule-bloat-cleanup.md)
    * [Custom Fail2ban Jail: Apache Directory Scanning](05-troubleshooting/security/apache-dirscan-fail2ban-jail.md)
    * [Tuning Netdata `web_log_1m_successful` for Redirect-Heavy WordPress Sites](05-troubleshooting/security/netdata-web-log-successful-redirect-heavy-tuning.md)
@ -104,7 +107,10 @@ updated: 2026-05-10T00:10
    * [rsync over Tailscale: Hung in TCP Teardown After Transfer Completes](05-troubleshooting/networking/rsync-tailscale-teardown-stall.md)
    * [iOS Tailscale Clients Report HostName="localhost" — Breaks /etc/hosts Generators](05-troubleshooting/networking/tailscale-status-json-hostname-localhost-ios.md)
    * [macOS: Repeating Alert Tone from Mirrored iPhone Notification](05-troubleshooting/macos-mirrored-notification-alert-loop.md)
    * [OBS Studio: Stale Script Paths After Windows Profile Rename](05-troubleshooting/obs-stale-script-paths-after-windows-profile-rename.md)
    * [ClamAV CPU Spike: Safe Scheduling with nice/ionice](05-troubleshooting/security/clamscan-cpu-spike-nice-ionice.md)
    * [Fedora CA Bundle Missing Symlink — TLS Breaks Fleet-Wide](05-troubleshooting/security/fedora-ca-bundle-missing-symlink.md)
    * [Netdata apps-group FD-utilisation false 100% (silenced fleet-wide)](05-troubleshooting/security/netdata-apps-fds-group-false-positive.md)
    * [Ansible: Vault Password File Not Found](05-troubleshooting/ansible-vault-password-file-missing.md)
    * [Ansible: ansible.cfg Ignored on WSL2 Windows Mounts](05-troubleshooting/ansible-wsl2-world-writable-mount-ignores-cfg.md)
    * [Ansible: SSH Timeout During dnf upgrade on Fedora Hosts](05-troubleshooting/ansible-ssh-timeout-dnf-upgrade.md)
Author	SHA1	Message	Date
majorlinux	65b0aa4567	wiki: expand Tailscale race condition article with network-online race Added Race 2: tailscaled starts before network-online.target, causing Tailscale to get stuck with SetNetworkUp(false). Covers both Ubuntu ssh.socket and cross-platform tailscaled ordering issues. Updated references to include majordiscord incident and new Ansible playbook.	2026-05-19 20:39:18 -04:00
majorlinux	eb39da9a26	Merge cowork/majorair/ssh-socket-wiki: ssh.socket Tailscale race condition article	2026-05-19 19:36:19 -04:00
majorlinux	7dc591d257	wiki: add ssh.socket Tailscale race condition troubleshooting article Documents the systemd socket activation race where ssh.socket binds to the Tailscale IP before tailscaled is ready, causing SSH to become unreachable after a Tailscale reconnect. Includes diagnosis steps and the After=/BindsTo= fix.	2026-05-19 19:35:16 -04:00
MajorLinux	64ac418a36	wiki: add ClamAV daemonless mode section + HEVC VAAPI article link	2026-05-15 09:02:24 -04:00
Marcus (via Claude Code)	28518e403e	Add troubleshooting articles: Netdata apps-group FD false-positive + OBS stale script paths - netdata-apps-fds-group-false-positive: the apps_group_file_descriptors_utilization false 100% on forking/root app groups (tailscaled on MajorToot 2026-05-15), the not-a-privilege gotcha, fleet-wide silence fix in MajorAnsible. - obs-stale-script-paths: pending from prior session (not on remote). - SUMMARY.md: link both (re-applied onto upstream after concurrent rebase). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-15 03:22:12 -04:00
majorlinux	a785e85821	Merge branch 'code/majorair/rsyslog-logwatch-fix'	2026-05-13 10:36:06 -04:00
majorlinux	4ec481c584	wiki: add rsyslog requirement to migration checklist and logwatch docs Fedora 44 Hetzner images ship without rsyslog — logwatch produces zero output because /var/log/messages doesn't exist. Added rsyslog to baseline table and new diagnostic section to logwatch article. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-05-13 10:36:00 -04:00
majorlinux	c22457f1aa	Merge branch 'code/majorair/teelia-cpu-docs'	2026-05-11 18:32:18 -04:00
majorlinux	ac84610380	wiki: add 1 vCPU nice/ionice limitation note to ClamAV article nice -n 19 only yields when other processes compete; on single-core VPS boxes the scan still saturates CPU. Document the expectation. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-05-11 18:32:01 -04:00
majorlinux	3df0979786	Merge branch 'code/majorair/logwatch-ca-bundle-docs' Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-05-11 07:37:48 -04:00
majorlinux	de9b661b9d	wiki: add Fedora CA bundle article, update migration checklist and logwatch docs New article documenting missing /etc/pki/tls/certs/ca-bundle.crt symlink on Hetzner Fedora images breaking Postfix TLS, curl, and dnf. Updated VPS migration baseline checklist with timezone, CA bundle, and crond verification steps. Updated logwatch fleet setup with crond check. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-05-11 07:35:42 -04:00