diff --git a/01-linux/distro-specific/wsl2-backup-powershell.md b/01-linux/distro-specific/wsl2-backup-powershell.md index cc0286b..a5b61fd 100644 --- a/01-linux/distro-specific/wsl2-backup-powershell.md +++ b/01-linux/distro-specific/wsl2-backup-powershell.md @@ -9,8 +9,8 @@ tags: - powershell - majorrig status: published -created: '2026-03-16' -updated: '2026-03-16' +created: 2026-03-16 +updated: 2026-04-23T10:57 --- # WSL2 Backup via PowerShell Scheduled Task @@ -19,7 +19,7 @@ WSL2 distributions are stored as a VHDX file on disk. Unlike traditional VMs, th ## The Short Answer -Save this as `C:\Users\majli\Scripts\backup-wsl.ps1` and register it as a weekly scheduled task. +Save this as `C:\Users\majorlinux\Scripts\backup-wsl.ps1` and register it as a weekly scheduled task. ## Backup Script @@ -57,7 +57,7 @@ Run in PowerShell as Administrator: ```powershell $Action = New-ScheduledTaskAction -Execute "PowerShell.exe" ` - -Argument "-NonInteractive -File C:\Users\majli\Scripts\backup-wsl.ps1" + -Argument "-NonInteractive -File C:\Users\majorlinux\Scripts\backup-wsl.ps1" $Trigger = New-ScheduledTaskTrigger -Weekly -DaysOfWeek Sunday -At 2am $Settings = New-ScheduledTaskSettingsSet -StartWhenAvailable -RunOnlyIfNetworkAvailable:$false Register-ScheduledTask -TaskName "WSL2 Backup - FedoraLinux43" ` diff --git a/01-linux/distro-specific/wsl2-rebuild-fedora43-training-env.md b/01-linux/distro-specific/wsl2-rebuild-fedora43-training-env.md index 950c1bf..6d3676e 100644 --- a/01-linux/distro-specific/wsl2-rebuild-fedora43-training-env.md +++ b/01-linux/distro-specific/wsl2-rebuild-fedora43-training-env.md @@ -11,8 +11,8 @@ tags: - majorrig - majortwin status: published -created: '2026-03-16' -updated: '2026-03-16' +created: 2026-03-16 +updated: 2026-04-23T10:57 --- # WSL2 Fedora 43 Training Environment Rebuild @@ -133,7 +133,7 @@ ls ~/majortwin/llama.cpp/build/bin/llama-cli && echo "OK" ```bash cat >> ~/.bashrc << 'EOF' # MajorInfrastructure Paths -export VAULT="/mnt/c/Users/majli/Documents/MajorVault" +export VAULT="/mnt/c/Users/majorlinux/Documents/MajorVault" export MAJORANSIBLE="/mnt/d/MajorAnsible" export MAJORTWIN_D="/mnt/d/MajorTwin" export MAJORTWIN_WSL="$HOME/majortwin" diff --git a/01-linux/networking/ssh-config-key-management.md b/01-linux/networking/ssh-config-key-management.md index cf446e5..2bfedbf 100644 --- a/01-linux/networking/ssh-config-key-management.md +++ b/01-linux/networking/ssh-config-key-management.md @@ -10,11 +10,7 @@ tags: - remote-access status: published created: 2026-03-08 -<<<<<<< Updated upstream -updated: 2026-04-14T14:27 -======= -updated: 2026-04-18T11:13 ->>>>>>> Stashed changes +updated: 2026-04-22T09:20 --- # SSH Config and Key Management diff --git a/02-selfhosting/dns-networking/pihole-doh-dot-bypass-defense.md b/02-selfhosting/dns-networking/pihole-doh-dot-bypass-defense.md new file mode 100644 index 0000000..c8ac1d8 --- /dev/null +++ b/02-selfhosting/dns-networking/pihole-doh-dot-bypass-defense.md @@ -0,0 +1,180 @@ +--- +title: Pi-hole DoH / DoT Bypass Defense +domain: selfhosting +category: dns-networking +tags: + - pihole + - dns + - doh + - dot + - privacy + - adblock + - bypass + - hagezi +status: published +created: 2026-04-22 +updated: 2026-04-23T09:09 +--- + +# Pi-hole DoH / DoT Bypass Defense + +## The Problem + +A LAN-wide ad/tracker/threat-intel blocklist at the DNS layer is only effective if clients actually use the DNS server doing the blocking. Three classes of client routinely bypass LAN DNS: + +1. **Modern browsers with built-in DNS-over-HTTPS (DoH).** Chrome, Firefox, Safari, Edge all ship with DoH either on by default or a one-toggle opt-in. When enabled, the browser sends DNS queries over HTTPS directly to Cloudflare / Google / Quad9 / NextDNS, bypassing the OS resolver and every DNS-layer blocklist on the network. +2. **IoT / smart devices with hardcoded public DNS.** Chromecast, Google Home, Nest, many Samsung TVs, some Amazon devices include hardcoded `8.8.8.8` or `1.1.1.1`. They ignore DHCP-pushed DNS entirely. +3. **Applications using DNS-over-TLS (DoT).** Rarer than DoH but used by some privacy-focused apps and occasional malware C2 — hits Cloudflare / Quad9 on port 853 instead of 53. + +Without defense, a compromised IoT or a telemetry-hungry app can exfil DNS traffic freely even though Pi-hole is "running." + +## What This Guide Covers + +- How Pi-hole's `blocking.mode = NULL` structurally prevents the most common fallback-resolver bypass. +- Why the `HaGeZi doh-vpn-proxy-bypass` adlist is the single highest-leverage defense against browser DoH. +- What still leaks and how to assess whether the router-level firewall is worth the effort for your threat model. + +## Pi-hole's block mode matters + +Pi-hole v6's default `dns.blocking.mode` is `NULL`. A blocked domain resolves to `0.0.0.0` — a **valid** DNS answer, not an NXDOMAIN. Verify on your host: + +```bash +dig +short @ +# → 0.0.0.0 +``` + +Why this matters: multi-resolver OSes (macOS, iOS, Windows) only consult fallback resolvers on a **failure** (timeout, SERVFAIL). A valid NULL answer short-circuits that — the client accepts the 0.0.0.0, tries to connect, fails at TCP, and never retries DNS. Even if `/etc/resolv.conf` has `1.1.1.1` as a secondary, it's never queried. + +If you've set blocking mode to `NXDOMAIN`, clients **will** fall back — and every telemetry domain on every adlist becomes bypassable through whatever secondary resolver the OS is configured with. **Leave it at NULL.** + +Check: +```bash +pihole-FTL --config dns.blocking.mode +# → NULL +``` + +## HaGeZi DoH/VPN/Proxy Bypass — the biggest single win + +HaGeZi maintains `adblock/doh-vpn-proxy-bypass.txt` — ~18,000 DoH resolver hostnames, including the bootstrap domains used by every major browser: + +| Browser | DoH bootstrap | +|---|---| +| Firefox | `mozilla.cloudflare-dns.com` | +| Chrome | `chrome.cloudflare-dns.com`, `dns.google` | +| Safari (iCloud Private Relay bootstrap) | Apple-specific, *not* in this list — Apple uses QUIC | +| Edge | `dns.google`, other public resolvers | + +When the bootstrap hostname can't be resolved (Pi-hole answers `0.0.0.0`), the browser's DoH setup fails and it falls back to the system resolver — which is Pi-hole. This flips the default behavior from "browsers can bypass" to "browsers respect LAN DNS." + +### Adding it + +```bash +NOW=$(date +%s) +sudo pihole-FTL sqlite3 /etc/pihole/gravity.db < +done +# All should return 0.0.0.0 +``` + +### Known false positives + +The list is aggressive. Expect occasional pushback: + +- **`claude.ai`** — gets caught by the broader `pro.txt` or TIF list in some combinations; DoH bypass list itself is usually clean. If you use Claude on LAN and see blocks, allowlist `claude.ai` — note that `api.anthropic.com` is typically **not** on any of these lists, so Claude Code / API traffic is unaffected. +- **Zscaler ZPA / Zscaler Internet Access** — **this will break work-from-home auth if you don't allowlist it.** The DoH/VPN bypass list classifies Zscaler's ZTNA backbone as a "VPN proxy" and blocks it. Symptom: users see a blank / failed page at `https://samlsp.private.zscaler.com/...` during SAML sign-in, and the Zscaler Client Connector fails to authenticate. + + The critical piece is that Zscaler's SAML SP hostname is a **CNAME chain**: + + ``` + samlsp.private.zscaler.com. CNAME samlsp.prod.zpath.net. + samlsp.prod.zpath.net. CNAME zapp2saml.gslb.prod.zpath.net. + zapp2saml.gslb.prod.zpath.net. CNAME snico2br.gslb.prod.zpath.net. + snico2br.gslb.prod.zpath.net. A + ``` + + Pi-hole walks the CNAME chain and blocks on the target (status 9 = `blocked_gravity_cname`), so **an exact-hostname allowlist for `samlsp.private.zscaler.com` will NOT fix it** — you have to allowlist the CNAME target domain. The GSLB subdomains rotate, so use a regex allowlist for the whole `zpath.net` zone: + + ```sql + INSERT OR IGNORE INTO domainlist (type, domain, enabled, comment) + VALUES (2, '(\.|^)zpath\.net$', 1, 'Zscaler ZPA CNAME backbone — do not block'); + ``` + + Don't forget `pihole reloaddns` after. Expect to also need regex allowlists for `zscaler.net`, `zscalertwo.net`, `zscalerthree.net`, `zscalerone.net`, `zscloud.net` if any are gravity-blocked — HaGeZi's lists may cover different combinations over time. +- **iCloud Private Relay** — if you want iCPR to keep working on your Apple devices, allowlist its mask ingresses. The DoH/VPN bypass list blocks `mask.icloud.com`, `mask-h2.icloud.com`, and `mask-api.icloud.com` (Apple's iCPR entrance points). Without them, iCPR silently falls back to standard DNS — which means **Pi-hole is covering the bypass whether you want it to or not**. For hosts where iCPR is desired: + + ```sql + INSERT OR IGNORE INTO domainlist (type, domain, enabled, comment) + VALUES (2, '(\.|^)mask[a-z0-9-]*\.icloud\.com$', 1, 'iCloud Private Relay ingress'); + ``` + + Keep this surgical — do **not** allowlist all of `icloud.com`. Other subdomains (`metrics.icloud.com`, `init.gc.apple.com` family) are Apple telemetry that the adlists correctly block. After allowlist + `pihole reloaddns`, toggle Wi-Fi or flip iCPR off/on in Settings on each Apple device to force DNS re-resolution — iOS/macOS caches DNS aggressively and won't pick up the change otherwise. +- **`dot.txt` companion adlist** — as of April 2026, HaGeZi's separate `adblock/dot.txt` URL returns 403. DoT resolver hostnames are folded into `doh-vpn-proxy-bypass.txt` already. + +## What still leaks + +The DoH adlist does not defend against: + +1. **IoT devices with hardcoded public DNS.** Chromecast et al. send UDP/53 queries directly to `8.8.8.8`. Pi-hole never sees them. +2. **Apps that hardcode a DoH or DoT endpoint by IP.** If an app has `1.1.1.1` baked in rather than `cloudflare-dns.com`, the hostname block can't help. +3. **Apple iCloud Private Relay.** Uses QUIC (UDP/443) to Cloudflare with oblivious DNS. Safari + Apple services route around Pi-hole entirely. Acceptable tradeoff for most users; mostly a privacy win even if it weakens your LAN-side visibility. + +Estimated residual gap after the DoH adlist: **~3%** of tracker/telemetry traffic, mostly from hardcoded-DNS IoT. + +## Router-level enforcement (optional, higher effort) + +To close the remaining 3%, block outbound `udp/53`, `tcp/53`, `tcp/853` at the router for everything except the Pi-hole's IP. Two rules: + +```bash +# Transparently redirect all LAN :53 traffic to Pi-hole, except Pi-hole itself +iptables -t nat -I PREROUTING -i br0 -p udp --dport 53 ! -s -j DNAT --to :53 +iptables -t nat -I PREROUTING -i br0 -p tcp --dport 53 ! -s -j DNAT --to :53 + +# Reject DoT so apps fall back to classic DNS (→ Pi-hole via above) +iptables -I FORWARD -i br0 -p tcp --dport 853 ! -s -j REJECT --reject-with tcp-reset +iptables -I FORWARD -i br0 -p udp --dport 853 ! -s -j REJECT +``` + +Design choices: +- **REDIRECT (DNAT), not DROP, for port 53** — devices with hardcoded `8.8.8.8` receive transparent answers from Pi-hole instead of silently breaking. +- **REJECT, not DROP, for port 853** — DoT clients see a fast error and fall back to classic DNS immediately instead of timing out. +- **Exempt the Pi-hole** — it needs to reach upstream resolvers (`1.1.1.1` etc.) unimpeded. +- **`-i br0` only** — LAN ingress, not WAN. + +### Persistence depends on router firmware + +- **Asuswrt-Merlin:** add rules to `/jffs/scripts/firewall-start` — runs on every firewall init. +- **Stock AsusWRT 388+:** `/jffs/scripts/firewall-start` is **not** honored. Rules added live persist until the next `restart_firewall` event (reboot, WAN flap, GUI config change). Workarounds: flash to Merlin, use the GUI's "LAN ▸ Network Services Filter" (DROP-only, less flexible), or schedule a cron re-apply in `/jffs/configs/crontab`. +- **OpenWrt / pfSense / OPNsense:** their respective firewall config persistence works out of the box. + +## Summary — minimum viable DoH defense + +1. Pi-hole block mode = `NULL` (default — verify). +2. Install HaGeZi `doh-vpn-proxy-bypass` adlist. +3. Run `pihole -g`. +4. Verify major DoH bootstraps return `0.0.0.0`. +5. Optional: add router iptables rules to close the IoT/hardcoded-DNS gap. + +Steps 1–4 give you ~97% effectiveness with zero client-side changes and no broken devices. Step 5 is polish for threat models where LAN-wide DNS visibility matters. + +## Related + +- [[MajorPi]] — local Pi-hole deployment +- [[pihole-v6-adlist-management]] — adlist CRUD via SQL (v5 CLI commands don't work in v6) +- [[Network Overview]] — fleet network context diff --git a/02-selfhosting/dns-networking/pihole-v6-adlist-management.md b/02-selfhosting/dns-networking/pihole-v6-adlist-management.md new file mode 100644 index 0000000..027481b --- /dev/null +++ b/02-selfhosting/dns-networking/pihole-v6-adlist-management.md @@ -0,0 +1,180 @@ +--- +title: "Pi-hole v6 Adlist Management via SQL" +domain: selfhosting +category: dns-networking +tags: [pihole, pihole-v6, adlist, dns, sql, sqlite, gravity, runbook] +status: published +created: 2026-04-22 +updated: 2026-04-22 +--- + +# Pi-hole v6 Adlist Management via SQL + +## The Problem + +Pi-hole v6 removed the `pihole -a adlist` CLI subcommands. The old muscle-memory commands (`pihole -a adlist add `, `pihole -a adlist remove `, `pihole -a adlist list`) all return errors or are no-ops on v6. The Web UI works, but for scripting, Ansible, or SSH-only hosts, you need a CLI-level method. + +The answer is to hit the `gravity.db` SQLite database directly. It's simple, idempotent, and scriptable. + +## Prerequisites + +- Pi-hole v6 installed (`pihole -v` → Core version v6.x). +- `sudo` access — `gravity.db` is owned `pihole:pihole` mode 660. +- `sqlite3` binary is **not** required. Pi-hole ships `pihole-FTL` with a built-in `sqlite3` subcommand that you can use instead: + ```bash + sudo pihole-FTL sqlite3 /etc/pihole/gravity.db "SELECT 1;" + ``` + Use this on any host where you don't want to install the standalone `sqlite3` package (e.g., Raspberry Pi OS minimal). + +## Listing adlists + +```bash +sudo pihole-FTL sqlite3 -column -header /etc/pihole/gravity.db \ + "SELECT id, enabled, address, comment FROM adlist ORDER BY id;" +``` + +| Column | Meaning | +|---|---| +| `id` | Internal ID (autoincrement, **does not match `queries.list_id`** — see note below) | +| `enabled` | `1` = active, `0` = disabled (still in DB but not compiled into gravity) | +| `address` | The URL fetched by `pihole -g` | +| `comment` | Human-readable label shown in the Web UI | + +## Adding an adlist + +```bash +NOW=$(date +%s) +sudo pihole-FTL sqlite3 /etc/pihole/gravity.db < @192.168.50.238 +# Expected: 0.0.0.0 (when dns.blocking.mode = NULL) +``` + +If you get a real answer, either the adlist fetch failed (check `pihole -g` output for 403/404), or the entry isn't in the list you added. + +## Common gotchas + +### `pihole -g` fails with "Forbidden" + +The adlist URL returned HTTP 403 or 404. HaGeZi and OISD in particular reorganize file paths occasionally. Remove the broken entry and either substitute the new URL or drop it: + +```bash +sudo pihole-FTL sqlite3 /etc/pihole/gravity.db \ + "DELETE FROM adlist WHERE address = '<404-url>';" +``` + +### `queries.list_id` doesn't match `adlist.id` + +In Pi-hole v6's FTL query log, the `list_id` column on `queries`/`query_storage` does **not** reliably point back at the `adlist.id`. For `status=4` (regex), it references a `domainlist.id`. For `status=1` (gravity), it can reference a `gravity` table rowid, not the adlist. Do not assume a bidirectional mapping — treat `list_id` as an opaque debug hint. + +### Stale regex after editing `domainlist` + +FTL compiles regex rules into memory at process start and on explicit reload. Editing `domainlist` via SQL without calling `pihole reloaddns` afterwards leaves the old compiled regex active. Symptom: `queries.status=4` blocks firing for domains whose `list_id` points at deleted entries. + +Fix: always follow `domainlist` edits with: +```bash +sudo pihole reloaddns +``` + +Verify via the FTL log: +```bash +sudo grep "Compiled .* regex" /var/log/pihole/FTL.log | tail +# → "Compiled N allow and M deny regex for X clients" +``` + +The numbers should match the count of `enabled=1` entries in `domainlist` by `type`. + +### No standalone `sqlite3` on the host + +Use `pihole-FTL sqlite3` — ships with every Pi-hole install, behaves identically to the standalone binary for the commands shown here. Do not install the `sqlite3` package just to manage Pi-hole. + +## Useful introspection queries + +**Total gravity domains by adlist:** +```sql +SELECT a.id, a.comment, COUNT(g.domain) AS domains +FROM gravity g +JOIN adlist a ON a.id = g.adlist_id +GROUP BY a.id +ORDER BY domains DESC; +``` + +**Active regex rules (what FTL SHOULD be running):** +```sql +SELECT * FROM vw_regex_denylist; +SELECT * FROM vw_regex_allowlist; +``` + +**Blocked queries in the last hour by adlist source:** +```sql +SELECT + CASE status + WHEN 1 THEN 'gravity' + WHEN 4 THEN 'regex_deny' + WHEN 5 THEN 'exact_deny' + WHEN 9 THEN 'gravity_cname' + WHEN 10 THEN 'regex_cname' + WHEN 11 THEN 'exact_cname' + END AS source, + COUNT(*) AS n +FROM queries +WHERE timestamp > strftime('%s','now','-1 hour') + AND status IN (1,4,5,9,10,11) +GROUP BY status; +``` + +## Related + +- [[MajorPi]] — the host running this +- [[pihole-doh-dot-bypass-defense]] — DoH/DoT bypass defense (reasons to add specific adlists) diff --git a/02-selfhosting/index.md b/02-selfhosting/index.md index 40a08c3..4b012ad 100644 --- a/02-selfhosting/index.md +++ b/02-selfhosting/index.md @@ -1,6 +1,6 @@ --- created: 2026-04-13T10:15 -updated: 2026-04-13T10:15 +updated: 2026-04-22T19:16 --- # 🏠 Self-Hosting & Homelab @@ -19,6 +19,8 @@ Guides for running your own services at home, including Docker, reverse proxies, ## DNS & Networking - [Tailscale for Homelab Remote Access](dns-networking/tailscale-homelab-remote-access.md) +- [Pi-hole v6 Adlist Management via SQL](dns-networking/pihole-v6-adlist-management.md) +- [Pi-hole DoH / DoT Bypass Defense](dns-networking/pihole-doh-dot-bypass-defense.md) ## Storage & Backup @@ -39,3 +41,8 @@ Guides for running your own services at home, including Docker, reverse proxies, - [Fail2ban Custom Jail: WordPress Login Brute Force](security/fail2ban-wordpress-login-jail.md) - [SELinux: Fixing Fail2ban grep execmem Denial](security/selinux-fail2ban-execmem-fix.md) - [UFW Firewall Management](security/ufw-firewall-management.md) + +## Services + +- [Mastodon: Database Maintenance](services/mastodon-db-maintenance.md) +- [Mastodon: Federation & Domain Blocks](services/mastodon-federation.md) diff --git a/02-selfhosting/services/mastodon-db-maintenance.md b/02-selfhosting/services/mastodon-db-maintenance.md new file mode 100644 index 0000000..4db9334 --- /dev/null +++ b/02-selfhosting/services/mastodon-db-maintenance.md @@ -0,0 +1,143 @@ +--- +title: Mastodon DB Maintenance — Statuses, Accounts, and VACUUM +domain: selfhosting +category: services +tags: + - mastodon + - database + - postgresql + - maintenance + - tootctl + - majortoot +status: published +created: 2026-04-22 +updated: 2026-04-22 +--- + +# Mastodon DB Maintenance + +Mastodon aggressively caches remote content — avatars, statuses, follow graphs — from every instance it federates with. On an active instance, this causes substantial PostgreSQL bloat over time. Without periodic maintenance, the database grows unbounded even if S3 handles media. + +## The Problem — majortoot at ~3.5 years + +| Table | Size | Rows | +|-------|------|------| +| `statuses` | 3.5 GB | 3.6M rows (3.6M remote cached, 37K local) | +| `accounts` | 499 MB | 214,770 remote cached, 18 local | +| `preview_cards` | 837 MB | remote link previews | +| `statuses_tags` | 506 MB | cascades from statuses | +| `conversations` | 436 MB | cascades from statuses | +| `mentions` | 305 MB | cascades from statuses | + +The `statuses remove` and `accounts cull` commands address most of this. + +--- + +## Maintenance Tasks + +### 1. Cache Clear + +Clears in-memory Rails caches. Fast (<5 seconds), safe to run anytime. + +```bash +tootctl cache clear +``` + +### 2. Statuses Remove + +Removes cached remote statuses (and their cascaded rows in `statuses_tags`, `mentions`, `conversations`, `status_stats`) older than N days. Does **not** touch local statuses. + +```bash +tootctl statuses remove --days=90 +``` + +> [!warning] This is the slowest step +> On a 3.6M-row statuses table, the extraction phase alone can take 20–40 minutes. PostgreSQL will be under heavy load. Run during off-peak hours. + +**What gets removed:** Remote statuses not pinned, not boosted by local users, and not replied to by local users, older than the threshold. + +### 3. Accounts Cull + +Contacts each remote account's home instance via WebFinger to check if it still exists. Removes accounts that return 404 or `Gone`. Catches dead instances, deleted accounts, and renamed handles. + +```bash +tootctl accounts cull +``` + +> [!note] Network-bound +> Cull makes HTTP requests to remote servers. It's slower on flaky network conditions and will skip accounts it can't reach (to avoid false deletions). + +### 4. VACUUM ANALYZE + +After large deletions, PostgreSQL does not immediately return space to the OS — dead rows sit in pages marked for reuse. `VACUUM ANALYZE` reclaims that space and updates query planner statistics. + +```bash +sudo -u postgres psql mastodon_production -c "VACUUM ANALYZE;" +``` + +For recovering actual disk space (not just marking pages free), `VACUUM FULL` is more aggressive but locks tables. Stick with plain `VACUUM ANALYZE` for routine maintenance. + +--- + +## The Maintenance Script + +**Location:** `/home/mastodon/maintenance.sh` +**Cron:** `0 2 * * 0` — Sunday 2 AM (runs before media prune at 3 AM) +**Log:** `/var/log/mastodon/maintenance.log` +**Notifications:** Email to `marcus@majorshouse.com` at each step via Postfix → MajorMail + +The script runs all four tasks in sequence and sends a notification email: + +- **On start** — lists steps and current DB size +- **After cache clear** — confirms done, warns statuses remove will take a while +- **After statuses remove** — summary output + current DB size +- **After accounts cull** — accounts removed + current DB size +- **On completion** — full timing breakdown and final DB size + +### Running Manually + +```bash +ssh root@100.110.197.17 +bash /home/mastodon/maintenance.sh +``` + +### Monitoring Progress + +```bash +ssh root@100.110.197.17 "tail -f /var/log/mastodon/maintenance.log" +``` + +### tootctl Wrapper (one-off commands) + +The `mastodon` user's rbenv is not on PATH in a login shell. Always use the wrapper: + +```bash +su - mastodon -c 'export PATH="/home/mastodon/.rbenv/bin:/home/mastodon/.rbenv/shims:$PATH" && eval "$(rbenv init -)" && cd /home/mastodon/live && RAILS_ENV=production bin/tootctl ' +``` + +--- + +## Full Cron Schedule on majortoot + +| Time | Job | Script | +|------|-----|--------| +| Sun 2 AM | DB maintenance | `/home/mastodon/maintenance.sh` | +| Sun 3 AM | Media prune (S3) | `/home/mastodon/media-prune.sh` | +| Daily 8 AM | Fail2Ban digest | `/usr/local/bin/fail2ban-digest.sh` | +| Monthly | Fail2Ban nginx-botsearch prune | `/usr/local/bin/f2b-prune.sh` | +| Daily | Certbot renewal | `service nginx stop; certbot renew; service nginx start` | + +--- + +## First Run Results (2026-04-22) + +First maintenance run ever on majortoot after ~3.5 years of operation. Results pending (job running in background at time of writing). Check `/var/log/mastodon/maintenance.log` for final numbers. + +--- + +## See Also + +- [[Mastodon]] — service doc (deployment, access, S3 config) +- [[majortoot]] — server doc (incident log, specs) +- [[mastodon-federation]] — domain blocks, silencing, FediSeer +- [[mastodon-instance-tuning]] — character limits, media cache diff --git a/02-selfhosting/services/mastodon-federation.md b/02-selfhosting/services/mastodon-federation.md new file mode 100644 index 0000000..31e5fb9 --- /dev/null +++ b/02-selfhosting/services/mastodon-federation.md @@ -0,0 +1,168 @@ +--- +title: Mastodon Federation — Domain Blocks, Silencing, and FediSeer +domain: selfhosting +category: services +tags: + - mastodon + - federation + - fediverse + - domain-blocks + - fediseer + - majortoot +status: published +created: 2026-04-22 +updated: 2026-04-22 +--- + +# Mastodon Federation — Domain Blocks, Silencing, and FediSeer + +## Domain Block Severity — Critical Gotcha + +The Mastodon admin UI labels severities as **Silence** and **Suspend**, but the integer values stored in the database are **not** in alphabetical order. The Rails enum is: + +```ruby +# app/models/domain_block.rb +enum :severity, { silence: 0, suspend: 1, noop: 2 }, validate: true +``` + +| DB value | Meaning | Effect | +|----------|---------|--------| +| `0` | **silence** | Instance limited — posts hidden from public timelines; follows require manual approval | +| `1` | **suspend** | Full defederation — all content removed, all follows severed | +| `2` | **noop** | No effect — entry tracked but no federation action taken | + +> [!warning] Don't trust raw integer queries +> If you query `domain_blocks` directly via psql, severity `0` looks like "the lowest level" but it's actually **silence** — a meaningful restriction. Always map through the enum. This tripped up a defederation investigation on 2026-04-22 where 13 silenced instances (including mastodon.social) were initially misread as noop. + +### majortoot block inventory (as of 2026-04-22) + +| Severity | Count | Notable entries | +|----------|-------|-----------------| +| silence (0) | 13 | mastodon.social, mastodon.world, chaos.social, fosstodon.org, tech.lgbt, threads.net | +| suspend (1) | 413 | Full defederation list | +| noop (2) | 0 | — | + +--- + +## How Silencing Affects Follows + +When your instance silences a remote domain, **every follow request from that domain requires manual approval** — even if your account has `locked = false`. + +This is enforced in `app/lib/activitypub/activity/follow.rb`: + +```ruby +if target_account.locked? || @account.silenced? + LocalNotificationWorker.perform_async(target_account.id, follow_request.id, 'FollowRequest', 'follow_request') +``` + +`@account.silenced?` returns true when the sending account's domain is in your `domain_blocks` at severity=0. The follow goes to the follow_requests queue instead of being automatically accepted. + +**Practical effect on majortoot:** mastodon.social is silenced (added 2026-12-11, same day as a FluentInFinance follow-spam report). All follows from mastodon.social accounts appear as pending follow requests requiring manual approval. This is intentional — it's the expected behavior of a silence block. + +--- + +## Checking Defederation Status + +### Are major instances blocking you? + +Check if your domain appears in another instance's public block list: + +```bash +# Check mastodon.social's public block list (397 entries as of 2026-04-22) +curl -s "https://mastodon.social/api/v1/instance/domain_blocks" | \ + python3 -c "import sys,json; data=json.load(sys.stdin); \ + found=[b for b in data if b['domain']=='toot.majorshouse.com']; \ + print('BLOCKED' if found else 'Not in public block list')" +``` + +Note: instances can mark blocks as private, so absence from the public list is not a guarantee. + +### Are you in their peer list? + +If you're in an instance's peer list, they've federated with you at some point: + +```bash +curl -s "https://mastodon.social/api/v1/instance/peers" | \ + python3 -c "import sys,json; data=json.load(sys.stdin); print('toot.majorshouse.com' in data)" +``` + +### Is the account visible from a remote instance? + +```bash +curl -s "https://mastodon.social/api/v1/accounts/lookup?acct=majorlinux@toot.majorshouse.com" | \ + python3 -c "import sys,json; d=json.load(sys.stdin); print('limited:', d.get('limited'), 'suspended:', d.get('suspended'))" +``` + +`limited: true` means the remote instance has silenced toot.majorshouse.com. + +### Check federation delivery health (Sidekiq) + +```bash +ssh root@100.110.197.17 "redis-cli llen sidekiq:dead; redis-cli llen sidekiq:retry" +# Both should be 0 for a healthy instance +``` + +### Check unavailable domains (delivery consistently failing) + +```bash +ssh root@100.110.197.17 " +sudo -u postgres psql mastodon_production -c \ + 'SELECT domain, updated_at FROM unavailable_domains ORDER BY updated_at DESC LIMIT 20;'" +``` + +These are domains where ActivityPub delivery has repeatedly failed. Most are dead instances, not active blocks. + +--- + +## FediSeer Registration + +[FediSeer](https://fediseer.com) is a community service that tracks censures (formal complaints) against fediverse instances. Registering lets you monitor if any instance formally censures toot.majorshouse.com. + +### majortoot status (registered 2026-04-22) + +| Field | Value | +|-------|-------| +| Domain | toot.majorshouse.com | +| ID | 5575 | +| State | UP | +| Censures received | 0 | +| Endorsements | 0 | +| Tags | mastodon, selfhosted, leftist, foss | +| Guarantor | none | +| API key | Bitwarden — "FediSeer — toot.majorshouse.com" | + +### Claiming / re-claiming your instance + +```bash +# Claim (sends API key via DM from @fediseer@fediseer.com) +curl -s -X PUT "https://fediseer.com/api/v1/whitelist/toot.majorshouse.com" \ + -H "Content-Type: application/json" \ + -d '{"admin": "majorlinux", "pm_proxy": "MASTODON"}' + +# The API key arrives as a DM — delete the DM after saving to Bitwarden +``` + +### Check censures + +```bash +curl -s "https://fediseer.com/api/v1/censures/toot.majorshouse.com" | python3 -c \ + "import sys,json; d=json.load(sys.stdin); print('Censures:', d.get('total',0))" +``` + +### Update tags + +```bash +curl -s -X PUT "https://fediseer.com/api/v1/tags" \ + -H "Content-Type: application/json" \ + -H "apikey: " \ + -d '{"tags_csv": "mastodon,selfhosted,leftist,foss"}' +``` + +--- + +## See Also + +- [[Mastodon]] — service doc +- [[majortoot]] — server doc +- [[mastodon-db-maintenance]] — statuses remove, accounts cull, vacuum +- [[mastodon-instance-tuning]] — character limits, media cache diff --git a/02-selfhosting/services/mastodon-instance-tuning.md b/02-selfhosting/services/mastodon-instance-tuning.md index 800dfeb..e50f15c 100644 --- a/02-selfhosting/services/mastodon-instance-tuning.md +++ b/02-selfhosting/services/mastodon-instance-tuning.md @@ -10,7 +10,7 @@ tags: - docker status: published created: 2026-04-02 -updated: 2026-04-19T04:55 +updated: 2026-04-25T12:56 --- # Mastodon Instance Tuning diff --git a/05-troubleshooting/ansible-check-mode-false-positives.md b/05-troubleshooting/ansible-check-mode-false-positives.md index 06b75f4..cfe7857 100644 --- a/05-troubleshooting/ansible-check-mode-false-positives.md +++ b/05-troubleshooting/ansible-check-mode-false-positives.md @@ -11,7 +11,11 @@ tags: - troubleshooting status: published created: 2026-04-18 +<<<<<<< Updated upstream updated: 2026-04-19T04:57 +======= +updated: 2026-04-22T09:20 +>>>>>>> Stashed changes --- # Ansible Check Mode False Positives in Verify/Assert Tasks diff --git a/05-troubleshooting/fantastical-google-phantom-calendar-syncselect.md b/05-troubleshooting/fantastical-google-phantom-calendar-syncselect.md new file mode 100644 index 0000000..9b1ba63 --- /dev/null +++ b/05-troubleshooting/fantastical-google-phantom-calendar-syncselect.md @@ -0,0 +1,111 @@ +--- +title: "Fantastical Google Sync Error Flood — Phantom Calendars Fixed via syncselect" +domain: troubleshooting +category: productivity +tags: [fantastical, google-calendar, caldav, sync, macos, syncselect] +status: published +created: 2026-04-24 +updated: 2026-04-24 +--- + +# Fantastical Google Sync Error Flood — Phantom Calendars Fixed via syncselect + +Fantastical floods its macOS unified log with Google Calendar sync errors, the app shows persistent sync failures in the UI, and re-adding the Google account inside Fantastical doesn't fix it. The cause is usually orphan calendar references — calendars that were deleted from Google Calendar but still enabled in Google's per-account CalDAV sync whitelist. + +## The Short Answer + +Visit **`https://www.google.com/calendar/syncselect`**, uncheck any calendars that no longer exist or you don't want Fantastical / Apple Calendar to try syncing, click Save. Fantastical's error flood stops within one sync cycle. + +This is a per-Google-account page — completely independent of Fantastical's settings, and independent of the calendar list inside Google Calendar's main web UI. + +## Background + +Google Calendar has **three** separate notions of calendar "visibility" for a given account: + +| Surface | What it controls | +|---|---| +| `calendar.google.com` main UI — calendar list in the left sidebar | What you see in Google's own web interface | +| `calendar.google.com/calendar/u/0/r/settings/calendar/...` — per-calendar settings | Sharing, notifications, access control | +| **`google.com/calendar/syncselect`** — sync selection | **What Google exposes to third-party CalDAV/Exchange clients** (Apple Calendar, Fantastical, Outlook, Thunderbird, etc.) | + +Fantastical talks to Google via CalDAV. It asks Google for the list of calendars enabled for CalDAV sync. If `syncselect` still has a calendar flagged for sync but the calendar has been deleted from Google (or unshared from you), Google returns an inconsistent response — the CalDAV principal lists the calendar ID but any request for its data returns 404. Fantastical dutifully logs an error and retries next sync cycle. Multiply by the number of orphans and you get a flood. + +Deleting a calendar from Google Calendar's main UI does **not** automatically remove it from `syncselect`. That's the gotcha. + +## Symptoms + +- Fantastical UI shows "Sync Error" or a red badge on the account +- macOS unified log filling with lines like: + ``` + [FBGooglePrincipalSyncSession.m] Unable to find Google Calendar information: + @group.calendar.google.com in () + ``` +- `dataaccessd` logs `Error Domain=kEKAccountErrorDomain Code=0` with `lastSyncStartDate = (null)` +- Fantastical's helper `85C27NK92C.com.flexibits.fantastical2.mac.helper` spams XPC / CoreData token errors every 3 seconds (secondary symptom when the token store gets wedged in the retry loop) + +## Diagnosis + +### Step 1 — Spot the phantom calendar IDs in the log + +```bash +log show --last 5m --style compact \ + --predicate 'eventMessage CONTAINS "Unable to find Google Calendar"' 2>/dev/null \ + | grep -oE 'information: [a-zA-Z0-9._%@-]+' | sort -u +``` + +Each line returned is a calendar ID Fantastical is asking Google for that Google can't find. + +### Step 2 — Get calendar names from Fantastical's local DB + +The orphan IDs alone look random. To match them to what the calendars were called (so you know what to uncheck in syncselect), query Fantastical's SQLite DB: + +```bash +DB="$HOME/Library/Group Containers/85C27NK92C.com.flexibits.fantastical2.mac/Database/Fantastical-8.fcdata" + +for id in ; do + echo "--- $id ---" + strings "$DB" 2>/dev/null | grep "$id" | head -5 +done +``` + +Fantastical stores the calendar's display name near each ID in the binary form. You may see names like `Kitchen Lights`, `Major7`, or other labels that remind you what the calendar was used for — often a deleted smart-home automation trigger, an old device's dedicated calendar, a former coworker's shared calendar, a subscribed sports or holiday calendar that moved. + +### Step 3 — Visit syncselect + +Open `https://www.google.com/calendar/syncselect` in the same browser you're signed in with. You'll see every calendar Google knows about for this account, with a checkbox per entry: + +- ✅ Live calendars you want on devices — leave checked +- ❌ Orphans, former smart-home triggers, deleted shared calendars — **uncheck** +- Unsure? Cross-reference against the names from Step 2 + +Click **Save**. + +## Fix + +1. Uncheck orphans at `https://www.google.com/calendar/syncselect`, click Save. +2. Let Fantastical complete one more sync cycle (or quit + relaunch for faster turnaround). +3. Verify the log is clean: + ```bash + log show --last 2m --style compact \ + --predicate 'eventMessage CONTAINS "Unable to find Google Calendar"' 2>/dev/null \ + | wc -l + ``` + Should return 0. + +**What you should NOT do as a first attempt:** + +- Remove and re-add the Google account inside Fantastical. This fixes some orphans but not all — Fantastical's local event cache keeps references to calendars that have associated cached events, so orphans with historical data survive a standard account re-add. Hit `syncselect` first. +- Delete Fantastical's `.fcdata` SQLite. Nuclear, loses local cache, unnecessary for this specific issue. + +## Gotchas & Notes + +- **syncselect is per-Google-account**, so if you have multiple Google accounts in Fantastical, each needs its own visit. The URL will use whichever account you're currently signed in with in the browser. +- **Calendar deletion from `calendar.google.com` doesn't propagate to syncselect.** This is a Google quirk, not a Fantastical bug. +- **The same fix applies to Apple Calendar.app** if it's showing the same sync errors — Fantastical and Apple Calendar use identical CalDAV plumbing via macOS's `dataaccessd`. +- The phantom calendar IDs will remain in Fantastical's `.fcdata` for a while even after the fix — Fantastical doesn't aggressively garbage-collect cached event data. This is cosmetic and doesn't re-trigger sync errors as long as syncselect no longer lists them. +- The XPC `Unable to create token NSXPCConnection` loop is downstream of the sync error flood — when Fantastical's helper gets wedged on repeated failed syncs, its CoreData-backed OAuth token store can't initialize cleanly. Fixing syncselect + a full Fantastical quit (menubar → Quit Fantastical, not just `Cmd+Q`) + relaunch clears this too. + +## Related + +- [[Recap]] skill — uses Google Calendar MCPs that are unaffected by this issue (MCPs go through Google's API directly, not CalDAV) +- Google's syncselect URL: https://www.google.com/calendar/syncselect diff --git a/05-troubleshooting/fedora-usrmerge-ebtables-blocker.md b/05-troubleshooting/fedora-usrmerge-ebtables-blocker.md deleted file mode 100644 index 63748af..0000000 --- a/05-troubleshooting/fedora-usrmerge-ebtables-blocker.md +++ /dev/null @@ -1,126 +0,0 @@ ---- -title: "Fedora usrmerge: ebtables Symlink Blocks Directory Consolidation" -domain: troubleshooting -category: fedora -tags: [fedora, usrmerge, ebtables, update-alternatives, ansible, dnf] -status: published -created: 2026-04-19 -updated: 2026-04-19 ---- - -# Fedora usrmerge: ebtables Symlink Blocks Directory Consolidation - -## Symptom - -Every `dnf upgrade` on Fedora 43 (and some earlier Fedora releases) emits a warning partway through the transaction: - -``` -/usr/sbin cannot be merged yet, /usr/sbin/ebtables points to /etc/alternatives/ebtables -``` - -When the upgrade is driven by Ansible, the warning contaminates the module's JSON output and surfaces in a play run as: - -``` -TASK [Upgrade all packages on CentOS/Fedora servers] *** -changed: [majorlab] -[WARNING]: Module invocation had junk after the JSON data: - /usr/sbin cannot be merged yet, /usr/sbin/ebtables points to /etc/alternatives/ebtables -changed: [majordiscord] -``` - -The upgrade succeeds — the warning is cosmetic — but it keeps firing on every run until the underlying state is cleaned up. - -## Why It Happens - -Fedora's `usrmerge` transition turns `/usr/sbin` into a symlink to `/usr/bin`. The `filesystem` package's post-install scriptlet enforces that at every transaction: it walks `/usr/sbin` looking for any entity still pinned to the old path and refuses to consolidate until they're removed. - -`ebtables` triggers this because `update-alternatives` can create registrations at `/usr/sbin/` with targets in `/etc/alternatives/`. Those symlinks: - -- Are **not owned by any rpm** (confirmable with `rpm -qf /usr/sbin/ebtables` → "not owned") -- Predate the usrmerge — they were created when `/usr/sbin` was still a real directory -- Point to a target (`/etc/alternatives/ebtables`) that in turn points back into `/usr/sbin/ebtables-legacy` or `/usr/bin/ebtables-nft` - -Because these live outside rpm, no package upgrade can clean them up. The filesystem scriptlet detects the blocker and backs off. - -## Investigation - -1. Confirm which hosts are affected: - ```bash - ansible fedora -m shell -a '[ -e /usr/sbin/ebtables ] && ls -la /usr/sbin/ebtables' - ``` -2. Inspect the alternatives registration: - ```bash - update-alternatives --display ebtables - ``` - Note whether the link points at `/usr/bin/ebtables-nft` (nft backend) or `/usr/sbin/ebtables-legacy` (legacy backend). Different Fedora images ship with different defaults. -3. Confirm ownership: - ```bash - rpm -qf /usr/sbin/ebtables /etc/alternatives/ebtables - ``` - Both should report "not owned by any package." That's the signal. - -## Fix - -Tear down the alternative, delete the blocker symlinks, then re-register with **`/usr/bin` paths on both sides of the registration** so the scriptlet has nothing left to complain about. - -```bash -# Capture current provider first (nft or legacy) -update-alternatives --display ebtables - -# Remove the stale registration -update-alternatives --remove-all ebtables - -# Clear the blocking symlinks (not rpm-owned) -rm -f /usr/sbin/ebtables /etc/alternatives/ebtables - -# Re-register with /usr/bin paths — example for nft backend -update-alternatives --install /usr/bin/ebtables ebtables /usr/bin/ebtables-nft 10 \ - --slave /usr/bin/ebtables-restore ebtables-restore /usr/bin/ebtables-nft-restore \ - --slave /usr/bin/ebtables-save ebtables-save /usr/bin/ebtables-nft-save \ - --slave /usr/share/man/man8/ebtables.8.gz ebtables.8.gz /usr/share/man/man8/ebtables-nft.8.gz - -# For legacy backend, swap -nft suffixes for -legacy -``` - -Verify: - -```bash -which ebtables # should resolve to /usr/bin/ebtables -ebtables -V # should print the version without error -test -e /usr/sbin/ebtables && echo BLOCKER || echo clean -``` - -Next `dnf upgrade` will consolidate `/usr/sbin` cleanly with no warning. - -## Ansible Playbook - -`MajorAnsible/fix_ebtables_usrmerge.yml` handles this fleet-wide: - -- Detects the backend (nft vs legacy) per host via `update-alternatives --display` -- Uses `check_mode: false` on the detection query — otherwise `ansible.builtin.command` is skipped in `--check`, the detection fact defaults, and downstream conditionals misfire (see [Ansible Check Mode False Positives](ansible-check-mode-false-positives.md) for the broader pattern) -- Safety check: bails out if `/usr/bin/ebtables-` is missing before touching anything -- Idempotent on re-run — no alternative registered → `end_host` - -Applied 2026-04-19 across the four Fedora hosts: - -| Host | Backend | -|---|---| -| majorlab | nft (`ebtables v1.8.11 nf_tables`) | -| majorhome | nft | -| majormail | legacy (`ebtables v2.0.11 (legacy)`) | -| majordiscord | legacy | - -## Why not just remove ebtables? - -Tempting, since nothing on the fleet currently writes L2 bridge firewall rules. But: - -- `ebtables` is a transitive dependency of iptables/libvirt/networking packages on Fedora — removing it fights the package manager -- The package itself isn't the problem; the **stale alternatives state** is - -Cleaning up the registration is cheaper than untangling the dependency graph. - -## Related - -- [Ansible Check Mode False Positives in Verify/Assert Tasks](ansible-check-mode-false-positives.md) -- Playbook: `MajorAnsible/fix_ebtables_usrmerge.yml` -- Fedora usrmerge background: `man file-hierarchy`, Fedora Change page "UsrMove" diff --git a/05-troubleshooting/index.md b/05-troubleshooting/index.md index d16d37e..15fb483 100644 --- a/05-troubleshooting/index.md +++ b/05-troubleshooting/index.md @@ -1,6 +1,6 @@ --- created: 2026-03-15T06:37 -updated: 2026-04-22T18:11 +updated: 2026-04-25T12:57 --- # 🔧 General Troubleshooting @@ -14,6 +14,7 @@ Practical fixes for common Linux, networking, and application problems. - [Mail Client Stops Receiving: Fail2ban IMAP Self-Ban](networking/fail2ban-imap-self-ban-mail-client.md) - [firewalld: Mail Ports Wiped After Reload](networking/firewalld-mail-ports-reset.md) - [Tailscale SSH: Unexpected Re-Authentication Prompt](networking/tailscale-ssh-reauth-prompt.md) +- [rsync over Tailscale: Hung in TCP Teardown After Transfer Completes](networking/rsync-tailscale-teardown-stall.md) - [Windows OpenSSH: WSL Default Shell Breaks Remote Commands](networking/windows-openssh-wsl-default-shell-breaks-remote-commands.md) - [Pi-hole AI Blocklist Blocks Claude Desktop (ERR_CONNECTION_REFUSED)](networking/pihole-blocks-claude-desktop.md) - [ISP SNI Filtering & Caddy](isp-sni-filtering-caddy.md) @@ -44,5 +45,6 @@ Practical fixes for common Linux, networking, and application problems. ## 🤖 AI / Local LLM - [Ollama Drops Off Tailscale When Mac Sleeps](ollama-macos-sleep-tailscale-disconnect.md) +- [Ollama: `ollama run` with Piped Stdin Bypasses Chat Template + SYSTEM Prompt](ollama-chat-template-pipe-stdin-bypass.md) - [Windows OpenSSH Server (sshd) Stops After Reboot](networking/windows-sshd-stops-after-reboot.md) - [claude-mem Silently Fails with Claude Code 2.1+ (Empty `--setting-sources`)](claude-mem-setting-sources-empty-arg.md) diff --git a/05-troubleshooting/networking/rsync-tailscale-teardown-stall.md b/05-troubleshooting/networking/rsync-tailscale-teardown-stall.md new file mode 100644 index 0000000..d984760 --- /dev/null +++ b/05-troubleshooting/networking/rsync-tailscale-teardown-stall.md @@ -0,0 +1,89 @@ +--- +title: "rsync over Tailscale: Hung in TCP Teardown After Transfer Completes" +domain: troubleshooting +category: networking +tags: [rsync, ssh, tailscale, hang, tcp-fin, hash-mismatch] +status: published +created: 2026-04-25 +updated: 2026-04-25 +--- + +# rsync over Tailscale: Hung in TCP Teardown After Transfer Completes + +A long rsync transfer over Tailscale finishes — the destination file is at full size, rsync's own summary line is in the log — but the rsync, ssh client, and parent bash processes never exit. The `&&` chain that should run after rsync (e.g. `&& echo DONE`) never fires. Watcher scripts polling for completion can stall indefinitely. + +## The Short Answer + +The data is fine. Verify with `md5sum` (or `md5 -q` on macOS) against the source, then kill the hung pipeline. + +```bash +# 1. confirm size matches rsync's reported total_size +ls -lh ~/your-file.gguf +tail ~/rsync.log # look for "total size is N" line + +# 2. checksum end-to-end +md5 -q ~/your-file.gguf # macOS +ssh majorlinux@100.x.x.x 'md5sum /source/path/your-file.gguf' # Linux source + +# 3. if hashes match, kill the hung pipeline by name +pkill -f 'rsync.*your-file' || true +pkill -f 'ssh .*rsync --server' || true +``` + +## How to Notice + +`ps aux | grep rsync` shows the rsync client, the spawned ssh, and the wrapping bash all in `S` state with **0 CPU activity** and timestamps from minutes-to-hours ago. The destination file already exists at the final (non-`.partial` / non-dotfile) path at full size. The trailing summary in the rsync log reads: + +``` +sent N bytes received M bytes ... bytes/sec +total size is X speedup is Y +``` + +…but the bash `&&` followup that depends on rsync's exit code never runs. + +## Why This Happens + +rsync's exit waits for the underlying ssh transport to close cleanly. Over Tailscale (especially after a long-running connection that bridged a sleep, reconnect, or NAT shuffle), the TCP FIN/ACK handshake from the remote sshd can be lost or delayed indefinitely. The local end has all the data, has finalized the file, has printed its summary — but it's still blocked in `read()` on a socket that will never close on its own. + +This is amplified when: +- The transfer hits a hash-mismatch retry mid-flight (rsync re-pulls the temp file). Each retry re-establishes connection state that's more vulnerable to teardown weirdness. +- The link briefly drops and reconnects via DERP relay during the transfer. +- The source machine is on WSL2 — Windows network stack rewrites can defer FINs. + +The upshot: the data was transferred correctly long before the pipeline reports done. Don't wait — verify and move on. + +## Don't Just Kill — Verify First + +Killing a hung rsync **before the file is complete** can leave a partial file that looks complete by size alone. Always: + +1. Compare the on-disk size to the `total size is N` line in the rsync log +2. md5 (or sha256) against the source to confirm bit-for-bit equality +3. Only then kill the hung processes + +Skipping the checksum step risks silently corrupting downstream consumers of the file (Ollama blobs, archive pipelines, etc.). + +## Watcher Threshold Gotcha + +If you have a polling watcher script that fires a notification when the file reaches some threshold size, **set the threshold below the actual file size**, not above it. Example: a 4.68 GB GGUF transferred fine but the watcher's threshold was set to 4.7 GB (`4_700_000_000` bytes), so the threshold never triggered even though the transfer completed. + +```bash +# bad — threshold above true size +TARGET=4700000000 # 4.7 GB + +# good — threshold below true size +TARGET=4600000000 # 4.6 GB, fires at ~98% complete +``` + +Or better: trust the rsync exit code / the `RSYNC_DONE` marker line your wrapper writes after `&&`, not file size. + +## Prevention + +- Wrap rsync in a watchdog. If rsync hasn't exited within `expected_runtime + 2 minutes`, snapshot status, md5-verify, and kill. +- For very large files, use `rsync --partial-dir` so a fresh re-run resumes from the temp file instead of redoing the transfer. +- Consider `rsync --inplace` for files that consumers will copy out of the destination anyway (Ollama blob copy step). +- Add `ServerAliveInterval=30` / `ServerAliveCountMax=3` to your ssh config for the source host — kills the ssh transport if the remote stops responding to keepalives. + +## Related + +- [[tailscale-ssh-reauth-prompt]] — different Tailscale-over-ssh gotcha +- [[../../02-selfhosting/storage-backup/rsync-backup-patterns|rsync backup patterns]] — general rsync usage in MajorInfrastructure diff --git a/05-troubleshooting/networking/windows-openssh-wsl-default-shell-breaks-remote-commands.md b/05-troubleshooting/networking/windows-openssh-wsl-default-shell-breaks-remote-commands.md index df52304..cd78463 100644 --- a/05-troubleshooting/networking/windows-openssh-wsl-default-shell-breaks-remote-commands.md +++ b/05-troubleshooting/networking/windows-openssh-wsl-default-shell-breaks-remote-commands.md @@ -11,11 +11,7 @@ tags: - powershell status: published created: 2026-04-03 -<<<<<<< Updated upstream -updated: 2026-04-14T14:27 -======= -updated: 2026-04-18T11:13 ->>>>>>> Stashed changes +updated: 2026-04-22T09:20 --- # Windows OpenSSH: WSL as Default Shell Breaks Remote Commands diff --git a/05-troubleshooting/networking/windows-sshd-stops-after-reboot.md b/05-troubleshooting/networking/windows-sshd-stops-after-reboot.md index 4b2e155..9f299fb 100644 --- a/05-troubleshooting/networking/windows-sshd-stops-after-reboot.md +++ b/05-troubleshooting/networking/windows-sshd-stops-after-reboot.md @@ -10,11 +10,7 @@ tags: - majorrig status: published created: 2026-04-02 -<<<<<<< Updated upstream -updated: 2026-04-14T14:27 -======= -updated: 2026-04-18T11:13 ->>>>>>> Stashed changes +updated: 2026-04-22T09:20 --- # Windows OpenSSH Server (sshd) Stops After Reboot diff --git a/05-troubleshooting/ollama-chat-template-pipe-stdin-bypass.md b/05-troubleshooting/ollama-chat-template-pipe-stdin-bypass.md new file mode 100644 index 0000000..4096378 --- /dev/null +++ b/05-troubleshooting/ollama-chat-template-pipe-stdin-bypass.md @@ -0,0 +1,88 @@ +--- +title: "Ollama: `ollama run` with Piped Stdin Bypasses Chat Template + SYSTEM Prompt" +domain: troubleshooting +category: ai-inference +tags: [ollama, eval, chat-template, system-prompt, majortwin, gotcha] +status: published +created: 2026-04-25 +updated: 2026-04-25 +--- + +# Ollama: `ollama run` with Piped Stdin Bypasses Chat Template + SYSTEM Prompt + +When eval'ing or smoke-testing an Ollama model, piping a prompt via stdin to `ollama run` skips the model's chat template **and** the SYSTEM prompt baked into the Modelfile. Output looks like raw base-model completion (often Mastodon-shaped or training-data-shaped), and you'll think the model is broken when it isn't. + +## The Short Answer + +For evals and any test where you want the model's actual chat behavior, **use the HTTP API at `/api/chat`** — never `ollama run` with `echo "..." | ollama run model`. + +```python +import json, urllib.request +body = json.dumps({ + "model": "majortwin-v8", + "messages": [{"role": "user", "content": "What's your name?"}], + "stream": False, +}).encode() +req = urllib.request.Request( + "http://localhost:11434/api/chat", + data=body, headers={"Content-Type": "application/json"}, method="POST", +) +r = json.loads(urllib.request.urlopen(req).read()) +print(r["message"]["content"]) +``` + +Or with curl piped through jq: + +```bash +curl -s http://localhost:11434/api/chat -d '{ + "model": "majortwin-v8", + "messages": [{"role": "user", "content": "What is your name?"}], + "stream": false +}' | jq -r .message.content +``` + +## How to Notice + +Symptom: model responses are weirdly raw — Mastodon-style hashtag rants, news headlines, multiple unrelated thoughts strung together — even though the same model behaves normally in Open WebUI or via the chat API. This is the canonical fingerprint of a chat-template-bypassed call. + +## Why This Happens + +`ollama run` is the CLI's interactive REPL. When stdin is a TTY, it reads input as user turns and applies the chat template. When stdin is a **pipe** (`echo "..." | ollama run model`), the CLI treats stdin as raw text and forwards it to `/api/generate` (the completion endpoint), not `/api/chat`. `/api/generate` does **not** apply the chat template, and the SYSTEM prompt only takes effect when the chat template wraps it. + +The two endpoints serve different purposes: +- `/api/generate` — raw completion, good for fill-in-the-blank or non-instruct base models +- `/api/chat` — applies the model's chat template, includes SYSTEM, handles multi-turn message arrays + +For an instruct-tuned model (Qwen2.5-Instruct, Llama-3.1-Instruct, etc.), bypassing the chat template means the model never sees the `<|im_start|>system ... <|im_end|>` framing it was trained to expect, and its responses regress toward base-model behavior. + +## When You Actually Want `/api/generate` + +Almost never, for instruct models. The legitimate use case is base models without a chat template, or specific completion-style prompts where you want the model to continue a string verbatim. For evals of a fine-tuned Modelfile, always use `/api/chat`. + +## Reusable Eval Pattern + +A minimal stdlib-only eval harness used for MajorTwin evals lives at `~/MajorTwin/scripts/eval_v8.py`. The key call is the `chat()` helper: + +```python +def chat(host, model, prompt, timeout=180): + body = json.dumps({ + "model": model, + "messages": [{"role": "user", "content": prompt}], + "stream": False, + }).encode() + req = urllib.request.Request( + f"{host}/api/chat", + data=body, + headers={"Content-Type": "application/json"}, + method="POST", + ) + with urllib.request.urlopen(req, timeout=timeout) as r: + return json.loads(r.read())["message"]["content"].strip() +``` + +This applies the chat template and the SYSTEM prompt baked into the Modelfile. No need to re-specify SYSTEM per-call. + +## Related + +- [[ollama-macos-sleep-tailscale-disconnect]] — different Ollama gotcha (sleep + Tailscale) +- [[20-Projects/MajorTwin/majortwin-v8-eval-report|MajorTwin v8 eval report]] — caught this issue during initial smoke test on 2026-04-25 diff --git a/05-troubleshooting/yt-dlp-fedora-js-challenge.md b/05-troubleshooting/yt-dlp-fedora-js-challenge.md index 21d4cb5..9fdd2e4 100644 --- a/05-troubleshooting/yt-dlp-fedora-js-challenge.md +++ b/05-troubleshooting/yt-dlp-fedora-js-challenge.md @@ -10,11 +10,7 @@ tags: - deno status: published created: 2026-04-02 -<<<<<<< Updated upstream -updated: 2026-04-14T14:27 -======= -updated: 2026-04-18T11:13 ->>>>>>> Stashed changes +updated: 2026-04-22T11:33 --- # yt-dlp YouTube JS Challenge Fix (Fedora) @@ -140,15 +136,43 @@ but no impersonate target is available. ERROR: Unable to download video subtitles for 'en-en-US': HTTP Error 429: Too Many Requests ``` -**Cause:** yt-dlp needs `curl_cffi` to impersonate a real browser's TLS fingerprint. Without it, YouTube detects the non-browser client and rate-limits with 429s. Subtitle downloads are usually the first to fail. +**Cause:** yt-dlp needs `curl_cffi` to impersonate a real browser's TLS fingerprint. Without it, YouTube detects the non-browser client and rate-limits with 429s. Subtitle downloads are usually the first to fail (YouTube's `timedtext` endpoint has its own, stricter per-IP bucket). -**Fix:** +**Fix (pin `curl_cffi` to the supported range):** ```bash -pip3 install --upgrade yt-dlp curl_cffi +pip3 install --user -U "curl_cffi>=0.10,<0.15" "yt-dlp-ejs>=0.8" ``` -Once `curl_cffi` is installed, yt-dlp automatically uses browser impersonation and the 429s stop. No config changes needed. +> ⚠️ Do **not** run a bare `pip install -U curl_cffi`. As of yt-dlp **2026.03.17**, the backend in `yt_dlp/networking/_curlcffi.py` hard-caps at `0.14.x`: +> +> ``` +> ImportError: Only curl_cffi versions 0.5.10 and 0.10.x through 0.14.x are supported +> ``` +> +> Installing `curl_cffi 0.15.0` silently disables impersonation — `yt-dlp --list-impersonate-targets` will show every source as `(unavailable)` even though `import curl_cffi` works fine. Always pin to `<0.15` until yt-dlp widens the range. + +**Verify:** + +```bash +yt-dlp --list-impersonate-targets | head -5 +``` + +Should show real entries (`Chrome-133 Macos-15 curl_cffi`), not the `(unavailable)` table. + +**If the 429 persists on subtitles only:** the `timedtext` bucket is already hot from prior retries. Either wait 15–60 min, skip subs for this download (`--no-write-subs --no-write-auto-subs`), or throttle with `--sleep-subtitles 5` on retry. The video/audio path is not affected. + +**Companion gotcha — `yt-dlp-ejs` version drift:** if `yt-dlp -U` reports yt-dlp is current but you still see: + +``` +WARNING: Challenge solver lib script version 0.3.2 is not supported ... supported version: 0.8.0 +``` + +…then the EJS solver helper is stale. `yt-dlp -U` does **not** update it. Upgrade explicitly: + +```bash +pip3 install --user -U yt-dlp-ejs +``` ### SABR-Only Streaming Warning diff --git a/README.md b/README.md index efa7298..becef6c 100644 --- a/README.md +++ b/README.md @@ -1,6 +1,6 @@ --- created: 2026-04-06T09:52 -updated: 2026-04-14T14:12 +updated: 2026-04-22T09:20 --- # MajorLinux Tech Wiki — Index diff --git a/SUMMARY.md b/SUMMARY.md index 98f16af..17c8f71 100644 --- a/SUMMARY.md +++ b/SUMMARY.md @@ -1,6 +1,6 @@ --- created: 2026-04-02T16:03 -updated: 2026-04-22T19:58 +updated: 2026-04-25T12:57 --- * [Home](index.md) * [Linux & Sysadmin](01-linux/index.md) @@ -88,6 +88,8 @@ updated: 2026-04-22T19:58 * [Windows OpenSSH: WSL Default Shell Breaks Remote Commands](05-troubleshooting/networking/windows-openssh-wsl-default-shell-breaks-remote-commands.md) * [Pi-hole AI Blocklist Blocks Claude Desktop (ERR_CONNECTION_REFUSED)](05-troubleshooting/networking/pihole-blocks-claude-desktop.md) * [Ollama Drops Off Tailscale When Mac Sleeps](05-troubleshooting/ollama-macos-sleep-tailscale-disconnect.md) + * [Ollama: `ollama run` with Piped Stdin Bypasses Chat Template + SYSTEM Prompt](05-troubleshooting/ollama-chat-template-pipe-stdin-bypass.md) + * [rsync over Tailscale: Hung in TCP Teardown After Transfer Completes](05-troubleshooting/networking/rsync-tailscale-teardown-stall.md) * [macOS: Repeating Alert Tone from Mirrored iPhone Notification](05-troubleshooting/macos-mirrored-notification-alert-loop.md) * [ClamAV CPU Spike: Safe Scheduling with nice/ionice](05-troubleshooting/security/clamscan-cpu-spike-nice-ionice.md) * [Ansible: Vault Password File Not Found](05-troubleshooting/ansible-vault-password-file-missing.md) diff --git a/index.md b/index.md index c8b7110..0dc877c 100644 --- a/index.md +++ b/index.md @@ -1,6 +1,10 @@ --- created: 2026-04-06T09:52 +<<<<<<< Updated upstream updated: 2026-04-19T21:46 +======= +updated: 2026-04-22T09:20 +>>>>>>> Stashed changes --- # MajorLinux Tech Wiki — Index