wiki: update fail2ban digest + netdata docker health + 3 new articles

- fail2ban-digest-mode-fleet: recidive-only email model, sshd now silent, defaults-debian.conf gotcha added - netdata-docker-health-alarm-tuning: 30m/10m config, tuning history table - New: wp-fail2ban-logpath-debian-ubuntu, lora-adapter-gguf-conversion-fails, tailscale-status-json-hostname-localhost-ios - Various article updates and nav index refreshes Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-05-02 14:58:07 -04:00 · 2026-05-02 14:58:07 -04:00 · 4126656c05
commit 4126656c05
parent f40f497b46
21 changed files with 567 additions and 35 deletions
--- a/01-linux/distro-specific/wsl2-backup-powershell.md
+++ b/01-linux/distro-specific/wsl2-backup-powershell.md
@ -10,7 +10,7 @@ tags:
  - majorrig
 status: published
 created: 2026-03-16
-updated: 2026-04-29T22:45
+updated: 2026-04-30T05:21
 ---

 # WSL2 Backup via PowerShell Scheduled Task
--- a/01-linux/networking/ssh-config-key-management.md
+++ b/01-linux/networking/ssh-config-key-management.md
@ -10,7 +10,7 @@ tags:
  - remote-access
 status: published
 created: 2026-03-08
-updated: 2026-04-22T09:20
+updated: 2026-04-30T05:21
 ---

 # SSH Config and Key Management
--- a/02-selfhosting/dns-networking/wake-on-lan-router-ssh.md
+++ b/02-selfhosting/dns-networking/wake-on-lan-router-ssh.md
@ -7,7 +7,7 @@ tags:
  - asus
  - ssh
 created: 2026-04-19
-updated: 2026-04-29T22:45
+updated: 2026-04-30T05:21
 ---

 # Wake-on-LAN via Router SSH
--- a/02-selfhosting/index.md
+++ b/02-selfhosting/index.md
@ -1,6 +1,6 @@
 ---
 created: 2026-04-13T10:15
-updated: 2026-04-29T22:45
+updated: 2026-04-30T05:21
 ---
 # 🏠 Self-Hosting & Homelab

--- a/02-selfhosting/monitoring/netdata-docker-health-alarm-tuning.md
+++ b/02-selfhosting/monitoring/netdata-docker-health-alarm-tuning.md
@ -1,11 +1,17 @@
 ---
-title: "Tuning Netdata Docker Health Alarms to Prevent Update Flapping"
+title: Tuning Netdata Docker Health Alarms to Prevent Update Flapping
 domain: selfhosting
 category: monitoring
-tags: [netdata, docker, nextcloud, alarms, health, monitoring]
+tags:
+  - netdata
+  - docker
+  - nextcloud
+  - alarms
+  - health
+  - monitoring
 status: published
 created: 2026-03-18
-updated: 2026-03-28
+updated: 2026-05-02T11:04
 ---

 # Tuning Netdata Docker Health Alarms to Prevent Update Flapping
@ -61,9 +67,9 @@ chart labels: container_name=!nextcloud-aio-nextcloud *

 ### Dedicated Nextcloud AIO Alarm

-Added 2026-03-23, updated 2026-03-28. The `nextcloud-aio-nextcloud` container needs a more lenient window than other containers. Its healthcheck (`/healthcheck.sh`) verifies PostgreSQL connectivity (port 5432) and PHP-FPM (port 9000). PHP-FPM takes ~90 seconds to warm up after a normal restart — but during nightly AIO update cycles, the full startup (occ upgrade, app updates, migrations) can take 5+ minutes. On 2026-03-27, a startup hung and left the container unhealthy for 20 hours until the next nightly cycle replaced it.
+Added 2026-03-23, updated 2026-05-02. The `nextcloud-aio-nextcloud` container needs a more lenient window than other containers. Its healthcheck (`/healthcheck.sh`) verifies PostgreSQL connectivity (port 5432) and PHP-FPM (port 9000). PHP-FPM takes ~90 seconds to warm up after a normal restart — but during nightly AIO update cycles, the full startup (occ upgrade, app updates, migrations) can take 5+ minutes. On 2026-03-27, a startup hung and left the container unhealthy for 20 hours until the next nightly cycle replaced it.

-The dedicated alarm uses a 10-minute lookup window and 10-minute delay to absorb normal startup, while still catching sustained failures:
+The dedicated alarm uses a 30-minute lookup window and 10-minute delay to absorb normal startup and update cycles (~40 minutes total grace), while still catching sustained failures:

 ```ini
 # Dedicated alarm for nextcloud-aio-nextcloud — lenient window to absorb nightly update cycle
@ -76,15 +82,23 @@ template: docker_nextcloud_unhealthy
 component: Docker
    units: status
    every: 30s
-   lookup: average -10m of unhealthy
+   lookup: average -30m of unhealthy
 chart labels: container_name=nextcloud-aio-nextcloud
-     warn: $this > 0
+     warn: $this >= 1
    delay: up 10m down 5m multiplier 1.5 max 30m
  summary: Nextcloud container health sustained
-     info: nextcloud-aio-nextcloud has been unhealthy for a sustained period — not a transient update blip
+     info: nextcloud-aio-nextcloud has been continuously unhealthy for 30+ minutes — not a transient update blip
       to: sysadmin
 ```

+**Tuning history:**
+
+| Date | Lookup | Delay | Trigger | Notes |
+|---|---|---|---|---|
+| 2026-03-23 | 35m | 35m | Initial split from general alarm | Absorbed PHP-FPM warm-up |
+| 2026-04-29 | 15m | 5m | Backup blip (~6m) never triggered | Tightened after stability |
+| 2026-05-02 | 30m | 10m | 15m still too aggressive for update cycles | ~40m total grace; catches real outages |
+
 ## Watchdog Cron: Auto-Restart on Sustained Unhealthy

 If the Nextcloud container stays unhealthy for more than 1 hour (well past any normal startup window), a cron watchdog on majorlab auto-restarts it and logs the event. This was added 2026-03-28 after an incident where the container sat unhealthy for 20 hours until the next nightly backup cycle replaced it.
--- a/02-selfhosting/security/clamav-fleet-deployment.md
+++ b/02-selfhosting/security/clamav-fleet-deployment.md
@ -11,7 +11,7 @@ tags:
  - cron
 status: published
 created: 2026-04-18
-updated: 2026-04-18T11:13
+updated: 2026-04-30T05:21
 ---
 # ClamAV Fleet Deployment with Ansible

--- a/02-selfhosting/security/fail2ban-digest-mode-fleet.md
+++ b/02-selfhosting/security/fail2ban-digest-mode-fleet.md
@ -1,11 +1,18 @@
 ---
-title: "Fail2Ban Digest Mode — Fleet-Wide Quiet Alerts"
+title: Fail2Ban Digest Mode — Fleet-Wide Quiet Alerts
 domain: selfhosting
 category: security
-tags: [fail2ban, security, email, ansible, fleet, cron, digest]
+tags:
+  - fail2ban
+  - security
+  - email
+  - ansible
+  - fleet
+  - cron
+  - digest
 status: published
 created: 2026-04-22
-updated: 2026-04-22
+updated: 2026-05-02T14:56
 ---
 # Fail2Ban Digest Mode — Fleet-Wide Quiet Alerts

@ -21,11 +28,11 @@ Three tiers replace the firehose:

 | Tier | Jails | Action | Why |
 |------|-------|--------|-----|
-| **Immediate email** | `sshd`, `recidive` | `action_mwl` | Security-critical — someone is actively targeting auth or is a repeat offender |
+| **Immediate email** | `recidive` | `action_mwl` | Repeat offenders only — someone has been banned multiple times across jails |
 | **Silent ban** | Everything else | `action_` (default) | Ban happens, firewall rule applied, no email sent |
 | **Daily digest** | All jails | Cron script at 08:00 UTC | One summary email per host with ban counts across all jails |

-This reduces email volume from hundreds per day to ~10 (one digest per host + occasional sshd/recidive alerts).
+This reduces email volume from hundreds per day to ~10 (one digest per host + occasional recidive alerts).

 ## jail.local Configuration

@ -40,18 +47,20 @@ action = %(action_)s

 This overrides the stock `action_mwl` for all jails. Bans still happen — the firewall rule is applied — but no email is sent.

-### Keep immediate alerts for critical jails
+### Keep immediate alerts for recidive only

 ```ini
 [sshd]
 enabled = true
-action = %(action_mwl)s
+action = %(action_)s

 [recidive]
 enabled = true
 action = %(action_mwl)s
 ```

+> **Updated 2026-05-02:** sshd was moved to silent (`action_`). Only recidive (repeat offenders) now triggers immediate email. sshd bans are captured in the daily digest.
+
 ### Clean up email subjects with fq-hostname

 By default, fail2ban uses the system FQDN in email subjects. On Tailscale hosts, this produces ugly subjects like `[Fail2Ban] sshd: banned 1.2.3.4 on MajorToot.tail7f2d9.ts.net`. Override it in `[DEFAULT]`:
@ -91,8 +100,9 @@ The playbook `configure_fail2ban_digest.yml` deploys the full digest model fleet
 ### What it does

 1. Deploys a Python helper script that performs **section-aware editing** of `jail.local` (see gotchas below)
-2. Sets `action = %(action_)s` in `[DEFAULT]`
-3. Sets `action = %(action_mwl)s` in `[sshd]` and `[recidive]`
+2. Sets `action = %(action_)s` in `[DEFAULT]` and `[sshd]`
+3. Sets `action = %(action_mwl)s` in `[recidive]`
+4. Removes stale `action = %(action_mwl)s` from `defaults-debian.conf` if present
 4. Sets `fq-hostname` per host using an override dict
 5. Deploys the digest script from a Jinja2 template
 6. Creates the cron job via `ansible.builtin.cron`
@ -143,6 +153,14 @@ option 'action' in section 'DEFAULT' already exists

 The Python editor script handles this by replacing existing keys rather than appending.

+### defaults-debian.conf overrides jail.local
+
+On Debian/Ubuntu, `/etc/fail2ban/jail.d/defaults-debian.conf` is loaded **after** `jail.local`. If it contains `action = %(action_mwl)s`, it silently overrides your silent default — every jail sends email on every ban. The Ansible playbook now removes this line automatically. If you see per-ban emails after deploying digest mode, check this file first:
+
+```bash
+grep action /etc/fail2ban/jail.d/defaults-debian.conf
+```
+
 ### fq-hostname scope

 Setting `fq-hostname` in `[DEFAULT]` affects all action templates that use the `<fq-hostname>` tag — including both immediate emails and the digest subject. This is the desired behavior, but be aware that it overrides the system hostname globally within fail2ban.
--- a/02-selfhosting/security/wp-fail2ban-logpath-debian-ubuntu.md
+++ b/02-selfhosting/security/wp-fail2ban-logpath-debian-ubuntu.md
@ -0,0 +1,151 @@
+---
+title: "wp-fail2ban Plugin Logpath on Debian/Ubuntu (auth.log, not syslog)"
+domain: selfhosting
+category: security
+tags: [fail2ban, wordpress, wp-fail2ban, debugging, gotcha, debian, ubuntu]
+status: published
+created: 2026-04-30
+updated: 2026-04-30
+---
+# wp-fail2ban Plugin Logpath on Debian/Ubuntu (auth.log, not syslog)
+
+## The Problem
+
+You install the [WP fail2ban](https://wordpress.org/plugins/wp-fail2ban/) WordPress plugin, configure the fleet-standard `wordpress-hard`, `wordpress-soft`, and `wordpress-extra` jails, and… nothing. Weeks pass. `fail2ban-client status wordpress-hard` reports `Total failed: 0, Total banned: 0`. Your site is being attacked, but the jails are dead.
+
+Meanwhile the `wordpress-login` jail (which reads Apache access logs for `POST /wp-login.php` directly) is happily catching brute-forcers. So the problem isn't fail2ban itself — it's specifically the wp-fail2ban-plugin-derived jails.
+
+## The Cause
+
+The wp-fail2ban plugin emits events via PHP's `syslog()` call with facility `LOG_AUTH`. On Debian/Ubuntu, rsyslog routes the `auth` facility to **`/var/log/auth.log`**, NOT `/var/log/syslog`. On RHEL/Fedora it's `/var/log/secure`.
+
+A lot of tutorials, ansible-galaxy roles, and copy-paste config snippets specify:
+
+```ini
+logpath = /var/log/syslog
+```
+
+That's wrong on Debian/Ubuntu. The events never land there, so the filter regex has nothing to match, so the jail catches zero events forever. Silently.
+
+## Diagnostic Steps
+
+If a `wordpress-{hard,soft,extra}` jail shows `Total failed: 0` over a long window despite the plugin being active and the site getting attacked:
+
+**1. Check what the jail thinks it's watching:**
+
+```bash
+sudo fail2ban-client status wordpress-hard | grep "File list"
+```
+
+**2. Check where wp-fail2ban events actually land:**
+
+```bash
+sudo grep -c "wordpress(" /var/log/auth.log /var/log/syslog /var/log/secure 2>/dev/null
+```
+
+You'll see something like:
+
+```
+/var/log/auth.log:314
+/var/log/syslog:0
+```
+
+**3. If the jail's `File list` ≠ the file with events, fix the `logpath`.**
+
+A real event line on Debian/Ubuntu looks like:
+
+```
+2026-04-18T23:28:21.027004-04:00 hostname wordpress(example.com)[719989]: XML-RPC authentication failure for someone from 1.2.3.4
+```
+
+The `wordpress(domain)[pid]` syslog tag is the giveaway — those are wp-fail2ban events.
+
+## The Fix
+
+Edit the jail blocks in `/etc/fail2ban/jail.local` (or your Ansible source for the jail) and set:
+
+```ini
+[wordpress-hard]
+enabled = true
+port = http,https
+filter = wordpress-hard
+logpath = /var/log/auth.log
+maxretry = 1
+findtime = 60
+bantime = 30d
+backend = polling
+
+[wordpress-soft]
+enabled = true
+port = http,https
+filter = wordpress-soft
+logpath = /var/log/auth.log
+maxretry = 5
+findtime = 60
+bantime = 30d
+backend = polling
+
+[wordpress-extra]
+enabled = true
+port = http,https
+filter = wordpress-extra
+logpath = /var/log/auth.log
+maxretry = 5
+findtime = 60
+bantime = 30d
+backend = polling
+```
+
+Then:
+
+```bash
+sudo fail2ban-client -t          # validate
+sudo fail2ban-client reload
+sudo fail2ban-client status wordpress-hard | grep "File list"
+# should now show /var/log/auth.log
+```
+
+## Verification
+
+You can prove the filter regex actually matches your real events without waiting for an attack — run `fail2ban-regex` against the rotated log:
+
+```bash
+sudo fail2ban-regex /var/log/auth.log.1 /etc/fail2ban/filter.d/wordpress-hard.conf | grep -E "Failregex:|Lines:"
+```
+
+Healthy output looks like:
+
+```
+Failregex: 81 total
+Lines: 13008 lines, 0 ignored, 81 matched, 12927 missed
+```
+
+If you see `Failregex: 0 total`, the filter regex doesn't match what the plugin actually emits — which is a different bug (filter version skew vs. plugin version), not the logpath gotcha. Investigate `/etc/fail2ban/filter.d/wordpress-{hard,soft}.conf` against actual event lines.
+
+> **Note:** On a freshly-fixed jail, counters will sit at `Total failed: 0` for a while — the `polling` backend starts at the file's current EOF, so old events aren't retroactively counted. New events from the moment of `reload` onward will accumulate. Allow a few days of normal attack traffic before declaring the fix broken.
+
+## Distribution Cheat Sheet
+
+| Distro family | wp-fail2ban events land in |
+|---|---|
+| Debian / Ubuntu | `/var/log/auth.log` |
+| RHEL / CentOS / Fedora | `/var/log/secure` |
+| systemd-journal-only systems | `journalctl SYSLOG_FACILITY=4` (use `backend = systemd` + `journalmatch = SYSLOG_FACILITY=4`) |
+
+If you have a mixed fleet, parameterize the path:
+
+```yaml
+# Ansible vars
+wp_fail2ban_log_path: "{{ '/var/log/auth.log' if ansible_os_family == 'Debian' else '/var/log/secure' }}"
+```
+
+## Why wordpress-login Is Unaffected
+
+The `wordpress-login` jail is a different beast — it reads `/var/log/apache2/access.log` directly and matches `^<HOST> -.*"POST /wp-login.php` via the `wordpress-login` filter. No plugin involved, no syslog facility involved. So a host can have `wordpress-login` working perfectly while `wordpress-{hard,soft,extra}` are silently dead. Don't let a healthy `wordpress-login` reassure you that the rest of the wp-fail2ban stack is also fine.
+
+## Related
+
+- [[fail2ban-wordpress-login-jail]] — the access-log layer that catches WP brute force without any plugin dependency
+- [[fail2ban-apache-bad-request-jail]]
+- [[fail2ban-apache-php-probe-jail]]
+- [[clamav-fleet-deployment]]
--- a/02-selfhosting/services/mastodon-instance-tuning.md
+++ b/02-selfhosting/services/mastodon-instance-tuning.md
@ -10,7 +10,7 @@ tags:
  - docker
 status: published
 created: 2026-04-02
-updated: 2026-04-29T22:45
+updated: 2026-04-30T05:21
 ---

 # Mastodon Instance Tuning
--- a/05-troubleshooting/ansible-check-mode-false-positives.md
+++ b/05-troubleshooting/ansible-check-mode-false-positives.md
@ -11,7 +11,7 @@ tags:
  - troubleshooting
 status: published
 created: 2026-04-18
-updated: 2026-04-29T22:45
+updated: 2026-04-30T05:21
 ---
 # Ansible Check Mode False Positives in Verify/Assert Tasks

--- a/05-troubleshooting/gpu-display/lora-adapter-gguf-conversion-fails.md
+++ b/05-troubleshooting/gpu-display/lora-adapter-gguf-conversion-fails.md
@ -0,0 +1,119 @@
+---
+title: "LoRA adapter — GGUF conversion fails with 'config.json not found'"
+domain: troubleshooting
+category: gpu-display
+tags: [lora, qlora, gguf, llama.cpp, unsloth, fine-tuning, qwen]
+status: published
+created: 2026-04-30
+updated: 2026-04-30
+---
+
+# LoRA adapter — GGUF conversion fails with 'config.json not found'
+
+## Problem
+
+After a QLoRA fine-tune, you point `llama.cpp/convert_hf_to_gguf.py` at the training output directory and it crashes immediately:
+
+```
+FileNotFoundError: [Errno 2] No such file or directory:
+  '/path/to/training-runs/<run>/final/config.json'
+```
+
+The output directory looks fine — it contains:
+
+```
+adapter_config.json
+adapter_model.safetensors  (~150 MB for a 7B base)
+chat_template.jinja
+tokenizer_config.json
+tokenizer.json
+```
+
+But no `config.json`, and `adapter_model.safetensors` is 150 MB — way smaller than the ~14 GB you'd expect for a full Qwen2.5-7B 16-bit checkpoint.
+
+## Root cause
+
+`model.save_pretrained()` after a LoRA/QLoRA train saves **only the adapter weights**, not a merged full-precision model. `convert_hf_to_gguf.py` expects a full HuggingFace model directory — it reads `config.json` to identify the architecture. Adapter-only directories don't have one.
+
+You need to merge the LoRA adapter into the base model first, then point the GGUF converter at the merged dir.
+
+## Solution
+
+### Quick fix — inline merge step
+
+Insert this block between training completion and `convert_hf_to_gguf.py`:
+
+```python
+from unsloth import FastLanguageModel
+
+adapter = "/path/to/training-runs/<run>/final"
+merged  = "/path/to/training-runs/<run>/merged"
+
+model, tok = FastLanguageModel.from_pretrained(
+    model_name=adapter,
+    max_seq_length=2048,
+    load_in_4bit=True,
+)
+model.save_pretrained_merged(merged, tok, save_method="merged_16bit")
+```
+
+Then run the GGUF converter against the **merged** dir, not the adapter dir:
+
+```bash
+python3 llama.cpp/convert_hf_to_gguf.py /path/to/training-runs/<run>/merged \
+  --outfile model-f16.gguf --outtype f16
+```
+
+The merged dir will contain `config.json`, `model-00001-of-00004.safetensors` (multiple shards totaling the full base model size), `generation_config.json`, etc.
+
+### Cleaner fix — use a wrapper
+
+If you do this often, encapsulate it:
+
+1. Wrapper Python script accepts `--adapter`, `--output`, `--skip-merge`, `--all-quants`
+2. Step 1: load adapter via `FastLanguageModel.from_pretrained()`, call `save_pretrained_merged()`
+3. Step 2: subprocess `convert_hf_to_gguf.py` on the merged dir
+4. Step 3: subprocess `llama-quantize` for each requested quant
+
+This is what `~/corpus/scripts/convert_gguf.py` does on MajorRig (rewritten 2026-04-09 for the MajorTwin v7b cycle).
+
+## Why this trips people up
+
+- Unsloth and PEFT both save adapter-only by default after `trainer.save_model()` or `model.save_pretrained()`. There's no warning that downstream tools expect a merged model.
+- The training output **looks** complete — there's a `tokenizer.json`, a `chat_template.jinja`, and a non-trivial `.safetensors`. It feels like a checkpoint.
+- A pipeline that uses `convert_gguf.py` (with merge) once and then someone reimplements Step 4 inline (skipping the wrapper) will silently lose the merge step. This is what happened in MajorTwin v8c (Apr 30, 2026) — see [[majortwin-v8b-plan#Pipeline Bug + Fix (2026-04-30)]].
+
+## Verification checklist
+
+After training, before running the GGUF converter, verify the directory you're pointing at:
+
+| File | Adapter-only dir | Merged dir |
+|---|---|---|
+| `adapter_config.json` | ✅ | ❌ |
+| `adapter_model.safetensors` | ✅ (~150 MB / 7B) | ❌ |
+| `config.json` | ❌ | ✅ |
+| `model-*.safetensors` (sharded) | ❌ | ✅ (~14 GB / 7B) |
+| `generation_config.json` | ❌ | ✅ |
+| `tokenizer.json` | ✅ | ✅ |
+
+If you see only the left column, you need to merge before converting.
+
+## Resuming a failed pipeline without re-training
+
+The adapter is small and self-contained. If your pipeline crashes at the GGUF step, you do NOT need to retrain — the LoRA adapter at `<run>/final/` is intact. Write a resume wrapper that runs only:
+
+1. Merge (`save_pretrained_merged`)
+2. F16 conversion (`convert_hf_to_gguf.py`)
+3. Quantization (`llama-quantize`)
+4. Deploy
+
+This saves the cost of however many GPU-hours the training took. See `~/corpus/scripts/resume_v8c_step4.sh` on MajorRig for an example.
+
+## Related
+
+- [[qwen-14b-oom-3080ti]] — base model size choice on a 12GB GPU
+- [[majortwin-v8b-plan]] — v8c pipeline architecture and resume
+
+## Maintenance
+
+- 2026-04-30 — Created after MajorTwin v8c pipeline failed Step 4. Root-caused, patched, resumed.
--- a/05-troubleshooting/index.md
+++ b/05-troubleshooting/index.md
@ -1,6 +1,6 @@
 ---
 created: 2026-03-15T06:37
-updated: 2026-04-29T22:45
+updated: 2026-04-30T10:41
 ---
 # 🔧 General Troubleshooting

@ -8,12 +8,14 @@ Practical fixes for common Linux, networking, and application problems.

 ## 🖥️ GPU & AI
 - [Qwen2.5-14B OOM on RTX 3080 Ti (12GB)](gpu-display/qwen-14b-oom-3080ti.md)
+- [LoRA adapter — GGUF conversion fails with 'config.json not found'](gpu-display/lora-adapter-gguf-conversion-fails.md)

 ## 🌐 Networking & Web
 - [Apache Outage: Fail2ban Self-Ban + Missing iptables Rules](networking/fail2ban-self-ban-apache-outage.md)
 - [Mail Client Stops Receiving: Fail2ban IMAP Self-Ban](networking/fail2ban-imap-self-ban-mail-client.md)
 - [firewalld: Mail Ports Wiped After Reload](networking/firewalld-mail-ports-reset.md)
 - [Tailscale SSH: Unexpected Re-Authentication Prompt](networking/tailscale-ssh-reauth-prompt.md)
+- [iOS Tailscale Clients Report HostName="localhost" — Breaks /etc/hosts Generators](networking/tailscale-status-json-hostname-localhost-ios.md)
 - [rsync over Tailscale: Hung in TCP Teardown After Transfer Completes](networking/rsync-tailscale-teardown-stall.md)
 - [Windows OpenSSH: WSL Default Shell Breaks Remote Commands](networking/windows-openssh-wsl-default-shell-breaks-remote-commands.md)
 - [Pi-hole AI Blocklist Blocks Claude Desktop (ERR_CONNECTION_REFUSED)](networking/pihole-blocks-claude-desktop.md)
--- a/05-troubleshooting/isp-sni-filtering-caddy.md
+++ b/05-troubleshooting/isp-sni-filtering-caddy.md
@ -1,11 +1,17 @@
 ---
-title: "ISP SNI Filtering & Caddy Troubleshooting"
+title: ISP SNI Filtering & Caddy Troubleshooting
 domain: troubleshooting
 category: general
-tags: [isp, sni, caddy, tls, dns, cloudflare]
+tags:
+  - isp
+  - sni
+  - caddy
+  - tls
+  - dns
+  - cloudflare
 status: published
 created: 2026-04-02
-updated: 2026-04-02
+updated: 2026-04-30T13:07
 ---
 # ISP SNI Filtering & Caddy Troubleshooting

@ -29,3 +35,89 @@ notes.majorshouse.com {
 ```

 Once the hostname was changed to one without the "wiki" keyword, the TLS handshake completed successfully.
+
+---
+
+## 🔁 2026-04-30 Update — Stale A Record + Cloudflare Proxy Fix
+
+The hostname rename held for ~4 weeks. On 2026-04-30 the wiki went down with a TLS handshake failure on `notes.majorshouse.com`. The on-the-spot framing was "ISP filter expanded to include 'notes'" — but Cloudflare DNS audit showed a different (and arguably worse) root cause: **the `notes` A record was pointing at `136.54.3.248`, an IP that is not majorlab's current home IP.** Whichever host responds at that address either does not run Caddy or does not know about the `notes.majorshouse.com` SNI, so the TLS handshake was rejected with `internal_error 80`.
+
+### Re-diagnosis
+
+```bash
+# Cert + Caddy + mkdocs all healthy on majorlab
+$ ssh majorlab 'systemctl is-active caddy; ss -tlnp | grep :443'
+active
+LISTEN 0  4096  *:443  users:(("caddy",pid=1549,fd=7))
+
+# Loopback-served TLS works fine — cert valid Mar 11 → Jun 9 2026
+$ ssh majorlab 'curl -sS -o /dev/null -w "%{http_code}\n" --resolve notes.majorshouse.com:443:127.0.0.1 https://notes.majorshouse.com/'
+200
+
+# External TLS handshake gets rejected with internal_error
+$ openssl s_client -servername notes.majorshouse.com -connect 136.54.3.248:443
+… SSL alert number 80 (internal_error) …
+```
+
+### The smoking-gun comparison
+
+Other `*.majorshouse.com` services worked because they were CNAMEs to the apex, which resolves to majorlab's actual home IP:
+
+| Subdomain | DNS shape | Final IP | Status |
+|---|---|---|---|
+| `notes.majorshouse.com` | **A → `136.54.3.248`** (stale) | `136.54.3.248` (wrong host) | ❌ TLS rejected |
+| `git.majorshouse.com` | CNAME → `majorshouse.com.` | `136.56.0.55` (majorlab) | ✅ |
+| `n8n.majorshouse.com` | CNAME → `majorshouse.com.` | `136.56.0.55` (majorlab) | ✅ |
+| `matrix.majorshouse.com` | CNAME → `majorshouse.com.` | `136.56.0.55` (majorlab) | ✅ |
+
+None of the working subdomains were proxied through Cloudflare (`proxied=false` on all of them); they simply had the right IP. The `notes` A record was the only one pointing somewhere wrong — most likely a stale value from a prior ISP / IP change that never got cleaned up.
+
+### ✅ Fix — switch `notes` to a Cloudflare-proxied CNAME
+
+Rather than just correcting the A record (which would silently break again the next time the home IP changes), the fix is a CNAME to the apex with proxy on. That gives two protections in one move: it always tracks the apex (so home IP changes propagate automatically) and it puts the wiki behind Cloudflare's edge (so any future ISP-side weirdness like the original `wiki` SNI filter is also bypassed).
+
+```bash
+# via Cloudflare API (token from ansible-vault: vault_cloudflare_api_token)
+PUT /zones/{ZONE_ID}/dns_records/{NOTES_RECORD_ID}
+{
+  "type":    "CNAME",
+  "name":    "notes.majorshouse.com",
+  "content": "majorshouse.com",
+  "ttl":     1,
+  "proxied": true,
+  "comment": "switched A→CNAME proxied to bypass stale IP / ISP SNI filter"
+}
+```
+
+Or via the dashboard:
+
+1. Cloudflare → `majorshouse.com` zone → DNS → Records
+2. Edit the `notes` record: Type `CNAME`, Target `majorshouse.com`, Proxy `Proxied` (orange cloud)
+3. Save
+
+External clients now hit Cloudflare edge IPs (`104.21.x.x` / `172.67.x.x`) which TLS-terminate at the edge and tunnel back to majorlab's apex IP. ACME on majorlab keeps working — Cloudflare passes the HTTP-01 challenge through on port 80. Caddy's `notes.majorshouse.com {}` block needs no change.
+
+Verify (response should show `server: cloudflare` and `via: 1.0 Caddy`):
+
+```bash
+curl -sSI https://notes.majorshouse.com/
+```
+
+### Why a Cloudflare-proxied CNAME is the durable shape
+
+- **Apex follows the home IP automatically.** Update the apex A record once when the ISP changes; every subdomain inherits it without per-record fixes.
+- **TLS handshake is offloaded to CF.** Any ISP-level SNI weirdness (the original `wiki` ban; theoretical future bans) becomes irrelevant — external clients SNI=`notes.majorshouse.com` to Cloudflare, which the ISP doesn't filter.
+- **Free.** Cloudflare's free tier covers proxy + TLS termination.
+
+### Audit checklist for any home-hosted `*.majorshouse.com` subdomain
+
+- [ ] DNS record is a **CNAME** to `majorshouse.com.`, not an A record to a literal home IP.
+- [ ] Cloudflare proxy (orange cloud, `proxied=true`) enabled on the record — at minimum for any subdomain where TLS reachability matters.
+- [ ] Caddy entry on majorlab references the public hostname; `reverse_proxy` stays on the localhost port.
+- [ ] HTTPS verified from outside the LAN (phone on cellular is sufficient) within the first hour after the change.
+- [ ] If an A record is genuinely required (e.g. it must NOT go through CF), document why in the deploy notes for that service.
+
+### Related
+
+- [[majwiki-setup-and-pipeline]] — full wiki deploy pipeline; the DNS step there should reference this fix
+- [[Network-Overview]] — fleet IP table
--- a/05-troubleshooting/networking/tailscale-status-json-hostname-localhost-ios.md
+++ b/05-troubleshooting/networking/tailscale-status-json-hostname-localhost-ios.md
@ -0,0 +1,116 @@
+---
+title: iOS Tailscale Clients Report HostName="localhost" — Breaks /etc/hosts Generators
+domain: troubleshooting
+category: networking
+tags:
+  - tailscale
+  - ios
+  - postfix
+  - etc-hosts
+  - jq
+status: published
+created: 2026-04-29
+updated: 2026-04-29
+---
+
+# iOS Tailscale Clients Report HostName="localhost" — Breaks /etc/hosts Generators
+
+## Problem
+
+A homegrown script that builds an `/etc/hosts` block from `tailscale status --json` silently corrupted the file the moment any iOS device joined the tailnet. After the next run, services bound to `localhost` started failing.
+
+On the affected host (`majordiscord`), Postfix refused to start with:
+
+```
+postfix: fatal: parameter inet_interfaces: no local interface found for 100.127.114.10
+```
+
+`/etc/hosts` looked fine at the top — `127.0.0.1 localhost` was still present — but inside the Tailscale-managed block:
+
+```
+# TAILSCALE_START
+100.84.42.102 tttpod
+100.110.197.17 majortoot
+100.95.55.40 localhost          <-- WRONG (this is an iPhone)
+100.84.165.52 majormail
+...
+100.127.114.10 localhost         <-- WRONG (this is an iPad)
+# TAILSCALE_END
+```
+
+When Postfix resolved `localhost` (because `inet_interfaces = localhost` in `main.cf`), the **last matching entry** in `/etc/hosts` won — a Tailscale IP that doesn't exist on this host — and the daemon died on bind.
+
+## Root Cause
+
+The script used `.HostName` from the Tailscale JSON:
+
+```bash
+tailscale status --json \
+  | jq -r '.Peer[] | "\(.TailscaleIPs[0]) \(.HostName)"' \
+  >> "$TEMP_HOSTS"
+```
+
+iOS Tailscale clients (iPhone, iPad) **always report `HostName: "localhost"`** in the JSON. iOS doesn't expose the real device name to apps the way macOS/Linux/Windows do, so the Tailscale client falls back to the literal string `localhost`.
+
+Inspect it directly:
+
+```bash
+$ tailscale status --json | jq '.Peer[] | select(.OS == "iOS") | {DNSName, HostName, OS}'
+{
+  "DNSName": "iphone171.tail7f2d9.ts.net.",
+  "HostName": "localhost",
+  "OS": "iOS"
+}
+{
+  "DNSName": "ipad166.tail7f2d9.ts.net.",
+  "HostName": "localhost",
+  "OS": "iOS"
+}
+```
+
+Every iOS device contributes a line `<tailscale-ip> localhost` to `/etc/hosts`, hijacking the `localhost` lookup.
+
+## Fix
+
+Use `.DNSName` (the unique tailnet DNS name) and take the first dotted component instead of `.HostName`:
+
+```bash
+tailscale status --json \
+  | jq -r '.Peer[] | "\(.TailscaleIPs[0]) \(.DNSName | rtrimstr(".") | split(".")[0])"' \
+  >> "$TEMP_HOSTS"
+```
+
+`DNSName` is always set, always unique, and produces clean labels like `iphone171`, `ipad166`, `majorlab`, etc.
+
+After patching the script and re-running it:
+
+```bash
+$ bash /root/update_tailscale_hosts.sh
+$ systemctl restart postfix
+$ systemctl is-active postfix
+active
+```
+
+## Why It's Hard to Spot
+
+- The corruption only triggers when an iOS device is in the tailnet — so the script "worked" for months.
+- `/etc/hosts` files are commonly skimmed top-down. The bogus `localhost` line is buried in the Tailscale block, well below the legitimate `127.0.0.1 localhost` line, and looks superficially like a normal Tailscale entry.
+- Postfix's error message names the IP, not `localhost`, so the connection to `/etc/hosts` isn't obvious.
+- `getent hosts localhost` shows the *first* match (`127.0.0.1`), not the one Postfix's resolver actually picks for `inet_interfaces` lookup.
+
+## Verification Checklist
+
+If you suspect this on any host using a similar generator script:
+
+```bash
+# Any non-loopback "localhost" entries are bugs
+grep -nE '^[0-9]+\..* localhost\s*$' /etc/hosts
+
+# Look at iOS peers' HostName field
+tailscale status --json | jq '.Peer[] | select(.OS == "iOS") | .HostName'
+```
+
+## Related
+
+- [[majordiscord]] — affected host (incident logged 2026-04-29)
+- [[Network Overview]] — Tailscale fleet topology
--- a/05-troubleshooting/networking/windows-openssh-wsl-default-shell-breaks-remote-commands.md
+++ b/05-troubleshooting/networking/windows-openssh-wsl-default-shell-breaks-remote-commands.md
@ -11,7 +11,7 @@ tags:
  - powershell
 status: published
 created: 2026-04-03
-updated: 2026-04-22T09:20
+updated: 2026-04-30T05:21
 ---

 # Windows OpenSSH: WSL as Default Shell Breaks Remote Commands
--- a/05-troubleshooting/networking/windows-sshd-stops-after-reboot.md
+++ b/05-troubleshooting/networking/windows-sshd-stops-after-reboot.md
@ -10,7 +10,7 @@ tags:
  - majorrig
 status: published
 created: 2026-04-02
-updated: 2026-04-22T09:20
+updated: 2026-04-30T05:21
 ---
 # Windows OpenSSH Server (sshd) Stops After Reboot

--- a/05-troubleshooting/yt-dlp-fedora-js-challenge.md
+++ b/05-troubleshooting/yt-dlp-fedora-js-challenge.md
@ -10,7 +10,7 @@ tags:
  - deno
 status: published
 created: 2026-04-02
-updated: 2026-04-22T11:33
+updated: 2026-04-30T05:21
 ---
 # yt-dlp YouTube JS Challenge Fix (Fedora)

--- a/MajorWiki-Deploy-Status.md
+++ b/MajorWiki-Deploy-Status.md
@ -2,7 +2,7 @@
 title: MajorWiki Deployment Status
 status: deployed
 project: MajorTwin
-updated: 2026-04-07T10:48
+updated: 2026-04-30T05:30
 created: 2026-04-02T16:10
 ---

@ -79,6 +79,23 @@ git push

 Gitea receives the push → fires webhook → majorlab pulls → MkDocs rebuilds → `notes.majorshouse.com` updates automatically.

+> [!tip] One-liner wrapper
+> On MajorRig, the `~/bin/wiki-commit "msg"` helper runs `git pull --rebase --autostash` → `git add -A` → `git commit` → `git push` in one shot. Sidesteps fast-forward rejections from cowork pushes (e.g. MajorAir pushing in parallel) and the empty-credentials issue with HTTPS.
+
+## 🔒 Pre-Commit Hook (in repo)
+
+`.githooks/pre-commit` (tracked) blocks any commit that adds or renames a `*.md` article without a corresponding entry in `SUMMARY.md`. Bypass with `git commit --no-verify` if you genuinely need to.
+
+**Per-clone setup** (one-time, per workstation that uses the repo):
+
+```bash
+cd <wiki-repo>
+git config core.hooksPath .githooks
+git config pull.rebase true
+```
+
+The hooksPath line is required — git doesn't run hooks from a tracked directory by default. The `pull.rebase true` makes plain `git pull` always rebase locally, matching the `wiki-commit` wrapper's behavior.
+
 ## 📋 Wiki Maintenance Protocol

 Every time a new article is added, the following **MUST** be updated to maintain index integrity:
--- a/README.md
+++ b/README.md
@ -1,6 +1,6 @@
 ---
 created: 2026-04-06T09:52
-updated: 2026-04-29T22:46
+updated: 2026-04-30T05:21
 ---
 # MajorLinux Tech Wiki — Index

--- a/SUMMARY.md
+++ b/SUMMARY.md
@ -1,6 +1,6 @@
 ---
 created: 2026-04-02T16:03
-updated: 2026-04-29T22:45
+updated: 2026-04-30T11:24
 ---
 * [Home](index.md)
 * [Linux & Sysadmin](01-linux/index.md)
@ -43,6 +43,7 @@ updated: 2026-04-29T22:45
    * [Fail2ban Custom Jail: Apache 404 Scanner Detection](02-selfhosting/security/fail2ban-apache-404-scanner-jail.md)
    * [Fail2ban Custom Jail: Apache PHP Webshell Probe Detection](02-selfhosting/security/fail2ban-apache-php-probe-jail.md)
    * [Fail2ban Custom Jail: WordPress Login Brute Force](02-selfhosting/security/fail2ban-wordpress-login-jail.md)
+    * [wp-fail2ban Plugin Logpath on Debian/Ubuntu (auth.log not syslog)](02-selfhosting/security/wp-fail2ban-logpath-debian-ubuntu.md)
    * [SELinux: Fixing Fail2ban grep execmem Denial](02-selfhosting/security/selinux-fail2ban-execmem-fix.md)
    * [UFW Firewall Management](02-selfhosting/security/ufw-firewall-management.md)
    * [Firewall Hardening with firewalld on Fedora Fleet](02-selfhosting/security/firewalld-fleet-hardening.md)
@ -77,6 +78,7 @@ updated: 2026-04-29T22:45
    * [ISP SNI Filtering with Caddy](05-troubleshooting/isp-sni-filtering-caddy.md)
    * [Obsidian Vault Recovery — Loading Cache Hang](05-troubleshooting/obsidian-cache-hang-recovery.md)
    * [Qwen2.5-14B OOM on RTX 3080 Ti (12GB)](05-troubleshooting/gpu-display/qwen-14b-oom-3080ti.md)
+    * [LoRA adapter — GGUF conversion fails with 'config.json not found'](05-troubleshooting/gpu-display/lora-adapter-gguf-conversion-fails.md)
    * [yt-dlp YouTube JS Challenge Fix on Fedora](05-troubleshooting/yt-dlp-fedora-js-challenge.md)
    * [Gemini CLI Manual Update](05-troubleshooting/gemini-cli-manual-update.md)
    * [MajorWiki Setup & Publishing Pipeline](05-troubleshooting/majwiki-setup-and-pipeline.md)
@ -90,6 +92,7 @@ updated: 2026-04-29T22:45
    * [Ollama Drops Off Tailscale When Mac Sleeps](05-troubleshooting/ollama-macos-sleep-tailscale-disconnect.md)
    * [Ollama: `ollama run` with Piped Stdin Bypasses Chat Template + SYSTEM Prompt](05-troubleshooting/ollama-chat-template-pipe-stdin-bypass.md)
    * [rsync over Tailscale: Hung in TCP Teardown After Transfer Completes](05-troubleshooting/networking/rsync-tailscale-teardown-stall.md)
+    * [iOS Tailscale Clients Report HostName="localhost" — Breaks /etc/hosts Generators](05-troubleshooting/networking/tailscale-status-json-hostname-localhost-ios.md)
    * [macOS: Repeating Alert Tone from Mirrored iPhone Notification](05-troubleshooting/macos-mirrored-notification-alert-loop.md)
    * [ClamAV CPU Spike: Safe Scheduling with nice/ionice](05-troubleshooting/security/clamscan-cpu-spike-nice-ionice.md)
    * [Ansible: Vault Password File Not Found](05-troubleshooting/ansible-vault-password-file-missing.md)
--- a/index.md
+++ b/index.md
@ -1,6 +1,6 @@
 ---
 created: 2026-04-06T09:52
-updated: 2026-04-29T22:45
+updated: 2026-04-30T05:21
 ---
 # MajorLinux Tech Wiki — Index