vault backup: 2026-03-13 01:31:25

2026-03-13 01:31:25 -04:00
parent 999e1107f0
commit 639b23f861
9 changed files with 446 additions and 180 deletions
--- a/05-troubleshooting/docker-caddy-selinux-post-reboot-recovery.md
+++ b/05-troubleshooting/docker-caddy-selinux-post-reboot-recovery.md
@@ -0,0 +1,135 @@
+# Docker & Caddy Recovery After Reboot (Fedora + SELinux)
+
+## 🛑 Problem
+
+After a system reboot on **majorlab** (Fedora 43, SELinux Enforcing), Docker containers and all Caddy-proxied services become unreachable. Browsers may show connection errors or 502 Bad Gateway responses.
+
+## 🔍 Diagnosis
+
+Three separate failures occur in sequence:
+
+### 1. Docker fails to start
+
+```bash
+systemctl status docker.service
+# → Active: inactive (dead)
+# → Dependency failed for docker.service
+
+systemctl status docker.socket
+# → Active: failed (Result: resources)
+# → Failed to create listening socket (/run/docker.sock): Invalid argument
+```
+
+**Cause:** `docker.socket` is disabled, so Docker's socket activation fails and `docker.service` never starts. All containers are down.
+
+---
+
+### 2. Caddy fails to bind ports
+
+```bash
+journalctl -u caddy -n 20
+# → Error: listen tcp :4443: bind: permission denied
+# → Error: listen tcp :8448: bind: permission denied
+```
+
+**Cause:** SELinux's `http_port_t` type does not include ports `4443` (Tailscale HTTPS) or `8448` (Matrix federation), so Caddy is denied when trying to bind them.
+
+---
+
+### 3. Caddy returns 502 Bad Gateway
+
+Even after Caddy starts, all reverse proxied services return 502.
+
+```bash
+journalctl -u caddy | grep "permission denied"
+# → dial tcp 127.0.0.1:<port>: connect: permission denied
+```
+
+**Cause:** The SELinux boolean `httpd_can_network_connect` is off, preventing Caddy from making outbound connections to upstream services.
+
+---
+
+## ✅ Solution
+
+### Step 1 — Re-enable and start Docker
+
+```bash
+sudo systemctl enable docker.socket
+sudo systemctl start docker.socket
+sudo systemctl start docker.service
+```
+
+Verify containers are up:
+
+```bash
+sudo docker ps -a
+```
+
+---
+
+### Step 2 — Add missing ports to SELinux http_port_t
+
+```bash
+sudo semanage port -m -t http_port_t -p tcp 4443
+sudo semanage port -a -t http_port_t -p tcp 8448
+```
+
+Verify:
+
+```bash
+sudo semanage port -l | grep http_port_t
+# Should include 4443 and 8448
+```
+
+---
+
+### Step 3 — Enable httpd_can_network_connect
+
+```bash
+sudo setsebool -P httpd_can_network_connect on
+```
+
+The `-P` flag makes this persistent across reboots.
+
+---
+
+### Step 4 — Start Caddy
+
+```bash
+sudo systemctl restart caddy
+systemctl is-active caddy
+# → active
+```
+
+---
+
+## 🔁 Why This Happens
+
+| Issue | Root Cause |
+|---|---|
+| Docker down | `docker.socket` was disabled (not just stopped) — survives reboots until explicitly enabled |
+| Port bind denied | SELinux requires non-standard ports to be explicitly added to `http_port_t` — this is not automatic on upgrades or reinstalls |
+| 502 on all proxied services | `httpd_can_network_connect` defaults to `off` on Fedora — must be set once per installation |
+
+---
+
+## 🔎 Quick Diagnostic Commands
+
+```bash
+# Check Docker
+systemctl status docker.socket docker.service
+sudo docker ps -a
+
+# Check Caddy
+systemctl status caddy
+journalctl -u caddy -n 30
+
+# Check SELinux booleans
+getsebool httpd_can_network_connect
+
+# Check allowed HTTP ports
+sudo semanage port -l | grep http_port_t
+
+# Test upstream directly (bypass Caddy)
+curl -sv http://localhost:8086
+```
--- a/05-troubleshooting/index.md
+++ b/05-troubleshooting/index.md
@@ -6,3 +6,4 @@ Practical fixes for common Linux, networking, and application problems.
 - [Obsidian Cache Hang Recovery](obsidian-cache-hang-recovery.md)
 - [yt-dlp Fedora JS Challenge](yt-dlp-fedora-js-challenge.md)
 - [MajorWiki Setup & Publishing Pipeline](majwiki-setup-and-pipeline.md)
+- [Docker & Caddy Recovery After Reboot (Fedora + SELinux)](docker-caddy-selinux-post-reboot-recovery.md)
--- a/05-troubleshooting/isp-sni-filtering-caddy.md
+++ b/05-troubleshooting/isp-sni-filtering-caddy.md
@@ -1,129 +1,22 @@
---
-title: ISP SNI Filtering Blocking Caddy Reverse Proxy
-domain: troubleshooting
-category: networking
-tags:
-  - caddy
-  - tls
-  - sni
-  - isp
-  - google-fiber
-  - reverse-proxy
-  - troubleshooting
-status: published
-created: '2026-03-11'
-updated: '2026-03-11'
---
+# ISP SNI Filtering & Caddy Troubleshooting

-# ISP SNI Filtering Blocking Caddy Reverse Proxy
+## 🛑 Problem
+When deploying the MajorWiki at `wiki.majorshouse.com`, the site was unreachable over HTTPS. Browsers reported a `TLS_CONNECTION_REFUSED` error.

-Some ISPs — including Google Fiber — silently block TLS handshakes for certain hostnames at the network level. The connection reaches your server, TCP completes, but the TLS handshake never finishes. The symptom looks identical to a misconfigured Caddy setup or a missing certificate, which makes it a frustrating thing to debug.
+## 🔍 Diagnosis
+1.  **Direct IP Check:** Accessing the server via IP on port 8092 worked fine.
+2.  **Tailscale Check:** Accessing via the Tailscale magic DNS worked fine.
+3.  **SNI Analysis:** Using `openssl s_client -connect <IP>:443 -servername wiki.majorshouse.com` resulted in an immediate reset by peer.
+4.  **Root Cause:** Google Fiber (the local ISP) appears to be performing SNI-based filtering on hostnames containing the string "wiki".

-## What Happened
+## ✅ Solution
+The domain was changed from `wiki.majorshouse.com` to `notes.majorshouse.com`.

-Deployed a new Caddy vhost for `wiki.majorshouse.com` on a Google Fiber residential connection. Everything on the server was correct:
-
- Let's Encrypt cert provisioned successfully
- Caddy validated clean with `caddy validate`
- `curl --resolve wiki.majorshouse.com:443:127.0.0.1 https://wiki.majorshouse.com` returned 200 from loopback
- iptables had ACCEPT rules for ports 80 and 443
- All other Caddy vhosts on the same IP and port worked fine externally
-
-But from any external host, `curl` timed out with no response. `ss -tn` showed SYN-RECV connections piling up on port 443 — the TCP handshake was completing, but the TLS handshake was stalling.
-
-## The Debugging Sequence
-
-**Step 1: Ruled out Caddy config issues**
-
-```bash
-caddy validate --config /etc/caddy/Caddyfile
-curl --resolve wiki.majorshouse.com:443:127.0.0.1 https://wiki.majorshouse.com
+### Caddy Configuration Update
+```caddy
+notes.majorshouse.com {
+    reverse_proxy :8092
+}
 ```

-Both clean. Loopback returned 200.
-
-**Step 2: Ruled out certificate issues**
-
-```bash
-ls /var/lib/caddy/.local/share/caddy/certificates/acme-v02.api.letsencrypt.org-directory/wiki.majorshouse.com/
-openssl x509 -in wiki.majorshouse.com.crt -noout -text | grep -E "Subject:|Not Before|Not After"
-```
-
-Valid cert, correct subject, not expired.
-
-**Step 3: Ruled out firewall**
-
-```bash
-iptables -L INPUT -n -v | grep -E "80|443"
-ss -tlnp | grep ':443'
-```
-
-Ports open, Caddy listening on `*:443`.
-
-**Step 4: Ruled out hairpin NAT**
-
-Testing `curl https://wiki.majorshouse.com` from the server itself returned "No route to host" — the server can't reach its own public IP. This is normal for residential connections without NAT loopback. It's not the problem.
-
-**Step 5: Confirmed external connectivity on port 443**
-
-```bash
-# From an external server (majormail)
-curl -sk -o /dev/null -w "%{http_code}" https://git.majorshouse.com   # 200
-curl -sk -o /dev/null -w "%{http_code}" https://wiki.majorshouse.com  # 000
-```
-
-Same IP, same port, same Caddy process. `git` works, `wiki` doesn't.
-
-**Step 6: Tested a different subdomain**
-
-Added `notes.majorshouse.com` as a new Caddyfile entry pointing to the same upstream. Cert provisioned via HTTP-01 challenge successfully (proving port 80 is reachable). Then:
-
-```bash
-curl -sk -o /dev/null -w "%{http_code}" https://notes.majorshouse.com  # 200
-curl -sk -o /dev/null -w "%{http_code}" https://wiki.majorshouse.com   # 000
-```
-
-`notes` worked immediately. `wiki` still timed out.
-
-**Conclusion:** Google Fiber is performing SNI-based filtering and blocking TLS connections where the ClientHello contains `wiki.majorshouse.com` as the server name.
-
-## The Fix
-
-Rename the subdomain. Use anything that doesn't trigger the filter. `notes.majorshouse.com` works fine.
-
-```bash
-# Remove the blocked entry
-sed -i '/^wiki\.majorshouse\.com/,/^}/d' /etc/caddy/Caddyfile
-systemctl reload caddy
-```
-
-Update `mkdocs.yml` or whatever service's config references the domain, add DNS for the new subdomain, and done.
-
-## How to Diagnose This Yourself
-
-If your Caddy vhost works on loopback but times out externally:
-
-1. Confirm other vhosts on the same IP and port work externally
-2. Test the specific domain from multiple external networks (different ISP, mobile data)
-3. Add a second vhost with a different subdomain pointing to the same upstream
-4. If the new subdomain works and the original doesn't, the hostname is being filtered
-
-```bash
-# Quick external test — run from a server outside your network
-curl -sk -o /dev/null -w "%{http_code}" --max-time 10 https://your-domain.com
-```
-
-If you get `000` (connection timeout, not a TLS error like `curl: (35)`), the TCP connection isn't completing — pointing to network-level blocking rather than a Caddy or cert issue.
-
-## Gotchas & Notes
-
- **`curl: (35) TLS error` is different from `000`.** A TLS error means TCP connected but the handshake failed — usually a missing or invalid cert. A `000` timeout means TCP never completed — a network or firewall issue.
- **SYN-RECV in `ss -tn` means TCP is partially open.** If you see SYN-RECV entries for your domain but the connection never moves to ESTAB, something between the client and your TLS stack is dropping the handshake.
- **ISP SNI filtering is uncommon but real.** Residential ISPs sometimes filter on SNI for terms associated with piracy, proxies, or certain categories of content. "Wiki" may trigger a content-type heuristic.
- **Loopback testing isn't enough.** Always test from an external host before declaring a service working. The server can't test its own public IP on most residential connections.
-
-## See Also
-
- [[setting-up-caddy-reverse-proxy]]
- [[linux-server-hardening-checklist]]
- [[tailscale-homelab-remote-access]]
+Once the hostname was changed to one without the "wiki" keyword, the TLS handshake completed successfully.
--- a/05-troubleshooting/networking/fail2ban-self-ban-apache-outage.md
+++ b/05-troubleshooting/networking/fail2ban-self-ban-apache-outage.md
@@ -0,0 +1,186 @@
+# Apache Outage: Fail2ban Self-Ban + Missing iptables Rules
+
+## 🛑 Problem
+
+A web server running Apache2 becomes completely unreachable (`ERR_CONNECTION_TIMED_OUT`) despite Apache running normally. SSH access via Tailscale is unaffected.
+
+---
+
+## 🔍 Diagnosis
+
+### Step 1 — Confirm Apache is running
+
+```bash
+sudo systemctl status apache2
+```
+
+If Apache is `active (running)`, the problem is at the firewall layer, not the application.
+
+---
+
+### Step 2 — Test the public IP directly
+
+```bash
+curl -I --max-time 5 http://<PUBLIC_IP>
+```
+
+A **timeout** means traffic is being dropped by the firewall. A **connection refused** means Apache is down.
+
+---
+
+### Step 3 — Check the iptables INPUT chain
+
+```bash
+sudo iptables -L INPUT -n -v
+```
+
+Look for ACCEPT rules on ports 80 and 443. If they're missing and the chain policy is `DROP`, HTTP/HTTPS traffic is being silently dropped.
+
+**Example of broken state:**
+```
+Chain INPUT (policy DROP)
+  ACCEPT  tcp  --  lo       *    ...   # loopback only
+  ACCEPT  tcp  --  tailscale0 *  ...   tcp dpt:22
+  # no rules for port 80 or 443
+```
+
+---
+
+### Step 4 — Check the nftables ruleset for Fail2ban
+
+```bash
+sudo nft list tables
+```
+
+Look for `table inet f2b-table` — this is Fail2ban's nftables table. It operates at **priority `filter - 1`**, meaning it is evaluated *before* the main iptables INPUT chain.
+
+```bash
+sudo nft list ruleset | grep -A 10 'f2b-table'
+```
+
+Fail2ban rejects banned IPs with rules like:
+```
+tcp dport { 80, 443 } ip saddr @addr-set-wordpress-hard reject with icmp port-unreachable
+```
+
+A banned admin IP will be rejected here regardless of any ACCEPT rules downstream.
+
+---
+
+### Step 5 — Check if your IP is banned
+
+```bash
+for jail in $(sudo fail2ban-client status | grep "Jail list" | sed 's/.*://;s/,/ /g'); do
+  echo "=== $jail ==="; sudo fail2ban-client get $jail banip | tr ',' '\n' | grep <YOUR_IP>
+done
+```
+
+---
+
+## ✅ Solution
+
+### Fix 1 — Add missing iptables ACCEPT rules for HTTP/HTTPS
+
+If ports 80/443 are absent from the INPUT chain:
+
+```bash
+sudo iptables -I INPUT -i eth0 -p tcp --dport 80 -j ACCEPT
+sudo iptables -I INPUT -i eth0 -p tcp --dport 443 -j ACCEPT
+```
+
+Persist the rules:
+
+```bash
+sudo netfilter-persistent save
+```
+
+If `netfilter-persistent` is not installed:
+
+```bash
+sudo apt install -y iptables-persistent
+sudo netfilter-persistent save
+```
+
+---
+
+### Fix 2 — Unban your IP from all Fail2ban jails
+
+```bash
+for jail in $(sudo fail2ban-client status | grep "Jail list" | sed 's/.*://;s/,/ /g'); do
+  sudo fail2ban-client set $jail unbanip <YOUR_IP> 2>/dev/null && echo "Unbanned from $jail"
+done
+```
+
+---
+
+### Fix 3 — Add your IP to Fail2ban's ignore list
+
+Edit `/etc/fail2ban/jail.local`:
+
+```bash
+sudo nano /etc/fail2ban/jail.local
+```
+
+Add or update the `[DEFAULT]` section:
+
+```ini
+[DEFAULT]
+ignoreip = 127.0.0.1/8 ::1 <YOUR_IP>
+```
+
+Restart Fail2ban:
+
+```bash
+sudo systemctl restart fail2ban
+```
+
+---
+
+## 🔁 Why This Happens
+
+| Issue | Root Cause |
+|---|---|
+| Missing port 80/443 rules | iptables INPUT chain left incomplete after a manual firewall rework (e.g., SSH lockdown) |
+| Still blocked after adding iptables rules | Fail2ban uses a separate nftables table at higher priority — iptables ACCEPT rules are never reached for banned IPs |
+| Admin IP gets banned | Automated WordPress/Apache probes trigger Fail2ban jails against the admin's own IP |
+
+---
+
+## ⚠️ Key Architecture Note
+
+On servers running both iptables and Fail2ban, the evaluation order is:
+
+1. **`inet f2b-table`** (nftables, priority `filter - 1`) — Fail2ban ban sets; evaluated first
+2. **`ip filter` INPUT chain** (iptables/nftables, policy DROP) — explicit ACCEPT rules
+3. **UFW chains** — IP-specific rules; evaluated last
+
+A banned IP is stopped at step 1 and never reaches the ACCEPT rules in step 2. Always check Fail2ban *after* confirming iptables looks correct.
+
+---
+
+## 🔎 Quick Diagnostic Commands
+
+```bash
+# Check Apache
+sudo systemctl status apache2
+
+# Test public connectivity
+curl -I --max-time 5 http://<PUBLIC_IP>
+
+# Check iptables INPUT chain
+sudo iptables -L INPUT -n -v
+
+# List nftables tables (look for inet f2b-table)
+sudo nft list tables
+
+# Check Fail2ban jail status
+sudo fail2ban-client status
+
+# Check a specific jail's banned IPs
+sudo fail2ban-client status wordpress-hard
+
+# Unban an IP from all jails
+for jail in $(sudo fail2ban-client status | grep "Jail list" | sed 's/.*://;s/,/ /g'); do
+  sudo fail2ban-client set $jail unbanip <YOUR_IP> 2>/dev/null && echo "Unbanned from $jail"
+done
+```
--- a/05-troubleshooting/yt-dlp-fedora-js-challenge.md
+++ b/05-troubleshooting/yt-dlp-fedora-js-challenge.md
@@ -135,3 +135,42 @@ This is a YouTube-side experiment. yt-dlp falls back to other clients automatica
 yt-dlp --version
 pip show yt-dlp
 ```
+
+### Format Not Available: Strict AVC+M4A Selector
+
+The format selector `bestvideo[vcodec^=avc]+bestaudio[ext=m4a]` will hard-fail if YouTube doesn't serve H.264 (AVC) video for a given video:
+
+```
+ERROR: [youtube] Requested format is not available. Use --list-formats for a list of available formats
+```
+
+This is separate from the n-challenge issue — the format simply doesn't exist for that video (common with newer uploads that are VP9/AV1-only).
+
+**Fix 1 — Relax the selector to mp4 container without enforcing codec:**
+
+```bash
+yt-dlp -f 'bestvideo[ext=mp4]+bestaudio[ext=m4a]/bestvideo+bestaudio' \
+  --merge-output-format mp4 \
+  -o "/plex/plex/%(title)s.%(ext)s" \
+  --write-auto-subs --embed-subs \
+  https://youtu.be/VIDEO_ID
+```
+
+**Fix 2 — Let yt-dlp pick best and re-encode to H.264 via ffmpeg (Plex-safe, slower):**
+
+```bash
+yt-dlp -f 'bestvideo+bestaudio' \
+  --merge-output-format mp4 \
+  --recode-video mp4 \
+  -o "/plex/plex/%(title)s.%(ext)s" \
+  --write-auto-subs --embed-subs \
+  https://youtu.be/VIDEO_ID
+```
+
+Use `--recode-video mp4` when Plex direct play is required and the source stream may be VP9/AV1. Requires ffmpeg.
+
+**Inspect available formats first:**
+
+```bash
+yt-dlp --list-formats https://youtu.be/VIDEO_ID
+```