vault backup: 2026-03-13 01:31:25

This commit is contained in:
2026-03-13 01:31:25 -04:00
parent 999e1107f0
commit 639b23f861
9 changed files with 446 additions and 180 deletions

View File

@@ -0,0 +1,135 @@
# Docker & Caddy Recovery After Reboot (Fedora + SELinux)
## 🛑 Problem
After a system reboot on **majorlab** (Fedora 43, SELinux Enforcing), Docker containers and all Caddy-proxied services become unreachable. Browsers may show connection errors or 502 Bad Gateway responses.
## 🔍 Diagnosis
Three separate failures occur in sequence:
### 1. Docker fails to start
```bash
systemctl status docker.service
# → Active: inactive (dead)
# → Dependency failed for docker.service
systemctl status docker.socket
# → Active: failed (Result: resources)
# → Failed to create listening socket (/run/docker.sock): Invalid argument
```
**Cause:** `docker.socket` is disabled, so Docker's socket activation fails and `docker.service` never starts. All containers are down.
---
### 2. Caddy fails to bind ports
```bash
journalctl -u caddy -n 20
# → Error: listen tcp :4443: bind: permission denied
# → Error: listen tcp :8448: bind: permission denied
```
**Cause:** SELinux's `http_port_t` type does not include ports `4443` (Tailscale HTTPS) or `8448` (Matrix federation), so Caddy is denied when trying to bind them.
---
### 3. Caddy returns 502 Bad Gateway
Even after Caddy starts, all reverse proxied services return 502.
```bash
journalctl -u caddy | grep "permission denied"
# → dial tcp 127.0.0.1:<port>: connect: permission denied
```
**Cause:** The SELinux boolean `httpd_can_network_connect` is off, preventing Caddy from making outbound connections to upstream services.
---
## ✅ Solution
### Step 1 — Re-enable and start Docker
```bash
sudo systemctl enable docker.socket
sudo systemctl start docker.socket
sudo systemctl start docker.service
```
Verify containers are up:
```bash
sudo docker ps -a
```
---
### Step 2 — Add missing ports to SELinux http_port_t
```bash
sudo semanage port -m -t http_port_t -p tcp 4443
sudo semanage port -a -t http_port_t -p tcp 8448
```
Verify:
```bash
sudo semanage port -l | grep http_port_t
# Should include 4443 and 8448
```
---
### Step 3 — Enable httpd_can_network_connect
```bash
sudo setsebool -P httpd_can_network_connect on
```
The `-P` flag makes this persistent across reboots.
---
### Step 4 — Start Caddy
```bash
sudo systemctl restart caddy
systemctl is-active caddy
# → active
```
---
## 🔁 Why This Happens
| Issue | Root Cause |
|---|---|
| Docker down | `docker.socket` was disabled (not just stopped) — survives reboots until explicitly enabled |
| Port bind denied | SELinux requires non-standard ports to be explicitly added to `http_port_t` — this is not automatic on upgrades or reinstalls |
| 502 on all proxied services | `httpd_can_network_connect` defaults to `off` on Fedora — must be set once per installation |
---
## 🔎 Quick Diagnostic Commands
```bash
# Check Docker
systemctl status docker.socket docker.service
sudo docker ps -a
# Check Caddy
systemctl status caddy
journalctl -u caddy -n 30
# Check SELinux booleans
getsebool httpd_can_network_connect
# Check allowed HTTP ports
sudo semanage port -l | grep http_port_t
# Test upstream directly (bypass Caddy)
curl -sv http://localhost:8086
```

View File

@@ -6,3 +6,4 @@ Practical fixes for common Linux, networking, and application problems.
- [Obsidian Cache Hang Recovery](obsidian-cache-hang-recovery.md)
- [yt-dlp Fedora JS Challenge](yt-dlp-fedora-js-challenge.md)
- [MajorWiki Setup & Publishing Pipeline](majwiki-setup-and-pipeline.md)
- [Docker & Caddy Recovery After Reboot (Fedora + SELinux)](docker-caddy-selinux-post-reboot-recovery.md)

View File

@@ -1,129 +1,22 @@
---
title: ISP SNI Filtering Blocking Caddy Reverse Proxy
domain: troubleshooting
category: networking
tags:
- caddy
- tls
- sni
- isp
- google-fiber
- reverse-proxy
- troubleshooting
status: published
created: '2026-03-11'
updated: '2026-03-11'
---
# ISP SNI Filtering & Caddy Troubleshooting
# ISP SNI Filtering Blocking Caddy Reverse Proxy
## 🛑 Problem
When deploying the MajorWiki at `wiki.majorshouse.com`, the site was unreachable over HTTPS. Browsers reported a `TLS_CONNECTION_REFUSED` error.
Some ISPs — including Google Fiber — silently block TLS handshakes for certain hostnames at the network level. The connection reaches your server, TCP completes, but the TLS handshake never finishes. The symptom looks identical to a misconfigured Caddy setup or a missing certificate, which makes it a frustrating thing to debug.
## 🔍 Diagnosis
1. **Direct IP Check:** Accessing the server via IP on port 8092 worked fine.
2. **Tailscale Check:** Accessing via the Tailscale magic DNS worked fine.
3. **SNI Analysis:** Using `openssl s_client -connect <IP>:443 -servername wiki.majorshouse.com` resulted in an immediate reset by peer.
4. **Root Cause:** Google Fiber (the local ISP) appears to be performing SNI-based filtering on hostnames containing the string "wiki".
## What Happened
## ✅ Solution
The domain was changed from `wiki.majorshouse.com` to `notes.majorshouse.com`.
Deployed a new Caddy vhost for `wiki.majorshouse.com` on a Google Fiber residential connection. Everything on the server was correct:
- Let's Encrypt cert provisioned successfully
- Caddy validated clean with `caddy validate`
- `curl --resolve wiki.majorshouse.com:443:127.0.0.1 https://wiki.majorshouse.com` returned 200 from loopback
- iptables had ACCEPT rules for ports 80 and 443
- All other Caddy vhosts on the same IP and port worked fine externally
But from any external host, `curl` timed out with no response. `ss -tn` showed SYN-RECV connections piling up on port 443 — the TCP handshake was completing, but the TLS handshake was stalling.
## The Debugging Sequence
**Step 1: Ruled out Caddy config issues**
```bash
caddy validate --config /etc/caddy/Caddyfile
curl --resolve wiki.majorshouse.com:443:127.0.0.1 https://wiki.majorshouse.com
### Caddy Configuration Update
```caddy
notes.majorshouse.com {
reverse_proxy :8092
}
```
Both clean. Loopback returned 200.
**Step 2: Ruled out certificate issues**
```bash
ls /var/lib/caddy/.local/share/caddy/certificates/acme-v02.api.letsencrypt.org-directory/wiki.majorshouse.com/
openssl x509 -in wiki.majorshouse.com.crt -noout -text | grep -E "Subject:|Not Before|Not After"
```
Valid cert, correct subject, not expired.
**Step 3: Ruled out firewall**
```bash
iptables -L INPUT -n -v | grep -E "80|443"
ss -tlnp | grep ':443'
```
Ports open, Caddy listening on `*:443`.
**Step 4: Ruled out hairpin NAT**
Testing `curl https://wiki.majorshouse.com` from the server itself returned "No route to host" — the server can't reach its own public IP. This is normal for residential connections without NAT loopback. It's not the problem.
**Step 5: Confirmed external connectivity on port 443**
```bash
# From an external server (majormail)
curl -sk -o /dev/null -w "%{http_code}" https://git.majorshouse.com # 200
curl -sk -o /dev/null -w "%{http_code}" https://wiki.majorshouse.com # 000
```
Same IP, same port, same Caddy process. `git` works, `wiki` doesn't.
**Step 6: Tested a different subdomain**
Added `notes.majorshouse.com` as a new Caddyfile entry pointing to the same upstream. Cert provisioned via HTTP-01 challenge successfully (proving port 80 is reachable). Then:
```bash
curl -sk -o /dev/null -w "%{http_code}" https://notes.majorshouse.com # 200
curl -sk -o /dev/null -w "%{http_code}" https://wiki.majorshouse.com # 000
```
`notes` worked immediately. `wiki` still timed out.
**Conclusion:** Google Fiber is performing SNI-based filtering and blocking TLS connections where the ClientHello contains `wiki.majorshouse.com` as the server name.
## The Fix
Rename the subdomain. Use anything that doesn't trigger the filter. `notes.majorshouse.com` works fine.
```bash
# Remove the blocked entry
sed -i '/^wiki\.majorshouse\.com/,/^}/d' /etc/caddy/Caddyfile
systemctl reload caddy
```
Update `mkdocs.yml` or whatever service's config references the domain, add DNS for the new subdomain, and done.
## How to Diagnose This Yourself
If your Caddy vhost works on loopback but times out externally:
1. Confirm other vhosts on the same IP and port work externally
2. Test the specific domain from multiple external networks (different ISP, mobile data)
3. Add a second vhost with a different subdomain pointing to the same upstream
4. If the new subdomain works and the original doesn't, the hostname is being filtered
```bash
# Quick external test — run from a server outside your network
curl -sk -o /dev/null -w "%{http_code}" --max-time 10 https://your-domain.com
```
If you get `000` (connection timeout, not a TLS error like `curl: (35)`), the TCP connection isn't completing — pointing to network-level blocking rather than a Caddy or cert issue.
## Gotchas & Notes
- **`curl: (35) TLS error` is different from `000`.** A TLS error means TCP connected but the handshake failed — usually a missing or invalid cert. A `000` timeout means TCP never completed — a network or firewall issue.
- **SYN-RECV in `ss -tn` means TCP is partially open.** If you see SYN-RECV entries for your domain but the connection never moves to ESTAB, something between the client and your TLS stack is dropping the handshake.
- **ISP SNI filtering is uncommon but real.** Residential ISPs sometimes filter on SNI for terms associated with piracy, proxies, or certain categories of content. "Wiki" may trigger a content-type heuristic.
- **Loopback testing isn't enough.** Always test from an external host before declaring a service working. The server can't test its own public IP on most residential connections.
## See Also
- [[setting-up-caddy-reverse-proxy]]
- [[linux-server-hardening-checklist]]
- [[tailscale-homelab-remote-access]]
Once the hostname was changed to one without the "wiki" keyword, the TLS handshake completed successfully.

View File

@@ -0,0 +1,186 @@
# Apache Outage: Fail2ban Self-Ban + Missing iptables Rules
## 🛑 Problem
A web server running Apache2 becomes completely unreachable (`ERR_CONNECTION_TIMED_OUT`) despite Apache running normally. SSH access via Tailscale is unaffected.
---
## 🔍 Diagnosis
### Step 1 — Confirm Apache is running
```bash
sudo systemctl status apache2
```
If Apache is `active (running)`, the problem is at the firewall layer, not the application.
---
### Step 2 — Test the public IP directly
```bash
curl -I --max-time 5 http://<PUBLIC_IP>
```
A **timeout** means traffic is being dropped by the firewall. A **connection refused** means Apache is down.
---
### Step 3 — Check the iptables INPUT chain
```bash
sudo iptables -L INPUT -n -v
```
Look for ACCEPT rules on ports 80 and 443. If they're missing and the chain policy is `DROP`, HTTP/HTTPS traffic is being silently dropped.
**Example of broken state:**
```
Chain INPUT (policy DROP)
ACCEPT tcp -- lo * ... # loopback only
ACCEPT tcp -- tailscale0 * ... tcp dpt:22
# no rules for port 80 or 443
```
---
### Step 4 — Check the nftables ruleset for Fail2ban
```bash
sudo nft list tables
```
Look for `table inet f2b-table` — this is Fail2ban's nftables table. It operates at **priority `filter - 1`**, meaning it is evaluated *before* the main iptables INPUT chain.
```bash
sudo nft list ruleset | grep -A 10 'f2b-table'
```
Fail2ban rejects banned IPs with rules like:
```
tcp dport { 80, 443 } ip saddr @addr-set-wordpress-hard reject with icmp port-unreachable
```
A banned admin IP will be rejected here regardless of any ACCEPT rules downstream.
---
### Step 5 — Check if your IP is banned
```bash
for jail in $(sudo fail2ban-client status | grep "Jail list" | sed 's/.*://;s/,/ /g'); do
echo "=== $jail ==="; sudo fail2ban-client get $jail banip | tr ',' '\n' | grep <YOUR_IP>
done
```
---
## ✅ Solution
### Fix 1 — Add missing iptables ACCEPT rules for HTTP/HTTPS
If ports 80/443 are absent from the INPUT chain:
```bash
sudo iptables -I INPUT -i eth0 -p tcp --dport 80 -j ACCEPT
sudo iptables -I INPUT -i eth0 -p tcp --dport 443 -j ACCEPT
```
Persist the rules:
```bash
sudo netfilter-persistent save
```
If `netfilter-persistent` is not installed:
```bash
sudo apt install -y iptables-persistent
sudo netfilter-persistent save
```
---
### Fix 2 — Unban your IP from all Fail2ban jails
```bash
for jail in $(sudo fail2ban-client status | grep "Jail list" | sed 's/.*://;s/,/ /g'); do
sudo fail2ban-client set $jail unbanip <YOUR_IP> 2>/dev/null && echo "Unbanned from $jail"
done
```
---
### Fix 3 — Add your IP to Fail2ban's ignore list
Edit `/etc/fail2ban/jail.local`:
```bash
sudo nano /etc/fail2ban/jail.local
```
Add or update the `[DEFAULT]` section:
```ini
[DEFAULT]
ignoreip = 127.0.0.1/8 ::1 <YOUR_IP>
```
Restart Fail2ban:
```bash
sudo systemctl restart fail2ban
```
---
## 🔁 Why This Happens
| Issue | Root Cause |
|---|---|
| Missing port 80/443 rules | iptables INPUT chain left incomplete after a manual firewall rework (e.g., SSH lockdown) |
| Still blocked after adding iptables rules | Fail2ban uses a separate nftables table at higher priority — iptables ACCEPT rules are never reached for banned IPs |
| Admin IP gets banned | Automated WordPress/Apache probes trigger Fail2ban jails against the admin's own IP |
---
## ⚠️ Key Architecture Note
On servers running both iptables and Fail2ban, the evaluation order is:
1. **`inet f2b-table`** (nftables, priority `filter - 1`) — Fail2ban ban sets; evaluated first
2. **`ip filter` INPUT chain** (iptables/nftables, policy DROP) — explicit ACCEPT rules
3. **UFW chains** — IP-specific rules; evaluated last
A banned IP is stopped at step 1 and never reaches the ACCEPT rules in step 2. Always check Fail2ban *after* confirming iptables looks correct.
---
## 🔎 Quick Diagnostic Commands
```bash
# Check Apache
sudo systemctl status apache2
# Test public connectivity
curl -I --max-time 5 http://<PUBLIC_IP>
# Check iptables INPUT chain
sudo iptables -L INPUT -n -v
# List nftables tables (look for inet f2b-table)
sudo nft list tables
# Check Fail2ban jail status
sudo fail2ban-client status
# Check a specific jail's banned IPs
sudo fail2ban-client status wordpress-hard
# Unban an IP from all jails
for jail in $(sudo fail2ban-client status | grep "Jail list" | sed 's/.*://;s/,/ /g'); do
sudo fail2ban-client set $jail unbanip <YOUR_IP> 2>/dev/null && echo "Unbanned from $jail"
done
```

View File

@@ -135,3 +135,42 @@ This is a YouTube-side experiment. yt-dlp falls back to other clients automatica
yt-dlp --version
pip show yt-dlp
```
### Format Not Available: Strict AVC+M4A Selector
The format selector `bestvideo[vcodec^=avc]+bestaudio[ext=m4a]` will hard-fail if YouTube doesn't serve H.264 (AVC) video for a given video:
```
ERROR: [youtube] Requested format is not available. Use --list-formats for a list of available formats
```
This is separate from the n-challenge issue — the format simply doesn't exist for that video (common with newer uploads that are VP9/AV1-only).
**Fix 1 — Relax the selector to mp4 container without enforcing codec:**
```bash
yt-dlp -f 'bestvideo[ext=mp4]+bestaudio[ext=m4a]/bestvideo+bestaudio' \
--merge-output-format mp4 \
-o "/plex/plex/%(title)s.%(ext)s" \
--write-auto-subs --embed-subs \
https://youtu.be/VIDEO_ID
```
**Fix 2 — Let yt-dlp pick best and re-encode to H.264 via ffmpeg (Plex-safe, slower):**
```bash
yt-dlp -f 'bestvideo+bestaudio' \
--merge-output-format mp4 \
--recode-video mp4 \
-o "/plex/plex/%(title)s.%(ext)s" \
--write-auto-subs --embed-subs \
https://youtu.be/VIDEO_ID
```
Use `--recode-video mp4` when Plex direct play is required and the source stream may be VP9/AV1. Requires ffmpeg.
**Inspect available formats first:**
```bash
yt-dlp --list-formats https://youtu.be/VIDEO_ID
```