wiki: 2026-04-30 update to ISP SNI filtering article

Re-diagnoses today's notes.majorshouse.com outage. Original framing
was "ISP filter expanded to include 'notes'" — but the actual root
cause was a stale A record pointing at 136.54.3.248 (not majorlab's
current home IP). Corrects the comparison table to show CNAMEs to
apex resolve to 136.56.0.55, and recommends a Cloudflare-proxied
CNAME as the durable shape so the apex follows home IP automatically
and ISP-level SNI weirdness is bypassed at the same time.

Includes the working CF API payload used to flip the record, and an
audit checklist for any new *.majorshouse.com subdomain.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
Marcus Summers 2026-04-30 13:08:36 -04:00
parent 34cc5c3d0b
commit 74c4ed9959

View file

@ -5,7 +5,7 @@ category: general
tags: [isp, sni, caddy, tls, dns, cloudflare] tags: [isp, sni, caddy, tls, dns, cloudflare]
status: published status: published
created: 2026-04-02 created: 2026-04-02
updated: 2026-04-02 updated: 2026-04-30
--- ---
# ISP SNI Filtering & Caddy Troubleshooting # ISP SNI Filtering & Caddy Troubleshooting
@ -29,3 +29,89 @@ notes.majorshouse.com {
``` ```
Once the hostname was changed to one without the "wiki" keyword, the TLS handshake completed successfully. Once the hostname was changed to one without the "wiki" keyword, the TLS handshake completed successfully.
---
## 🔁 2026-04-30 Update — Stale A Record + Cloudflare Proxy Fix
The hostname rename held for ~4 weeks. On 2026-04-30 the wiki went down with a TLS handshake failure on `notes.majorshouse.com`. The on-the-spot framing was "ISP filter expanded to include 'notes'" — but Cloudflare DNS audit showed a different (and arguably worse) root cause: **the `notes` A record was pointing at `136.54.3.248`, an IP that is not majorlab's current home IP.** Whichever host responds at that address either does not run Caddy or does not know about the `notes.majorshouse.com` SNI, so the TLS handshake was rejected with `internal_error 80`.
### Re-diagnosis
```bash
# Cert + Caddy + mkdocs all healthy on majorlab
$ ssh majorlab 'systemctl is-active caddy; ss -tlnp | grep :443'
active
LISTEN 0 4096 *:443 users:(("caddy",pid=1549,fd=7))
# Loopback-served TLS works fine — cert valid Mar 11 → Jun 9 2026
$ ssh majorlab 'curl -sS -o /dev/null -w "%{http_code}\n" --resolve notes.majorshouse.com:443:127.0.0.1 https://notes.majorshouse.com/'
200
# External TLS handshake gets rejected with internal_error
$ openssl s_client -servername notes.majorshouse.com -connect 136.54.3.248:443
… SSL alert number 80 (internal_error) …
```
### The smoking-gun comparison
Other `*.majorshouse.com` services worked because they were CNAMEs to the apex, which resolves to majorlab's actual home IP:
| Subdomain | DNS shape | Final IP | Status |
|---|---|---|---|
| `notes.majorshouse.com` | **A → `136.54.3.248`** (stale) | `136.54.3.248` (wrong host) | ❌ TLS rejected |
| `git.majorshouse.com` | CNAME → `majorshouse.com.` | `136.56.0.55` (majorlab) | ✅ |
| `n8n.majorshouse.com` | CNAME → `majorshouse.com.` | `136.56.0.55` (majorlab) | ✅ |
| `matrix.majorshouse.com` | CNAME → `majorshouse.com.` | `136.56.0.55` (majorlab) | ✅ |
None of the working subdomains were proxied through Cloudflare (`proxied=false` on all of them); they simply had the right IP. The `notes` A record was the only one pointing somewhere wrong — most likely a stale value from a prior ISP / IP change that never got cleaned up.
### ✅ Fix — switch `notes` to a Cloudflare-proxied CNAME
Rather than just correcting the A record (which would silently break again the next time the home IP changes), the fix is a CNAME to the apex with proxy on. That gives two protections in one move: it always tracks the apex (so home IP changes propagate automatically) and it puts the wiki behind Cloudflare's edge (so any future ISP-side weirdness like the original `wiki` SNI filter is also bypassed).
```bash
# via Cloudflare API (token from ansible-vault: vault_cloudflare_api_token)
PUT /zones/{ZONE_ID}/dns_records/{NOTES_RECORD_ID}
{
"type": "CNAME",
"name": "notes.majorshouse.com",
"content": "majorshouse.com",
"ttl": 1,
"proxied": true,
"comment": "switched A→CNAME proxied to bypass stale IP / ISP SNI filter"
}
```
Or via the dashboard:
1. Cloudflare → `majorshouse.com` zone → DNS → Records
2. Edit the `notes` record: Type `CNAME`, Target `majorshouse.com`, Proxy `Proxied` (orange cloud)
3. Save
External clients now hit Cloudflare edge IPs (`104.21.x.x` / `172.67.x.x`) which TLS-terminate at the edge and tunnel back to majorlab's apex IP. ACME on majorlab keeps working — Cloudflare passes the HTTP-01 challenge through on port 80. Caddy's `notes.majorshouse.com {}` block needs no change.
Verify (response should show `server: cloudflare` and `via: 1.0 Caddy`):
```bash
curl -sSI https://notes.majorshouse.com/
```
### Why a Cloudflare-proxied CNAME is the durable shape
- **Apex follows the home IP automatically.** Update the apex A record once when the ISP changes; every subdomain inherits it without per-record fixes.
- **TLS handshake is offloaded to CF.** Any ISP-level SNI weirdness (the original `wiki` ban; theoretical future bans) becomes irrelevant — external clients SNI=`notes.majorshouse.com` to Cloudflare, which the ISP doesn't filter.
- **Free.** Cloudflare's free tier covers proxy + TLS termination.
### Audit checklist for any home-hosted `*.majorshouse.com` subdomain
- [ ] DNS record is a **CNAME** to `majorshouse.com.`, not an A record to a literal home IP.
- [ ] Cloudflare proxy (orange cloud, `proxied=true`) enabled on the record — at minimum for any subdomain where TLS reachability matters.
- [ ] Caddy entry on majorlab references the public hostname; `reverse_proxy` stays on the localhost port.
- [ ] HTTPS verified from outside the LAN (phone on cellular is sufficient) within the first hour after the change.
- [ ] If an A record is genuinely required (e.g. it must NOT go through CF), document why in the deploy notes for that service.
### Related
- [[majwiki-setup-and-pipeline]] — full wiki deploy pipeline; the DNS step there should reference this fix
- [[Network-Overview]] — fleet IP table