diff --git a/05-troubleshooting/networking/ssh-missing-host-block-magicdns-host-key-failure.md b/05-troubleshooting/networking/ssh-missing-host-block-magicdns-host-key-failure.md index a87815f..f597174 100644 --- a/05-troubleshooting/networking/ssh-missing-host-block-magicdns-host-key-failure.md +++ b/05-troubleshooting/networking/ssh-missing-host-block-magicdns-host-key-failure.md @@ -12,7 +12,7 @@ tags: - troubleshooting status: published created: 2026-06-11 -updated: 2026-06-11 +updated: 2026-06-12 --- # SSH Alias Falls Through to MagicDNS — Host-Key Verification Failure (No `Host` Block) @@ -82,6 +82,12 @@ Host MyMac mymac IdentityFile ~/.ssh/id_ed25519 ``` +> [!note] When pinning the IP is the *wrong* call +> Pinning the IP is right while the host is **stable**. If the box gets migrated or +> rebuilt — new Tailscale IP *and* new host key — the pin rots and `known_hosts` +> mismatches. At that point switch to **MagicDNS names** so the alias self-heals. See +> *[MagicDNS Names vs Pinned IPs for Tailscale SSH (After a Fleet Migration)](tailscale-ssh-magicdns-vs-pinned-ip-after-migration.md)*. + Now `ssh MyMac` resolves to `100.74.124.81`, whose key is in `known_hosts`, and the check passes with no prompt. Verify non-interactively: diff --git a/05-troubleshooting/networking/tailscale-ssh-magicdns-vs-pinned-ip-after-migration.md b/05-troubleshooting/networking/tailscale-ssh-magicdns-vs-pinned-ip-after-migration.md new file mode 100644 index 0000000..a07d72b --- /dev/null +++ b/05-troubleshooting/networking/tailscale-ssh-magicdns-vs-pinned-ip-after-migration.md @@ -0,0 +1,163 @@ +--- +title: "MagicDNS Names vs Pinned IPs for Tailscale SSH (After a Fleet Migration)" +domain: troubleshooting +category: networking +tags: + - ssh + - ssh-config + - tailscale + - magicdns + - known-hosts + - host-key + - migration + - wsl2 +status: published +created: 2026-06-12 +updated: 2026-06-12 +--- + +# MagicDNS Names vs Pinned IPs for Tailscale SSH (After a Fleet Migration) + +You have SSH aliases for a Tailscale fleet (`alias tttpod='ssh root@100.84.42.102'`). +They worked for months. Then you migrate or rebuild some nodes — and now a third of +them hang on connect or refuse the host key. This is the failure mode that hardcoded +addresses hit, and why the durable answer is **MagicDNS names**, not pinned IPs. + +> This is the sequel to *[SSH Alias Falls Through to MagicDNS — Host-Key Verification +> Failure (No `Host` Block)](ssh-missing-host-block-magicdns-host-key-failure.md)*. +> That article says **pin the IP** `known_hosts` already trusts — correct when the +> node is stable. This one covers what happens when a migration changes the IP *and* +> the host key, which is exactly when IP-pinning stops paying off. + +## The Three Failure Modes + +A migration/rebuild can trigger any of these — often several at once across a fleet, +which is what makes it confusing: + +### 1. Stale hardcoded IP → connection times out + +The node re-registered on the tailnet with a **new** Tailscale IP, but your alias +still names the old one: + +``` +$ tttpod +ssh: connect to host 100.84.42.102 port 22: Operation timed out +``` + +The old address is dead; SSH waits the full timeout and gives up. Confirm by asking +the tailnet for the node's *current* IP by name: + +``` +$ tailscale status | grep tttpod +100.95.137.38 tttpod ... # alias points at 100.84.42.102 — stale +``` + +### 2. Cold-path teardown → first connect after idle times out + +The IP is correct and the node is up (it answers `ping`), but TCP/22 still times out +on the *first* try after a quiet period, then works on retry. Tailscale 1.98.x is more +aggressive about tearing down **idle direct UDP paths**; the first SSH has to +re-establish NAT traversal, which can overrun SSH's default connect timeout. + +``` +$ tailscale status | grep tttpod +100.95.137.38 tttpod ... idle, tx 9360 rx 0 # cold path +$ tailscale ping tttpod +pong from tttpod (100.95.137.38) via 5.161.118.84:41641 in 48ms # warms instantly +``` + +### 3. Host-key verification failed → box was rebuilt + +The node was reinstalled, so it presents a **new** SSH host key. Your `known_hosts` +still has the old one, so even `StrictHostKeyChecking=accept-new` aborts — `accept-new` +only adds *genuinely new* hosts, it refuses a **mismatch**: + +``` +$ ssh root@tttpod hostname +Host key verification failed. +``` + +## The Fix + +Three changes, applied on every **name-capable** machine (see the WSL2 caveat below): + +### a. Switch aliases from IPs to MagicDNS names + +```bash +# before — rots on every migration +alias tttpod='ssh root@100.84.42.102' +# after — always resolves the node's current IP +alias tttpod='ssh root@tttpod' +``` + +MagicDNS resolves the name to whatever IP the node currently has, so a future +migration needs **zero** alias edits. This is the whole point: the tailnet already +knows the mapping — stop duplicating (and stale-ing) it in your dotfiles. + +> **Exception:** if there's no tailnet device with that exact name (e.g. an alias +> `teelia` pointing at a node actually named `temptedparadise`), MagicDNS can't +> resolve it — keep the IP for that one. + +### b. Purge stale host keys, then re-accept + +After a rebuild, clear the old entries under **both** the name and the current IP, +then reconnect with `accept-new` to record the fresh key. Over Tailscale's +authenticated WireGuard tunnel, a key change from a known rebuild is safe to accept. + +```bash +for pair in "tttpod:100.95.137.38" "majortoot:100.64.169.62" "dcaprod:100.98.223.93"; do + n="${pair%%:*}"; ip="${pair##*:}" + ssh-keygen -R "$n"; ssh-keygen -R "$ip" +done +# repopulate +ssh -o StrictHostKeyChecking=accept-new root@tttpod hostname +``` + +### c. Add a cold-path cushion to `~/.ssh/config` + +Give the first (cold) connection time to renegotiate instead of erroring: + +```sshconfig +Host majorlinux tttpod majortoot majordiscord dcaprod majormail majorhome + ConnectTimeout 25 + ServerAliveInterval 30 + ServerAliveCountMax 4 +``` + +`ConnectTimeout 25` turns the cold-path timeout into a ~1–2 s pause. The keepalives +hold the path open during an active session so it doesn't drop mid-command. + +## Caveat: WSL2 Can't Use MagicDNS + +A Linux box under **WSL2** typically has **no `tailscale` CLI and no MagicDNS +resolver** — it rides the Windows host's networking, and name lookups for tailnet +nodes fail: + +``` +$ getent hosts tttpod # (inside WSL2) + # nothing — no resolution +$ command -v tailscale # nothing — CLI lives on the Windows side +``` + +On those machines you **must** keep hardcoded IPs in `~/.ssh/config` (or use `Host` +blocks with explicit `HostName `), and refresh them by hand when a node migrates. +There's no self-healing option there — the trade is unavoidable. + +## Diagnosis Checklist + +1. `tailscale status | grep ` — does your alias's IP match the **current** one? + (Mode 1: stale IP.) +2. `ping`/`tailscale ping ` works but TCP/22 times out on first try, succeeds on + retry? (Mode 2: cold path.) +3. `ssh root@ true` → `Host key verification failed` (not `Permission denied`)? + (Mode 3: rebuilt box, stale `known_hosts`.) +4. Is the client a WSL2 box? `getent hosts ` returns nothing → MagicDNS + unavailable, stay on IPs. + +## Takeaway + +Pin the IP when a host is **stable** and the IP-keyed `known_hosts` entry is your +durable trust anchor. Switch to **MagicDNS names** when hosts **move** — migrations, +rebuilds, provider changes — so the tailnet's own name→IP mapping does the work your +dotfiles kept getting wrong. And on WSL2, you don't get the choice: hardcoded IPs, +refreshed by hand. diff --git a/SUMMARY.md b/SUMMARY.md index 8de7b0d..ee24d14 100644 --- a/SUMMARY.md +++ b/SUMMARY.md @@ -134,5 +134,6 @@ updated: 2026-05-15T09:00 * [Ansible: Check Mode False Positives in Verify/Assert Tasks](05-troubleshooting/ansible-check-mode-false-positives.md) * [Ansible Fails with Permission Denied While `ssh ` Works (Host Alias Bypass)](05-troubleshooting/ansible-ssh-host-alias-bypass.md) * [SSH Alias Falls Through to MagicDNS — Host-Key Verification Failure (No `Host` Block)](05-troubleshooting/networking/ssh-missing-host-block-magicdns-host-key-failure.md) + * [MagicDNS Names vs Pinned IPs for Tailscale SSH (After a Fleet Migration)](05-troubleshooting/networking/tailscale-ssh-magicdns-vs-pinned-ip-after-migration.md) * [Ghost EmailAnalytics Lag Warning — What It Means and When to Worry](05-troubleshooting/ghost-emailanalytics-lag-warning.md) * [claude-mem: --setting-sources Empty Arg Bug (Claude Code 2.1.x)](05-troubleshooting/claude-mem-setting-sources-empty-arg.md)