New troubleshooting/networking article covering the three SSH failure modes after a fleet migration (stale hardcoded IP, Tailscale 1.98.x cold-path teardown, rebuilt-box host-key mismatch) and the durable fix (MagicDNS names + known_hosts purge + ConnectTimeout), with the WSL2 no-resolver caveat. Cross-links the existing host-key article (adds a 'when pinning the IP is wrong' callout) and adds the SUMMARY nav entry.
5.9 KiB
| title | domain | category | tags | status | created | updated | ||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| MagicDNS Names vs Pinned IPs for Tailscale SSH (After a Fleet Migration) | troubleshooting | networking |
|
published | 2026-06-12 | 2026-06-12 |
MagicDNS Names vs Pinned IPs for Tailscale SSH (After a Fleet Migration)
You have SSH aliases for a Tailscale fleet (alias tttpod='ssh root@100.84.42.102').
They worked for months. Then you migrate or rebuild some nodes — and now a third of
them hang on connect or refuse the host key. This is the failure mode that hardcoded
addresses hit, and why the durable answer is MagicDNS names, not pinned IPs.
This is the sequel to SSH Alias Falls Through to MagicDNS — Host-Key Verification Failure (No
HostBlock). That article says pin the IPknown_hostsalready trusts — correct when the node is stable. This one covers what happens when a migration changes the IP and the host key, which is exactly when IP-pinning stops paying off.
The Three Failure Modes
A migration/rebuild can trigger any of these — often several at once across a fleet, which is what makes it confusing:
1. Stale hardcoded IP → connection times out
The node re-registered on the tailnet with a new Tailscale IP, but your alias still names the old one:
$ tttpod
ssh: connect to host 100.84.42.102 port 22: Operation timed out
The old address is dead; SSH waits the full timeout and gives up. Confirm by asking the tailnet for the node's current IP by name:
$ tailscale status | grep tttpod
100.95.137.38 tttpod ... # alias points at 100.84.42.102 — stale
2. Cold-path teardown → first connect after idle times out
The IP is correct and the node is up (it answers ping), but TCP/22 still times out
on the first try after a quiet period, then works on retry. Tailscale 1.98.x is more
aggressive about tearing down idle direct UDP paths; the first SSH has to
re-establish NAT traversal, which can overrun SSH's default connect timeout.
$ tailscale status | grep tttpod
100.95.137.38 tttpod ... idle, tx 9360 rx 0 # cold path
$ tailscale ping tttpod
pong from tttpod (100.95.137.38) via 5.161.118.84:41641 in 48ms # warms instantly
3. Host-key verification failed → box was rebuilt
The node was reinstalled, so it presents a new SSH host key. Your known_hosts
still has the old one, so even StrictHostKeyChecking=accept-new aborts — accept-new
only adds genuinely new hosts, it refuses a mismatch:
$ ssh root@tttpod hostname
Host key verification failed.
The Fix
Three changes, applied on every name-capable machine (see the WSL2 caveat below):
a. Switch aliases from IPs to MagicDNS names
# before — rots on every migration
alias tttpod='ssh root@100.84.42.102'
# after — always resolves the node's current IP
alias tttpod='ssh root@tttpod'
MagicDNS resolves the name to whatever IP the node currently has, so a future migration needs zero alias edits. This is the whole point: the tailnet already knows the mapping — stop duplicating (and stale-ing) it in your dotfiles.
Exception: if there's no tailnet device with that exact name (e.g. an alias
teeliapointing at a node actually namedtemptedparadise), MagicDNS can't resolve it — keep the IP for that one.
b. Purge stale host keys, then re-accept
After a rebuild, clear the old entries under both the name and the current IP,
then reconnect with accept-new to record the fresh key. Over Tailscale's
authenticated WireGuard tunnel, a key change from a known rebuild is safe to accept.
for pair in "tttpod:100.95.137.38" "majortoot:100.64.169.62" "dcaprod:100.98.223.93"; do
n="${pair%%:*}"; ip="${pair##*:}"
ssh-keygen -R "$n"; ssh-keygen -R "$ip"
done
# repopulate
ssh -o StrictHostKeyChecking=accept-new root@tttpod hostname
c. Add a cold-path cushion to ~/.ssh/config
Give the first (cold) connection time to renegotiate instead of erroring:
Host majorlinux tttpod majortoot majordiscord dcaprod majormail majorhome
ConnectTimeout 25
ServerAliveInterval 30
ServerAliveCountMax 4
ConnectTimeout 25 turns the cold-path timeout into a ~1–2 s pause. The keepalives
hold the path open during an active session so it doesn't drop mid-command.
Caveat: WSL2 Can't Use MagicDNS
A Linux box under WSL2 typically has no tailscale CLI and no MagicDNS
resolver — it rides the Windows host's networking, and name lookups for tailnet
nodes fail:
$ getent hosts tttpod # (inside WSL2)
# nothing — no resolution
$ command -v tailscale # nothing — CLI lives on the Windows side
On those machines you must keep hardcoded IPs in ~/.ssh/config (or use Host
blocks with explicit HostName <ip>), and refresh them by hand when a node migrates.
There's no self-healing option there — the trade is unavoidable.
Diagnosis Checklist
tailscale status | grep <host>— does your alias's IP match the current one? (Mode 1: stale IP.)ping/tailscale ping <host>works but TCP/22 times out on first try, succeeds on retry? (Mode 2: cold path.)ssh root@<host> true→Host key verification failed(notPermission denied)? (Mode 3: rebuilt box, staleknown_hosts.)- Is the client a WSL2 box?
getent hosts <name>returns nothing → MagicDNS unavailable, stay on IPs.
Takeaway
Pin the IP when a host is stable and the IP-keyed known_hosts entry is your
durable trust anchor. Switch to MagicDNS names when hosts move — migrations,
rebuilds, provider changes — so the tailnet's own name→IP mapping does the work your
dotfiles kept getting wrong. And on WSL2, you don't get the choice: hardcoded IPs,
refreshed by hand.