majorwiki/05-troubleshooting/networking/tailscale-ssh-magicdns-vs-pinned-ip-after-migration.md
majorlinux 950759da52 wiki: add MagicDNS-names-vs-pinned-IPs Tailscale SSH article
New troubleshooting/networking article covering the three SSH failure modes
after a fleet migration (stale hardcoded IP, Tailscale 1.98.x cold-path
teardown, rebuilt-box host-key mismatch) and the durable fix (MagicDNS names +
known_hosts purge + ConnectTimeout), with the WSL2 no-resolver caveat.
Cross-links the existing host-key article (adds a 'when pinning the IP is
wrong' callout) and adds the SUMMARY nav entry.
2026-06-12 01:33:31 -04:00

163 lines
5.9 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

---
title: "MagicDNS Names vs Pinned IPs for Tailscale SSH (After a Fleet Migration)"
domain: troubleshooting
category: networking
tags:
- ssh
- ssh-config
- tailscale
- magicdns
- known-hosts
- host-key
- migration
- wsl2
status: published
created: 2026-06-12
updated: 2026-06-12
---
# MagicDNS Names vs Pinned IPs for Tailscale SSH (After a Fleet Migration)
You have SSH aliases for a Tailscale fleet (`alias tttpod='ssh root@100.84.42.102'`).
They worked for months. Then you migrate or rebuild some nodes — and now a third of
them hang on connect or refuse the host key. This is the failure mode that hardcoded
addresses hit, and why the durable answer is **MagicDNS names**, not pinned IPs.
> This is the sequel to *[SSH Alias Falls Through to MagicDNS — Host-Key Verification
> Failure (No `Host` Block)](ssh-missing-host-block-magicdns-host-key-failure.md)*.
> That article says **pin the IP** `known_hosts` already trusts — correct when the
> node is stable. This one covers what happens when a migration changes the IP *and*
> the host key, which is exactly when IP-pinning stops paying off.
## The Three Failure Modes
A migration/rebuild can trigger any of these — often several at once across a fleet,
which is what makes it confusing:
### 1. Stale hardcoded IP → connection times out
The node re-registered on the tailnet with a **new** Tailscale IP, but your alias
still names the old one:
```
$ tttpod
ssh: connect to host 100.84.42.102 port 22: Operation timed out
```
The old address is dead; SSH waits the full timeout and gives up. Confirm by asking
the tailnet for the node's *current* IP by name:
```
$ tailscale status | grep tttpod
100.95.137.38 tttpod ... # alias points at 100.84.42.102 — stale
```
### 2. Cold-path teardown → first connect after idle times out
The IP is correct and the node is up (it answers `ping`), but TCP/22 still times out
on the *first* try after a quiet period, then works on retry. Tailscale 1.98.x is more
aggressive about tearing down **idle direct UDP paths**; the first SSH has to
re-establish NAT traversal, which can overrun SSH's default connect timeout.
```
$ tailscale status | grep tttpod
100.95.137.38 tttpod ... idle, tx 9360 rx 0 # cold path
$ tailscale ping tttpod
pong from tttpod (100.95.137.38) via 5.161.118.84:41641 in 48ms # warms instantly
```
### 3. Host-key verification failed → box was rebuilt
The node was reinstalled, so it presents a **new** SSH host key. Your `known_hosts`
still has the old one, so even `StrictHostKeyChecking=accept-new` aborts — `accept-new`
only adds *genuinely new* hosts, it refuses a **mismatch**:
```
$ ssh root@tttpod hostname
Host key verification failed.
```
## The Fix
Three changes, applied on every **name-capable** machine (see the WSL2 caveat below):
### a. Switch aliases from IPs to MagicDNS names
```bash
# before — rots on every migration
alias tttpod='ssh root@100.84.42.102'
# after — always resolves the node's current IP
alias tttpod='ssh root@tttpod'
```
MagicDNS resolves the name to whatever IP the node currently has, so a future
migration needs **zero** alias edits. This is the whole point: the tailnet already
knows the mapping — stop duplicating (and stale-ing) it in your dotfiles.
> **Exception:** if there's no tailnet device with that exact name (e.g. an alias
> `teelia` pointing at a node actually named `temptedparadise`), MagicDNS can't
> resolve it — keep the IP for that one.
### b. Purge stale host keys, then re-accept
After a rebuild, clear the old entries under **both** the name and the current IP,
then reconnect with `accept-new` to record the fresh key. Over Tailscale's
authenticated WireGuard tunnel, a key change from a known rebuild is safe to accept.
```bash
for pair in "tttpod:100.95.137.38" "majortoot:100.64.169.62" "dcaprod:100.98.223.93"; do
n="${pair%%:*}"; ip="${pair##*:}"
ssh-keygen -R "$n"; ssh-keygen -R "$ip"
done
# repopulate
ssh -o StrictHostKeyChecking=accept-new root@tttpod hostname
```
### c. Add a cold-path cushion to `~/.ssh/config`
Give the first (cold) connection time to renegotiate instead of erroring:
```sshconfig
Host majorlinux tttpod majortoot majordiscord dcaprod majormail majorhome
ConnectTimeout 25
ServerAliveInterval 30
ServerAliveCountMax 4
```
`ConnectTimeout 25` turns the cold-path timeout into a ~12 s pause. The keepalives
hold the path open during an active session so it doesn't drop mid-command.
## Caveat: WSL2 Can't Use MagicDNS
A Linux box under **WSL2** typically has **no `tailscale` CLI and no MagicDNS
resolver** — it rides the Windows host's networking, and name lookups for tailnet
nodes fail:
```
$ getent hosts tttpod # (inside WSL2)
# nothing — no resolution
$ command -v tailscale # nothing — CLI lives on the Windows side
```
On those machines you **must** keep hardcoded IPs in `~/.ssh/config` (or use `Host`
blocks with explicit `HostName <ip>`), and refresh them by hand when a node migrates.
There's no self-healing option there — the trade is unavoidable.
## Diagnosis Checklist
1. `tailscale status | grep <host>` — does your alias's IP match the **current** one?
(Mode 1: stale IP.)
2. `ping`/`tailscale ping <host>` works but TCP/22 times out on first try, succeeds on
retry? (Mode 2: cold path.)
3. `ssh root@<host> true``Host key verification failed` (not `Permission denied`)?
(Mode 3: rebuilt box, stale `known_hosts`.)
4. Is the client a WSL2 box? `getent hosts <name>` returns nothing → MagicDNS
unavailable, stay on IPs.
## Takeaway
Pin the IP when a host is **stable** and the IP-keyed `known_hosts` entry is your
durable trust anchor. Switch to **MagicDNS names** when hosts **move** — migrations,
rebuilds, provider changes — so the tailnet's own name→IP mapping does the work your
dotfiles kept getting wrong. And on WSL2, you don't get the choice: hardcoded IPs,
refreshed by hand.