wiki: add MagicDNS-names-vs-pinned-IPs Tailscale SSH article
New troubleshooting/networking article covering the three SSH failure modes after a fleet migration (stale hardcoded IP, Tailscale 1.98.x cold-path teardown, rebuilt-box host-key mismatch) and the durable fix (MagicDNS names + known_hosts purge + ConnectTimeout), with the WSL2 no-resolver caveat. Cross-links the existing host-key article (adds a 'when pinning the IP is wrong' callout) and adds the SUMMARY nav entry.
This commit is contained in:
parent
877c4b815f
commit
950759da52
3 changed files with 171 additions and 1 deletions
|
|
@ -12,7 +12,7 @@ tags:
|
||||||
- troubleshooting
|
- troubleshooting
|
||||||
status: published
|
status: published
|
||||||
created: 2026-06-11
|
created: 2026-06-11
|
||||||
updated: 2026-06-11
|
updated: 2026-06-12
|
||||||
---
|
---
|
||||||
|
|
||||||
# SSH Alias Falls Through to MagicDNS — Host-Key Verification Failure (No `Host` Block)
|
# SSH Alias Falls Through to MagicDNS — Host-Key Verification Failure (No `Host` Block)
|
||||||
|
|
@ -82,6 +82,12 @@ Host MyMac mymac
|
||||||
IdentityFile ~/.ssh/id_ed25519
|
IdentityFile ~/.ssh/id_ed25519
|
||||||
```
|
```
|
||||||
|
|
||||||
|
> [!note] When pinning the IP is the *wrong* call
|
||||||
|
> Pinning the IP is right while the host is **stable**. If the box gets migrated or
|
||||||
|
> rebuilt — new Tailscale IP *and* new host key — the pin rots and `known_hosts`
|
||||||
|
> mismatches. At that point switch to **MagicDNS names** so the alias self-heals. See
|
||||||
|
> *[MagicDNS Names vs Pinned IPs for Tailscale SSH (After a Fleet Migration)](tailscale-ssh-magicdns-vs-pinned-ip-after-migration.md)*.
|
||||||
|
|
||||||
Now `ssh MyMac` resolves to `100.74.124.81`, whose key is in `known_hosts`, and the
|
Now `ssh MyMac` resolves to `100.74.124.81`, whose key is in `known_hosts`, and the
|
||||||
check passes with no prompt. Verify non-interactively:
|
check passes with no prompt. Verify non-interactively:
|
||||||
|
|
||||||
|
|
|
||||||
|
|
@ -0,0 +1,163 @@
|
||||||
|
---
|
||||||
|
title: "MagicDNS Names vs Pinned IPs for Tailscale SSH (After a Fleet Migration)"
|
||||||
|
domain: troubleshooting
|
||||||
|
category: networking
|
||||||
|
tags:
|
||||||
|
- ssh
|
||||||
|
- ssh-config
|
||||||
|
- tailscale
|
||||||
|
- magicdns
|
||||||
|
- known-hosts
|
||||||
|
- host-key
|
||||||
|
- migration
|
||||||
|
- wsl2
|
||||||
|
status: published
|
||||||
|
created: 2026-06-12
|
||||||
|
updated: 2026-06-12
|
||||||
|
---
|
||||||
|
|
||||||
|
# MagicDNS Names vs Pinned IPs for Tailscale SSH (After a Fleet Migration)
|
||||||
|
|
||||||
|
You have SSH aliases for a Tailscale fleet (`alias tttpod='ssh root@100.84.42.102'`).
|
||||||
|
They worked for months. Then you migrate or rebuild some nodes — and now a third of
|
||||||
|
them hang on connect or refuse the host key. This is the failure mode that hardcoded
|
||||||
|
addresses hit, and why the durable answer is **MagicDNS names**, not pinned IPs.
|
||||||
|
|
||||||
|
> This is the sequel to *[SSH Alias Falls Through to MagicDNS — Host-Key Verification
|
||||||
|
> Failure (No `Host` Block)](ssh-missing-host-block-magicdns-host-key-failure.md)*.
|
||||||
|
> That article says **pin the IP** `known_hosts` already trusts — correct when the
|
||||||
|
> node is stable. This one covers what happens when a migration changes the IP *and*
|
||||||
|
> the host key, which is exactly when IP-pinning stops paying off.
|
||||||
|
|
||||||
|
## The Three Failure Modes
|
||||||
|
|
||||||
|
A migration/rebuild can trigger any of these — often several at once across a fleet,
|
||||||
|
which is what makes it confusing:
|
||||||
|
|
||||||
|
### 1. Stale hardcoded IP → connection times out
|
||||||
|
|
||||||
|
The node re-registered on the tailnet with a **new** Tailscale IP, but your alias
|
||||||
|
still names the old one:
|
||||||
|
|
||||||
|
```
|
||||||
|
$ tttpod
|
||||||
|
ssh: connect to host 100.84.42.102 port 22: Operation timed out
|
||||||
|
```
|
||||||
|
|
||||||
|
The old address is dead; SSH waits the full timeout and gives up. Confirm by asking
|
||||||
|
the tailnet for the node's *current* IP by name:
|
||||||
|
|
||||||
|
```
|
||||||
|
$ tailscale status | grep tttpod
|
||||||
|
100.95.137.38 tttpod ... # alias points at 100.84.42.102 — stale
|
||||||
|
```
|
||||||
|
|
||||||
|
### 2. Cold-path teardown → first connect after idle times out
|
||||||
|
|
||||||
|
The IP is correct and the node is up (it answers `ping`), but TCP/22 still times out
|
||||||
|
on the *first* try after a quiet period, then works on retry. Tailscale 1.98.x is more
|
||||||
|
aggressive about tearing down **idle direct UDP paths**; the first SSH has to
|
||||||
|
re-establish NAT traversal, which can overrun SSH's default connect timeout.
|
||||||
|
|
||||||
|
```
|
||||||
|
$ tailscale status | grep tttpod
|
||||||
|
100.95.137.38 tttpod ... idle, tx 9360 rx 0 # cold path
|
||||||
|
$ tailscale ping tttpod
|
||||||
|
pong from tttpod (100.95.137.38) via 5.161.118.84:41641 in 48ms # warms instantly
|
||||||
|
```
|
||||||
|
|
||||||
|
### 3. Host-key verification failed → box was rebuilt
|
||||||
|
|
||||||
|
The node was reinstalled, so it presents a **new** SSH host key. Your `known_hosts`
|
||||||
|
still has the old one, so even `StrictHostKeyChecking=accept-new` aborts — `accept-new`
|
||||||
|
only adds *genuinely new* hosts, it refuses a **mismatch**:
|
||||||
|
|
||||||
|
```
|
||||||
|
$ ssh root@tttpod hostname
|
||||||
|
Host key verification failed.
|
||||||
|
```
|
||||||
|
|
||||||
|
## The Fix
|
||||||
|
|
||||||
|
Three changes, applied on every **name-capable** machine (see the WSL2 caveat below):
|
||||||
|
|
||||||
|
### a. Switch aliases from IPs to MagicDNS names
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# before — rots on every migration
|
||||||
|
alias tttpod='ssh root@100.84.42.102'
|
||||||
|
# after — always resolves the node's current IP
|
||||||
|
alias tttpod='ssh root@tttpod'
|
||||||
|
```
|
||||||
|
|
||||||
|
MagicDNS resolves the name to whatever IP the node currently has, so a future
|
||||||
|
migration needs **zero** alias edits. This is the whole point: the tailnet already
|
||||||
|
knows the mapping — stop duplicating (and stale-ing) it in your dotfiles.
|
||||||
|
|
||||||
|
> **Exception:** if there's no tailnet device with that exact name (e.g. an alias
|
||||||
|
> `teelia` pointing at a node actually named `temptedparadise`), MagicDNS can't
|
||||||
|
> resolve it — keep the IP for that one.
|
||||||
|
|
||||||
|
### b. Purge stale host keys, then re-accept
|
||||||
|
|
||||||
|
After a rebuild, clear the old entries under **both** the name and the current IP,
|
||||||
|
then reconnect with `accept-new` to record the fresh key. Over Tailscale's
|
||||||
|
authenticated WireGuard tunnel, a key change from a known rebuild is safe to accept.
|
||||||
|
|
||||||
|
```bash
|
||||||
|
for pair in "tttpod:100.95.137.38" "majortoot:100.64.169.62" "dcaprod:100.98.223.93"; do
|
||||||
|
n="${pair%%:*}"; ip="${pair##*:}"
|
||||||
|
ssh-keygen -R "$n"; ssh-keygen -R "$ip"
|
||||||
|
done
|
||||||
|
# repopulate
|
||||||
|
ssh -o StrictHostKeyChecking=accept-new root@tttpod hostname
|
||||||
|
```
|
||||||
|
|
||||||
|
### c. Add a cold-path cushion to `~/.ssh/config`
|
||||||
|
|
||||||
|
Give the first (cold) connection time to renegotiate instead of erroring:
|
||||||
|
|
||||||
|
```sshconfig
|
||||||
|
Host majorlinux tttpod majortoot majordiscord dcaprod majormail majorhome
|
||||||
|
ConnectTimeout 25
|
||||||
|
ServerAliveInterval 30
|
||||||
|
ServerAliveCountMax 4
|
||||||
|
```
|
||||||
|
|
||||||
|
`ConnectTimeout 25` turns the cold-path timeout into a ~1–2 s pause. The keepalives
|
||||||
|
hold the path open during an active session so it doesn't drop mid-command.
|
||||||
|
|
||||||
|
## Caveat: WSL2 Can't Use MagicDNS
|
||||||
|
|
||||||
|
A Linux box under **WSL2** typically has **no `tailscale` CLI and no MagicDNS
|
||||||
|
resolver** — it rides the Windows host's networking, and name lookups for tailnet
|
||||||
|
nodes fail:
|
||||||
|
|
||||||
|
```
|
||||||
|
$ getent hosts tttpod # (inside WSL2)
|
||||||
|
# nothing — no resolution
|
||||||
|
$ command -v tailscale # nothing — CLI lives on the Windows side
|
||||||
|
```
|
||||||
|
|
||||||
|
On those machines you **must** keep hardcoded IPs in `~/.ssh/config` (or use `Host`
|
||||||
|
blocks with explicit `HostName <ip>`), and refresh them by hand when a node migrates.
|
||||||
|
There's no self-healing option there — the trade is unavoidable.
|
||||||
|
|
||||||
|
## Diagnosis Checklist
|
||||||
|
|
||||||
|
1. `tailscale status | grep <host>` — does your alias's IP match the **current** one?
|
||||||
|
(Mode 1: stale IP.)
|
||||||
|
2. `ping`/`tailscale ping <host>` works but TCP/22 times out on first try, succeeds on
|
||||||
|
retry? (Mode 2: cold path.)
|
||||||
|
3. `ssh root@<host> true` → `Host key verification failed` (not `Permission denied`)?
|
||||||
|
(Mode 3: rebuilt box, stale `known_hosts`.)
|
||||||
|
4. Is the client a WSL2 box? `getent hosts <name>` returns nothing → MagicDNS
|
||||||
|
unavailable, stay on IPs.
|
||||||
|
|
||||||
|
## Takeaway
|
||||||
|
|
||||||
|
Pin the IP when a host is **stable** and the IP-keyed `known_hosts` entry is your
|
||||||
|
durable trust anchor. Switch to **MagicDNS names** when hosts **move** — migrations,
|
||||||
|
rebuilds, provider changes — so the tailnet's own name→IP mapping does the work your
|
||||||
|
dotfiles kept getting wrong. And on WSL2, you don't get the choice: hardcoded IPs,
|
||||||
|
refreshed by hand.
|
||||||
|
|
@ -134,5 +134,6 @@ updated: 2026-05-15T09:00
|
||||||
* [Ansible: Check Mode False Positives in Verify/Assert Tasks](05-troubleshooting/ansible-check-mode-false-positives.md)
|
* [Ansible: Check Mode False Positives in Verify/Assert Tasks](05-troubleshooting/ansible-check-mode-false-positives.md)
|
||||||
* [Ansible Fails with Permission Denied While `ssh <alias>` Works (Host Alias Bypass)](05-troubleshooting/ansible-ssh-host-alias-bypass.md)
|
* [Ansible Fails with Permission Denied While `ssh <alias>` Works (Host Alias Bypass)](05-troubleshooting/ansible-ssh-host-alias-bypass.md)
|
||||||
* [SSH Alias Falls Through to MagicDNS — Host-Key Verification Failure (No `Host` Block)](05-troubleshooting/networking/ssh-missing-host-block-magicdns-host-key-failure.md)
|
* [SSH Alias Falls Through to MagicDNS — Host-Key Verification Failure (No `Host` Block)](05-troubleshooting/networking/ssh-missing-host-block-magicdns-host-key-failure.md)
|
||||||
|
* [MagicDNS Names vs Pinned IPs for Tailscale SSH (After a Fleet Migration)](05-troubleshooting/networking/tailscale-ssh-magicdns-vs-pinned-ip-after-migration.md)
|
||||||
* [Ghost EmailAnalytics Lag Warning — What It Means and When to Worry](05-troubleshooting/ghost-emailanalytics-lag-warning.md)
|
* [Ghost EmailAnalytics Lag Warning — What It Means and When to Worry](05-troubleshooting/ghost-emailanalytics-lag-warning.md)
|
||||||
* [claude-mem: --setting-sources Empty Arg Bug (Claude Code 2.1.x)](05-troubleshooting/claude-mem-setting-sources-empty-arg.md)
|
* [claude-mem: --setting-sources Empty Arg Bug (Claude Code 2.1.x)](05-troubleshooting/claude-mem-setting-sources-empty-arg.md)
|
||||||
|
|
|
||||||
Loading…
Add table
Reference in a new issue