--- title: "Ansible UNREACHABLE: Host Key Verification Failed After a Host Rebuild or Migration" domain: troubleshooting category: networking tags: [ansible, ssh, known-hosts, tailscale, host-key, migration] status: published created: 2026-06-12 updated: 2026-06-12 --- # Ansible UNREACHABLE: Host Key Verification Failed After a Host Rebuild or Migration ## Symptom A subset of hosts in an Ansible run fail at **Gathering Facts** while the rest succeed: ``` [ERROR]: Task failed: Data could not be sent to remote host "100.112.127.0". Make sure this host can be reached over ssh: Host key verification failed. fatal: [majormail]: UNREACHABLE! => {"unreachable": true, ...} ``` The failing hosts are exactly the ones that were recently **rebuilt or migrated** (new server, new OS install, or a cloud move that issued a new Tailscale IP). Hosts that were never rebuilt connect fine. Confusingly, **interactive `ssh root@` works perfectly** for the same boxes — only Ansible fails. ## Cause SSH stores each accepted host key in `~/.ssh/known_hosts` keyed by the **exact address you connected with**. A key accepted for `ssh root@tttpod` is saved under the hostname `tttpod`; it is *not* indexed under that node's IP. Ansible inventories almost always set `ansible_host` to a **literal IP** (here, the Tailscale `100.x.x.x` address). So Ansible's SSH lookup is by IP, finds no matching entry, and with `StrictHostKeyChecking=yes` (or `accept-new` already exhausted) it refuses the connection: ``` No ED25519 host key is known for 100.112.127.0 and you have requested strict checking. Host key verification failed. ``` The hostname-form and IP-form entries are independent. Fixing interactive SSH (e.g. converting aliases to MagicDNS names and re-accepting keys) does **nothing** for Ansible, because Ansible never uses the hostname. A rebuilt host also generates **brand-new host keys**, so any old IP-form entry would additionally be a mismatch — but the common case after a migration to a *new* IP is simply that no IP entry exists at all. ## Diagnosis ```bash # 1. Is there any known_hosts entry for the failing IP? (0 = none) ssh-keygen -F 100.112.127.0 # 2. Reproduce the exact failure without an interactive prompt: ssh -o BatchMode=yes -o StrictHostKeyChecking=yes root@100.112.127.0 true # -> "Host key verification failed." confirms the gap # 3. Confirm the inventory IP is actually the host's CURRENT address # (guards against stale-IP drift, a separate problem): tailscale status | grep majormail ssh-keyscan -t ed25519 100.112.127.0 | ssh-keygen -lf - # fingerprint it ``` If step 3 shows the inventory IP matches the live Tailscale node and the box answers `ssh-keyscan`, the only problem is the missing IP-form key. ## Fix Add the **IP-form** host keys to the `known_hosts` of the user that runs Ansible. Back up first, scan over the tailnet, de-dup: ```bash cp ~/.ssh/known_hosts ~/.ssh/known_hosts.bak.$(date +%Y%m%d) for ip in 100.98.223.93 100.112.127.0 100.73.85.46 100.95.137.38 100.76.51.16 100.64.169.62; do ssh-keyscan -T 5 -t rsa,ecdsa,ed25519 "$ip" >> ~/.ssh/known_hosts done sort -u ~/.ssh/known_hosts -o ~/.ssh/known_hosts ``` Verify before re-running the playbook: ```bash ansible -m ping # expect "pong" from each ``` ### Why `ssh-keyscan` is safe here `ssh-keyscan` trusts whatever answers on the wire — normally a MITM risk. Over **Tailscale**, the connection rides WireGuard, which cryptographically authenticates the peer by its tailnet identity: reaching `100.x.x.x` *guarantees* you are talking to the node that owns that tailnet address. Scanning and trusting the key over the tailnet is therefore as trustworthy as the tailnet itself. Always cross-check the IP against `tailscale status` first (step 3) so you scan the right node. ## Prevention - **Per-workstation, not fleet-wide.** `known_hosts` is local to each machine + user. After a migration, *every* host that runs Ansible (each workstation, plus any control node like `majorlab`) needs the IP keys added independently. Adding them on one Mac does not help the others. - **Sweep on every migration phase.** A rolling migration changes one node's IP at a time; fold the keyscan above into the post-cutover checklist so Ansible never breaks mid-rollout. - **Alternative — `accept-new`.** Setting `host_key_checking = False` in `ansible.cfg` (or `ANSIBLE_HOST_KEY_CHECKING=False`) sidesteps the prompt but trades away host-key verification entirely. Prefer the explicit keyscan: it keeps strict checking on for every *future* run while accepting the new key exactly once, under your control. ## Related - SSH-Aliases — Fleet SSH access; the MagicDNS-vs-pinned-IP strategy and the Ansible-by-IP `known_hosts` note - Network Overview — Tailscale fleet inventory and current IPs - Hetzner-Migration-Status — the migration that triggered the fleet-wide IP churn - [[ssh-socket-tailscale-race-condition]] — a different "SSH unreachable after reboot" failure mode