diff --git a/05-troubleshooting/networking/ssh-rotated-key-not-backfilled-authorized-keys.md b/05-troubleshooting/networking/ssh-rotated-key-not-backfilled-authorized-keys.md new file mode 100644 index 0000000..5985ac4 --- /dev/null +++ b/05-troubleshooting/networking/ssh-rotated-key-not-backfilled-authorized-keys.md @@ -0,0 +1,160 @@ +--- +title: "SSH `Permission denied (publickey)` After Rotating a Key — Backfill Every `authorized_keys`" +domain: selfhosting +category: troubleshooting +tags: + - ssh + - ssh-keys + - authorized-keys + - key-rotation + - publickey + - fleet + - troubleshooting +status: published +created: 2026-06-17 +updated: 2026-06-17 +--- + +# SSH `Permission denied (publickey)` After Rotating a Key — Backfill Every `authorized_keys` + +## The Problem + +A host you've SSH'd into for months suddenly rejects you — but **only some hosts**, not all: + +``` +$ ssh root@host-a +root@host-a: Permission denied (publickey). + +$ ssh root@host-b # same key, same workstation — works fine +host-b $ +``` + +Nothing changed on the servers. The thing that changed is on **your** side: at some +point the workstation's SSH key was **regenerated** (lost laptop, rebuild, a key file +clobbered by a botched copy, a routine rotation). The new public key was pushed to a +few hosts but never fanned out to the rest. Every host still holding only the *old* +public key now rejects the new private key with `Permission denied (publickey)`. + +> The tell: it's `Permission denied (publickey)`, **not** `Host key verification +> failed`. The former is an **authorization** failure (the server doesn't trust your +> key); the latter is the server's key not matching your `known_hosts`. Different +> problem — see *[SSH Alias Falls Through to MagicDNS — Host-Key Verification Failure](ssh-missing-host-block-magicdns-host-key-failure.md)*. + +## Why It Happens + +Public-key auth is **per-host**: the server only lets you in if your public key is a +line in that host's `~/.ssh/authorized_keys`. There is no central directory — each +host is its own island. So when you rotate a key, *every* host needs the new public +key appended independently. + +It's easy to do this partially without noticing. You regenerate the key, then over the +next hour you happen to SSH into three boxes and (re-)deploy the key there as part of +other work. Those three now trust the new key. The other six don't — and you won't +find out until weeks later when you reach for one of them. + +Confirm it's an authorization (key) failure and see which key is being offered: + +``` +$ ssh -v root@host-a 2>&1 | grep -E 'Offering|Authentications|Permission denied' +debug1: Offering public key: /home/you/.ssh/id_ed25519 ED25519 SHA256:XeY1/N9qwB… +debug1: Authentications that can continue: publickey +root@host-a: Permission denied (publickey). +``` + +The server offered you nothing but `publickey`, you offered your current key, and it +was refused → your key isn't in that host's `authorized_keys`. + +## Scope It First — Don't Fix One Host at a Time + +The host you noticed is rarely the only one. Sweep the whole fleet in one pass before +touching anything, so you fix the real set, not just the squeaky wheel: + +```bash +for h in host-a host-b host-c host-d host-e host-f; do + r=$(ssh -o BatchMode=yes -o ConnectTimeout=8 root@"$h" 'echo OK' 2>&1 | tail -1) + echo "$h: $r" +done +``` + +`BatchMode=yes` suppresses password/passphrase prompts so a failure fails fast instead +of hanging. Anything that doesn't print `OK` needs the backfill. + +## The Fix + +You need a **second, still-trusted** way onto each failing host to append the new key. +Common transit options, best first: + +- **Another of your keys that still works** (e.g. a config-management / automation + user whose key is authorized fleet-wide, ideally with `sudo`). +- **Another workstation** whose key those hosts still trust. +- **The provider's web console / serial console** as a last resort. + +> [!warning] A jump host only helps if *it* can reach the target +> "Bounce through a box that still trusts me" only works if that box's own key is in +> the target's `authorized_keys`. A host can trust *your* key yet have no standing +> trust to a third host (and hit its own `Host key verification failed` on the way). +> Test the full two-hop path before relying on it. + +Using a fleet-wide automation user (`deploy`) with passwordless `sudo` as the transit, +append the new key idempotently, with a backup, to every failing host: + +```bash +PUBKEY=$(cat ~/.ssh/id_ed25519.pub) +STAMP=$(date +%Y%m%d-%H%M%S) +for h in host-a host-c host-e; do # only the hosts that failed the sweep + ssh deploy@"$h" "sudo bash -s" <> "\$F" # append only if absent +chmod 600 "\$F" +EOF +done +``` + +Three things that keep this safe: + +- **Append, never overwrite.** `>> "$F"` and the `grep -qF … ||` guard mean you add + one line and only if it's missing. Re-running is a no-op — never clobber an + `authorized_keys` with `>` or you'll lock out every *other* key on the box. +- **Back up first.** The `.bak-` copy is your undo. +- **`chmod 600`.** SSH silently ignores an `authorized_keys` that's group/world + writable, which looks exactly like "the key didn't take." + +Then verify directly — not through the transit user: + +```bash +for h in host-a host-c host-e; do + echo "$h: $(ssh -o BatchMode=yes root@"$h" 'echo OK' 2>&1 | tail -1)" +done +``` + +All `OK` means the new key authenticates on its own. + +## Prevention + +- **Treat rotation as fleet-wide.** When a workstation key changes, the very next step + is to fan the new public key out to **every** host's `authorized_keys` in one pass — + not opportunistically as you happen to log in. A short `for` loop over the full host + list (or a config-management task — see below) closes the gap immediately. +- **Manage `authorized_keys` declaratively.** An Ansible `ansible.posix.authorized_key` + task (or equivalent) that lists the *current* set of keys makes "who can log in" a + reviewed, version-controlled fact instead of an append-only pile that drifts per host. +- **Keep the old key authorized until the new one is verified everywhere**, then remove + the stale line in a deliberate cleanup pass. + +## How to Diagnose This (Checklist) + +1. `ssh -o BatchMode=yes true` → `Permission denied (publickey)` (auth), not + `Host key verification failed` (host key). Confirms which problem you have. +2. `ssh -v 2>&1 | grep Offering` → which private key is being offered, and its + fingerprint. +3. Sweep the whole fleet with the `BatchMode` loop → get the **full** list of affected + hosts before fixing. +4. Append the new public key (idempotent, backed up, `chmod 600`) via a still-trusted + transit path. +5. Re-verify each host with a direct `BatchMode` login. + +Related: *[SSH Config & Key Management](../../01-linux/networking/ssh-config-key-management.md)* +and *[SSH Hardening Across a Fleet with Ansible](../../02-selfhosting/security/ssh-hardening-ansible-fleet.md)*. diff --git a/SUMMARY.md b/SUMMARY.md index 6014b65..30e7597 100644 --- a/SUMMARY.md +++ b/SUMMARY.md @@ -141,6 +141,7 @@ updated: 2026-05-15T09:00 * [Ansible Fails with Permission Denied While `ssh ` Works (Host Alias Bypass)](05-troubleshooting/ansible-ssh-host-alias-bypass.md) * [SSH Alias Falls Through to MagicDNS — Host-Key Verification Failure (No `Host` Block)](05-troubleshooting/networking/ssh-missing-host-block-magicdns-host-key-failure.md) * [MagicDNS Names vs Pinned IPs for Tailscale SSH (After a Fleet Migration)](05-troubleshooting/networking/tailscale-ssh-magicdns-vs-pinned-ip-after-migration.md) + * [`Permission denied (publickey)` After Rotating a Key — Backfill Every `authorized_keys`](05-troubleshooting/networking/ssh-rotated-key-not-backfilled-authorized-keys.md) * [Ansible UNREACHABLE: Host Key Verification Failed After a Host Rebuild or Migration](05-troubleshooting/networking/ansible-host-key-verification-failed-rebuilt-host.md) * [Logwatch Reports the Wrong Hostname (`-hetzner`) After a Migration](05-troubleshooting/logwatch-wrong-hostname-after-migration.md) * [Ghost EmailAnalytics Lag Warning — What It Means and When to Worry](05-troubleshooting/ghost-emailanalytics-lag-warning.md)