Marcus Summers e1767bc19e Add troubleshooting article: Permission denied (publickey) after key rotation

New 05-troubleshooting/networking article covering the per-host nature of
authorized_keys: rotating a workstation SSH key requires backfilling the new
pubkey to every host, or hosts holding only the old key reject it with
Permission denied (publickey). Includes fleet-sweep diagnosis, idempotent
backed-up backfill via a still-trusted transit user, and prevention. Wired
into SUMMARY.md nav.

2026-06-17 13:14:41 -04:00

6.6 KiB

Raw Blame History

title

domain

SSH `Permission denied (publickey)` After Rotating a Key — Backfill Every `authorized_keys`

The Problem

A host you've SSH'd into for months suddenly rejects you — but only some hosts, not all:

$ ssh root@host-a
root@host-a: Permission denied (publickey).

$ ssh root@host-b      # same key, same workstation — works fine
host-b $

Nothing changed on the servers. The thing that changed is on your side: at some point the workstation's SSH key was regenerated (lost laptop, rebuild, a key file clobbered by a botched copy, a routine rotation). The new public key was pushed to a few hosts but never fanned out to the rest. Every host still holding only the old public key now rejects the new private key with Permission denied (publickey).

The tell: it's Permission denied (publickey), not Host key verification failed. The former is an authorization failure (the server doesn't trust your key); the latter is the server's key not matching your known_hosts. Different problem — see SSH Alias Falls Through to MagicDNS — Host-Key Verification Failure.

Why It Happens

Public-key auth is per-host: the server only lets you in if your public key is a line in that host's ~/.ssh/authorized_keys. There is no central directory — each host is its own island. So when you rotate a key, every host needs the new public key appended independently.

It's easy to do this partially without noticing. You regenerate the key, then over the next hour you happen to SSH into three boxes and (re-)deploy the key there as part of other work. Those three now trust the new key. The other six don't — and you won't find out until weeks later when you reach for one of them.

Confirm it's an authorization (key) failure and see which key is being offered:

$ ssh -v root@host-a 2>&1 | grep -E 'Offering|Authentications|Permission denied'
debug1: Offering public key: /home/you/.ssh/id_ed25519 ED25519 SHA256:XeY1/N9qwB…
debug1: Authentications that can continue: publickey
root@host-a: Permission denied (publickey).

The server offered you nothing but publickey, you offered your current key, and it was refused → your key isn't in that host's authorized_keys.

Scope It First — Don't Fix One Host at a Time

The host you noticed is rarely the only one. Sweep the whole fleet in one pass before touching anything, so you fix the real set, not just the squeaky wheel:

for h in host-a host-b host-c host-d host-e host-f; do
  r=$(ssh -o BatchMode=yes -o ConnectTimeout=8 root@"$h" 'echo OK' 2>&1 | tail -1)
  echo "$h: $r"
done

BatchMode=yes suppresses password/passphrase prompts so a failure fails fast instead of hanging. Anything that doesn't print OK needs the backfill.

The Fix

You need a second, still-trusted way onto each failing host to append the new key. Common transit options, best first:

Another of your keys that still works (e.g. a config-management / automation user whose key is authorized fleet-wide, ideally with sudo).
Another workstation whose key those hosts still trust.
The provider's web console / serial console as a last resort.

[!warning] A jump host only helps if it can reach the target "Bounce through a box that still trusts me" only works if that box's own key is in the target's authorized_keys. A host can trust your key yet have no standing trust to a third host (and hit its own Host key verification failed on the way). Test the full two-hop path before relying on it.

Using a fleet-wide automation user (deploy) with passwordless sudo as the transit, append the new key idempotently, with a backup, to every failing host:

PUBKEY=$(cat ~/.ssh/id_ed25519.pub)
STAMP=$(date +%Y%m%d-%H%M%S)
for h in host-a host-c host-e; do          # only the hosts that failed the sweep
  ssh deploy@"$h" "sudo bash -s" <<EOF
set -e
F=/root/.ssh/authorized_keys
mkdir -p /root/.ssh && touch "\$F"
cp "\$F" "\$F.bak-$STAMP"                   # backup before any change
grep -qF "$PUBKEY" "\$F" || printf '%s\n' "$PUBKEY" >> "\$F"   # append only if absent
chmod 600 "\$F"
EOF
done

Three things that keep this safe:

Append, never overwrite. >> "$F" and the grep -qF … || guard mean you add one line and only if it's missing. Re-running is a no-op — never clobber an authorized_keys with > or you'll lock out every other key on the box.
Back up first. The .bak-<stamp> copy is your undo.
chmod 600. SSH silently ignores an authorized_keys that's group/world writable, which looks exactly like "the key didn't take."

Then verify directly — not through the transit user:

for h in host-a host-c host-e; do
  echo "$h: $(ssh -o BatchMode=yes root@"$h" 'echo OK' 2>&1 | tail -1)"
done

All OK means the new key authenticates on its own.

Prevention

Treat rotation as fleet-wide. When a workstation key changes, the very next step is to fan the new public key out to every host's authorized_keys in one pass — not opportunistically as you happen to log in. A short for loop over the full host list (or a config-management task — see below) closes the gap immediately.
Manage authorized_keys declaratively. An Ansible ansible.posix.authorized_key task (or equivalent) that lists the current set of keys makes "who can log in" a reviewed, version-controlled fact instead of an append-only pile that drifts per host.
Keep the old key authorized until the new one is verified everywhere, then remove the stale line in a deliberate cleanup pass.

How to Diagnose This (Checklist)

ssh -o BatchMode=yes <host> true → Permission denied (publickey) (auth), not Host key verification failed (host key). Confirms which problem you have.
ssh -v <host> 2>&1 | grep Offering → which private key is being offered, and its fingerprint.
Sweep the whole fleet with the BatchMode loop → get the full list of affected hosts before fixing.
Append the new public key (idempotent, backed up, chmod 600) via a still-trusted transit path.
Re-verify each host with a direct BatchMode login.

6.6 KiB Raw Blame History

SSH Permission denied (publickey) After Rotating a Key — Backfill Every authorized_keys