Add troubleshooting article: Permission denied (publickey) after key rotation

New 05-troubleshooting/networking article covering the per-host nature of
authorized_keys: rotating a workstation SSH key requires backfilling the new
pubkey to every host, or hosts holding only the old key reject it with
Permission denied (publickey). Includes fleet-sweep diagnosis, idempotent
backed-up backfill via a still-trusted transit user, and prevention. Wired
into SUMMARY.md nav.
This commit is contained in:
Marcus Summers 2026-06-17 13:14:41 -04:00
parent 0d08e21ee4
commit e1767bc19e
2 changed files with 161 additions and 0 deletions

View file

@ -0,0 +1,160 @@
---
title: "SSH `Permission denied (publickey)` After Rotating a Key — Backfill Every `authorized_keys`"
domain: selfhosting
category: troubleshooting
tags:
- ssh
- ssh-keys
- authorized-keys
- key-rotation
- publickey
- fleet
- troubleshooting
status: published
created: 2026-06-17
updated: 2026-06-17
---
# SSH `Permission denied (publickey)` After Rotating a Key — Backfill Every `authorized_keys`
## The Problem
A host you've SSH'd into for months suddenly rejects you — but **only some hosts**, not all:
```
$ ssh root@host-a
root@host-a: Permission denied (publickey).
$ ssh root@host-b # same key, same workstation — works fine
host-b $
```
Nothing changed on the servers. The thing that changed is on **your** side: at some
point the workstation's SSH key was **regenerated** (lost laptop, rebuild, a key file
clobbered by a botched copy, a routine rotation). The new public key was pushed to a
few hosts but never fanned out to the rest. Every host still holding only the *old*
public key now rejects the new private key with `Permission denied (publickey)`.
> The tell: it's `Permission denied (publickey)`, **not** `Host key verification
> failed`. The former is an **authorization** failure (the server doesn't trust your
> key); the latter is the server's key not matching your `known_hosts`. Different
> problem — see *[SSH Alias Falls Through to MagicDNS — Host-Key Verification Failure](ssh-missing-host-block-magicdns-host-key-failure.md)*.
## Why It Happens
Public-key auth is **per-host**: the server only lets you in if your public key is a
line in that host's `~/.ssh/authorized_keys`. There is no central directory — each
host is its own island. So when you rotate a key, *every* host needs the new public
key appended independently.
It's easy to do this partially without noticing. You regenerate the key, then over the
next hour you happen to SSH into three boxes and (re-)deploy the key there as part of
other work. Those three now trust the new key. The other six don't — and you won't
find out until weeks later when you reach for one of them.
Confirm it's an authorization (key) failure and see which key is being offered:
```
$ ssh -v root@host-a 2>&1 | grep -E 'Offering|Authentications|Permission denied'
debug1: Offering public key: /home/you/.ssh/id_ed25519 ED25519 SHA256:XeY1/N9qwB…
debug1: Authentications that can continue: publickey
root@host-a: Permission denied (publickey).
```
The server offered you nothing but `publickey`, you offered your current key, and it
was refused → your key isn't in that host's `authorized_keys`.
## Scope It First — Don't Fix One Host at a Time
The host you noticed is rarely the only one. Sweep the whole fleet in one pass before
touching anything, so you fix the real set, not just the squeaky wheel:
```bash
for h in host-a host-b host-c host-d host-e host-f; do
r=$(ssh -o BatchMode=yes -o ConnectTimeout=8 root@"$h" 'echo OK' 2>&1 | tail -1)
echo "$h: $r"
done
```
`BatchMode=yes` suppresses password/passphrase prompts so a failure fails fast instead
of hanging. Anything that doesn't print `OK` needs the backfill.
## The Fix
You need a **second, still-trusted** way onto each failing host to append the new key.
Common transit options, best first:
- **Another of your keys that still works** (e.g. a config-management / automation
user whose key is authorized fleet-wide, ideally with `sudo`).
- **Another workstation** whose key those hosts still trust.
- **The provider's web console / serial console** as a last resort.
> [!warning] A jump host only helps if *it* can reach the target
> "Bounce through a box that still trusts me" only works if that box's own key is in
> the target's `authorized_keys`. A host can trust *your* key yet have no standing
> trust to a third host (and hit its own `Host key verification failed` on the way).
> Test the full two-hop path before relying on it.
Using a fleet-wide automation user (`deploy`) with passwordless `sudo` as the transit,
append the new key idempotently, with a backup, to every failing host:
```bash
PUBKEY=$(cat ~/.ssh/id_ed25519.pub)
STAMP=$(date +%Y%m%d-%H%M%S)
for h in host-a host-c host-e; do # only the hosts that failed the sweep
ssh deploy@"$h" "sudo bash -s" <<EOF
set -e
F=/root/.ssh/authorized_keys
mkdir -p /root/.ssh && touch "\$F"
cp "\$F" "\$F.bak-$STAMP" # backup before any change
grep -qF "$PUBKEY" "\$F" || printf '%s\n' "$PUBKEY" >> "\$F" # append only if absent
chmod 600 "\$F"
EOF
done
```
Three things that keep this safe:
- **Append, never overwrite.** `>> "$F"` and the `grep -qF … ||` guard mean you add
one line and only if it's missing. Re-running is a no-op — never clobber an
`authorized_keys` with `>` or you'll lock out every *other* key on the box.
- **Back up first.** The `.bak-<stamp>` copy is your undo.
- **`chmod 600`.** SSH silently ignores an `authorized_keys` that's group/world
writable, which looks exactly like "the key didn't take."
Then verify directly — not through the transit user:
```bash
for h in host-a host-c host-e; do
echo "$h: $(ssh -o BatchMode=yes root@"$h" 'echo OK' 2>&1 | tail -1)"
done
```
All `OK` means the new key authenticates on its own.
## Prevention
- **Treat rotation as fleet-wide.** When a workstation key changes, the very next step
is to fan the new public key out to **every** host's `authorized_keys` in one pass —
not opportunistically as you happen to log in. A short `for` loop over the full host
list (or a config-management task — see below) closes the gap immediately.
- **Manage `authorized_keys` declaratively.** An Ansible `ansible.posix.authorized_key`
task (or equivalent) that lists the *current* set of keys makes "who can log in" a
reviewed, version-controlled fact instead of an append-only pile that drifts per host.
- **Keep the old key authorized until the new one is verified everywhere**, then remove
the stale line in a deliberate cleanup pass.
## How to Diagnose This (Checklist)
1. `ssh -o BatchMode=yes <host> true``Permission denied (publickey)` (auth), not
`Host key verification failed` (host key). Confirms which problem you have.
2. `ssh -v <host> 2>&1 | grep Offering` → which private key is being offered, and its
fingerprint.
3. Sweep the whole fleet with the `BatchMode` loop → get the **full** list of affected
hosts before fixing.
4. Append the new public key (idempotent, backed up, `chmod 600`) via a still-trusted
transit path.
5. Re-verify each host with a direct `BatchMode` login.
Related: *[SSH Config & Key Management](../../01-linux/networking/ssh-config-key-management.md)*
and *[SSH Hardening Across a Fleet with Ansible](../../02-selfhosting/security/ssh-hardening-ansible-fleet.md)*.

View file

@ -141,6 +141,7 @@ updated: 2026-05-15T09:00
* [Ansible Fails with Permission Denied While `ssh <alias>` Works (Host Alias Bypass)](05-troubleshooting/ansible-ssh-host-alias-bypass.md) * [Ansible Fails with Permission Denied While `ssh <alias>` Works (Host Alias Bypass)](05-troubleshooting/ansible-ssh-host-alias-bypass.md)
* [SSH Alias Falls Through to MagicDNS — Host-Key Verification Failure (No `Host` Block)](05-troubleshooting/networking/ssh-missing-host-block-magicdns-host-key-failure.md) * [SSH Alias Falls Through to MagicDNS — Host-Key Verification Failure (No `Host` Block)](05-troubleshooting/networking/ssh-missing-host-block-magicdns-host-key-failure.md)
* [MagicDNS Names vs Pinned IPs for Tailscale SSH (After a Fleet Migration)](05-troubleshooting/networking/tailscale-ssh-magicdns-vs-pinned-ip-after-migration.md) * [MagicDNS Names vs Pinned IPs for Tailscale SSH (After a Fleet Migration)](05-troubleshooting/networking/tailscale-ssh-magicdns-vs-pinned-ip-after-migration.md)
* [`Permission denied (publickey)` After Rotating a Key — Backfill Every `authorized_keys`](05-troubleshooting/networking/ssh-rotated-key-not-backfilled-authorized-keys.md)
* [Ansible UNREACHABLE: Host Key Verification Failed After a Host Rebuild or Migration](05-troubleshooting/networking/ansible-host-key-verification-failed-rebuilt-host.md) * [Ansible UNREACHABLE: Host Key Verification Failed After a Host Rebuild or Migration](05-troubleshooting/networking/ansible-host-key-verification-failed-rebuilt-host.md)
* [Logwatch Reports the Wrong Hostname (`<host>-hetzner`) After a Migration](05-troubleshooting/logwatch-wrong-hostname-after-migration.md) * [Logwatch Reports the Wrong Hostname (`<host>-hetzner`) After a Migration](05-troubleshooting/logwatch-wrong-hostname-after-migration.md)
* [Ghost EmailAnalytics Lag Warning — What It Means and When to Worry](05-troubleshooting/ghost-emailanalytics-lag-warning.md) * [Ghost EmailAnalytics Lag Warning — What It Means and When to Worry](05-troubleshooting/ghost-emailanalytics-lag-warning.md)