Add troubleshooting article: Permission denied (publickey) after key rotation
New 05-troubleshooting/networking article covering the per-host nature of authorized_keys: rotating a workstation SSH key requires backfilling the new pubkey to every host, or hosts holding only the old key reject it with Permission denied (publickey). Includes fleet-sweep diagnosis, idempotent backed-up backfill via a still-trusted transit user, and prevention. Wired into SUMMARY.md nav.
This commit is contained in:
parent
0d08e21ee4
commit
e1767bc19e
2 changed files with 161 additions and 0 deletions
|
|
@ -0,0 +1,160 @@
|
||||||
|
---
|
||||||
|
title: "SSH `Permission denied (publickey)` After Rotating a Key — Backfill Every `authorized_keys`"
|
||||||
|
domain: selfhosting
|
||||||
|
category: troubleshooting
|
||||||
|
tags:
|
||||||
|
- ssh
|
||||||
|
- ssh-keys
|
||||||
|
- authorized-keys
|
||||||
|
- key-rotation
|
||||||
|
- publickey
|
||||||
|
- fleet
|
||||||
|
- troubleshooting
|
||||||
|
status: published
|
||||||
|
created: 2026-06-17
|
||||||
|
updated: 2026-06-17
|
||||||
|
---
|
||||||
|
|
||||||
|
# SSH `Permission denied (publickey)` After Rotating a Key — Backfill Every `authorized_keys`
|
||||||
|
|
||||||
|
## The Problem
|
||||||
|
|
||||||
|
A host you've SSH'd into for months suddenly rejects you — but **only some hosts**, not all:
|
||||||
|
|
||||||
|
```
|
||||||
|
$ ssh root@host-a
|
||||||
|
root@host-a: Permission denied (publickey).
|
||||||
|
|
||||||
|
$ ssh root@host-b # same key, same workstation — works fine
|
||||||
|
host-b $
|
||||||
|
```
|
||||||
|
|
||||||
|
Nothing changed on the servers. The thing that changed is on **your** side: at some
|
||||||
|
point the workstation's SSH key was **regenerated** (lost laptop, rebuild, a key file
|
||||||
|
clobbered by a botched copy, a routine rotation). The new public key was pushed to a
|
||||||
|
few hosts but never fanned out to the rest. Every host still holding only the *old*
|
||||||
|
public key now rejects the new private key with `Permission denied (publickey)`.
|
||||||
|
|
||||||
|
> The tell: it's `Permission denied (publickey)`, **not** `Host key verification
|
||||||
|
> failed`. The former is an **authorization** failure (the server doesn't trust your
|
||||||
|
> key); the latter is the server's key not matching your `known_hosts`. Different
|
||||||
|
> problem — see *[SSH Alias Falls Through to MagicDNS — Host-Key Verification Failure](ssh-missing-host-block-magicdns-host-key-failure.md)*.
|
||||||
|
|
||||||
|
## Why It Happens
|
||||||
|
|
||||||
|
Public-key auth is **per-host**: the server only lets you in if your public key is a
|
||||||
|
line in that host's `~/.ssh/authorized_keys`. There is no central directory — each
|
||||||
|
host is its own island. So when you rotate a key, *every* host needs the new public
|
||||||
|
key appended independently.
|
||||||
|
|
||||||
|
It's easy to do this partially without noticing. You regenerate the key, then over the
|
||||||
|
next hour you happen to SSH into three boxes and (re-)deploy the key there as part of
|
||||||
|
other work. Those three now trust the new key. The other six don't — and you won't
|
||||||
|
find out until weeks later when you reach for one of them.
|
||||||
|
|
||||||
|
Confirm it's an authorization (key) failure and see which key is being offered:
|
||||||
|
|
||||||
|
```
|
||||||
|
$ ssh -v root@host-a 2>&1 | grep -E 'Offering|Authentications|Permission denied'
|
||||||
|
debug1: Offering public key: /home/you/.ssh/id_ed25519 ED25519 SHA256:XeY1/N9qwB…
|
||||||
|
debug1: Authentications that can continue: publickey
|
||||||
|
root@host-a: Permission denied (publickey).
|
||||||
|
```
|
||||||
|
|
||||||
|
The server offered you nothing but `publickey`, you offered your current key, and it
|
||||||
|
was refused → your key isn't in that host's `authorized_keys`.
|
||||||
|
|
||||||
|
## Scope It First — Don't Fix One Host at a Time
|
||||||
|
|
||||||
|
The host you noticed is rarely the only one. Sweep the whole fleet in one pass before
|
||||||
|
touching anything, so you fix the real set, not just the squeaky wheel:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
for h in host-a host-b host-c host-d host-e host-f; do
|
||||||
|
r=$(ssh -o BatchMode=yes -o ConnectTimeout=8 root@"$h" 'echo OK' 2>&1 | tail -1)
|
||||||
|
echo "$h: $r"
|
||||||
|
done
|
||||||
|
```
|
||||||
|
|
||||||
|
`BatchMode=yes` suppresses password/passphrase prompts so a failure fails fast instead
|
||||||
|
of hanging. Anything that doesn't print `OK` needs the backfill.
|
||||||
|
|
||||||
|
## The Fix
|
||||||
|
|
||||||
|
You need a **second, still-trusted** way onto each failing host to append the new key.
|
||||||
|
Common transit options, best first:
|
||||||
|
|
||||||
|
- **Another of your keys that still works** (e.g. a config-management / automation
|
||||||
|
user whose key is authorized fleet-wide, ideally with `sudo`).
|
||||||
|
- **Another workstation** whose key those hosts still trust.
|
||||||
|
- **The provider's web console / serial console** as a last resort.
|
||||||
|
|
||||||
|
> [!warning] A jump host only helps if *it* can reach the target
|
||||||
|
> "Bounce through a box that still trusts me" only works if that box's own key is in
|
||||||
|
> the target's `authorized_keys`. A host can trust *your* key yet have no standing
|
||||||
|
> trust to a third host (and hit its own `Host key verification failed` on the way).
|
||||||
|
> Test the full two-hop path before relying on it.
|
||||||
|
|
||||||
|
Using a fleet-wide automation user (`deploy`) with passwordless `sudo` as the transit,
|
||||||
|
append the new key idempotently, with a backup, to every failing host:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
PUBKEY=$(cat ~/.ssh/id_ed25519.pub)
|
||||||
|
STAMP=$(date +%Y%m%d-%H%M%S)
|
||||||
|
for h in host-a host-c host-e; do # only the hosts that failed the sweep
|
||||||
|
ssh deploy@"$h" "sudo bash -s" <<EOF
|
||||||
|
set -e
|
||||||
|
F=/root/.ssh/authorized_keys
|
||||||
|
mkdir -p /root/.ssh && touch "\$F"
|
||||||
|
cp "\$F" "\$F.bak-$STAMP" # backup before any change
|
||||||
|
grep -qF "$PUBKEY" "\$F" || printf '%s\n' "$PUBKEY" >> "\$F" # append only if absent
|
||||||
|
chmod 600 "\$F"
|
||||||
|
EOF
|
||||||
|
done
|
||||||
|
```
|
||||||
|
|
||||||
|
Three things that keep this safe:
|
||||||
|
|
||||||
|
- **Append, never overwrite.** `>> "$F"` and the `grep -qF … ||` guard mean you add
|
||||||
|
one line and only if it's missing. Re-running is a no-op — never clobber an
|
||||||
|
`authorized_keys` with `>` or you'll lock out every *other* key on the box.
|
||||||
|
- **Back up first.** The `.bak-<stamp>` copy is your undo.
|
||||||
|
- **`chmod 600`.** SSH silently ignores an `authorized_keys` that's group/world
|
||||||
|
writable, which looks exactly like "the key didn't take."
|
||||||
|
|
||||||
|
Then verify directly — not through the transit user:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
for h in host-a host-c host-e; do
|
||||||
|
echo "$h: $(ssh -o BatchMode=yes root@"$h" 'echo OK' 2>&1 | tail -1)"
|
||||||
|
done
|
||||||
|
```
|
||||||
|
|
||||||
|
All `OK` means the new key authenticates on its own.
|
||||||
|
|
||||||
|
## Prevention
|
||||||
|
|
||||||
|
- **Treat rotation as fleet-wide.** When a workstation key changes, the very next step
|
||||||
|
is to fan the new public key out to **every** host's `authorized_keys` in one pass —
|
||||||
|
not opportunistically as you happen to log in. A short `for` loop over the full host
|
||||||
|
list (or a config-management task — see below) closes the gap immediately.
|
||||||
|
- **Manage `authorized_keys` declaratively.** An Ansible `ansible.posix.authorized_key`
|
||||||
|
task (or equivalent) that lists the *current* set of keys makes "who can log in" a
|
||||||
|
reviewed, version-controlled fact instead of an append-only pile that drifts per host.
|
||||||
|
- **Keep the old key authorized until the new one is verified everywhere**, then remove
|
||||||
|
the stale line in a deliberate cleanup pass.
|
||||||
|
|
||||||
|
## How to Diagnose This (Checklist)
|
||||||
|
|
||||||
|
1. `ssh -o BatchMode=yes <host> true` → `Permission denied (publickey)` (auth), not
|
||||||
|
`Host key verification failed` (host key). Confirms which problem you have.
|
||||||
|
2. `ssh -v <host> 2>&1 | grep Offering` → which private key is being offered, and its
|
||||||
|
fingerprint.
|
||||||
|
3. Sweep the whole fleet with the `BatchMode` loop → get the **full** list of affected
|
||||||
|
hosts before fixing.
|
||||||
|
4. Append the new public key (idempotent, backed up, `chmod 600`) via a still-trusted
|
||||||
|
transit path.
|
||||||
|
5. Re-verify each host with a direct `BatchMode` login.
|
||||||
|
|
||||||
|
Related: *[SSH Config & Key Management](../../01-linux/networking/ssh-config-key-management.md)*
|
||||||
|
and *[SSH Hardening Across a Fleet with Ansible](../../02-selfhosting/security/ssh-hardening-ansible-fleet.md)*.
|
||||||
|
|
@ -141,6 +141,7 @@ updated: 2026-05-15T09:00
|
||||||
* [Ansible Fails with Permission Denied While `ssh <alias>` Works (Host Alias Bypass)](05-troubleshooting/ansible-ssh-host-alias-bypass.md)
|
* [Ansible Fails with Permission Denied While `ssh <alias>` Works (Host Alias Bypass)](05-troubleshooting/ansible-ssh-host-alias-bypass.md)
|
||||||
* [SSH Alias Falls Through to MagicDNS — Host-Key Verification Failure (No `Host` Block)](05-troubleshooting/networking/ssh-missing-host-block-magicdns-host-key-failure.md)
|
* [SSH Alias Falls Through to MagicDNS — Host-Key Verification Failure (No `Host` Block)](05-troubleshooting/networking/ssh-missing-host-block-magicdns-host-key-failure.md)
|
||||||
* [MagicDNS Names vs Pinned IPs for Tailscale SSH (After a Fleet Migration)](05-troubleshooting/networking/tailscale-ssh-magicdns-vs-pinned-ip-after-migration.md)
|
* [MagicDNS Names vs Pinned IPs for Tailscale SSH (After a Fleet Migration)](05-troubleshooting/networking/tailscale-ssh-magicdns-vs-pinned-ip-after-migration.md)
|
||||||
|
* [`Permission denied (publickey)` After Rotating a Key — Backfill Every `authorized_keys`](05-troubleshooting/networking/ssh-rotated-key-not-backfilled-authorized-keys.md)
|
||||||
* [Ansible UNREACHABLE: Host Key Verification Failed After a Host Rebuild or Migration](05-troubleshooting/networking/ansible-host-key-verification-failed-rebuilt-host.md)
|
* [Ansible UNREACHABLE: Host Key Verification Failed After a Host Rebuild or Migration](05-troubleshooting/networking/ansible-host-key-verification-failed-rebuilt-host.md)
|
||||||
* [Logwatch Reports the Wrong Hostname (`<host>-hetzner`) After a Migration](05-troubleshooting/logwatch-wrong-hostname-after-migration.md)
|
* [Logwatch Reports the Wrong Hostname (`<host>-hetzner`) After a Migration](05-troubleshooting/logwatch-wrong-hostname-after-migration.md)
|
||||||
* [Ghost EmailAnalytics Lag Warning — What It Means and When to Worry](05-troubleshooting/ghost-emailanalytics-lag-warning.md)
|
* [Ghost EmailAnalytics Lag Warning — What It Means and When to Worry](05-troubleshooting/ghost-emailanalytics-lag-warning.md)
|
||||||
|
|
|
||||||
Loading…
Add table
Reference in a new issue