Merge branch 'code/majormac/ssh-key-backfill-article'
This commit is contained in:
commit
06162273f7
2 changed files with 161 additions and 0 deletions
|
|
@ -0,0 +1,160 @@
|
|||
---
|
||||
title: "SSH `Permission denied (publickey)` After Rotating a Key — Backfill Every `authorized_keys`"
|
||||
domain: selfhosting
|
||||
category: troubleshooting
|
||||
tags:
|
||||
- ssh
|
||||
- ssh-keys
|
||||
- authorized-keys
|
||||
- key-rotation
|
||||
- publickey
|
||||
- fleet
|
||||
- troubleshooting
|
||||
status: published
|
||||
created: 2026-06-17
|
||||
updated: 2026-06-17
|
||||
---
|
||||
|
||||
# SSH `Permission denied (publickey)` After Rotating a Key — Backfill Every `authorized_keys`
|
||||
|
||||
## The Problem
|
||||
|
||||
A host you've SSH'd into for months suddenly rejects you — but **only some hosts**, not all:
|
||||
|
||||
```
|
||||
$ ssh root@host-a
|
||||
root@host-a: Permission denied (publickey).
|
||||
|
||||
$ ssh root@host-b # same key, same workstation — works fine
|
||||
host-b $
|
||||
```
|
||||
|
||||
Nothing changed on the servers. The thing that changed is on **your** side: at some
|
||||
point the workstation's SSH key was **regenerated** (lost laptop, rebuild, a key file
|
||||
clobbered by a botched copy, a routine rotation). The new public key was pushed to a
|
||||
few hosts but never fanned out to the rest. Every host still holding only the *old*
|
||||
public key now rejects the new private key with `Permission denied (publickey)`.
|
||||
|
||||
> The tell: it's `Permission denied (publickey)`, **not** `Host key verification
|
||||
> failed`. The former is an **authorization** failure (the server doesn't trust your
|
||||
> key); the latter is the server's key not matching your `known_hosts`. Different
|
||||
> problem — see *[SSH Alias Falls Through to MagicDNS — Host-Key Verification Failure](ssh-missing-host-block-magicdns-host-key-failure.md)*.
|
||||
|
||||
## Why It Happens
|
||||
|
||||
Public-key auth is **per-host**: the server only lets you in if your public key is a
|
||||
line in that host's `~/.ssh/authorized_keys`. There is no central directory — each
|
||||
host is its own island. So when you rotate a key, *every* host needs the new public
|
||||
key appended independently.
|
||||
|
||||
It's easy to do this partially without noticing. You regenerate the key, then over the
|
||||
next hour you happen to SSH into three boxes and (re-)deploy the key there as part of
|
||||
other work. Those three now trust the new key. The other six don't — and you won't
|
||||
find out until weeks later when you reach for one of them.
|
||||
|
||||
Confirm it's an authorization (key) failure and see which key is being offered:
|
||||
|
||||
```
|
||||
$ ssh -v root@host-a 2>&1 | grep -E 'Offering|Authentications|Permission denied'
|
||||
debug1: Offering public key: /home/you/.ssh/id_ed25519 ED25519 SHA256:XeY1/N9qwB…
|
||||
debug1: Authentications that can continue: publickey
|
||||
root@host-a: Permission denied (publickey).
|
||||
```
|
||||
|
||||
The server offered you nothing but `publickey`, you offered your current key, and it
|
||||
was refused → your key isn't in that host's `authorized_keys`.
|
||||
|
||||
## Scope It First — Don't Fix One Host at a Time
|
||||
|
||||
The host you noticed is rarely the only one. Sweep the whole fleet in one pass before
|
||||
touching anything, so you fix the real set, not just the squeaky wheel:
|
||||
|
||||
```bash
|
||||
for h in host-a host-b host-c host-d host-e host-f; do
|
||||
r=$(ssh -o BatchMode=yes -o ConnectTimeout=8 root@"$h" 'echo OK' 2>&1 | tail -1)
|
||||
echo "$h: $r"
|
||||
done
|
||||
```
|
||||
|
||||
`BatchMode=yes` suppresses password/passphrase prompts so a failure fails fast instead
|
||||
of hanging. Anything that doesn't print `OK` needs the backfill.
|
||||
|
||||
## The Fix
|
||||
|
||||
You need a **second, still-trusted** way onto each failing host to append the new key.
|
||||
Common transit options, best first:
|
||||
|
||||
- **Another of your keys that still works** (e.g. a config-management / automation
|
||||
user whose key is authorized fleet-wide, ideally with `sudo`).
|
||||
- **Another workstation** whose key those hosts still trust.
|
||||
- **The provider's web console / serial console** as a last resort.
|
||||
|
||||
> [!warning] A jump host only helps if *it* can reach the target
|
||||
> "Bounce through a box that still trusts me" only works if that box's own key is in
|
||||
> the target's `authorized_keys`. A host can trust *your* key yet have no standing
|
||||
> trust to a third host (and hit its own `Host key verification failed` on the way).
|
||||
> Test the full two-hop path before relying on it.
|
||||
|
||||
Using a fleet-wide automation user (`deploy`) with passwordless `sudo` as the transit,
|
||||
append the new key idempotently, with a backup, to every failing host:
|
||||
|
||||
```bash
|
||||
PUBKEY=$(cat ~/.ssh/id_ed25519.pub)
|
||||
STAMP=$(date +%Y%m%d-%H%M%S)
|
||||
for h in host-a host-c host-e; do # only the hosts that failed the sweep
|
||||
ssh deploy@"$h" "sudo bash -s" <<EOF
|
||||
set -e
|
||||
F=/root/.ssh/authorized_keys
|
||||
mkdir -p /root/.ssh && touch "\$F"
|
||||
cp "\$F" "\$F.bak-$STAMP" # backup before any change
|
||||
grep -qF "$PUBKEY" "\$F" || printf '%s\n' "$PUBKEY" >> "\$F" # append only if absent
|
||||
chmod 600 "\$F"
|
||||
EOF
|
||||
done
|
||||
```
|
||||
|
||||
Three things that keep this safe:
|
||||
|
||||
- **Append, never overwrite.** `>> "$F"` and the `grep -qF … ||` guard mean you add
|
||||
one line and only if it's missing. Re-running is a no-op — never clobber an
|
||||
`authorized_keys` with `>` or you'll lock out every *other* key on the box.
|
||||
- **Back up first.** The `.bak-<stamp>` copy is your undo.
|
||||
- **`chmod 600`.** SSH silently ignores an `authorized_keys` that's group/world
|
||||
writable, which looks exactly like "the key didn't take."
|
||||
|
||||
Then verify directly — not through the transit user:
|
||||
|
||||
```bash
|
||||
for h in host-a host-c host-e; do
|
||||
echo "$h: $(ssh -o BatchMode=yes root@"$h" 'echo OK' 2>&1 | tail -1)"
|
||||
done
|
||||
```
|
||||
|
||||
All `OK` means the new key authenticates on its own.
|
||||
|
||||
## Prevention
|
||||
|
||||
- **Treat rotation as fleet-wide.** When a workstation key changes, the very next step
|
||||
is to fan the new public key out to **every** host's `authorized_keys` in one pass —
|
||||
not opportunistically as you happen to log in. A short `for` loop over the full host
|
||||
list (or a config-management task — see below) closes the gap immediately.
|
||||
- **Manage `authorized_keys` declaratively.** An Ansible `ansible.posix.authorized_key`
|
||||
task (or equivalent) that lists the *current* set of keys makes "who can log in" a
|
||||
reviewed, version-controlled fact instead of an append-only pile that drifts per host.
|
||||
- **Keep the old key authorized until the new one is verified everywhere**, then remove
|
||||
the stale line in a deliberate cleanup pass.
|
||||
|
||||
## How to Diagnose This (Checklist)
|
||||
|
||||
1. `ssh -o BatchMode=yes <host> true` → `Permission denied (publickey)` (auth), not
|
||||
`Host key verification failed` (host key). Confirms which problem you have.
|
||||
2. `ssh -v <host> 2>&1 | grep Offering` → which private key is being offered, and its
|
||||
fingerprint.
|
||||
3. Sweep the whole fleet with the `BatchMode` loop → get the **full** list of affected
|
||||
hosts before fixing.
|
||||
4. Append the new public key (idempotent, backed up, `chmod 600`) via a still-trusted
|
||||
transit path.
|
||||
5. Re-verify each host with a direct `BatchMode` login.
|
||||
|
||||
Related: *[SSH Config & Key Management](../../01-linux/networking/ssh-config-key-management.md)*
|
||||
and *[SSH Hardening Across a Fleet with Ansible](../../02-selfhosting/security/ssh-hardening-ansible-fleet.md)*.
|
||||
|
|
@ -141,6 +141,7 @@ updated: 2026-05-15T09:00
|
|||
* [Ansible Fails with Permission Denied While `ssh <alias>` Works (Host Alias Bypass)](05-troubleshooting/ansible-ssh-host-alias-bypass.md)
|
||||
* [SSH Alias Falls Through to MagicDNS — Host-Key Verification Failure (No `Host` Block)](05-troubleshooting/networking/ssh-missing-host-block-magicdns-host-key-failure.md)
|
||||
* [MagicDNS Names vs Pinned IPs for Tailscale SSH (After a Fleet Migration)](05-troubleshooting/networking/tailscale-ssh-magicdns-vs-pinned-ip-after-migration.md)
|
||||
* [`Permission denied (publickey)` After Rotating a Key — Backfill Every `authorized_keys`](05-troubleshooting/networking/ssh-rotated-key-not-backfilled-authorized-keys.md)
|
||||
* [Ansible UNREACHABLE: Host Key Verification Failed After a Host Rebuild or Migration](05-troubleshooting/networking/ansible-host-key-verification-failed-rebuilt-host.md)
|
||||
* [Logwatch Reports the Wrong Hostname (`<host>-hetzner`) After a Migration](05-troubleshooting/logwatch-wrong-hostname-after-migration.md)
|
||||
* [Ghost EmailAnalytics Lag Warning — What It Means and When to Worry](05-troubleshooting/ghost-emailanalytics-lag-warning.md)
|
||||
|
|
|
|||
Loading…
Add table
Reference in a new issue