troubleshooting: ssh-race article — fleet audited & reconciled 2026-06-07
dcaprod-hetzner + tttpod-hetzner were missing tailscale-wait-ready.service (inert ssh.service gate -> latent bind race); corrected playbook applied to both. teelia uses Tailscale SSH (no sshd, immune). All Ubuntu hosts now on the dependency-free-socket + ssh.service-gate pattern.
This commit is contained in:
parent
0cde19e064
commit
c3045e33dd
1 changed files with 3 additions and 1 deletions
|
|
@ -84,7 +84,9 @@ ss -tlnp | grep :22 # verify bound to Tailscale IP
|
|||
Ubuntu hosts using `configure_tailscale_ssh_only.yml`: majorlinux, dcaprod-hetzner, tttpod-hetzner, majortoot-hetzner.
|
||||
|
||||
> [!danger] The Ubuntu playbook shipped the cycle pattern until 2026-06-07
|
||||
> Despite the 2026-06-04 resolution above, `configure_tailscale_ssh_only.yml` in the repo kept deploying the `[Unit] Requires=tailscale-wait-ready.service` gate on **ssh.socket** (the cycle-causer) and never added the ssh.service gate — so re-running it *re-armed* the ordering cycle. Caught 2026-06-07: it clobbered majorlinux's hand-fix, and **majortoot-hetzner was found already armed** with the latent cycle (would have lost SSH on its next reboot). Both restored/defused; playbook corrected in MajorAnsible `e0d35aa` (gate on ssh.service, dependency-free socket). ⚠️ dcaprod-hetzner / tttpod-hetzner lack `tailscale-wait-ready.service` and teelia has no socket override — the Ubuntu SSH-lockdown state is **inconsistent across the fleet and needs a deliberate per-host audit**.
|
||||
> Despite the 2026-06-04 resolution above, `configure_tailscale_ssh_only.yml` in the repo kept deploying the `[Unit] Requires=tailscale-wait-ready.service` gate on **ssh.socket** (the cycle-causer) and never added the ssh.service gate — so re-running it *re-armed* the ordering cycle. Caught 2026-06-07: it clobbered majorlinux's hand-fix, and **majortoot-hetzner was found already armed** with the latent cycle (would have lost SSH on its next reboot). Both restored/defused; playbook corrected in MajorAnsible `e0d35aa` (gate on ssh.service, dependency-free socket).
|
||||
>
|
||||
> **Fleet audited & reconciled 2026-06-07:** dcaprod-hetzner + tttpod-hetzner had the dependency-free socket already but were **missing `tailscale-wait-ready.service`** (their ssh.service gate referenced a non-existent unit → inert → latent *bind* race, not a cycle); the corrected playbook was applied to both, deploying the service and activating the gate. teelia uses **Tailscale SSH** (no sshd, ss.socket/ssh.service disabled) — immune to both races. All Ubuntu hosts now run the same pattern: dependency-free `ss.socket` bind + `ssh.service` readiness gate + `tailscale-wait-ready.service`.
|
||||
|
||||
> [!warning] Fedora hosts are NOT automatically immune (corrected 2026-06-07)
|
||||
> The firewalld method (`configure_tailscale_ssh_only_fedora.yml`) binds sshd on `0.0.0.0:22` and enforces Tailscale-only via the firewall, so it has no dependency on the Tailscale address — **unless** a host also carries a leftover manual `ListenAddress <tailscale-ip>` drop-in (`/etc/ssh/sshd_config.d/tailscale-only.conf`) from the pre-firewall lockdown. Then sshd.service hits the same boot bind-race (`Bind to port 22 on <ts-ip> failed: Cannot assign requested address`) and flaps every reboot. Hit on **majordiscord 2026-06-07**; fixed by removing the redundant drop-in (firewall stays the enforcing layer). The Fedora playbook now removes it automatically (MajorAnsible `b4a9090`).
|
||||
|
|
|
|||
Loading…
Add table
Reference in a new issue