troubleshooting: correct ssh tailscale-race article (Fedora ListenAddress variant + playbook cycle landmine)

- Fedora hosts are NOT automatically immune: a leftover manual
  `ListenAddress <tailscale-ip>` drop-in reintroduces the sshd boot bind-race
  even under firewalld (hit on majordiscord 2026-06-07; fix = remove it).
- The Ubuntu playbook kept shipping the cycle-causing [Unit] gate on
  ssh.socket despite the 2026-06-04 resolution; re-running it re-armed the
  ordering cycle (clobbered majorlinux; majortoot-hetzner found armed).
  Corrected in MajorAnsible e0d35aa. Fleet ssh-lockdown state is inconsistent
  (dcaprod/tttpod lack wait-ready; teelia no override) — needs a per-host audit.
This commit is contained in:
Marcus Summers 2026-06-07 05:56:25 -04:00
parent fda2d35ea5
commit 8d4dee5da3

View file

@ -81,7 +81,13 @@ ss -tlnp | grep :22 # verify bound to Tailscale IP
### Affected Hosts
Ubuntu hosts using `configure_tailscale_ssh_only.yml`: majorlinux, dcaprod-hetzner, tttpod-hetzner, majortoot-hetzner. Fedora hosts (majordiscord) use firewall rules for SSH restriction — not affected by this race.
Ubuntu hosts using `configure_tailscale_ssh_only.yml`: majorlinux, dcaprod-hetzner, tttpod-hetzner, majortoot-hetzner.
> [!danger] The Ubuntu playbook shipped the cycle pattern until 2026-06-07
> Despite the 2026-06-04 resolution above, `configure_tailscale_ssh_only.yml` in the repo kept deploying the `[Unit] Requires=tailscale-wait-ready.service` gate on **ssh.socket** (the cycle-causer) and never added the ssh.service gate — so re-running it *re-armed* the ordering cycle. Caught 2026-06-07: it clobbered majorlinux's hand-fix, and **majortoot-hetzner was found already armed** with the latent cycle (would have lost SSH on its next reboot). Both restored/defused; playbook corrected in MajorAnsible `e0d35aa` (gate on ssh.service, dependency-free socket). ⚠️ dcaprod-hetzner / tttpod-hetzner lack `tailscale-wait-ready.service` and teelia has no socket override — the Ubuntu SSH-lockdown state is **inconsistent across the fleet and needs a deliberate per-host audit**.
> [!warning] Fedora hosts are NOT automatically immune (corrected 2026-06-07)
> The firewalld method (`configure_tailscale_ssh_only_fedora.yml`) binds sshd on `0.0.0.0:22` and enforces Tailscale-only via the firewall, so it has no dependency on the Tailscale address — **unless** a host also carries a leftover manual `ListenAddress <tailscale-ip>` drop-in (`/etc/ssh/sshd_config.d/tailscale-only.conf`) from the pre-firewall lockdown. Then sshd.service hits the same boot bind-race (`Bind to port 22 on <ts-ip> failed: Cannot assign requested address`) and flaps every reboot. Hit on **majordiscord 2026-06-07**; fixed by removing the redundant drop-in (firewall stays the enforcing layer). The Fedora playbook now removes it automatically (MajorAnsible `b4a9090`).
---