majorwiki/05-troubleshooting/networking/ssh-socket-tailscale-race-condition.md
majorlinux 7dc591d257 wiki: add ssh.socket Tailscale race condition troubleshooting article
Documents the systemd socket activation race where ssh.socket binds
to the Tailscale IP before tailscaled is ready, causing SSH to become
unreachable after a Tailscale reconnect. Includes diagnosis steps and
the After=/BindsTo= fix.
2026-05-19 19:35:16 -04:00

63 lines
2.2 KiB
Markdown

# ssh.socket Unreachable After Reboot (Tailscale Race Condition)
## Symptom
SSH to a host via Tailscale IP times out. `tailscale ping` works, `tailscale status` shows `active; direct`, but SSH on port 22 refuses connections. No access via Hetzner console if root password is unset.
## Cause
Ubuntu 24.04 uses systemd **socket activation** for SSH (`ssh.socket` instead of persistent `ssh.service`). When the socket override binds to a Tailscale IP, it can start *before* `tailscaled.service` is ready. The bind may succeed initially (Tailscale state file caches the IP), but a later Tailscale reconnect or interface reset invalidates the bound address silently — SSH dies with no recovery path.
## Diagnosis
```bash
# From another host:
tailscale ping <IP> # succeeds — host is up
ssh root@<IP> # times out — sshd not listening
# After gaining console access or reboot:
systemctl status ssh.socket # check Listen: address
journalctl -b -1 -u ssh # likely empty — sshd never spawned
journalctl -b -1 -u ssh.socket # socket started before tailscaled
```
## Fix
Add Tailscale dependency to the socket override:
```ini
# /etc/systemd/system/ssh.socket.d/override.conf
[Unit]
After=tailscaled.service
BindsTo=tailscaled.service
[Socket]
ListenStream=
ListenStream=<TAILSCALE_IP>:22
```
Then reload and restart:
```bash
systemctl daemon-reload
systemctl restart ssh.socket
systemctl status ssh.socket # verify Listen: shows correct IP
```
- `After=` ensures the socket waits for Tailscale to start
- `BindsTo=` restarts the socket if Tailscale restarts, preventing stale binds
## Prevention
- Set root passwords on all Hetzner hosts for emergency console access
- Ansible playbook `configure_tailscale_ssh_only.yml` includes both directives as of commit `7ef182b`
## Affected Hosts
Ubuntu hosts using `configure_tailscale_ssh_only.yml`: majorlinux, dcaprod-hetzner. Fedora hosts (majordiscord) use firewall rules for SSH restriction — not affected.
## References
- [[dcaprod#2026-05-19 — SSH unreachable due to ssh.socket race condition with Tailscale]]
- [[majorlinux#2026-05-19 — ssh.socket override patched: added Tailscale dependency]]
- Ansible: `configure_tailscale_ssh_only.yml`