diff --git a/05-troubleshooting/networking/ssh-socket-tailscale-race-condition.md b/05-troubleshooting/networking/ssh-socket-tailscale-race-condition.md new file mode 100644 index 0000000..22e5b25 --- /dev/null +++ b/05-troubleshooting/networking/ssh-socket-tailscale-race-condition.md @@ -0,0 +1,63 @@ +# ssh.socket Unreachable After Reboot (Tailscale Race Condition) + +## Symptom + +SSH to a host via Tailscale IP times out. `tailscale ping` works, `tailscale status` shows `active; direct`, but SSH on port 22 refuses connections. No access via Hetzner console if root password is unset. + +## Cause + +Ubuntu 24.04 uses systemd **socket activation** for SSH (`ssh.socket` instead of persistent `ssh.service`). When the socket override binds to a Tailscale IP, it can start *before* `tailscaled.service` is ready. The bind may succeed initially (Tailscale state file caches the IP), but a later Tailscale reconnect or interface reset invalidates the bound address silently — SSH dies with no recovery path. + +## Diagnosis + +```bash +# From another host: +tailscale ping # succeeds — host is up +ssh root@ # times out — sshd not listening + +# After gaining console access or reboot: +systemctl status ssh.socket # check Listen: address +journalctl -b -1 -u ssh # likely empty — sshd never spawned +journalctl -b -1 -u ssh.socket # socket started before tailscaled +``` + +## Fix + +Add Tailscale dependency to the socket override: + +```ini +# /etc/systemd/system/ssh.socket.d/override.conf +[Unit] +After=tailscaled.service +BindsTo=tailscaled.service + +[Socket] +ListenStream= +ListenStream=:22 +``` + +Then reload and restart: + +```bash +systemctl daemon-reload +systemctl restart ssh.socket +systemctl status ssh.socket # verify Listen: shows correct IP +``` + +- `After=` ensures the socket waits for Tailscale to start +- `BindsTo=` restarts the socket if Tailscale restarts, preventing stale binds + +## Prevention + +- Set root passwords on all Hetzner hosts for emergency console access +- Ansible playbook `configure_tailscale_ssh_only.yml` includes both directives as of commit `7ef182b` + +## Affected Hosts + +Ubuntu hosts using `configure_tailscale_ssh_only.yml`: majorlinux, dcaprod-hetzner. Fedora hosts (majordiscord) use firewall rules for SSH restriction — not affected. + +## References + +- [[dcaprod#2026-05-19 — SSH unreachable due to ssh.socket race condition with Tailscale]] +- [[majorlinux#2026-05-19 — ssh.socket override patched: added Tailscale dependency]] +- Ansible: `configure_tailscale_ssh_only.yml` diff --git a/SUMMARY.md b/SUMMARY.md index ed14416..f30579d 100644 --- a/SUMMARY.md +++ b/SUMMARY.md @@ -76,6 +76,7 @@ updated: 2026-05-15T09:00 * [Mail Client Stops Receiving: Fail2ban IMAP Self-Ban](05-troubleshooting/networking/fail2ban-imap-self-ban-mail-client.md) * [firewalld: Mail Ports Wiped After Reload](05-troubleshooting/networking/firewalld-mail-ports-reset.md) * [Tailscale SSH: Unexpected Re-Authentication Prompt](05-troubleshooting/networking/tailscale-ssh-reauth-prompt.md) + * [ssh.socket Unreachable After Reboot (Tailscale Race Condition)](05-troubleshooting/networking/ssh-socket-tailscale-race-condition.md) * [Fail2ban & UFW Rule Bloat Cleanup](05-troubleshooting/networking/fail2ban-ufw-rule-bloat-cleanup.md) * [Custom Fail2ban Jail: Apache Directory Scanning](05-troubleshooting/security/apache-dirscan-fail2ban-jail.md) * [Tuning Netdata `web_log_1m_successful` for Redirect-Heavy WordPress Sites](05-troubleshooting/security/netdata-web-log-successful-redirect-heavy-tuning.md)