From 7dc591d2570b00c6b41faf993cc77d125a152e2a Mon Sep 17 00:00:00 2001 From: majorlinux Date: Tue, 19 May 2026 19:35:16 -0400 Subject: [PATCH] wiki: add ssh.socket Tailscale race condition troubleshooting article Documents the systemd socket activation race where ssh.socket binds to the Tailscale IP before tailscaled is ready, causing SSH to become unreachable after a Tailscale reconnect. Includes diagnosis steps and the After=/BindsTo= fix. --- .../ssh-socket-tailscale-race-condition.md | 63 +++++++++++++++++++ SUMMARY.md | 6 +- 2 files changed, 68 insertions(+), 1 deletion(-) create mode 100644 05-troubleshooting/networking/ssh-socket-tailscale-race-condition.md diff --git a/05-troubleshooting/networking/ssh-socket-tailscale-race-condition.md b/05-troubleshooting/networking/ssh-socket-tailscale-race-condition.md new file mode 100644 index 0000000..22e5b25 --- /dev/null +++ b/05-troubleshooting/networking/ssh-socket-tailscale-race-condition.md @@ -0,0 +1,63 @@ +# ssh.socket Unreachable After Reboot (Tailscale Race Condition) + +## Symptom + +SSH to a host via Tailscale IP times out. `tailscale ping` works, `tailscale status` shows `active; direct`, but SSH on port 22 refuses connections. No access via Hetzner console if root password is unset. + +## Cause + +Ubuntu 24.04 uses systemd **socket activation** for SSH (`ssh.socket` instead of persistent `ssh.service`). When the socket override binds to a Tailscale IP, it can start *before* `tailscaled.service` is ready. The bind may succeed initially (Tailscale state file caches the IP), but a later Tailscale reconnect or interface reset invalidates the bound address silently — SSH dies with no recovery path. + +## Diagnosis + +```bash +# From another host: +tailscale ping # succeeds — host is up +ssh root@ # times out — sshd not listening + +# After gaining console access or reboot: +systemctl status ssh.socket # check Listen: address +journalctl -b -1 -u ssh # likely empty — sshd never spawned +journalctl -b -1 -u ssh.socket # socket started before tailscaled +``` + +## Fix + +Add Tailscale dependency to the socket override: + +```ini +# /etc/systemd/system/ssh.socket.d/override.conf +[Unit] +After=tailscaled.service +BindsTo=tailscaled.service + +[Socket] +ListenStream= +ListenStream=:22 +``` + +Then reload and restart: + +```bash +systemctl daemon-reload +systemctl restart ssh.socket +systemctl status ssh.socket # verify Listen: shows correct IP +``` + +- `After=` ensures the socket waits for Tailscale to start +- `BindsTo=` restarts the socket if Tailscale restarts, preventing stale binds + +## Prevention + +- Set root passwords on all Hetzner hosts for emergency console access +- Ansible playbook `configure_tailscale_ssh_only.yml` includes both directives as of commit `7ef182b` + +## Affected Hosts + +Ubuntu hosts using `configure_tailscale_ssh_only.yml`: majorlinux, dcaprod-hetzner. Fedora hosts (majordiscord) use firewall rules for SSH restriction — not affected. + +## References + +- [[dcaprod#2026-05-19 — SSH unreachable due to ssh.socket race condition with Tailscale]] +- [[majorlinux#2026-05-19 — ssh.socket override patched: added Tailscale dependency]] +- Ansible: `configure_tailscale_ssh_only.yml` diff --git a/SUMMARY.md b/SUMMARY.md index 5e0079e..f30579d 100644 --- a/SUMMARY.md +++ b/SUMMARY.md @@ -1,6 +1,6 @@ --- created: 2026-04-02T16:03 -updated: 2026-05-11T07:35 +updated: 2026-05-15T09:00 --- * [Home](index.md) * [Linux & Sysadmin](01-linux/index.md) @@ -70,11 +70,13 @@ updated: 2026-05-11T07:35 * [Streaming & Podcasting](04-streaming/index.md) * [OBS Studio Setup & Encoding](04-streaming/obs/obs-studio-setup-encoding.md) * [Plex 4K Codec Compatibility (Apple TV)](04-streaming/plex/plex-4k-codec-compatibility.md) + * [HEVC Batch Re-Encode for Plex Using VAAPI (AMD GPU)](04-streaming/plex/hevc-vaapi-batch-encode.md) * [Troubleshooting](05-troubleshooting/index.md) * [Apache Outage: Fail2ban Self-Ban + Missing iptables Rules](05-troubleshooting/networking/fail2ban-self-ban-apache-outage.md) * [Mail Client Stops Receiving: Fail2ban IMAP Self-Ban](05-troubleshooting/networking/fail2ban-imap-self-ban-mail-client.md) * [firewalld: Mail Ports Wiped After Reload](05-troubleshooting/networking/firewalld-mail-ports-reset.md) * [Tailscale SSH: Unexpected Re-Authentication Prompt](05-troubleshooting/networking/tailscale-ssh-reauth-prompt.md) + * [ssh.socket Unreachable After Reboot (Tailscale Race Condition)](05-troubleshooting/networking/ssh-socket-tailscale-race-condition.md) * [Fail2ban & UFW Rule Bloat Cleanup](05-troubleshooting/networking/fail2ban-ufw-rule-bloat-cleanup.md) * [Custom Fail2ban Jail: Apache Directory Scanning](05-troubleshooting/security/apache-dirscan-fail2ban-jail.md) * [Tuning Netdata `web_log_1m_successful` for Redirect-Heavy WordPress Sites](05-troubleshooting/security/netdata-web-log-successful-redirect-heavy-tuning.md) @@ -105,8 +107,10 @@ updated: 2026-05-11T07:35 * [rsync over Tailscale: Hung in TCP Teardown After Transfer Completes](05-troubleshooting/networking/rsync-tailscale-teardown-stall.md) * [iOS Tailscale Clients Report HostName="localhost" — Breaks /etc/hosts Generators](05-troubleshooting/networking/tailscale-status-json-hostname-localhost-ios.md) * [macOS: Repeating Alert Tone from Mirrored iPhone Notification](05-troubleshooting/macos-mirrored-notification-alert-loop.md) + * [OBS Studio: Stale Script Paths After Windows Profile Rename](05-troubleshooting/obs-stale-script-paths-after-windows-profile-rename.md) * [ClamAV CPU Spike: Safe Scheduling with nice/ionice](05-troubleshooting/security/clamscan-cpu-spike-nice-ionice.md) * [Fedora CA Bundle Missing Symlink — TLS Breaks Fleet-Wide](05-troubleshooting/security/fedora-ca-bundle-missing-symlink.md) + * [Netdata apps-group FD-utilisation false 100% (silenced fleet-wide)](05-troubleshooting/security/netdata-apps-fds-group-false-positive.md) * [Ansible: Vault Password File Not Found](05-troubleshooting/ansible-vault-password-file-missing.md) * [Ansible: ansible.cfg Ignored on WSL2 Windows Mounts](05-troubleshooting/ansible-wsl2-world-writable-mount-ignores-cfg.md) * [Ansible: SSH Timeout During dnf upgrade on Fedora Hosts](05-troubleshooting/ansible-ssh-timeout-dnf-upgrade.md)