# Tailscale Boot Race Conditions (SSH Unreachable After Reboot) Two related race conditions can make a host unreachable via Tailscale after reboot. Both stem from systemd services starting before Tailscale or the network is ready. --- ## Race 1: ssh.socket Binds Before Tailscale Is Up (Ubuntu) ### Symptom SSH to a host via Tailscale IP times out. `tailscale ping` works, `tailscale status` shows `active; direct`, but SSH on port 22 refuses connections. No access via Hetzner console if root password is unset. ### Cause Ubuntu 24.04 uses systemd **socket activation** for SSH (`ssh.socket` instead of persistent `ssh.service`). When the socket override binds to a Tailscale IP, it can start *before* `tailscaled.service` is ready. The bind may succeed initially (Tailscale state file caches the IP), but a later Tailscale reconnect or interface reset invalidates the bound address silently — SSH dies with no recovery path. ### Diagnosis ```bash # From another host: tailscale ping # succeeds — host is up ssh root@ # times out — sshd not listening # After gaining console access or reboot: systemctl status ssh.socket # check Listen: address journalctl -b -1 -u ssh # likely empty — sshd never spawned journalctl -b -1 -u ssh.socket # socket started before tailscaled ``` ### Fix Add Tailscale dependency to the socket override: ```ini # /etc/systemd/system/ssh.socket.d/override.conf [Unit] After=tailscaled.service BindsTo=tailscaled.service [Socket] ListenStream= ListenStream=:22 ``` Then reload and restart: ```bash systemctl daemon-reload systemctl restart ssh.socket systemctl status ssh.socket # verify Listen: shows correct IP ``` - `After=` ensures the socket waits for Tailscale to start - `BindsTo=` restarts the socket if Tailscale restarts, preventing stale binds ### Affected Hosts Ubuntu hosts using `configure_tailscale_ssh_only.yml`: majorlinux, dcaprod-hetzner. Fedora hosts (majordiscord) use firewall rules for SSH restriction — not affected by this race. --- ## Race 2: tailscaled Starts Before Network Is Online (All Hosts) ### Symptom Host reboots but never appears on Tailscale. `tailscale ping` times out entirely. SSH is dead because Tailscale never connects. The host is up (accessible via provider console) but isolated from the Tailscale network. ### Cause `tailscaled.service` ships with `After=network-pre.target`, which fires *before* the network interface has an IP. On VPS hosts (especially Hetzner), the interface can take several seconds to come online. Tailscale starts, sees no network (`SetNetworkUp(false)`, `link state: defaultRoute= ifs={} v4=false v6=false`), fails DNS bootstrap and DERP relay connections, and gets stuck — never retrying. ### Diagnosis ```bash # From Hetzner console or another access method: journalctl -b -u tailscaled | grep -E "SetNetworkUp|link state|error|DERP" # Look for: # magicsock: SetNetworkUp(false) # link state: interfaces.State{defaultRoute= ifs={} v4=false v6=false} # health: Tailscale could not connect to any relay server ``` ### Fix Deploy a systemd drop-in to wait for full network connectivity: ```ini # /etc/systemd/system/tailscaled.service.d/override.conf [Unit] After=network-online.target Wants=network-online.target ``` Then reload and restart: ```bash systemctl daemon-reload systemctl restart tailscaled ``` ### Affected Hosts All hosts where Tailscale is the primary access path. Particularly impactful on VPS hosts with slow interface bringup. Both Fedora and Ubuntu hosts are affected. --- ## Prevention - Set root passwords on all VPS hosts for emergency console access - Ansible playbooks deploy both fixes automatically: - `configure_tailscale_network_wait.yml` — tailscaled network-online dependency (all hosts) - `configure_tailscale_ssh_only.yml` — ssh.socket Tailscale dependency (Ubuntu only) ## References - [[dcaprod#2026-05-19 — SSH unreachable due to ssh.socket race condition with Tailscale]] - [[majordiscord#2026-05-19 — Tailscale boot race: unreachable after Ansible reboot]] - [[majorlinux#2026-05-19 — ssh.socket override patched: added Tailscale dependency]] - Ansible: `configure_tailscale_ssh_only.yml`, `configure_tailscale_network_wait.yml`