--- title: "rsync over Tailscale: Hung in TCP Teardown After Transfer Completes" domain: troubleshooting category: networking tags: [rsync, ssh, tailscale, hang, tcp-fin, hash-mismatch] status: published created: 2026-04-25 updated: 2026-04-25 --- # rsync over Tailscale: Hung in TCP Teardown After Transfer Completes A long rsync transfer over Tailscale finishes — the destination file is at full size, rsync's own summary line is in the log — but the rsync, ssh client, and parent bash processes never exit. The `&&` chain that should run after rsync (e.g. `&& echo DONE`) never fires. Watcher scripts polling for completion can stall indefinitely. ## The Short Answer The data is fine. Verify with `md5sum` (or `md5 -q` on macOS) against the source, then kill the hung pipeline. ```bash # 1. confirm size matches rsync's reported total_size ls -lh ~/your-file.gguf tail ~/rsync.log # look for "total size is N" line # 2. checksum end-to-end md5 -q ~/your-file.gguf # macOS ssh majorlinux@100.x.x.x 'md5sum /source/path/your-file.gguf' # Linux source # 3. if hashes match, kill the hung pipeline by name pkill -f 'rsync.*your-file' || true pkill -f 'ssh .*rsync --server' || true ``` ## How to Notice `ps aux | grep rsync` shows the rsync client, the spawned ssh, and the wrapping bash all in `S` state with **0 CPU activity** and timestamps from minutes-to-hours ago. The destination file already exists at the final (non-`.partial` / non-dotfile) path at full size. The trailing summary in the rsync log reads: ``` sent N bytes received M bytes ... bytes/sec total size is X speedup is Y ``` …but the bash `&&` followup that depends on rsync's exit code never runs. ## Why This Happens rsync's exit waits for the underlying ssh transport to close cleanly. Over Tailscale (especially after a long-running connection that bridged a sleep, reconnect, or NAT shuffle), the TCP FIN/ACK handshake from the remote sshd can be lost or delayed indefinitely. The local end has all the data, has finalized the file, has printed its summary — but it's still blocked in `read()` on a socket that will never close on its own. This is amplified when: - The transfer hits a hash-mismatch retry mid-flight (rsync re-pulls the temp file). Each retry re-establishes connection state that's more vulnerable to teardown weirdness. - The link briefly drops and reconnects via DERP relay during the transfer. - The source machine is on WSL2 — Windows network stack rewrites can defer FINs. The upshot: the data was transferred correctly long before the pipeline reports done. Don't wait — verify and move on. ## Don't Just Kill — Verify First Killing a hung rsync **before the file is complete** can leave a partial file that looks complete by size alone. Always: 1. Compare the on-disk size to the `total size is N` line in the rsync log 2. md5 (or sha256) against the source to confirm bit-for-bit equality 3. Only then kill the hung processes Skipping the checksum step risks silently corrupting downstream consumers of the file (Ollama blobs, archive pipelines, etc.). ## Watcher Threshold Gotcha If you have a polling watcher script that fires a notification when the file reaches some threshold size, **set the threshold below the actual file size**, not above it. Example: a 4.68 GB GGUF transferred fine but the watcher's threshold was set to 4.7 GB (`4_700_000_000` bytes), so the threshold never triggered even though the transfer completed. ```bash # bad — threshold above true size TARGET=4700000000 # 4.7 GB # good — threshold below true size TARGET=4600000000 # 4.6 GB, fires at ~98% complete ``` Or better: trust the rsync exit code / the `RSYNC_DONE` marker line your wrapper writes after `&&`, not file size. ## Prevention - Wrap rsync in a watchdog. If rsync hasn't exited within `expected_runtime + 2 minutes`, snapshot status, md5-verify, and kill. - For very large files, use `rsync --partial-dir` so a fresh re-run resumes from the temp file instead of redoing the transfer. - Consider `rsync --inplace` for files that consumers will copy out of the destination anyway (Ollama blob copy step). - Add `ServerAliveInterval=30` / `ServerAliveCountMax=3` to your ssh config for the source host — kills the ssh transport if the remote stops responding to keepalives. ## Related - [[tailscale-ssh-reauth-prompt]] — different Tailscale-over-ssh gotcha - [[../../02-selfhosting/storage-backup/rsync-backup-patterns|rsync backup patterns]] — general rsync usage in MajorInfrastructure