New articles: - pihole-doh-dot-bypass-defense - pihole-v6-adlist-management - mastodon-db-maintenance - mastodon-federation - fantastical-google-phantom-calendar-syncselect - rsync-tailscale-teardown-stall - ollama-chat-template-pipe-stdin-bypass Updated: wsl2-backup, wsl2-rebuild, ssh-config-key-management, selfhosting index, mastodon-instance-tuning, ansible-check-mode, windows-openssh, windows-sshd, yt-dlp, README, SUMMARY, index Removed: fedora-usrmerge-ebtables-blocker (superseded by prior push)
4.5 KiB
| title | domain | category | tags | status | created | updated | ||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| rsync over Tailscale: Hung in TCP Teardown After Transfer Completes | troubleshooting | networking |
|
published | 2026-04-25 | 2026-04-25 |
rsync over Tailscale: Hung in TCP Teardown After Transfer Completes
A long rsync transfer over Tailscale finishes — the destination file is at full size, rsync's own summary line is in the log — but the rsync, ssh client, and parent bash processes never exit. The && chain that should run after rsync (e.g. && echo DONE) never fires. Watcher scripts polling for completion can stall indefinitely.
The Short Answer
The data is fine. Verify with md5sum (or md5 -q on macOS) against the source, then kill the hung pipeline.
# 1. confirm size matches rsync's reported total_size
ls -lh ~/your-file.gguf
tail ~/rsync.log # look for "total size is N" line
# 2. checksum end-to-end
md5 -q ~/your-file.gguf # macOS
ssh majorlinux@100.x.x.x 'md5sum /source/path/your-file.gguf' # Linux source
# 3. if hashes match, kill the hung pipeline by name
pkill -f 'rsync.*your-file' || true
pkill -f 'ssh .*rsync --server' || true
How to Notice
ps aux | grep rsync shows the rsync client, the spawned ssh, and the wrapping bash all in S state with 0 CPU activity and timestamps from minutes-to-hours ago. The destination file already exists at the final (non-.partial / non-dotfile) path at full size. The trailing summary in the rsync log reads:
sent N bytes received M bytes ... bytes/sec
total size is X speedup is Y
…but the bash && followup that depends on rsync's exit code never runs.
Why This Happens
rsync's exit waits for the underlying ssh transport to close cleanly. Over Tailscale (especially after a long-running connection that bridged a sleep, reconnect, or NAT shuffle), the TCP FIN/ACK handshake from the remote sshd can be lost or delayed indefinitely. The local end has all the data, has finalized the file, has printed its summary — but it's still blocked in read() on a socket that will never close on its own.
This is amplified when:
- The transfer hits a hash-mismatch retry mid-flight (rsync re-pulls the temp file). Each retry re-establishes connection state that's more vulnerable to teardown weirdness.
- The link briefly drops and reconnects via DERP relay during the transfer.
- The source machine is on WSL2 — Windows network stack rewrites can defer FINs.
The upshot: the data was transferred correctly long before the pipeline reports done. Don't wait — verify and move on.
Don't Just Kill — Verify First
Killing a hung rsync before the file is complete can leave a partial file that looks complete by size alone. Always:
- Compare the on-disk size to the
total size is Nline in the rsync log - md5 (or sha256) against the source to confirm bit-for-bit equality
- Only then kill the hung processes
Skipping the checksum step risks silently corrupting downstream consumers of the file (Ollama blobs, archive pipelines, etc.).
Watcher Threshold Gotcha
If you have a polling watcher script that fires a notification when the file reaches some threshold size, set the threshold below the actual file size, not above it. Example: a 4.68 GB GGUF transferred fine but the watcher's threshold was set to 4.7 GB (4_700_000_000 bytes), so the threshold never triggered even though the transfer completed.
# bad — threshold above true size
TARGET=4700000000 # 4.7 GB
# good — threshold below true size
TARGET=4600000000 # 4.6 GB, fires at ~98% complete
Or better: trust the rsync exit code / the RSYNC_DONE marker line your wrapper writes after &&, not file size.
Prevention
- Wrap rsync in a watchdog. If rsync hasn't exited within
expected_runtime + 2 minutes, snapshot status, md5-verify, and kill. - For very large files, use
rsync --partial-dirso a fresh re-run resumes from the temp file instead of redoing the transfer. - Consider
rsync --inplacefor files that consumers will copy out of the destination anyway (Ollama blob copy step). - Add
ServerAliveInterval=30/ServerAliveCountMax=3to your ssh config for the source host — kills the ssh transport if the remote stops responding to keepalives.
Related
- tailscale-ssh-reauth-prompt — different Tailscale-over-ssh gotcha
- ../../02-selfhosting/storage-backup/rsync-backup-patterns — general rsync usage in MajorInfrastructure