majorwiki/05-troubleshooting/networking/rsync-tailscale-teardown-stall.md
majorlinux 91455fac39 Add 7 articles; update nav and existing articles (2026-04-25)
New articles:
- pihole-doh-dot-bypass-defense
- pihole-v6-adlist-management
- mastodon-db-maintenance
- mastodon-federation
- fantastical-google-phantom-calendar-syncselect
- rsync-tailscale-teardown-stall
- ollama-chat-template-pipe-stdin-bypass

Updated: wsl2-backup, wsl2-rebuild, ssh-config-key-management,
selfhosting index, mastodon-instance-tuning, ansible-check-mode,
windows-openssh, windows-sshd, yt-dlp, README, SUMMARY, index
Removed: fedora-usrmerge-ebtables-blocker (superseded by prior push)
2026-04-25 17:52:48 +00:00

4.5 KiB

title domain category tags status created updated
rsync over Tailscale: Hung in TCP Teardown After Transfer Completes troubleshooting networking
rsync
ssh
tailscale
hang
tcp-fin
hash-mismatch
published 2026-04-25 2026-04-25

rsync over Tailscale: Hung in TCP Teardown After Transfer Completes

A long rsync transfer over Tailscale finishes — the destination file is at full size, rsync's own summary line is in the log — but the rsync, ssh client, and parent bash processes never exit. The && chain that should run after rsync (e.g. && echo DONE) never fires. Watcher scripts polling for completion can stall indefinitely.

The Short Answer

The data is fine. Verify with md5sum (or md5 -q on macOS) against the source, then kill the hung pipeline.

# 1. confirm size matches rsync's reported total_size
ls -lh ~/your-file.gguf
tail ~/rsync.log    # look for "total size is N" line

# 2. checksum end-to-end
md5 -q ~/your-file.gguf                                                  # macOS
ssh majorlinux@100.x.x.x 'md5sum /source/path/your-file.gguf'            # Linux source

# 3. if hashes match, kill the hung pipeline by name
pkill -f 'rsync.*your-file' || true
pkill -f 'ssh .*rsync --server' || true

How to Notice

ps aux | grep rsync shows the rsync client, the spawned ssh, and the wrapping bash all in S state with 0 CPU activity and timestamps from minutes-to-hours ago. The destination file already exists at the final (non-.partial / non-dotfile) path at full size. The trailing summary in the rsync log reads:

sent N bytes  received M bytes  ... bytes/sec
total size is X  speedup is Y

…but the bash && followup that depends on rsync's exit code never runs.

Why This Happens

rsync's exit waits for the underlying ssh transport to close cleanly. Over Tailscale (especially after a long-running connection that bridged a sleep, reconnect, or NAT shuffle), the TCP FIN/ACK handshake from the remote sshd can be lost or delayed indefinitely. The local end has all the data, has finalized the file, has printed its summary — but it's still blocked in read() on a socket that will never close on its own.

This is amplified when:

  • The transfer hits a hash-mismatch retry mid-flight (rsync re-pulls the temp file). Each retry re-establishes connection state that's more vulnerable to teardown weirdness.
  • The link briefly drops and reconnects via DERP relay during the transfer.
  • The source machine is on WSL2 — Windows network stack rewrites can defer FINs.

The upshot: the data was transferred correctly long before the pipeline reports done. Don't wait — verify and move on.

Don't Just Kill — Verify First

Killing a hung rsync before the file is complete can leave a partial file that looks complete by size alone. Always:

  1. Compare the on-disk size to the total size is N line in the rsync log
  2. md5 (or sha256) against the source to confirm bit-for-bit equality
  3. Only then kill the hung processes

Skipping the checksum step risks silently corrupting downstream consumers of the file (Ollama blobs, archive pipelines, etc.).

Watcher Threshold Gotcha

If you have a polling watcher script that fires a notification when the file reaches some threshold size, set the threshold below the actual file size, not above it. Example: a 4.68 GB GGUF transferred fine but the watcher's threshold was set to 4.7 GB (4_700_000_000 bytes), so the threshold never triggered even though the transfer completed.

# bad — threshold above true size
TARGET=4700000000  # 4.7 GB

# good — threshold below true size
TARGET=4600000000  # 4.6 GB, fires at ~98% complete

Or better: trust the rsync exit code / the RSYNC_DONE marker line your wrapper writes after &&, not file size.

Prevention

  • Wrap rsync in a watchdog. If rsync hasn't exited within expected_runtime + 2 minutes, snapshot status, md5-verify, and kill.
  • For very large files, use rsync --partial-dir so a fresh re-run resumes from the temp file instead of redoing the transfer.
  • Consider rsync --inplace for files that consumers will copy out of the destination anyway (Ollama blob copy step).
  • Add ServerAliveInterval=30 / ServerAliveCountMax=3 to your ssh config for the source host — kills the ssh transport if the remote stops responding to keepalives.