From 09968615122e8072dc857a7821c6211360bebe0e Mon Sep 17 00:00:00 2001 From: Marcus Summers Date: Sat, 25 Apr 2026 12:55:41 -0400 Subject: [PATCH] wiki: add troubleshooting articles from MajorTwin v8 cycle MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Two articles surfaced during the v8 deploy + eval on 2026-04-25: - Ollama: `ollama run` with piped stdin bypasses the chat template and SYSTEM prompt — output looks like raw base-model completion. Caught during initial v8 smoke test. Fix: use /api/chat HTTP endpoint. - rsync over Tailscale can hang in TCP teardown after the data has fully transferred. Verify with md5sum, then kill the hung pipeline. Includes a watcher-threshold gotcha (set below true file size, not above) and prevention tips. Co-Authored-By: Claude Opus 4.7 (1M context) --- 05-troubleshooting/index.md | 2 + .../rsync-tailscale-teardown-stall.md | 89 +++++++++++++++++++ .../ollama-chat-template-pipe-stdin-bypass.md | 88 ++++++++++++++++++ SUMMARY.md | 2 + 4 files changed, 181 insertions(+) create mode 100644 05-troubleshooting/networking/rsync-tailscale-teardown-stall.md create mode 100644 05-troubleshooting/ollama-chat-template-pipe-stdin-bypass.md diff --git a/05-troubleshooting/index.md b/05-troubleshooting/index.md index d16d37e..6008137 100644 --- a/05-troubleshooting/index.md +++ b/05-troubleshooting/index.md @@ -14,6 +14,7 @@ Practical fixes for common Linux, networking, and application problems. - [Mail Client Stops Receiving: Fail2ban IMAP Self-Ban](networking/fail2ban-imap-self-ban-mail-client.md) - [firewalld: Mail Ports Wiped After Reload](networking/firewalld-mail-ports-reset.md) - [Tailscale SSH: Unexpected Re-Authentication Prompt](networking/tailscale-ssh-reauth-prompt.md) +- [rsync over Tailscale: Hung in TCP Teardown After Transfer Completes](networking/rsync-tailscale-teardown-stall.md) - [Windows OpenSSH: WSL Default Shell Breaks Remote Commands](networking/windows-openssh-wsl-default-shell-breaks-remote-commands.md) - [Pi-hole AI Blocklist Blocks Claude Desktop (ERR_CONNECTION_REFUSED)](networking/pihole-blocks-claude-desktop.md) - [ISP SNI Filtering & Caddy](isp-sni-filtering-caddy.md) @@ -44,5 +45,6 @@ Practical fixes for common Linux, networking, and application problems. ## 🤖 AI / Local LLM - [Ollama Drops Off Tailscale When Mac Sleeps](ollama-macos-sleep-tailscale-disconnect.md) +- [Ollama: `ollama run` with Piped Stdin Bypasses Chat Template + SYSTEM Prompt](ollama-chat-template-pipe-stdin-bypass.md) - [Windows OpenSSH Server (sshd) Stops After Reboot](networking/windows-sshd-stops-after-reboot.md) - [claude-mem Silently Fails with Claude Code 2.1+ (Empty `--setting-sources`)](claude-mem-setting-sources-empty-arg.md) diff --git a/05-troubleshooting/networking/rsync-tailscale-teardown-stall.md b/05-troubleshooting/networking/rsync-tailscale-teardown-stall.md new file mode 100644 index 0000000..d984760 --- /dev/null +++ b/05-troubleshooting/networking/rsync-tailscale-teardown-stall.md @@ -0,0 +1,89 @@ +--- +title: "rsync over Tailscale: Hung in TCP Teardown After Transfer Completes" +domain: troubleshooting +category: networking +tags: [rsync, ssh, tailscale, hang, tcp-fin, hash-mismatch] +status: published +created: 2026-04-25 +updated: 2026-04-25 +--- + +# rsync over Tailscale: Hung in TCP Teardown After Transfer Completes + +A long rsync transfer over Tailscale finishes — the destination file is at full size, rsync's own summary line is in the log — but the rsync, ssh client, and parent bash processes never exit. The `&&` chain that should run after rsync (e.g. `&& echo DONE`) never fires. Watcher scripts polling for completion can stall indefinitely. + +## The Short Answer + +The data is fine. Verify with `md5sum` (or `md5 -q` on macOS) against the source, then kill the hung pipeline. + +```bash +# 1. confirm size matches rsync's reported total_size +ls -lh ~/your-file.gguf +tail ~/rsync.log # look for "total size is N" line + +# 2. checksum end-to-end +md5 -q ~/your-file.gguf # macOS +ssh majorlinux@100.x.x.x 'md5sum /source/path/your-file.gguf' # Linux source + +# 3. if hashes match, kill the hung pipeline by name +pkill -f 'rsync.*your-file' || true +pkill -f 'ssh .*rsync --server' || true +``` + +## How to Notice + +`ps aux | grep rsync` shows the rsync client, the spawned ssh, and the wrapping bash all in `S` state with **0 CPU activity** and timestamps from minutes-to-hours ago. The destination file already exists at the final (non-`.partial` / non-dotfile) path at full size. The trailing summary in the rsync log reads: + +``` +sent N bytes received M bytes ... bytes/sec +total size is X speedup is Y +``` + +…but the bash `&&` followup that depends on rsync's exit code never runs. + +## Why This Happens + +rsync's exit waits for the underlying ssh transport to close cleanly. Over Tailscale (especially after a long-running connection that bridged a sleep, reconnect, or NAT shuffle), the TCP FIN/ACK handshake from the remote sshd can be lost or delayed indefinitely. The local end has all the data, has finalized the file, has printed its summary — but it's still blocked in `read()` on a socket that will never close on its own. + +This is amplified when: +- The transfer hits a hash-mismatch retry mid-flight (rsync re-pulls the temp file). Each retry re-establishes connection state that's more vulnerable to teardown weirdness. +- The link briefly drops and reconnects via DERP relay during the transfer. +- The source machine is on WSL2 — Windows network stack rewrites can defer FINs. + +The upshot: the data was transferred correctly long before the pipeline reports done. Don't wait — verify and move on. + +## Don't Just Kill — Verify First + +Killing a hung rsync **before the file is complete** can leave a partial file that looks complete by size alone. Always: + +1. Compare the on-disk size to the `total size is N` line in the rsync log +2. md5 (or sha256) against the source to confirm bit-for-bit equality +3. Only then kill the hung processes + +Skipping the checksum step risks silently corrupting downstream consumers of the file (Ollama blobs, archive pipelines, etc.). + +## Watcher Threshold Gotcha + +If you have a polling watcher script that fires a notification when the file reaches some threshold size, **set the threshold below the actual file size**, not above it. Example: a 4.68 GB GGUF transferred fine but the watcher's threshold was set to 4.7 GB (`4_700_000_000` bytes), so the threshold never triggered even though the transfer completed. + +```bash +# bad — threshold above true size +TARGET=4700000000 # 4.7 GB + +# good — threshold below true size +TARGET=4600000000 # 4.6 GB, fires at ~98% complete +``` + +Or better: trust the rsync exit code / the `RSYNC_DONE` marker line your wrapper writes after `&&`, not file size. + +## Prevention + +- Wrap rsync in a watchdog. If rsync hasn't exited within `expected_runtime + 2 minutes`, snapshot status, md5-verify, and kill. +- For very large files, use `rsync --partial-dir` so a fresh re-run resumes from the temp file instead of redoing the transfer. +- Consider `rsync --inplace` for files that consumers will copy out of the destination anyway (Ollama blob copy step). +- Add `ServerAliveInterval=30` / `ServerAliveCountMax=3` to your ssh config for the source host — kills the ssh transport if the remote stops responding to keepalives. + +## Related + +- [[tailscale-ssh-reauth-prompt]] — different Tailscale-over-ssh gotcha +- [[../../02-selfhosting/storage-backup/rsync-backup-patterns|rsync backup patterns]] — general rsync usage in MajorInfrastructure diff --git a/05-troubleshooting/ollama-chat-template-pipe-stdin-bypass.md b/05-troubleshooting/ollama-chat-template-pipe-stdin-bypass.md new file mode 100644 index 0000000..4096378 --- /dev/null +++ b/05-troubleshooting/ollama-chat-template-pipe-stdin-bypass.md @@ -0,0 +1,88 @@ +--- +title: "Ollama: `ollama run` with Piped Stdin Bypasses Chat Template + SYSTEM Prompt" +domain: troubleshooting +category: ai-inference +tags: [ollama, eval, chat-template, system-prompt, majortwin, gotcha] +status: published +created: 2026-04-25 +updated: 2026-04-25 +--- + +# Ollama: `ollama run` with Piped Stdin Bypasses Chat Template + SYSTEM Prompt + +When eval'ing or smoke-testing an Ollama model, piping a prompt via stdin to `ollama run` skips the model's chat template **and** the SYSTEM prompt baked into the Modelfile. Output looks like raw base-model completion (often Mastodon-shaped or training-data-shaped), and you'll think the model is broken when it isn't. + +## The Short Answer + +For evals and any test where you want the model's actual chat behavior, **use the HTTP API at `/api/chat`** — never `ollama run` with `echo "..." | ollama run model`. + +```python +import json, urllib.request +body = json.dumps({ + "model": "majortwin-v8", + "messages": [{"role": "user", "content": "What's your name?"}], + "stream": False, +}).encode() +req = urllib.request.Request( + "http://localhost:11434/api/chat", + data=body, headers={"Content-Type": "application/json"}, method="POST", +) +r = json.loads(urllib.request.urlopen(req).read()) +print(r["message"]["content"]) +``` + +Or with curl piped through jq: + +```bash +curl -s http://localhost:11434/api/chat -d '{ + "model": "majortwin-v8", + "messages": [{"role": "user", "content": "What is your name?"}], + "stream": false +}' | jq -r .message.content +``` + +## How to Notice + +Symptom: model responses are weirdly raw — Mastodon-style hashtag rants, news headlines, multiple unrelated thoughts strung together — even though the same model behaves normally in Open WebUI or via the chat API. This is the canonical fingerprint of a chat-template-bypassed call. + +## Why This Happens + +`ollama run` is the CLI's interactive REPL. When stdin is a TTY, it reads input as user turns and applies the chat template. When stdin is a **pipe** (`echo "..." | ollama run model`), the CLI treats stdin as raw text and forwards it to `/api/generate` (the completion endpoint), not `/api/chat`. `/api/generate` does **not** apply the chat template, and the SYSTEM prompt only takes effect when the chat template wraps it. + +The two endpoints serve different purposes: +- `/api/generate` — raw completion, good for fill-in-the-blank or non-instruct base models +- `/api/chat` — applies the model's chat template, includes SYSTEM, handles multi-turn message arrays + +For an instruct-tuned model (Qwen2.5-Instruct, Llama-3.1-Instruct, etc.), bypassing the chat template means the model never sees the `<|im_start|>system ... <|im_end|>` framing it was trained to expect, and its responses regress toward base-model behavior. + +## When You Actually Want `/api/generate` + +Almost never, for instruct models. The legitimate use case is base models without a chat template, or specific completion-style prompts where you want the model to continue a string verbatim. For evals of a fine-tuned Modelfile, always use `/api/chat`. + +## Reusable Eval Pattern + +A minimal stdlib-only eval harness used for MajorTwin evals lives at `~/MajorTwin/scripts/eval_v8.py`. The key call is the `chat()` helper: + +```python +def chat(host, model, prompt, timeout=180): + body = json.dumps({ + "model": model, + "messages": [{"role": "user", "content": prompt}], + "stream": False, + }).encode() + req = urllib.request.Request( + f"{host}/api/chat", + data=body, + headers={"Content-Type": "application/json"}, + method="POST", + ) + with urllib.request.urlopen(req, timeout=timeout) as r: + return json.loads(r.read())["message"]["content"].strip() +``` + +This applies the chat template and the SYSTEM prompt baked into the Modelfile. No need to re-specify SYSTEM per-call. + +## Related + +- [[ollama-macos-sleep-tailscale-disconnect]] — different Ollama gotcha (sleep + Tailscale) +- [[20-Projects/MajorTwin/majortwin-v8-eval-report|MajorTwin v8 eval report]] — caught this issue during initial smoke test on 2026-04-25 diff --git a/SUMMARY.md b/SUMMARY.md index 98f16af..ba92985 100644 --- a/SUMMARY.md +++ b/SUMMARY.md @@ -88,6 +88,8 @@ updated: 2026-04-22T19:58 * [Windows OpenSSH: WSL Default Shell Breaks Remote Commands](05-troubleshooting/networking/windows-openssh-wsl-default-shell-breaks-remote-commands.md) * [Pi-hole AI Blocklist Blocks Claude Desktop (ERR_CONNECTION_REFUSED)](05-troubleshooting/networking/pihole-blocks-claude-desktop.md) * [Ollama Drops Off Tailscale When Mac Sleeps](05-troubleshooting/ollama-macos-sleep-tailscale-disconnect.md) + * [Ollama: `ollama run` with Piped Stdin Bypasses Chat Template + SYSTEM Prompt](05-troubleshooting/ollama-chat-template-pipe-stdin-bypass.md) + * [rsync over Tailscale: Hung in TCP Teardown After Transfer Completes](05-troubleshooting/networking/rsync-tailscale-teardown-stall.md) * [macOS: Repeating Alert Tone from Mirrored iPhone Notification](05-troubleshooting/macos-mirrored-notification-alert-loop.md) * [ClamAV CPU Spike: Safe Scheduling with nice/ionice](05-troubleshooting/security/clamscan-cpu-spike-nice-ionice.md) * [Ansible: Vault Password File Not Found](05-troubleshooting/ansible-vault-password-file-missing.md)