--- title: "Ollama: `ollama run` with Piped Stdin Bypasses Chat Template + SYSTEM Prompt" domain: troubleshooting category: ai-inference tags: [ollama, eval, chat-template, system-prompt, majortwin, gotcha] status: published created: 2026-04-25 updated: 2026-04-25 --- # Ollama: `ollama run` with Piped Stdin Bypasses Chat Template + SYSTEM Prompt When eval'ing or smoke-testing an Ollama model, piping a prompt via stdin to `ollama run` skips the model's chat template **and** the SYSTEM prompt baked into the Modelfile. Output looks like raw base-model completion (often Mastodon-shaped or training-data-shaped), and you'll think the model is broken when it isn't. ## The Short Answer For evals and any test where you want the model's actual chat behavior, **use the HTTP API at `/api/chat`** — never `ollama run` with `echo "..." | ollama run model`. ```python import json, urllib.request body = json.dumps({ "model": "majortwin-v8", "messages": [{"role": "user", "content": "What's your name?"}], "stream": False, }).encode() req = urllib.request.Request( "http://localhost:11434/api/chat", data=body, headers={"Content-Type": "application/json"}, method="POST", ) r = json.loads(urllib.request.urlopen(req).read()) print(r["message"]["content"]) ``` Or with curl piped through jq: ```bash curl -s http://localhost:11434/api/chat -d '{ "model": "majortwin-v8", "messages": [{"role": "user", "content": "What is your name?"}], "stream": false }' | jq -r .message.content ``` ## How to Notice Symptom: model responses are weirdly raw — Mastodon-style hashtag rants, news headlines, multiple unrelated thoughts strung together — even though the same model behaves normally in Open WebUI or via the chat API. This is the canonical fingerprint of a chat-template-bypassed call. ## Why This Happens `ollama run` is the CLI's interactive REPL. When stdin is a TTY, it reads input as user turns and applies the chat template. When stdin is a **pipe** (`echo "..." | ollama run model`), the CLI treats stdin as raw text and forwards it to `/api/generate` (the completion endpoint), not `/api/chat`. `/api/generate` does **not** apply the chat template, and the SYSTEM prompt only takes effect when the chat template wraps it. The two endpoints serve different purposes: - `/api/generate` — raw completion, good for fill-in-the-blank or non-instruct base models - `/api/chat` — applies the model's chat template, includes SYSTEM, handles multi-turn message arrays For an instruct-tuned model (Qwen2.5-Instruct, Llama-3.1-Instruct, etc.), bypassing the chat template means the model never sees the `<|im_start|>system ... <|im_end|>` framing it was trained to expect, and its responses regress toward base-model behavior. ## When You Actually Want `/api/generate` Almost never, for instruct models. The legitimate use case is base models without a chat template, or specific completion-style prompts where you want the model to continue a string verbatim. For evals of a fine-tuned Modelfile, always use `/api/chat`. ## Reusable Eval Pattern A minimal stdlib-only eval harness used for MajorTwin evals lives at `~/MajorTwin/scripts/eval_v8.py`. The key call is the `chat()` helper: ```python def chat(host, model, prompt, timeout=180): body = json.dumps({ "model": model, "messages": [{"role": "user", "content": prompt}], "stream": False, }).encode() req = urllib.request.Request( f"{host}/api/chat", data=body, headers={"Content-Type": "application/json"}, method="POST", ) with urllib.request.urlopen(req, timeout=timeout) as r: return json.loads(r.read())["message"]["content"].strip() ``` This applies the chat template and the SYSTEM prompt baked into the Modelfile. No need to re-specify SYSTEM per-call. ## Related - [[ollama-macos-sleep-tailscale-disconnect]] — different Ollama gotcha (sleep + Tailscale) - [[20-Projects/MajorTwin/majortwin-v8-eval-report|MajorTwin v8 eval report]] — caught this issue during initial smoke test on 2026-04-25