---
title: "Ollama: `ollama run` with Piped Stdin Bypasses Chat Template + SYSTEM Prompt"
domain: troubleshooting
category: ai-inference
tags: [ollama, eval, chat-template, system-prompt, majortwin, gotcha]
status: published
created: 2026-04-25
updated: 2026-04-25
---

# Ollama: `ollama run` with Piped Stdin Bypasses Chat Template + SYSTEM Prompt

When eval'ing or smoke-testing an Ollama model, piping a prompt via stdin to `ollama run` skips the model's chat template **and** the SYSTEM prompt baked into the Modelfile. Output looks like raw base-model completion (often Mastodon-shaped or training-data-shaped), and you'll think the model is broken when it isn't.

## The Short Answer

For evals and any test where you want the model's actual chat behavior, **use the HTTP API at `/api/chat`** — never `ollama run` with `echo "..." | ollama run model`.

```python
import json, urllib.request
body = json.dumps({
    "model": "majortwin-v8",
    "messages": [{"role": "user", "content": "What's your name?"}],
    "stream": False,
}).encode()
req = urllib.request.Request(
    "http://localhost:11434/api/chat",
    data=body, headers={"Content-Type": "application/json"}, method="POST",
)
r = json.loads(urllib.request.urlopen(req).read())
print(r["message"]["content"])
```

Or with curl piped through jq:

```bash
curl -s http://localhost:11434/api/chat -d '{
  "model": "majortwin-v8",
  "messages": [{"role": "user", "content": "What is your name?"}],
  "stream": false
}' | jq -r .message.content
```

## How to Notice

Symptom: model responses are weirdly raw — Mastodon-style hashtag rants, news headlines, multiple unrelated thoughts strung together — even though the same model behaves normally in Open WebUI or via the chat API. This is the canonical fingerprint of a chat-template-bypassed call.

## Why This Happens

`ollama run` is the CLI's interactive REPL. When stdin is a TTY, it reads input as user turns and applies the chat template. When stdin is a **pipe** (`echo "..." | ollama run model`), the CLI treats stdin as raw text and forwards it to `/api/generate` (the completion endpoint), not `/api/chat`. `/api/generate` does **not** apply the chat template, and the SYSTEM prompt only takes effect when the chat template wraps it.

The two endpoints serve different purposes:
- `/api/generate` — raw completion, good for fill-in-the-blank or non-instruct base models
- `/api/chat` — applies the model's chat template, includes SYSTEM, handles multi-turn message arrays

For an instruct-tuned model (Qwen2.5-Instruct, Llama-3.1-Instruct, etc.), bypassing the chat template means the model never sees the `<|im_start|>system ... <|im_end|>` framing it was trained to expect, and its responses regress toward base-model behavior.

## When You Actually Want `/api/generate`

Almost never, for instruct models. The legitimate use case is base models without a chat template, or specific completion-style prompts where you want the model to continue a string verbatim. For evals of a fine-tuned Modelfile, always use `/api/chat`.

## Reusable Eval Pattern

A minimal stdlib-only eval harness used for MajorTwin evals lives at `~/MajorTwin/scripts/eval_v8.py`. The key call is the `chat()` helper:

```python
def chat(host, model, prompt, timeout=180):
    body = json.dumps({
        "model": model,
        "messages": [{"role": "user", "content": prompt}],
        "stream": False,
    }).encode()
    req = urllib.request.Request(
        f"{host}/api/chat",
        data=body,
        headers={"Content-Type": "application/json"},
        method="POST",
    )
    with urllib.request.urlopen(req, timeout=timeout) as r:
        return json.loads(r.read())["message"]["content"].strip()
```

This applies the chat template and the SYSTEM prompt baked into the Modelfile. No need to re-specify SYSTEM per-call.

## Related

- [[ollama-macos-sleep-tailscale-disconnect]] — different Ollama gotcha (sleep + Tailscale)
- [[20-Projects/MajorTwin/majortwin-v8-eval-report|MajorTwin v8 eval report]] — caught this issue during initial smoke test on 2026-04-25