wiki: add troubleshooting article — LoRA adapter GGUF conversion fails

Documents the gotcha where convert_hf_to_gguf.py crashes with 'config.json not found' because the training output directory holds only the LoRA adapter, not a merged HF model. Includes inline save_pretrained_merged() fix snippet, verification checklist, and resume-pipeline-without-retraining pattern. Discovered today during the MajorTwin v8c pipeline failure (Step 4). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-30 11:22:59 -04:00 · 2026-04-30 11:22:59 -04:00 · 6e7a0ca21f
commit 6e7a0ca21f
parent 85f8a5df2d
3 changed files with 121 additions and 0 deletions
--- a/05-troubleshooting/gpu-display/lora-adapter-gguf-conversion-fails.md
+++ b/05-troubleshooting/gpu-display/lora-adapter-gguf-conversion-fails.md
@ -0,0 +1,119 @@
+---
+title: "LoRA adapter — GGUF conversion fails with 'config.json not found'"
+domain: troubleshooting
+category: gpu-display
+tags: [lora, qlora, gguf, llama.cpp, unsloth, fine-tuning, qwen]
+status: published
+created: 2026-04-30
+updated: 2026-04-30
+---
+
+# LoRA adapter — GGUF conversion fails with 'config.json not found'
+
+## Problem
+
+After a QLoRA fine-tune, you point `llama.cpp/convert_hf_to_gguf.py` at the training output directory and it crashes immediately:
+
+```
+FileNotFoundError: [Errno 2] No such file or directory:
+  '/path/to/training-runs/<run>/final/config.json'
+```
+
+The output directory looks fine — it contains:
+
+```
+adapter_config.json
+adapter_model.safetensors  (~150 MB for a 7B base)
+chat_template.jinja
+tokenizer_config.json
+tokenizer.json
+```
+
+But no `config.json`, and `adapter_model.safetensors` is 150 MB — way smaller than the ~14 GB you'd expect for a full Qwen2.5-7B 16-bit checkpoint.
+
+## Root cause
+
+`model.save_pretrained()` after a LoRA/QLoRA train saves **only the adapter weights**, not a merged full-precision model. `convert_hf_to_gguf.py` expects a full HuggingFace model directory — it reads `config.json` to identify the architecture. Adapter-only directories don't have one.
+
+You need to merge the LoRA adapter into the base model first, then point the GGUF converter at the merged dir.
+
+## Solution
+
+### Quick fix — inline merge step
+
+Insert this block between training completion and `convert_hf_to_gguf.py`:
+
+```python
+from unsloth import FastLanguageModel
+
+adapter = "/path/to/training-runs/<run>/final"
+merged  = "/path/to/training-runs/<run>/merged"
+
+model, tok = FastLanguageModel.from_pretrained(
+    model_name=adapter,
+    max_seq_length=2048,
+    load_in_4bit=True,
+)
+model.save_pretrained_merged(merged, tok, save_method="merged_16bit")
+```
+
+Then run the GGUF converter against the **merged** dir, not the adapter dir:
+
+```bash
+python3 llama.cpp/convert_hf_to_gguf.py /path/to/training-runs/<run>/merged \
+  --outfile model-f16.gguf --outtype f16
+```
+
+The merged dir will contain `config.json`, `model-00001-of-00004.safetensors` (multiple shards totaling the full base model size), `generation_config.json`, etc.
+
+### Cleaner fix — use a wrapper
+
+If you do this often, encapsulate it:
+
+1. Wrapper Python script accepts `--adapter`, `--output`, `--skip-merge`, `--all-quants`
+2. Step 1: load adapter via `FastLanguageModel.from_pretrained()`, call `save_pretrained_merged()`
+3. Step 2: subprocess `convert_hf_to_gguf.py` on the merged dir
+4. Step 3: subprocess `llama-quantize` for each requested quant
+
+This is what `~/corpus/scripts/convert_gguf.py` does on MajorRig (rewritten 2026-04-09 for the MajorTwin v7b cycle).
+
+## Why this trips people up
+
+- Unsloth and PEFT both save adapter-only by default after `trainer.save_model()` or `model.save_pretrained()`. There's no warning that downstream tools expect a merged model.
+- The training output **looks** complete — there's a `tokenizer.json`, a `chat_template.jinja`, and a non-trivial `.safetensors`. It feels like a checkpoint.
+- A pipeline that uses `convert_gguf.py` (with merge) once and then someone reimplements Step 4 inline (skipping the wrapper) will silently lose the merge step. This is what happened in MajorTwin v8c (Apr 30, 2026) — see [[majortwin-v8b-plan#Pipeline Bug + Fix (2026-04-30)]].
+
+## Verification checklist
+
+After training, before running the GGUF converter, verify the directory you're pointing at:
+
+| File | Adapter-only dir | Merged dir |
+|---|---|---|
+| `adapter_config.json` | ✅ | ❌ |
+| `adapter_model.safetensors` | ✅ (~150 MB / 7B) | ❌ |
+| `config.json` | ❌ | ✅ |
+| `model-*.safetensors` (sharded) | ❌ | ✅ (~14 GB / 7B) |
+| `generation_config.json` | ❌ | ✅ |
+| `tokenizer.json` | ✅ | ✅ |
+
+If you see only the left column, you need to merge before converting.
+
+## Resuming a failed pipeline without re-training
+
+The adapter is small and self-contained. If your pipeline crashes at the GGUF step, you do NOT need to retrain — the LoRA adapter at `<run>/final/` is intact. Write a resume wrapper that runs only:
+
+1. Merge (`save_pretrained_merged`)
+2. F16 conversion (`convert_hf_to_gguf.py`)
+3. Quantization (`llama-quantize`)
+4. Deploy
+
+This saves the cost of however many GPU-hours the training took. See `~/corpus/scripts/resume_v8c_step4.sh` on MajorRig for an example.
+
+## Related
+
+- [[qwen-14b-oom-3080ti]] — base model size choice on a 12GB GPU
+- [[majortwin-v8b-plan]] — v8c pipeline architecture and resume
+
+## Maintenance
+
+- 2026-04-30 — Created after MajorTwin v8c pipeline failed Step 4. Root-caused, patched, resumed.
--- a/05-troubleshooting/index.md
+++ b/05-troubleshooting/index.md
@ -8,6 +8,7 @@ Practical fixes for common Linux, networking, and application problems.

 ## 🖥️ GPU & AI
 - [Qwen2.5-14B OOM on RTX 3080 Ti (12GB)](gpu-display/qwen-14b-oom-3080ti.md)
+- [LoRA adapter — GGUF conversion fails with 'config.json not found'](gpu-display/lora-adapter-gguf-conversion-fails.md)

 ## 🌐 Networking & Web
 - [Apache Outage: Fail2ban Self-Ban + Missing iptables Rules](networking/fail2ban-self-ban-apache-outage.md)
--- a/SUMMARY.md
+++ b/SUMMARY.md
@ -78,6 +78,7 @@ updated: 2026-04-29T23:55
    * [ISP SNI Filtering with Caddy](05-troubleshooting/isp-sni-filtering-caddy.md)
    * [Obsidian Vault Recovery — Loading Cache Hang](05-troubleshooting/obsidian-cache-hang-recovery.md)
    * [Qwen2.5-14B OOM on RTX 3080 Ti (12GB)](05-troubleshooting/gpu-display/qwen-14b-oom-3080ti.md)
+    * [LoRA adapter — GGUF conversion fails with 'config.json not found'](05-troubleshooting/gpu-display/lora-adapter-gguf-conversion-fails.md)
    * [yt-dlp YouTube JS Challenge Fix on Fedora](05-troubleshooting/yt-dlp-fedora-js-challenge.md)
    * [Gemini CLI Manual Update](05-troubleshooting/gemini-cli-manual-update.md)
    * [MajorWiki Setup & Publishing Pipeline](05-troubleshooting/majwiki-setup-and-pipeline.md)