From 6e7a0ca21fcb52603e748d1b854989d4856a3dfb Mon Sep 17 00:00:00 2001 From: Marcus Summers Date: Thu, 30 Apr 2026 11:22:59 -0400 Subject: [PATCH] =?UTF-8?q?wiki:=20add=20troubleshooting=20article=20?= =?UTF-8?q?=E2=80=94=20LoRA=20adapter=20GGUF=20conversion=20fails?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Documents the gotcha where convert_hf_to_gguf.py crashes with 'config.json not found' because the training output directory holds only the LoRA adapter, not a merged HF model. Includes inline save_pretrained_merged() fix snippet, verification checklist, and resume-pipeline-without-retraining pattern. Discovered today during the MajorTwin v8c pipeline failure (Step 4). Co-Authored-By: Claude Opus 4.7 (1M context) --- .../lora-adapter-gguf-conversion-fails.md | 119 ++++++++++++++++++ 05-troubleshooting/index.md | 1 + SUMMARY.md | 1 + 3 files changed, 121 insertions(+) create mode 100644 05-troubleshooting/gpu-display/lora-adapter-gguf-conversion-fails.md diff --git a/05-troubleshooting/gpu-display/lora-adapter-gguf-conversion-fails.md b/05-troubleshooting/gpu-display/lora-adapter-gguf-conversion-fails.md new file mode 100644 index 0000000..2b74b08 --- /dev/null +++ b/05-troubleshooting/gpu-display/lora-adapter-gguf-conversion-fails.md @@ -0,0 +1,119 @@ +--- +title: "LoRA adapter — GGUF conversion fails with 'config.json not found'" +domain: troubleshooting +category: gpu-display +tags: [lora, qlora, gguf, llama.cpp, unsloth, fine-tuning, qwen] +status: published +created: 2026-04-30 +updated: 2026-04-30 +--- + +# LoRA adapter — GGUF conversion fails with 'config.json not found' + +## Problem + +After a QLoRA fine-tune, you point `llama.cpp/convert_hf_to_gguf.py` at the training output directory and it crashes immediately: + +``` +FileNotFoundError: [Errno 2] No such file or directory: + '/path/to/training-runs//final/config.json' +``` + +The output directory looks fine — it contains: + +``` +adapter_config.json +adapter_model.safetensors (~150 MB for a 7B base) +chat_template.jinja +tokenizer_config.json +tokenizer.json +``` + +But no `config.json`, and `adapter_model.safetensors` is 150 MB — way smaller than the ~14 GB you'd expect for a full Qwen2.5-7B 16-bit checkpoint. + +## Root cause + +`model.save_pretrained()` after a LoRA/QLoRA train saves **only the adapter weights**, not a merged full-precision model. `convert_hf_to_gguf.py` expects a full HuggingFace model directory — it reads `config.json` to identify the architecture. Adapter-only directories don't have one. + +You need to merge the LoRA adapter into the base model first, then point the GGUF converter at the merged dir. + +## Solution + +### Quick fix — inline merge step + +Insert this block between training completion and `convert_hf_to_gguf.py`: + +```python +from unsloth import FastLanguageModel + +adapter = "/path/to/training-runs//final" +merged = "/path/to/training-runs//merged" + +model, tok = FastLanguageModel.from_pretrained( + model_name=adapter, + max_seq_length=2048, + load_in_4bit=True, +) +model.save_pretrained_merged(merged, tok, save_method="merged_16bit") +``` + +Then run the GGUF converter against the **merged** dir, not the adapter dir: + +```bash +python3 llama.cpp/convert_hf_to_gguf.py /path/to/training-runs//merged \ + --outfile model-f16.gguf --outtype f16 +``` + +The merged dir will contain `config.json`, `model-00001-of-00004.safetensors` (multiple shards totaling the full base model size), `generation_config.json`, etc. + +### Cleaner fix — use a wrapper + +If you do this often, encapsulate it: + +1. Wrapper Python script accepts `--adapter`, `--output`, `--skip-merge`, `--all-quants` +2. Step 1: load adapter via `FastLanguageModel.from_pretrained()`, call `save_pretrained_merged()` +3. Step 2: subprocess `convert_hf_to_gguf.py` on the merged dir +4. Step 3: subprocess `llama-quantize` for each requested quant + +This is what `~/corpus/scripts/convert_gguf.py` does on MajorRig (rewritten 2026-04-09 for the MajorTwin v7b cycle). + +## Why this trips people up + +- Unsloth and PEFT both save adapter-only by default after `trainer.save_model()` or `model.save_pretrained()`. There's no warning that downstream tools expect a merged model. +- The training output **looks** complete — there's a `tokenizer.json`, a `chat_template.jinja`, and a non-trivial `.safetensors`. It feels like a checkpoint. +- A pipeline that uses `convert_gguf.py` (with merge) once and then someone reimplements Step 4 inline (skipping the wrapper) will silently lose the merge step. This is what happened in MajorTwin v8c (Apr 30, 2026) — see [[majortwin-v8b-plan#Pipeline Bug + Fix (2026-04-30)]]. + +## Verification checklist + +After training, before running the GGUF converter, verify the directory you're pointing at: + +| File | Adapter-only dir | Merged dir | +|---|---|---| +| `adapter_config.json` | ✅ | ❌ | +| `adapter_model.safetensors` | ✅ (~150 MB / 7B) | ❌ | +| `config.json` | ❌ | ✅ | +| `model-*.safetensors` (sharded) | ❌ | ✅ (~14 GB / 7B) | +| `generation_config.json` | ❌ | ✅ | +| `tokenizer.json` | ✅ | ✅ | + +If you see only the left column, you need to merge before converting. + +## Resuming a failed pipeline without re-training + +The adapter is small and self-contained. If your pipeline crashes at the GGUF step, you do NOT need to retrain — the LoRA adapter at `/final/` is intact. Write a resume wrapper that runs only: + +1. Merge (`save_pretrained_merged`) +2. F16 conversion (`convert_hf_to_gguf.py`) +3. Quantization (`llama-quantize`) +4. Deploy + +This saves the cost of however many GPU-hours the training took. See `~/corpus/scripts/resume_v8c_step4.sh` on MajorRig for an example. + +## Related + +- [[qwen-14b-oom-3080ti]] — base model size choice on a 12GB GPU +- [[majortwin-v8b-plan]] — v8c pipeline architecture and resume + +## Maintenance + +- 2026-04-30 — Created after MajorTwin v8c pipeline failed Step 4. Root-caused, patched, resumed. diff --git a/05-troubleshooting/index.md b/05-troubleshooting/index.md index 9bd55ef..c6de078 100644 --- a/05-troubleshooting/index.md +++ b/05-troubleshooting/index.md @@ -8,6 +8,7 @@ Practical fixes for common Linux, networking, and application problems. ## 🖥️ GPU & AI - [Qwen2.5-14B OOM on RTX 3080 Ti (12GB)](gpu-display/qwen-14b-oom-3080ti.md) +- [LoRA adapter — GGUF conversion fails with 'config.json not found'](gpu-display/lora-adapter-gguf-conversion-fails.md) ## 🌐 Networking & Web - [Apache Outage: Fail2ban Self-Ban + Missing iptables Rules](networking/fail2ban-self-ban-apache-outage.md) diff --git a/SUMMARY.md b/SUMMARY.md index 45a8d18..a60b4e5 100644 --- a/SUMMARY.md +++ b/SUMMARY.md @@ -78,6 +78,7 @@ updated: 2026-04-29T23:55 * [ISP SNI Filtering with Caddy](05-troubleshooting/isp-sni-filtering-caddy.md) * [Obsidian Vault Recovery — Loading Cache Hang](05-troubleshooting/obsidian-cache-hang-recovery.md) * [Qwen2.5-14B OOM on RTX 3080 Ti (12GB)](05-troubleshooting/gpu-display/qwen-14b-oom-3080ti.md) + * [LoRA adapter — GGUF conversion fails with 'config.json not found'](05-troubleshooting/gpu-display/lora-adapter-gguf-conversion-fails.md) * [yt-dlp YouTube JS Challenge Fix on Fedora](05-troubleshooting/yt-dlp-fedora-js-challenge.md) * [Gemini CLI Manual Update](05-troubleshooting/gemini-cli-manual-update.md) * [MajorWiki Setup & Publishing Pipeline](05-troubleshooting/majwiki-setup-and-pipeline.md)