--- title: "LoRA adapter — GGUF conversion fails with 'config.json not found'" domain: troubleshooting category: gpu-display tags: [lora, qlora, gguf, llama.cpp, unsloth, fine-tuning, qwen] status: published created: 2026-04-30 updated: 2026-04-30 --- # LoRA adapter — GGUF conversion fails with 'config.json not found' ## Problem After a QLoRA fine-tune, you point `llama.cpp/convert_hf_to_gguf.py` at the training output directory and it crashes immediately: ``` FileNotFoundError: [Errno 2] No such file or directory: '/path/to/training-runs//final/config.json' ``` The output directory looks fine — it contains: ``` adapter_config.json adapter_model.safetensors (~150 MB for a 7B base) chat_template.jinja tokenizer_config.json tokenizer.json ``` But no `config.json`, and `adapter_model.safetensors` is 150 MB — way smaller than the ~14 GB you'd expect for a full Qwen2.5-7B 16-bit checkpoint. ## Root cause `model.save_pretrained()` after a LoRA/QLoRA train saves **only the adapter weights**, not a merged full-precision model. `convert_hf_to_gguf.py` expects a full HuggingFace model directory — it reads `config.json` to identify the architecture. Adapter-only directories don't have one. You need to merge the LoRA adapter into the base model first, then point the GGUF converter at the merged dir. ## Solution ### Quick fix — inline merge step Insert this block between training completion and `convert_hf_to_gguf.py`: ```python from unsloth import FastLanguageModel adapter = "/path/to/training-runs//final" merged = "/path/to/training-runs//merged" model, tok = FastLanguageModel.from_pretrained( model_name=adapter, max_seq_length=2048, load_in_4bit=True, ) model.save_pretrained_merged(merged, tok, save_method="merged_16bit") ``` Then run the GGUF converter against the **merged** dir, not the adapter dir: ```bash python3 llama.cpp/convert_hf_to_gguf.py /path/to/training-runs//merged \ --outfile model-f16.gguf --outtype f16 ``` The merged dir will contain `config.json`, `model-00001-of-00004.safetensors` (multiple shards totaling the full base model size), `generation_config.json`, etc. ### Cleaner fix — use a wrapper If you do this often, encapsulate it: 1. Wrapper Python script accepts `--adapter`, `--output`, `--skip-merge`, `--all-quants` 2. Step 1: load adapter via `FastLanguageModel.from_pretrained()`, call `save_pretrained_merged()` 3. Step 2: subprocess `convert_hf_to_gguf.py` on the merged dir 4. Step 3: subprocess `llama-quantize` for each requested quant This is what `~/corpus/scripts/convert_gguf.py` does on MajorRig (rewritten 2026-04-09 for the MajorTwin v7b cycle). ## Why this trips people up - Unsloth and PEFT both save adapter-only by default after `trainer.save_model()` or `model.save_pretrained()`. There's no warning that downstream tools expect a merged model. - The training output **looks** complete — there's a `tokenizer.json`, a `chat_template.jinja`, and a non-trivial `.safetensors`. It feels like a checkpoint. - A pipeline that uses `convert_gguf.py` (with merge) once and then someone reimplements Step 4 inline (skipping the wrapper) will silently lose the merge step. This is what happened in MajorTwin v8c (Apr 30, 2026) — see [[majortwin-v8b-plan#Pipeline Bug + Fix (2026-04-30)]]. ## Verification checklist After training, before running the GGUF converter, verify the directory you're pointing at: | File | Adapter-only dir | Merged dir | |---|---|---| | `adapter_config.json` | ✅ | ❌ | | `adapter_model.safetensors` | ✅ (~150 MB / 7B) | ❌ | | `config.json` | ❌ | ✅ | | `model-*.safetensors` (sharded) | ❌ | ✅ (~14 GB / 7B) | | `generation_config.json` | ❌ | ✅ | | `tokenizer.json` | ✅ | ✅ | If you see only the left column, you need to merge before converting. ## Resuming a failed pipeline without re-training The adapter is small and self-contained. If your pipeline crashes at the GGUF step, you do NOT need to retrain — the LoRA adapter at `/final/` is intact. Write a resume wrapper that runs only: 1. Merge (`save_pretrained_merged`) 2. F16 conversion (`convert_hf_to_gguf.py`) 3. Quantization (`llama-quantize`) 4. Deploy This saves the cost of however many GPU-hours the training took. See `~/corpus/scripts/resume_v8c_step4.sh` on MajorRig for an example. ## Related - [[qwen-14b-oom-3080ti]] — base model size choice on a 12GB GPU - [[majortwin-v8b-plan]] — v8c pipeline architecture and resume ## Maintenance - 2026-04-30 — Created after MajorTwin v8c pipeline failed Step 4. Root-caused, patched, resumed.