wiki: add manual update guide for Gemini CLI

2026-03-13 22:45:52 -04:00
parent 70d9657b7f
commit 2861cade55
10 changed files with 323 additions and 21 deletions
--- a/05-troubleshooting/gpu-display/qwen-14b-oom-3080ti.md
+++ b/05-troubleshooting/gpu-display/qwen-14b-oom-3080ti.md
@@ -0,0 +1,58 @@
+# Qwen2.5-14B OOM on RTX 3080 Ti (12GB)
+
+## Problem
+
+When attempting to run or fine-tune **Qwen2.5-14B** on an NVIDIA RTX 3080 Ti with 12GB of VRAM, the process fails with an Out of Memory (OOM) error:
+
+```
+torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate X GiB (GPU 0; 12.00 GiB total capacity; Y GiB already allocated; Z GiB free; ...)
+```
+
+The 12GB VRAM limit is hit during the initial model load or immediately upon starting the first training step.
+
+## Root Causes
+
+1. **Model Size:** A 14B parameter model in FP16/BF16 requires ~28GB of VRAM just for the weights.
+2. **Context Length:** High context lengths (e.g., 4096+) significantly increase VRAM usage during training.
+3. **Training Overhead:** Even with QLoRA (4-bit quantization), the overhead of gradients, optimizer states, and activations can exceed 12GB for a 14B model.
+
+---
+
+## Solutions
+
+### 1. Pivot to a 7B Model (Recommended)
+
+For a 12GB GPU, a 7B parameter model (like **Qwen2.5-7B-Instruct**) is the sweet spot. It provides excellent performance while leaving enough VRAM for high context lengths and larger batch sizes.
+
+- **VRAM Usage (7B QLoRA):** ~6-8GB
+- **Pros:** Stable, fast, supports long context.
+- **Cons:** Slightly lower reasoning capability than 14B.
+
+### 2. Aggressive Quantization
+
+If you MUST run 14B, use 4-bit quantization (GGUF or EXL2) for inference only. Training 14B on 12GB is not reliably possible even with extreme offloading.
+
+```bash
+# Example Ollama run (uses 4-bit quantization by default)
+ollama run qwen2.5:14b
+```
+
+### 3. Training Optimizations (if attempting 14B)
+
+If you have no choice but to try 14B training:
+- Set `max_seq_length` to 512 or 1024.
+- Use `Unsloth` (it is highly memory-efficient).
+- Enable `gradient_checkpointing`.
+- Set `per_device_train_batch_size = 1`.
+
+---
+
+## Maintenance
+
+Keep your NVIDIA drivers and CUDA toolkit updated. On Windows (MajorRig), ensure WSL2 has sufficient memory allocation in `.wslconfig`.
+
+---
+
+## Tags
+
+#gpu #cuda #oom #qwen #majortwin #llm #fine-tuning