majorwiki/04-streaming/plex/hevc-vaapi-batch-encode.md
MajorLinux ce2e761d33 hevc-vaapi-batch-encode: add already_failed() skip for streaming content
Document that VAAPI HEVC on Polaris can't beat already-efficient H.264 (YouTube/
Twitch/stream archives), so output comes out larger and lands in hevc_failed.txt.
Add already_failed() guard so the batch skips known-bad files on queue rebuilds
instead of re-attempting them. Also: MIN_FREE_GB note (start-only check) and a
source-bitrate triage snippet for picking real encode candidates.
2026-06-11 20:16:19 -04:00

331 lines
14 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

---
title: "HEVC Batch Re-Encode for Plex Using VAAPI (AMD GPU)"
domain: streaming
category: plex
tags: [plex, ffmpeg, hevc, vaapi, amd, gpu, encode, storage, rx480]
status: published
created: 2026-05-15
updated: 2026-06-05
---
# HEVC Batch Re-Encode for Plex Using VAAPI (AMD GPU)
## Problem
Plex NVMe storage is filling up from a large library of H.264-encoded video files (YouTube downloads, stream archives, etc.). Re-encoding to HEVC (H.265) reclaims 3050% of disk space. The catch: Plex tracks each file's "date added" in a SQLite database, and that order matters for playback queues. Naive re-encode-and-replace approaches can corrupt or reset that metadata.
## Solution
Use `ffmpeg` with `hevc_vaapi` (AMD GPU hardware encoder) to batch re-encode files in-place using an atomic rename swap that preserves the Plex database record — including `added_at` — without any Plex downtime or database editing.
---
## How Plex Stores "Date Added"
Plex does **not** use file modification time (`mtime`) for "date added." It stores a Unix timestamp in its SQLite database:
```sql
-- Plex DB location (override via systemd unit may differ — check):
-- /var/lib/plexmediaserver/Library/Application Support/Plex Media Server/
-- Plug-in Support/Databases/com.plexapp.plugins.library.db
-- (or wherever PLEX_MEDIA_SERVER_APPLICATION_SUPPORT_DIR points)
SELECT mi.added_at, datetime(mi.added_at, 'unixepoch'), mp.file
FROM metadata_items mi
JOIN media_items me ON me.metadata_item_id = mi.id
JOIN media_parts mp ON mp.media_item_id = me.id
WHERE mp.file LIKE '%your-file%';
```
> **Note:** If the default path returns 0 rows, check your actual data directory:
> ```bash
> systemctl cat plexmediaserver | grep APPLICATION_SUPPORT
> ```
The `added_at` field is keyed to the **file path** in `media_parts`. As long as the file path doesn't change, the database record — including `added_at` — is untouched even after the file's content is replaced.
---
## Why VAAPI Instead of libx265
On a host with an AMD RX 480/580 (or similar Polaris GPU), hardware HEVC encoding via VAAPI is roughly **9× faster** than software libx265 at comparable quality:
| Encoder | Speed (1080p) | Notes |
|---|---|---|
| libx265 -preset medium | ~21 fps / 0.35× | Best quality/size ratio |
| hevc_vaapi QP 28 | ~186 fps / 3.1× | Sufficient for streaming content |
For 1080p streaming content (game streams, podcasts, YouTube archival), the quality difference is imperceptible. libx265 is preferable only for archival encodes where absolute quality matters.
### Verify VAAPI is working
```bash
vainfo 2>&1 | grep -E "vaapi|HEVC|hevc|Driver"
ls /dev/dri/renderD128
```
You need `VAProfileHEVCMain : VAEntrypointEncSlice` in the output. If missing, install `mesa-va-drivers-freeworld` (RPM Fusion) for AMD hardware.
---
## The Atomic Swap Strategy
The key insight: `mv file.tmp file` on the **same filesystem** is an atomic inode rename at the kernel level. Plex sees the same path still present — it never fires a "file removed" event, so the `metadata_items` record (including `added_at`) is preserved.
**Safe sequence:**
1. Encode source → `.hevc.tmp.mp4` alongside the original
2. Verify the output with `ffprobe`
3. `touch -r original.mp4 temp.mp4` — copy mtime (cosmetic, not required)
4. `mv temp.mp4 original.mp4` — atomic replace
**The one pitfall:** if the original file is deleted *before* the `mv`, Plex orphans the DB record (removes `metadata_items` entry on next scan) and re-indexes the new file with a fresh `added_at`. The original must still exist at swap time.
---
## The Batch Script
Script lives at `~/hevc_batch.sh` on majorhome.
```bash
# Dry run — scan and report what would be encoded, no changes
bash ~/hevc_batch.sh --dry-run
# Full run (default: files >1GB, QP 28)
tmux new-session -d -s hevc_batch 'bash ~/hevc_batch.sh'
# Custom options
bash ~/hevc_batch.sh --min-size-gb 2 --qp 26
```
### Queue and resume
The script writes a queue file at `~/hevc_queue.txt` on first run (scanning all files with ffprobe — takes ~10 min for a large library). On subsequent runs it resumes from where it left off. Completed files are logged to `~/hevc_done.txt`. Failed files go to `~/hevc_failed.txt`.
To restart from scratch: `rm ~/hevc_queue.txt ~/hevc_done.txt`
### Log output
```bash
# Structured log lines only (skip ffmpeg progress noise)
grep '^\[20' ~/hevc_batch.log
# Watch live progress
tail -f ~/hevc_batch.log | grep '^\[20'
```
Each file logs:
- Source size and codec
- `Plex added_at before: <unix timestamp>`
- ffmpeg exit code and elapsed time
- Output size and savings
- `DB check: added_at PRESERVED ✓` (or WARN if changed)
### Space guard
The script aborts if free space on the Plex volume drops below 10GB (`MIN_FREE_GB`). Worst-case headroom needed is `source_size + tmp_size` simultaneously — on a 4GB source file that's ~8GB peak. Note: the space check only runs at the **start** of each encode, not during — a large file can still consume significant disk mid-encode.
---
## ffmpeg Command
```bash
ffmpeg \
-vaapi_device /dev/dri/renderD128 \
-i "input.mp4" \
-vf 'format=nv12,hwupload' \
-c:v hevc_vaapi -rc_mode CQP -qp 28 \
-c:a copy \
-movflags +faststart \
-y "output.tmp.mp4"
```
- `-rc_mode CQP -qp 28` — constant quantizer; higher value = smaller file / lower quality. QP 24 is high quality, QP 28 is good for streaming content.
- `-vf 'format=nv12,hwupload'` — required to move frames to GPU memory for VAAPI encoding.
- `-c:a copy` — passes audio through untouched.
- `hevc_vaapi` does not support 10-bit output on Polaris (RX 480/580). For 10-bit HDR sources, fall back to `libx265` with color signaling flags.
---
## Plex Data Directory Override
On majorhome, the Plex data directory is overridden in the systemd unit — the default path `/var/lib/plexmediaserver/` is empty:
```bash
systemctl cat plexmediaserver | grep APPLICATION_SUPPORT
# Environment=PLEX_MEDIA_SERVER_APPLICATION_SUPPORT_DIR=/plex/plexdata/Library/Application Support
```
The actual DB path is therefore:
```
/plex/plexdata/Library/Application Support/Plex Media Server/Plug-in Support/Databases/com.plexapp.plugins.library.db
```
---
---
## Troubleshooting
### Encode keeps stopping after a few files
**Symptom:** The script runs, encodes a handful of files, then exits. Restarting it produces the same behavior — processes a few, then exits again.
**Cause:** `hevc_batch.sh` is a **one-shot batch processor**, not a daemon. It reads through the queue file once from top to bottom, encodes whatever hasn't been done, then exits cleanly with `Batch complete: N processed`. It does not loop or restart itself.
On subsequent restarts, the script reuses the existing `hevc_queue.txt` rather than rebuilding it — the rebuild only runs if the queue file is missing or empty:
```bash
if [[ ! -f "$QUEUE" ]] || [[ ! -s "$QUEUE" ]]; then
build_queue
fi
```
This means restarts process only the few items left in the stale queue that haven't been marked done, then exit.
**Fix:** Delete the queue file before restarting so the script rescans the library and builds a fresh queue:
```bash
su - majorlinux -c 'rm ~/hevc_queue.txt && tmux new-session -d -s hevc_batch "bash ~/hevc_batch.sh"'
```
> Do **not** delete `hevc_done.txt` — that's the deduplication record. The rebuilt queue will skip anything already in `hevc_done.txt`.
---
### "Parse error, at least 3 arguments" in the log
**Symptom:** Log lines like `Parse error, at least 3 arguments were expected, only 1 given in string 'h.mp4'` scattered between encode entries.
**Cause:** ffmpeg printing its own internal parsing warnings to stderr for filenames containing Unicode special characters used in Giant Bomb / YouTube-DL titles ( — fullwidth variants). The bash script handles these correctly via `IFS= read -r`; these messages are cosmetic ffmpeg noise and do not affect the encode.
**Action:** None — these are safe to ignore.
---
### "SKIP (not found): uiem DLC & Far Far West.mp4" — truncated filenames
**Symptom:** "not found" skip entries in the log show what look like the *ends* of filenames (e.g., `uiem DLC & Far Far West.mp4` instead of `Resident Evil Requiem DLC & Far Far West.mp4`).
**Cause:** The queue file has corrupt/truncated entries — lines where the beginning of the path was lost, likely from a write error or interrupted pipe when the queue was originally built. The script can't find these truncated paths on disk and skips them.
**Fix:** Delete the queue file to force a full rebuild (see above). The rebuild uses `find` with a fresh scan — no truncation possible.
---
### Checking real progress
```bash
# Files done, failed, and remaining in queue
wc -l ~/hevc_done.txt ~/hevc_failed.txt ~/hevc_queue.txt
# Remaining = queue total - done - failed
# (some "remaining" may be not-found or parse-error skips)
# Last 10 log entries
grep '^\[20' ~/hevc_batch.log | tail -10
# Watch live
tail -f ~/hevc_batch.log | grep '^\[20'
# Disk free on /plex
df -h /plex | tail -1
```
---
### Script exits with `set -euo pipefail`
The script uses `set -euo pipefail` — any unhandled non-zero exit code kills it immediately. If the script exits with no "Batch complete" line in the log, look for the last log entry before the gap to identify the failing command. Most encode-path errors are handled with `|| echo ""` guards, but external tools (sqlite3, ffprobe) can still trip this under unusual conditions.
---
## Related
- [[plex-4k-codec-compatibility]] — Apple TV Direct Play compatibility, HEVC HDR notes
- [[plex-transcoding-troubleshooting]] — Playback stops, software transcode CPU limits, VAAPI setup
- [[snapraid-mergerfs-setup]] — MajorRAID storage pool setup
- [[SnapRAID-Majorhome]] — majorhome SnapRAID project
---
### ffmpeg "Error opening output file" / "Invalid argument" on specific files
**Symptom:** One or more files fail with this in the log:
```
Error opening output file /plex/plex/Giant Bomb's Sub-A-Thon Day 3 PART 4.hevc.tmp.mp4.
Error opening output files: Invalid argument
[YYYY-MM-DD HH:MM:SS] ffmpeg exited 234 in 0s
[YYYY-MM-DD HH:MM:SS] FAILED: ffmpeg error — keeping original, removing tmp
```
The file ends up in `hevc_failed.txt` and the original is untouched.
**Cause:** ffmpeg has its own URL/protocol parser that runs on all input and output path strings before any filesystem access. The ASCII pipe character `|` (U+007C) triggers ffmpeg's pipe protocol handler — it tries to interpret `output|file.mp4` as "pipe output to the process named `file.mp4`" and fails with EINVAL. This happens even though the shell variable is properly quoted and the Linux filesystem supports `|` in filenames. The fullwidth variant `` (U+FF5C) can also cause issues depending on ffmpeg's build.
Common in libraries with Giant Bomb, YouTube, or Twitch downloads — those titles frequently use `` as a visual separator.
**Fix:** Sanitize the `stem` used for the `.hevc.tmp.` output filename. The *source* file keeps its original name (the final `mv` writes back to the original path, which the filesystem handles fine); only the temp file needs a clean name for ffmpeg:
```bash
# In encode_file(), replace:
local tmp="${dir}/${stem}.hevc.tmp.${ext}"
# With:
local safe_stem="${stem//|/-}"
safe_stem="${safe_stem///-}"
local tmp="${dir}/${safe_stem}.hevc.tmp.${ext}"
```
After patching, delete the affected entries from `hevc_failed.txt` (or leave them — they'll be re-queued on the next run since they're not in `hevc_done.txt`) and restart the batch.
---
### Many files failing: output larger than source (streaming content)
**Symptom:** A large portion of the queue ends up in `hevc_failed.txt` with log lines like:
```
[2026-06-05 ...] Output: 4.7G savings=0 (output larger than source)
[2026-06-05 ...] WARN: output is larger than source — skipping swap, keeping original
```
**Cause:** These files are YouTube downloads or streaming archives (Giant Bomb, Twitch VODs, etc.) that were already encoded with an efficient H.264 encoder (typically YouTube's VP9-to-AVC pipeline or a broadcast H.264 encoder at a reasonable bitrate). VAAPI HEVC encoding at QP 28 on a Polaris GPU (RX 480/580) is a hardware encoder with limited rate control precision — it cannot beat a well-tuned software H.264 encode on already-compressed talking-head/gaming content. The output reliably comes out 1525% *larger* than the source.
The script handles this correctly: it detects output > source, deletes the tmp, keeps the original, and writes to `hevc_failed.txt`. The files are not corrupted. However, without the `already_failed()` guard, the script will re-attempt these files on every queue rebuild, wasting CPU time and briefly consuming 48 GB of disk per failed attempt.
**Fix — add `already_failed()` skip logic:**
Patch `~/hevc_batch.sh` to skip files already in `hevc_failed.txt`:
```bash
# After the existing already_done() function, add:
already_failed() {
[[ -f "$FAILED" ]] && grep -qF "$1" "$FAILED"
}
# In build_queue(), after the already_done "$f" && continue line:
already_failed "$f" && continue
# In the main loop, after the already_done "$file" check:
already_failed "$file" && { log "SKIP (already failed): $file"; continue; }
```
After patching, the batch will skip all 132+ known-bad files on the next pass and only attempt fresh queue entries.
**Tuning options to improve savings on dense content:**
- Lower QP: `--qp 24` or `--qp 22` — more aggressive quality target, better chance of beating source size. Trade-off: larger output for files that do compress.
- Accept the failures: for streaming content archives, the source is already "good enough." Only files that are genuinely oversized H.264 (old stream captures at very high bitrate) will benefit from HEVC re-encode.
**Identifying which files are worth encoding:**
```bash
# Show source bitrate for all queued files — high-bitrate sources are candidates
while IFS= read -r f; do
bitrate=$(ffprobe -v quiet -show_entries format=bit_rate -of csv=p=0 "$f" 2>/dev/null)
echo "$bitrate $f"
done < ~/hevc_queue.txt | sort -rn | head -20
```
Files above ~8,000 kbits/s are typically good encode candidates. Files at 3,0005,000 kbits/s (typical YouTube/Twitch 1080p) will usually fail.