These #hashtag tag lines render as plain text on MkDocs. All articles already have tags in YAML frontmatter, so the inline tags were redundant. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
1.5 KiB
rmlint — Extreme Duplicate File Scanning
Problem
Over time, backups and media collections can accumulate massive amounts of duplicate data. Traditional duplicate finders are often slow and limited in how they handle results. On MajorRAID, I identified ~4.0 TB (113,584 files) of duplicate data across three different storage points.
Solution
rmlint is an extremely fast tool for finding (and optionally removing) duplicates. It is significantly faster than fdupes or rdfind because it uses a multi-stage approach to avoid unnecessary hashing.
1. Installation (Fedora)
sudo dnf install rmlint
2. Scanning Multiple Directories
To scan for duplicates across multiple mount points and compare them:
rmlint /majorstorage /majorRAID /mnt/usb
This will generate a script named rmlint.sh and a summary of the findings.
3. Reviewing Results
DO NOT run the generated script without reviewing it first. You can use the summary to see which paths contain the most duplicates:
# View the summary
cat rmlint.json | jq .
4. Advanced Usage: Finding Duplicates by Hash Only
If you suspect duplicates with different filenames:
rmlint --hidden --hard-links /path/to/search
5. Repurposing Storage
After scanning and clearing duplicates, you can reclaim significant space. In my case, this was the first step in repurposing a 12TB USB drive as a SnapRAID parity drive.
Maintenance
Run a scan monthly or before any major storage consolidation project.