--- title: "rmlint — Extreme Duplicate File Scanning" domain: opensource category: productivity tags: [rmlint, duplicates, storage, cleanup, linux] status: published created: 2026-04-02 updated: 2026-04-02 --- # rmlint — Extreme Duplicate File Scanning ## Problem Over time, backups and media collections can accumulate massive amounts of duplicate data. Traditional duplicate finders are often slow and limited in how they handle results. On MajorRAID, I identified **~4.0 TB (113,584 files)** of duplicate data across three different storage points. ## Solution `rmlint` is an extremely fast tool for finding (and optionally removing) duplicates. It is significantly faster than `fdupes` or `rdfind` because it uses a multi-stage approach to avoid unnecessary hashing. ### 1. Installation (Fedora) ```bash sudo dnf install rmlint ``` ### 2. Scanning Multiple Directories To scan for duplicates across multiple mount points and compare them: ```bash rmlint /majorstorage /majorRAID /mnt/usb ``` This will generate a script named `rmlint.sh` and a summary of the findings. ### 3. Reviewing Results **DO NOT** run the generated script without reviewing it first. You can use the summary to see which paths contain the most duplicates: ```bash # View the summary cat rmlint.json | jq . ``` ### 4. Advanced Usage: Finding Duplicates by Hash Only If you suspect duplicates with different filenames: ```bash rmlint --hidden --hard-links /path/to/search ``` ### 5. Repurposing Storage After scanning and clearing duplicates, you can reclaim significant space. In my case, this was the first step in repurposing a 12TB USB drive as a **SnapRAID parity drive**. --- ## Maintenance Run a scan monthly or before any major storage consolidation project. ---