Files
MajorWiki/03-opensource/productivity/rmlint-duplicate-scanning.md
MajorLinux 6592eb4fea wiki: audit fixes — broken links, wikilinks, frontmatter, stale content (66 files)
- Fixed 4 broken markdown links (bad relative paths in See Also sections)
- Corrected n8n port binding to 127.0.0.1:5678 (matches actual deployment)
- Updated SnapRAID article with actual majorhome paths (/majorRAID, disk1-3)
- Converted 67 Obsidian wikilinks to relative markdown links or plain text
- Added YAML frontmatter to 35 articles missing it entirely
- Completed frontmatter on 8 articles with missing fields

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-02 11:16:29 -04:00

1.7 KiB

title, domain, category, tags, status, created, updated
title domain category tags status created updated
rmlint — Extreme Duplicate File Scanning opensource productivity
rmlint
duplicates
storage
cleanup
linux
published 2026-04-02 2026-04-02

rmlint — Extreme Duplicate File Scanning

Problem

Over time, backups and media collections can accumulate massive amounts of duplicate data. Traditional duplicate finders are often slow and limited in how they handle results. On MajorRAID, I identified ~4.0 TB (113,584 files) of duplicate data across three different storage points.

Solution

rmlint is an extremely fast tool for finding (and optionally removing) duplicates. It is significantly faster than fdupes or rdfind because it uses a multi-stage approach to avoid unnecessary hashing.

1. Installation (Fedora)

sudo dnf install rmlint

2. Scanning Multiple Directories

To scan for duplicates across multiple mount points and compare them:

rmlint /majorstorage /majorRAID /mnt/usb

This will generate a script named rmlint.sh and a summary of the findings.

3. Reviewing Results

DO NOT run the generated script without reviewing it first. You can use the summary to see which paths contain the most duplicates:

# View the summary
cat rmlint.json | jq .

4. Advanced Usage: Finding Duplicates by Hash Only

If you suspect duplicates with different filenames:

rmlint --hidden --hard-links /path/to/search

5. Repurposing Storage

After scanning and clearing duplicates, you can reclaim significant space. In my case, this was the first step in repurposing a 12TB USB drive as a SnapRAID parity drive.


Maintenance

Run a scan monthly or before any major storage consolidation project.