Files
MajorWiki/03-opensource/productivity/rmlint-duplicate-scanning.md
MajorLinux 9490781740 wiki: remove Obsidian-style hashtag tags from 12 articles
These #hashtag tag lines render as plain text on MkDocs. All articles
already have tags in YAML frontmatter, so the inline tags were redundant.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-02 11:03:28 -04:00

55 lines
1.5 KiB
Markdown

# rmlint — Extreme Duplicate File Scanning
## Problem
Over time, backups and media collections can accumulate massive amounts of duplicate data. Traditional duplicate finders are often slow and limited in how they handle results. On MajorRAID, I identified **~4.0 TB (113,584 files)** of duplicate data across three different storage points.
## Solution
`rmlint` is an extremely fast tool for finding (and optionally removing) duplicates. It is significantly faster than `fdupes` or `rdfind` because it uses a multi-stage approach to avoid unnecessary hashing.
### 1. Installation (Fedora)
```bash
sudo dnf install rmlint
```
### 2. Scanning Multiple Directories
To scan for duplicates across multiple mount points and compare them:
```bash
rmlint /majorstorage /majorRAID /mnt/usb
```
This will generate a script named `rmlint.sh` and a summary of the findings.
### 3. Reviewing Results
**DO NOT** run the generated script without reviewing it first. You can use the summary to see which paths contain the most duplicates:
```bash
# View the summary
cat rmlint.json | jq .
```
### 4. Advanced Usage: Finding Duplicates by Hash Only
If you suspect duplicates with different filenames:
```bash
rmlint --hidden --hard-links /path/to/search
```
### 5. Repurposing Storage
After scanning and clearing duplicates, you can reclaim significant space. In my case, this was the first step in repurposing a 12TB USB drive as a **SnapRAID parity drive**.
---
## Maintenance
Run a scan monthly or before any major storage consolidation project.
---