Files
MajorWiki/03-opensource/productivity/rmlint-duplicate-scanning.md
MajorLinux 6592eb4fea wiki: audit fixes — broken links, wikilinks, frontmatter, stale content (66 files)
- Fixed 4 broken markdown links (bad relative paths in See Also sections)
- Corrected n8n port binding to 127.0.0.1:5678 (matches actual deployment)
- Updated SnapRAID article with actual majorhome paths (/majorRAID, disk1-3)
- Converted 67 Obsidian wikilinks to relative markdown links or plain text
- Added YAML frontmatter to 35 articles missing it entirely
- Completed frontmatter on 8 articles with missing fields

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-02 11:16:29 -04:00

64 lines
1.7 KiB
Markdown

---
title: "rmlint — Extreme Duplicate File Scanning"
domain: opensource
category: productivity
tags: [rmlint, duplicates, storage, cleanup, linux]
status: published
created: 2026-04-02
updated: 2026-04-02
---
# rmlint — Extreme Duplicate File Scanning
## Problem
Over time, backups and media collections can accumulate massive amounts of duplicate data. Traditional duplicate finders are often slow and limited in how they handle results. On MajorRAID, I identified **~4.0 TB (113,584 files)** of duplicate data across three different storage points.
## Solution
`rmlint` is an extremely fast tool for finding (and optionally removing) duplicates. It is significantly faster than `fdupes` or `rdfind` because it uses a multi-stage approach to avoid unnecessary hashing.
### 1. Installation (Fedora)
```bash
sudo dnf install rmlint
```
### 2. Scanning Multiple Directories
To scan for duplicates across multiple mount points and compare them:
```bash
rmlint /majorstorage /majorRAID /mnt/usb
```
This will generate a script named `rmlint.sh` and a summary of the findings.
### 3. Reviewing Results
**DO NOT** run the generated script without reviewing it first. You can use the summary to see which paths contain the most duplicates:
```bash
# View the summary
cat rmlint.json | jq .
```
### 4. Advanced Usage: Finding Duplicates by Hash Only
If you suspect duplicates with different filenames:
```bash
rmlint --hidden --hard-links /path/to/search
```
### 5. Repurposing Storage
After scanning and clearing duplicates, you can reclaim significant space. In my case, this was the first step in repurposing a 12TB USB drive as a **SnapRAID parity drive**.
---
## Maintenance
Run a scan monthly or before any major storage consolidation project.
---