59 lines
1.6 KiB
Markdown
59 lines
1.6 KiB
Markdown
# rmlint — Extreme Duplicate File Scanning
|
|
|
|
## Problem
|
|
|
|
Over time, backups and media collections can accumulate massive amounts of duplicate data. Traditional duplicate finders are often slow and limited in how they handle results. On MajorRAID, I identified **~4.0 TB (113,584 files)** of duplicate data across three different storage points.
|
|
|
|
## Solution
|
|
|
|
`rmlint` is an extremely fast tool for finding (and optionally removing) duplicates. It is significantly faster than `fdupes` or `rdfind` because it uses a multi-stage approach to avoid unnecessary hashing.
|
|
|
|
### 1. Installation (Fedora)
|
|
|
|
```bash
|
|
sudo dnf install rmlint
|
|
```
|
|
|
|
### 2. Scanning Multiple Directories
|
|
|
|
To scan for duplicates across multiple mount points and compare them:
|
|
|
|
```bash
|
|
rmlint /majorstorage /majorRAID /mnt/usb
|
|
```
|
|
|
|
This will generate a script named `rmlint.sh` and a summary of the findings.
|
|
|
|
### 3. Reviewing Results
|
|
|
|
**DO NOT** run the generated script without reviewing it first. You can use the summary to see which paths contain the most duplicates:
|
|
|
|
```bash
|
|
# View the summary
|
|
cat rmlint.json | jq .
|
|
```
|
|
|
|
### 4. Advanced Usage: Finding Duplicates by Hash Only
|
|
|
|
If you suspect duplicates with different filenames:
|
|
|
|
```bash
|
|
rmlint --hidden --hard-links /path/to/search
|
|
```
|
|
|
|
### 5. Repurposing Storage
|
|
|
|
After scanning and clearing duplicates, you can reclaim significant space. In my case, this was the first step in repurposing a 12TB USB drive as a **SnapRAID parity drive**.
|
|
|
|
---
|
|
|
|
## Maintenance
|
|
|
|
Run a scan monthly or before any major storage consolidation project.
|
|
|
|
---
|
|
|
|
## Tags
|
|
|
|
#rmlint #linux #storage #cleanup #duplicates
|