MajorWiki/03-opensource/productivity/rmlint-duplicate-scanning.md

# rmlint — Extreme Duplicate File Scanning

## Problem

Over time, backups and media collections can accumulate massive amounts of duplicate data. Traditional duplicate finders are often slow and limited in how they handle results. On MajorRAID, I identified **~4.0 TB (113,584 files)** of duplicate data across three different storage points.

## Solution

`rmlint` is an extremely fast tool for finding (and optionally removing) duplicates. It is significantly faster than `fdupes` or `rdfind` because it uses a multi-stage approach to avoid unnecessary hashing.

### 1. Installation (Fedora)

```bash
sudo dnf install rmlint
```

### 2. Scanning Multiple Directories

To scan for duplicates across multiple mount points and compare them:

```bash
rmlint /majorstorage /majorRAID /mnt/usb
```

This will generate a script named `rmlint.sh` and a summary of the findings.

### 3. Reviewing Results

**DO NOT** run the generated script without reviewing it first. You can use the summary to see which paths contain the most duplicates:

```bash
# View the summary
cat rmlint.json | jq .
```

### 4. Advanced Usage: Finding Duplicates by Hash Only

If you suspect duplicates with different filenames:

```bash
rmlint --hidden --hard-links /path/to/search
```

### 5. Repurposing Storage

After scanning and clearing duplicates, you can reclaim significant space. In my case, this was the first step in repurposing a 12TB USB drive as a **SnapRAID parity drive**.

---

## Maintenance

Run a scan monthly or before any major storage consolidation project.

---