- fail2ban-digest-mode-fleet: recidive-only email model, sshd now silent, defaults-debian.conf gotcha added - netdata-docker-health-alarm-tuning: 30m/10m config, tuning history table - New: wp-fail2ban-logpath-debian-ubuntu, lora-adapter-gguf-conversion-fails, tailscale-status-json-hostname-localhost-ios - Various article updates and nav index refreshes Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
191 lines
6.9 KiB
Markdown
191 lines
6.9 KiB
Markdown
---
|
|
title: Fail2Ban Digest Mode — Fleet-Wide Quiet Alerts
|
|
domain: selfhosting
|
|
category: security
|
|
tags:
|
|
- fail2ban
|
|
- security
|
|
- email
|
|
- ansible
|
|
- fleet
|
|
- cron
|
|
- digest
|
|
status: published
|
|
created: 2026-04-22
|
|
updated: 2026-05-02T14:56
|
|
---
|
|
# Fail2Ban Digest Mode — Fleet-Wide Quiet Alerts
|
|
|
|
## The Problem
|
|
|
|
Fail2Ban's default `action_mwl` sends an email for every single ban — IP address, whois lookup, and relevant log lines. On a fleet of 9 servers running 50+ jails, this floods the inbox with hundreds of emails daily. The vast majority are routine scanner bans (probing `.env`, `wp-login.php`, random PHP paths) that require no human attention.
|
|
|
|
The signal-to-noise ratio is terrible. Genuinely important events — SSH brute-force attempts, recidive escalations — get buried in the noise.
|
|
|
|
## The Solution — Tiered Alert Model
|
|
|
|
Three tiers replace the firehose:
|
|
|
|
| Tier | Jails | Action | Why |
|
|
|------|-------|--------|-----|
|
|
| **Immediate email** | `recidive` | `action_mwl` | Repeat offenders only — someone has been banned multiple times across jails |
|
|
| **Silent ban** | Everything else | `action_` (default) | Ban happens, firewall rule applied, no email sent |
|
|
| **Daily digest** | All jails | Cron script at 08:00 UTC | One summary email per host with ban counts across all jails |
|
|
|
|
This reduces email volume from hundreds per day to ~10 (one digest per host + occasional recidive alerts).
|
|
|
|
## jail.local Configuration
|
|
|
|
### Set silent ban as the default
|
|
|
|
In `[DEFAULT]`:
|
|
|
|
```ini
|
|
[DEFAULT]
|
|
action = %(action_)s
|
|
```
|
|
|
|
This overrides the stock `action_mwl` for all jails. Bans still happen — the firewall rule is applied — but no email is sent.
|
|
|
|
### Keep immediate alerts for recidive only
|
|
|
|
```ini
|
|
[sshd]
|
|
enabled = true
|
|
action = %(action_)s
|
|
|
|
[recidive]
|
|
enabled = true
|
|
action = %(action_mwl)s
|
|
```
|
|
|
|
> **Updated 2026-05-02:** sshd was moved to silent (`action_`). Only recidive (repeat offenders) now triggers immediate email. sshd bans are captured in the daily digest.
|
|
|
|
### Clean up email subjects with fq-hostname
|
|
|
|
By default, fail2ban uses the system FQDN in email subjects. On Tailscale hosts, this produces ugly subjects like `[Fail2Ban] sshd: banned 1.2.3.4 on MajorToot.tail7f2d9.ts.net`. Override it in `[DEFAULT]`:
|
|
|
|
```ini
|
|
[DEFAULT]
|
|
fq-hostname = majortoot
|
|
```
|
|
|
|
This sets the `<fq-hostname>` tag used in action templates, producing cleaner subjects: `[Fail2Ban] sshd: banned 1.2.3.4 on majortoot`.
|
|
|
|
## Daily Digest Script
|
|
|
|
A shell script at `/usr/local/bin/fail2ban-digest.sh` runs via cron at 08:00 UTC. It does the following:
|
|
|
|
1. Queries every active jail via `fail2ban-client status <jail>`
|
|
2. Collects four metrics per jail: currently banned, total banned, currently failed, total failed
|
|
3. Builds a plain-text email body with one line per jail
|
|
4. Separates "active" jails (with bans) from "quiet" jails (zero bans) for quick scanning
|
|
5. Sets the subject line to include the total banned count (e.g., `[Fail2Ban Digest] majortoot — 47 total bans`)
|
|
6. Sends via `sendmail`
|
|
|
|
The script is templated and deployed by Ansible. It lives on each host at `/usr/local/bin/fail2ban-digest.sh`.
|
|
|
|
### Cron entry
|
|
|
|
```
|
|
0 8 * * * /usr/local/bin/fail2ban-digest.sh
|
|
```
|
|
|
|
Managed by `ansible.builtin.cron` — no manual crontab editing needed.
|
|
|
|
## Ansible Deployment
|
|
|
|
The playbook `configure_fail2ban_digest.yml` deploys the full digest model fleet-wide.
|
|
|
|
### What it does
|
|
|
|
1. Deploys a Python helper script that performs **section-aware editing** of `jail.local` (see gotchas below)
|
|
2. Sets `action = %(action_)s` in `[DEFAULT]` and `[sshd]`
|
|
3. Sets `action = %(action_mwl)s` in `[recidive]`
|
|
4. Removes stale `action = %(action_mwl)s` from `defaults-debian.conf` if present
|
|
4. Sets `fq-hostname` per host using an override dict
|
|
5. Deploys the digest script from a Jinja2 template
|
|
6. Creates the cron job via `ansible.builtin.cron`
|
|
7. Restarts fail2ban
|
|
|
|
### Host-specific overrides
|
|
|
|
Two dictionaries in the playbook vars handle per-host customization:
|
|
|
|
```yaml
|
|
fail2ban_sender_overrides:
|
|
majormail: "fail2ban@majorshouse.com"
|
|
dcaprod: "fail2ban@dcanalysts.net"
|
|
|
|
fail2ban_hostname_overrides:
|
|
majortoot: "majortoot"
|
|
teelia: "teelia"
|
|
majormail: "majormail"
|
|
```
|
|
|
|
These feed into the Python editor script and the digest template.
|
|
|
|
### Why not lineinfile?
|
|
|
|
The playbook uses a Python script for `jail.local` editing instead of Ansible's `lineinfile` module. This is deliberate — see the gotchas section below.
|
|
|
|
## Gotchas
|
|
|
|
### lineinfile matches stock action definitions
|
|
|
|
Using `lineinfile` with a regex like `regexp: '^action\s*=\s*%(action_mwl)s'` is dangerous. In a full `jail.local` that includes stock action *definitions* (copied from `jail.conf`), the regex matches lines like:
|
|
|
|
```ini
|
|
action_mwl = %(action_mwl)s
|
|
```
|
|
|
|
This is the stock definition of the `action_mwl` macro itself — not the `action =` assignment in `[DEFAULT]`. `lineinfile` replaces the wrong line, corrupting the config. Fail2ban then refuses to start because the `action_mwl` macro is undefined.
|
|
|
|
**Solution:** Use a Python script that parses INI sections and only modifies the `action` key within the correct `[section]`. This is what the Ansible playbook does.
|
|
|
|
### Duplicate action lines crash fail2ban
|
|
|
|
If a section ends up with two `action =` lines (e.g., from a botched `lineinfile` run), fail2ban refuses to load:
|
|
|
|
```
|
|
option 'action' in section 'DEFAULT' already exists
|
|
```
|
|
|
|
The Python editor script handles this by replacing existing keys rather than appending.
|
|
|
|
### defaults-debian.conf overrides jail.local
|
|
|
|
On Debian/Ubuntu, `/etc/fail2ban/jail.d/defaults-debian.conf` is loaded **after** `jail.local`. If it contains `action = %(action_mwl)s`, it silently overrides your silent default — every jail sends email on every ban. The Ansible playbook now removes this line automatically. If you see per-ban emails after deploying digest mode, check this file first:
|
|
|
|
```bash
|
|
grep action /etc/fail2ban/jail.d/defaults-debian.conf
|
|
```
|
|
|
|
### fq-hostname scope
|
|
|
|
Setting `fq-hostname` in `[DEFAULT]` affects all action templates that use the `<fq-hostname>` tag — including both immediate emails and the digest subject. This is the desired behavior, but be aware that it overrides the system hostname globally within fail2ban.
|
|
|
|
## Verifying the Setup
|
|
|
|
```bash
|
|
# Check that default action is silent
|
|
fail2ban-client get DEFAULT action
|
|
# Should NOT contain 'sendmail' or 'mail'
|
|
|
|
# Check that sshd still sends email
|
|
fail2ban-client get sshd action
|
|
# Should contain 'sendmail-whois-lines' or similar
|
|
|
|
# Trigger a test digest
|
|
/usr/local/bin/fail2ban-digest.sh
|
|
|
|
# Check cron is installed
|
|
crontab -l | grep fail2ban-digest
|
|
```
|
|
|
|
## See Also
|
|
|
|
- [fail2ban-apache-404-scanner-jail](fail2ban-apache-404-scanner-jail.md) — custom jail for catching 404 scanners
|
|
- [fail2ban-wordpress-login-jail](fail2ban-wordpress-login-jail.md) — WordPress brute-force jail
|
|
- [ssh-hardening-ansible-fleet](ssh-hardening-ansible-fleet.md) — fleet SSH hardening
|
|
- [clamav-fleet-deployment](clamav-fleet-deployment.md) — another fleet-wide security deployment via Ansible
|