wiki: add fail2ban UFW rule bloat and Apache dirscan jail articles (56 articles)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-28 00:54:06 -04:00
parent 565b37a605
commit c7c7c9e5be
3 changed files with 273 additions and 0 deletions
--- a/05-troubleshooting/networking/fail2ban-ufw-rule-bloat-cleanup.md
+++ b/05-troubleshooting/networking/fail2ban-ufw-rule-bloat-cleanup.md
@@ -0,0 +1,158 @@
+# Fail2ban & UFW Rule Bloat: 30k Rules Slowing Down a VPS
+
+## 🛑 Problem
+
+A small VPS (1–2 GB RAM) running Fail2ban with permanent bans (`bantime = -1`) gradually accumulates thousands of UFW DENY rules or nftables entries. Over time this causes:
+
+- High memory usage from Fail2ban (100+ MB RSS)
+- Bloated nftables ruleset (30k+ rules) — every incoming packet must traverse the full list
+- Netdata alerts flapping on RAM/swap thresholds
+- Degraded packet processing performance
+
+---
+
+## 🔍 Diagnosis
+
+### Step 1 — Check Fail2ban memory and thread count
+
+```bash
+grep -E "VmRSS|VmSwap|Threads" /proc/$(pgrep -ox fail2ban-server)/status
+```
+
+On a small VPS, Fail2ban RSS over 80 MB is a red flag. Thread count scales with jail count (roughly 2 threads per jail + overhead).
+
+---
+
+### Step 2 — Count nftables/UFW rules
+
+```bash
+# Total drop/reject rules in nftables
+nft list ruleset | grep -c "reject\|drop"
+
+# UFW rule file size
+wc -l /etc/ufw/user.rules
+```
+
+A healthy UFW setup has 10–30 rules. Thousands means manual `ufw deny` commands or permanent Fail2ban bans have accumulated.
+
+---
+
+### Step 3 — Identify dead jails
+
+```bash
+for jail in $(fail2ban-client status | grep "Jail list" | sed 's/.*://;s/,/ /g'); do
+  total=$(fail2ban-client status $jail | grep "Total banned" | awk '{print $NF}')
+  echo "$jail: $total total bans"
+done
+```
+
+Jails with zero total bans are dead weight — burning threads and regex cycles for nothing.
+
+---
+
+### Step 4 — Check ban policy
+
+```bash
+grep bantime /etc/fail2ban/jail.local
+```
+
+`bantime = -1` means permanent. On a public-facing server, scanner IPs rotate constantly — permanent bans just pile up with no benefit.
+
+---
+
+## ✅ Solution
+
+### Fix 1 — Disable dead jails
+
+Edit `/etc/fail2ban/jail.local` and set `enabled = false` for any jail with zero historical bans.
+
+### Fix 2 — Switch to time-limited bans
+
+```ini
+[DEFAULT]
+bantime = 30d
+
+[recidive]
+bantime = 90d
+```
+
+30 days is long enough to block active campaigns; repeat offenders get 90 days via recidive. Scanner IPs rarely persist beyond a week.
+
+### Fix 3 — Flush accumulated bans
+
+```bash
+fail2ban-client unban --all
+```
+
+### Fix 4 — Reset bloated UFW rules
+
+**Back up first:**
+
+```bash
+cp /etc/ufw/user.rules /etc/ufw/user.rules.bak
+cp /etc/ufw/user6.rules /etc/ufw/user6.rules.bak
+```
+
+**Reset and re-add only legitimate ALLOW rules:**
+
+```bash
+ufw --force reset
+ufw default deny incoming
+ufw default allow outgoing
+ufw allow 443/tcp
+ufw allow 80/tcp
+ufw allow in on tailscale0 to any port 22 comment "SSH via Tailscale"
+# Add any other ALLOW rules specific to your server
+ufw --force enable
+```
+
+**Restart Fail2ban** so it re-creates its nftables chains:
+
+```bash
+systemctl restart fail2ban
+```
+
+---
+
+## 🔁 Why This Happens
+
+| Cause | Effect |
+|---|---|
+| `bantime = -1` (permanent) | Banned IP list grows forever; nftables rules never expire |
+| Manual `ufw deny from <IP>` | Each adds a persistent rule to `user.rules`; survives reboots |
+| Many jails with no hits | Each jail spawns 2+ threads, runs regex against logs continuously |
+| Small VPS (1–2 GB RAM) | Fail2ban + nftables overhead becomes significant fraction of total RAM |
+
+---
+
+## ⚠️ Key Notes
+
+- **Deleting UFW rules one-by-one is impractical** at scale — `ufw delete` with 30k rules takes hours. A full reset + re-add is the only efficient path.
+- **`ufw --force reset` also resets `before.rules` and `after.rules`** — UFW auto-backs these up, but verify your custom chains if any exist.
+- **After flushing bans, expect a brief spike in 4xx responses** as scanners that were previously blocked hit Apache again. Fail2ban will re-ban them within minutes.
+- **The Netdata `web_log_1m_successful` alert may fire** during this window — it will self-clear once bans repopulate.
+
+---
+
+## 🔎 Quick Diagnostic Commands
+
+```bash
+# Fail2ban memory usage
+grep -E "VmRSS|VmSwap|Threads" /proc/$(pgrep -ox fail2ban-server)/status
+
+# Count nftables rules
+nft list ruleset | grep -c "reject\|drop"
+
+# UFW rule count
+ufw status numbered | tail -1
+
+# List all jails with ban counts
+for jail in $(fail2ban-client status | grep "Jail list" | sed 's/.*://;s/,/ /g'); do
+  banned=$(fail2ban-client status $jail | grep "Currently banned" | awk '{print $NF}')
+  total=$(fail2ban-client status $jail | grep "Total banned" | awk '{print $NF}')
+  echo "$jail: $banned current / $total total"
+done
+
+# Flush all bans
+fail2ban-client unban --all
+```
--- a/05-troubleshooting/security/apache-dirscan-fail2ban-jail.md
+++ b/05-troubleshooting/security/apache-dirscan-fail2ban-jail.md
@@ -0,0 +1,113 @@
+# Custom Fail2ban Jail: Apache Directory Scanning & Junk Methods
+
+## 🛑 Problem
+
+Bots and vulnerability scanners enumerate WordPress directories (`/wp-admin/`, `/wp-includes/`, `/wp-content/`), probe for access-denied paths, or send junk HTTP methods (e.g., `YQEILVHZ`, `DUTEDCEM`). These generate Apache error log entries but are not caught by any default Fail2ban jail:
+
+- `AH01276` — directory index forbidden (autoindex:error)
+- `AH01630` — client denied by server configuration (authz_core:error)
+- `AH00135` — invalid method in request (core:error)
+
+The result is a low success ratio on Netdata's `web_log_1m_successful` metric and wasted server resources processing scanner requests.
+
+---
+
+## ✅ Solution
+
+### Step 1 — Create the filter
+
+Create `/etc/fail2ban/filter.d/apache-dirscan.conf`:
+
+```ini
+# Fail2ban filter for Apache scanning/probing
+# Catches: directory enumeration (AH01276), access denied (AH01630), invalid methods (AH00135)
+
+[Definition]
+failregex = ^\[.*\] \[autoindex:error\] \[pid \d+\] \[client <HOST>:\d+\] AH01276:
+            ^\[.*\] \[authz_core:error\] \[pid \d+\] \[client <HOST>:\d+\] AH01630:
+            ^\[.*\] \[core:error\] \[pid \d+\] \[client <HOST>:\d+\] AH00135:
+
+ignoreregex =
+```
+
+### Step 2 — Add the jail
+
+Add to `/etc/fail2ban/jail.local`:
+
+```ini
+[apache-dirscan]
+enabled = true
+port = http,https
+filter = apache-dirscan
+logpath = /var/log/apache2/error.log
+maxretry = 3
+findtime = 60
+```
+
+Three hits in 60 seconds is aggressive enough to catch active scanners while avoiding false positives from legitimate 403s.
+
+### Step 3 — Test the regex
+
+```bash
+fail2ban-regex /var/log/apache2/error.log /etc/fail2ban/filter.d/apache-dirscan.conf
+```
+
+This shows match counts per regex line and any missed lines.
+
+### Step 4 — Reload Fail2ban
+
+```bash
+fail2ban-client reload
+fail2ban-client status apache-dirscan
+```
+
+---
+
+## 🔍 What Each Pattern Catches
+
+| Error Code | Apache Module | Trigger |
+|---|---|---|
+| `AH01276` | `autoindex:error` | Bot requests a directory with no index file and `Options -Indexes` is set. Classic WordPress/CMS directory enumeration. |
+| `AH01630` | `authz_core:error` | Request denied by `<Directory>` or `<Location>` rules (e.g., probing `/wp-content/plugins/`). |
+| `AH00135` | `core:error` | Request uses a garbage HTTP method that Apache can't parse. Scanners use these to fingerprint servers. |
+
+---
+
+## 🔁 Why Default Jails Miss This
+
+| Default Jail | What It Catches | Gap |
+|---|---|---|
+| `apache-badbots` | Bad User-Agent strings in access log | Doesn't look at error log; many scanners use normal UAs |
+| `apache-botsearch` | 404s for common exploit paths | Only matches access log 404s, not error log entries |
+| `apache-noscript` | Requests for non-existent scripts | Narrow regex, doesn't cover directory probes |
+| `apache-overflows` | Long request URIs | Only catches buffer overflow attempts |
+| `apache-invaliduri` | `AH10244` invalid URI encoding | Different error code — catches URL-encoded traversal, not directory scanning |
+
+The `apache-dirscan` filter fills the gap by monitoring the error log for the three most common scanner signatures that slip through all default jails.
+
+---
+
+## ⚠️ Key Notes
+
+- **`logpath` must point to the error log**, not the access log. All three patterns are logged to `error.log`.
+- **Adjust `logpath`** for your distribution: Debian/Ubuntu uses `/var/log/apache2/error.log`, RHEL/Fedora uses `/var/log/httpd/error_log`.
+- **The `allowipv6` warning** on reload is cosmetic (Fail2ban 1.0+) and can be ignored.
+- **Pair with `recidive`** to escalate repeat offenders to longer bans.
+
+---
+
+## 🔎 Quick Diagnostic Commands
+
+```bash
+# Test filter against current error log
+fail2ban-regex /var/log/apache2/error.log /etc/fail2ban/filter.d/apache-dirscan.conf
+
+# Check jail status
+fail2ban-client status apache-dirscan
+
+# Watch bans in real time
+tail -f /var/log/fail2ban.log | grep apache-dirscan
+
+# Count current error types
+grep -c "AH01276\|AH01630\|AH00135" /var/log/apache2/error.log
+```
--- a/SUMMARY.md
+++ b/SUMMARY.md
@@ -44,6 +44,8 @@
    * [Mail Client Stops Receiving: Fail2ban IMAP Self-Ban](05-troubleshooting/networking/fail2ban-imap-self-ban-mail-client.md)
    * [firewalld: Mail Ports Wiped After Reload](05-troubleshooting/networking/firewalld-mail-ports-reset.md)
    * [Tailscale SSH: Unexpected Re-Authentication Prompt](05-troubleshooting/networking/tailscale-ssh-reauth-prompt.md)
+    * [Fail2ban & UFW Rule Bloat Cleanup](05-troubleshooting/networking/fail2ban-ufw-rule-bloat-cleanup.md)
+    * [Custom Fail2ban Jail: Apache Directory Scanning](05-troubleshooting/security/apache-dirscan-fail2ban-jail.md)
    * [Nextcloud AIO Unhealthy 20h After Nightly Update](05-troubleshooting/docker/nextcloud-aio-unhealthy-20h-stuck.md)
    * [Docker & Caddy Recovery After Reboot (Fedora + SELinux)](05-troubleshooting/docker-caddy-selinux-post-reboot-recovery.md)
    * [ISP SNI Filtering with Caddy](05-troubleshooting/isp-sni-filtering-caddy.md)