diff --git a/05-troubleshooting/networking/fail2ban-ufw-rule-bloat-cleanup.md b/05-troubleshooting/networking/fail2ban-ufw-rule-bloat-cleanup.md new file mode 100644 index 0000000..6b0214f --- /dev/null +++ b/05-troubleshooting/networking/fail2ban-ufw-rule-bloat-cleanup.md @@ -0,0 +1,158 @@ +# Fail2ban & UFW Rule Bloat: 30k Rules Slowing Down a VPS + +## 🛑 Problem + +A small VPS (1–2 GB RAM) running Fail2ban with permanent bans (`bantime = -1`) gradually accumulates thousands of UFW DENY rules or nftables entries. Over time this causes: + +- High memory usage from Fail2ban (100+ MB RSS) +- Bloated nftables ruleset (30k+ rules) — every incoming packet must traverse the full list +- Netdata alerts flapping on RAM/swap thresholds +- Degraded packet processing performance + +--- + +## 🔍 Diagnosis + +### Step 1 — Check Fail2ban memory and thread count + +```bash +grep -E "VmRSS|VmSwap|Threads" /proc/$(pgrep -ox fail2ban-server)/status +``` + +On a small VPS, Fail2ban RSS over 80 MB is a red flag. Thread count scales with jail count (roughly 2 threads per jail + overhead). + +--- + +### Step 2 — Count nftables/UFW rules + +```bash +# Total drop/reject rules in nftables +nft list ruleset | grep -c "reject\|drop" + +# UFW rule file size +wc -l /etc/ufw/user.rules +``` + +A healthy UFW setup has 10–30 rules. Thousands means manual `ufw deny` commands or permanent Fail2ban bans have accumulated. + +--- + +### Step 3 — Identify dead jails + +```bash +for jail in $(fail2ban-client status | grep "Jail list" | sed 's/.*://;s/,/ /g'); do + total=$(fail2ban-client status $jail | grep "Total banned" | awk '{print $NF}') + echo "$jail: $total total bans" +done +``` + +Jails with zero total bans are dead weight — burning threads and regex cycles for nothing. + +--- + +### Step 4 — Check ban policy + +```bash +grep bantime /etc/fail2ban/jail.local +``` + +`bantime = -1` means permanent. On a public-facing server, scanner IPs rotate constantly — permanent bans just pile up with no benefit. + +--- + +## ✅ Solution + +### Fix 1 — Disable dead jails + +Edit `/etc/fail2ban/jail.local` and set `enabled = false` for any jail with zero historical bans. + +### Fix 2 — Switch to time-limited bans + +```ini +[DEFAULT] +bantime = 30d + +[recidive] +bantime = 90d +``` + +30 days is long enough to block active campaigns; repeat offenders get 90 days via recidive. Scanner IPs rarely persist beyond a week. + +### Fix 3 — Flush accumulated bans + +```bash +fail2ban-client unban --all +``` + +### Fix 4 — Reset bloated UFW rules + +**Back up first:** + +```bash +cp /etc/ufw/user.rules /etc/ufw/user.rules.bak +cp /etc/ufw/user6.rules /etc/ufw/user6.rules.bak +``` + +**Reset and re-add only legitimate ALLOW rules:** + +```bash +ufw --force reset +ufw default deny incoming +ufw default allow outgoing +ufw allow 443/tcp +ufw allow 80/tcp +ufw allow in on tailscale0 to any port 22 comment "SSH via Tailscale" +# Add any other ALLOW rules specific to your server +ufw --force enable +``` + +**Restart Fail2ban** so it re-creates its nftables chains: + +```bash +systemctl restart fail2ban +``` + +--- + +## 🔁 Why This Happens + +| Cause | Effect | +|---|---| +| `bantime = -1` (permanent) | Banned IP list grows forever; nftables rules never expire | +| Manual `ufw deny from ` | Each adds a persistent rule to `user.rules`; survives reboots | +| Many jails with no hits | Each jail spawns 2+ threads, runs regex against logs continuously | +| Small VPS (1–2 GB RAM) | Fail2ban + nftables overhead becomes significant fraction of total RAM | + +--- + +## ⚠️ Key Notes + +- **Deleting UFW rules one-by-one is impractical** at scale — `ufw delete` with 30k rules takes hours. A full reset + re-add is the only efficient path. +- **`ufw --force reset` also resets `before.rules` and `after.rules`** — UFW auto-backs these up, but verify your custom chains if any exist. +- **After flushing bans, expect a brief spike in 4xx responses** as scanners that were previously blocked hit Apache again. Fail2ban will re-ban them within minutes. +- **The Netdata `web_log_1m_successful` alert may fire** during this window — it will self-clear once bans repopulate. + +--- + +## 🔎 Quick Diagnostic Commands + +```bash +# Fail2ban memory usage +grep -E "VmRSS|VmSwap|Threads" /proc/$(pgrep -ox fail2ban-server)/status + +# Count nftables rules +nft list ruleset | grep -c "reject\|drop" + +# UFW rule count +ufw status numbered | tail -1 + +# List all jails with ban counts +for jail in $(fail2ban-client status | grep "Jail list" | sed 's/.*://;s/,/ /g'); do + banned=$(fail2ban-client status $jail | grep "Currently banned" | awk '{print $NF}') + total=$(fail2ban-client status $jail | grep "Total banned" | awk '{print $NF}') + echo "$jail: $banned current / $total total" +done + +# Flush all bans +fail2ban-client unban --all +``` diff --git a/05-troubleshooting/security/apache-dirscan-fail2ban-jail.md b/05-troubleshooting/security/apache-dirscan-fail2ban-jail.md new file mode 100644 index 0000000..4f2a7f4 --- /dev/null +++ b/05-troubleshooting/security/apache-dirscan-fail2ban-jail.md @@ -0,0 +1,113 @@ +# Custom Fail2ban Jail: Apache Directory Scanning & Junk Methods + +## 🛑 Problem + +Bots and vulnerability scanners enumerate WordPress directories (`/wp-admin/`, `/wp-includes/`, `/wp-content/`), probe for access-denied paths, or send junk HTTP methods (e.g., `YQEILVHZ`, `DUTEDCEM`). These generate Apache error log entries but are not caught by any default Fail2ban jail: + +- `AH01276` — directory index forbidden (autoindex:error) +- `AH01630` — client denied by server configuration (authz_core:error) +- `AH00135` — invalid method in request (core:error) + +The result is a low success ratio on Netdata's `web_log_1m_successful` metric and wasted server resources processing scanner requests. + +--- + +## ✅ Solution + +### Step 1 — Create the filter + +Create `/etc/fail2ban/filter.d/apache-dirscan.conf`: + +```ini +# Fail2ban filter for Apache scanning/probing +# Catches: directory enumeration (AH01276), access denied (AH01630), invalid methods (AH00135) + +[Definition] +failregex = ^\[.*\] \[autoindex:error\] \[pid \d+\] \[client :\d+\] AH01276: + ^\[.*\] \[authz_core:error\] \[pid \d+\] \[client :\d+\] AH01630: + ^\[.*\] \[core:error\] \[pid \d+\] \[client :\d+\] AH00135: + +ignoreregex = +``` + +### Step 2 — Add the jail + +Add to `/etc/fail2ban/jail.local`: + +```ini +[apache-dirscan] +enabled = true +port = http,https +filter = apache-dirscan +logpath = /var/log/apache2/error.log +maxretry = 3 +findtime = 60 +``` + +Three hits in 60 seconds is aggressive enough to catch active scanners while avoiding false positives from legitimate 403s. + +### Step 3 — Test the regex + +```bash +fail2ban-regex /var/log/apache2/error.log /etc/fail2ban/filter.d/apache-dirscan.conf +``` + +This shows match counts per regex line and any missed lines. + +### Step 4 — Reload Fail2ban + +```bash +fail2ban-client reload +fail2ban-client status apache-dirscan +``` + +--- + +## 🔍 What Each Pattern Catches + +| Error Code | Apache Module | Trigger | +|---|---|---| +| `AH01276` | `autoindex:error` | Bot requests a directory with no index file and `Options -Indexes` is set. Classic WordPress/CMS directory enumeration. | +| `AH01630` | `authz_core:error` | Request denied by `` or `` rules (e.g., probing `/wp-content/plugins/`). | +| `AH00135` | `core:error` | Request uses a garbage HTTP method that Apache can't parse. Scanners use these to fingerprint servers. | + +--- + +## 🔁 Why Default Jails Miss This + +| Default Jail | What It Catches | Gap | +|---|---|---| +| `apache-badbots` | Bad User-Agent strings in access log | Doesn't look at error log; many scanners use normal UAs | +| `apache-botsearch` | 404s for common exploit paths | Only matches access log 404s, not error log entries | +| `apache-noscript` | Requests for non-existent scripts | Narrow regex, doesn't cover directory probes | +| `apache-overflows` | Long request URIs | Only catches buffer overflow attempts | +| `apache-invaliduri` | `AH10244` invalid URI encoding | Different error code — catches URL-encoded traversal, not directory scanning | + +The `apache-dirscan` filter fills the gap by monitoring the error log for the three most common scanner signatures that slip through all default jails. + +--- + +## ⚠️ Key Notes + +- **`logpath` must point to the error log**, not the access log. All three patterns are logged to `error.log`. +- **Adjust `logpath`** for your distribution: Debian/Ubuntu uses `/var/log/apache2/error.log`, RHEL/Fedora uses `/var/log/httpd/error_log`. +- **The `allowipv6` warning** on reload is cosmetic (Fail2ban 1.0+) and can be ignored. +- **Pair with `recidive`** to escalate repeat offenders to longer bans. + +--- + +## 🔎 Quick Diagnostic Commands + +```bash +# Test filter against current error log +fail2ban-regex /var/log/apache2/error.log /etc/fail2ban/filter.d/apache-dirscan.conf + +# Check jail status +fail2ban-client status apache-dirscan + +# Watch bans in real time +tail -f /var/log/fail2ban.log | grep apache-dirscan + +# Count current error types +grep -c "AH01276\|AH01630\|AH00135" /var/log/apache2/error.log +``` diff --git a/SUMMARY.md b/SUMMARY.md index b8c1628..b1e5991 100644 --- a/SUMMARY.md +++ b/SUMMARY.md @@ -44,6 +44,8 @@ * [Mail Client Stops Receiving: Fail2ban IMAP Self-Ban](05-troubleshooting/networking/fail2ban-imap-self-ban-mail-client.md) * [firewalld: Mail Ports Wiped After Reload](05-troubleshooting/networking/firewalld-mail-ports-reset.md) * [Tailscale SSH: Unexpected Re-Authentication Prompt](05-troubleshooting/networking/tailscale-ssh-reauth-prompt.md) + * [Fail2ban & UFW Rule Bloat Cleanup](05-troubleshooting/networking/fail2ban-ufw-rule-bloat-cleanup.md) + * [Custom Fail2ban Jail: Apache Directory Scanning](05-troubleshooting/security/apache-dirscan-fail2ban-jail.md) * [Nextcloud AIO Unhealthy 20h After Nightly Update](05-troubleshooting/docker/nextcloud-aio-unhealthy-20h-stuck.md) * [Docker & Caddy Recovery After Reboot (Fedora + SELinux)](05-troubleshooting/docker-caddy-selinux-post-reboot-recovery.md) * [ISP SNI Filtering with Caddy](05-troubleshooting/isp-sni-filtering-caddy.md)