ClamAV fleet article: add DigitalOcean monitoring caveat for 1vCPU droplets

DO's hypervisor-level CPU metric doesn't know about nice/ionice — a "polite" weekly clamscan on a 1 vCPU droplet still reads 100% utilization and trips a default >85%/5m alert. Adds a new section explaining the trade-off and providing the DO API recipe (PUT existing alert with explicit entities, POST a new relaxed alert scoped to the small droplet) plus when not to bother (2+ vCPU boxes won't trip). Triggered by the 2026-05-10 teelia incident where the weekly cron fired the fleet-wide CPU alert despite the cron script already wrapping clamscan in nice 19 + ionice idle + cgroup memory limits. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-10 02:24:17 -04:00 · 2026-05-10 02:24:17 -04:00 · af14e36caf
commit af14e36caf
parent 545df9f5c6
2 changed files with 73 additions and 1 deletions
--- a/02-selfhosting/security/clamav-fleet-deployment.md
+++ b/02-selfhosting/security/clamav-fleet-deployment.md
@ -11,7 +11,7 @@ tags:
  - cron
 status: published
 created: 2026-04-18
-updated: 2026-04-30T05:21
+updated: 2026-05-10T01:50
 ---
 # ClamAV Fleet Deployment with Ansible

@ -147,6 +147,77 @@ clamscan /tmp/eicar-test.txt
 rm /tmp/eicar-test.txt
 ```

+## DigitalOcean Monitoring Caveat (1 vCPU droplets)
+
+`nice -n 19 ionice -c 3` plus `MemoryMax`/`MemorySwapMax` cgroups make clamscan "polite" to the Linux scheduler — it yields to PHP-FPM, MySQL, etc. instantly. **But hypervisor-level CPU monitoring (DigitalOcean, Linode, Hetzner) doesn't know about niceness.** It sees raw CPU utilization. On a 1 vCPU droplet during quiet hours, a single-threaded clamscan can fill 100% of the vCPU on its own, tripping a default `>85%/5m` CPU alert every week — even though the workload is genuinely insulating real traffic.
+
+**Symptoms:**
+- Weekly `[ALERT] CPU is running high` email from DO at the same time/day every week
+- The alert clears within 10–60 min (when scan finishes)
+- No actual user-visible service degradation
+- Netdata shows CPU 80–100% but PHP-FPM/MySQL response times barely move
+
+**Fix: per-droplet alert scoping.** Two changes via the DO API:
+
+1. **Scope the existing fleet-wide CPU alert to exclude affected 1 vCPU droplets** by setting `entities` to an explicit array of *all other* droplet IDs.
+2. **Add a new alert scoped to just the affected droplet(s)** with a relaxed threshold:
+   - `value: 95`
+   - `window: "30m"`
+   - `entities: [<droplet_id>]`
+
+The relaxed threshold still catches runaway PHP loops, mining trojans, and actual sustained saturation — but ignores the weekly polite scan.
+
+### Apply via DO API
+
+```bash
+TOKEN="<your DigitalOcean PAT>"
+
+# 1. Scope existing CPU alert (PUT requires the full alert spec)
+curl -sS -X PUT \
+  -H "Authorization: Bearer $TOKEN" \
+  -H "Content-Type: application/json" \
+  -d '{
+    "alerts": {"email": ["you@example.com"], "slack": []},
+    "compare": "GreaterThan",
+    "description": "CPU is running high (excludes 1vCPU clamscan boxes)",
+    "enabled": true,
+    "entities": ["<droplet_id_1>", "<droplet_id_2>"],
+    "tags": [],
+    "type": "v1/insights/droplet/cpu",
+    "value": 85,
+    "window": "5m"
+  }' \
+  "https://api.digitalocean.com/v2/monitoring/alerts/<existing_uuid>"
+
+# 2. Create a relaxed alert for the small box
+curl -sS -X POST \
+  -H "Authorization: Bearer $TOKEN" \
+  -H "Content-Type: application/json" \
+  -d '{
+    "alerts": {"email": ["you@example.com"], "slack": []},
+    "compare": "GreaterThan",
+    "description": "<host> CPU sustained high (clamscan-aware)",
+    "enabled": true,
+    "entities": ["<small_droplet_id>"],
+    "tags": [],
+    "type": "v1/insights/droplet/cpu",
+    "value": 95,
+    "window": "30m"
+  }' \
+  "https://api.digitalocean.com/v2/monitoring/alerts"
+```
+
+To list current alerts (find UUIDs and current `entities`):
+
+```bash
+curl -sS -H "Authorization: Bearer $TOKEN" \
+  "https://api.digitalocean.com/v2/monitoring/alerts" | jq
+```
+
+**When *not* to do this:** If your droplet has 2+ vCPUs and clamscan only consumes ~50% of total, you probably won't trip an 85% alert in the first place. The per-droplet exemption is mainly for 1 vCPU boxes.
+
+**Alternative considered: switch to `clamdscan`** — uses a resident `clamd` daemon, signatures stay loaded, scan finishes ~10× faster with much less CPU/RAM. Better long-term answer, but requires running `clamd` continuously (memory cost on small boxes is ~250 MB resident vs the cron approach which only holds RAM during scan). Trade-off, not strictly better.
+
 ## See Also

 - [clamscan-cpu-spike-nice-ionice](../../05-troubleshooting/security/clamscan-cpu-spike-nice-ionice.md) — troubleshooting CPU spikes from unthrottled scans
--- a/index.md
+++ b/index.md
@ -217,6 +217,7 @@ updated: 2026-05-10T01:30

 | Date | Article | Domain |
 |---|---|---|
+| 2026-05-10 | [ClamAV Fleet Deployment with Ansible](02-selfhosting/security/clamav-fleet-deployment.md) — added DigitalOcean monitoring caveat for 1vCPU droplets | Self-Hosting |
 | 2026-05-10 | [Claude Desktop MCP Mass-Disconnect After Blocking SSH Reboot](05-troubleshooting/claude-desktop-mcp-mass-disconnect-blocking-reboot.md) | Troubleshooting |
 | 2026-05-10 | [Castopod Posts Don't Appear on Mastodon — Diagnosing the Federation Path](05-troubleshooting/security/castopod-broadcast-not-on-mastodon.md) | Troubleshooting |
 | 2026-05-08 | [Castopod: Stale Federated Avatar URLs After Remote Profile Updates](05-troubleshooting/security/castopod-stale-federated-avatar.md) | Troubleshooting |