ClamAV fleet caveat: add follow-up on the polite-CPU-on-1vCPU edge case

Same-day correction. The proposed per-droplet relaxed alert (>95%/30m) turned out to also trip on a 1 vCPU box during low-traffic weekly scans, because there's literally no real load for nice 19 to yield to — clamscan opportunistically fills the vCPU and DO sees 100% utilization regardless of `%nice` vs `%user` split. Documents the three realistic options (accept page / switch to clamdscan / disable alert) and the underlying limit (no DO threshold can distinguish polite from impolite CPU when the box is fully utilized). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-10 02:32:35 -04:00 · 2026-05-10 02:32:35 -04:00 · a852f7b7bd
commit a852f7b7bd
parent af14e36caf
2 changed files with 9 additions and 1 deletions
--- a/02-selfhosting/security/clamav-fleet-deployment.md
+++ b/02-selfhosting/security/clamav-fleet-deployment.md
@ -216,6 +216,14 @@ curl -sS -H "Authorization: Bearer $TOKEN" \

 **When *not* to do this:** If your droplet has 2+ vCPUs and clamscan only consumes ~50% of total, you probably won't trip an 85% alert in the first place. The per-droplet exemption is mainly for 1 vCPU boxes.

+**When the per-droplet relaxed alert *also* trips (and what to do):** On a 1 vCPU droplet during low-traffic hours (e.g., the default Sunday-morning weekly cron window), clamscan has *nothing real to yield to* — `nice 19` only matters when something else wants the CPU. The kernel correctly schedules clamscan as nice/idle (`iostat` shows `%nice ~94, %idle 0`) but DO sees `100% - 0% idle = 100% CPU` and trips even the 95%/30m threshold for the duration of the scan (~30–50 min on small webserver boxes). At that point the realistic options are:
+
+1. **Accept the weekly page** as expected noise — simplest, no further engineering
+2. **Switch to `clamdscan`** (daemon-backed) — scans finish ~3–5× faster and fit in a 30m window, but `clamd` adds ~250 MB resident memory continuously
+3. **Disable the per-droplet CPU alert entirely** for that host and rely on Netdata for the real signal
+
+The "polite CPU is invisible to DO" trick stops working once the box is small enough that the polite work fills the entire core unopposed. There is no DO threshold that distinguishes "polite scan filling idle CPU" from "runaway process pinning the vCPU" — that distinction lives in `iostat`'s `%nice` vs `%user` split, which DO doesn't expose.
+
 **Alternative considered: switch to `clamdscan`** — uses a resident `clamd` daemon, signatures stay loaded, scan finishes ~10× faster with much less CPU/RAM. Better long-term answer, but requires running `clamd` continuously (memory cost on small boxes is ~250 MB resident vs the cron approach which only holds RAM during scan). Trade-off, not strictly better.

 ## See Also
--- a/index.md
+++ b/index.md
@ -217,7 +217,7 @@ updated: 2026-05-10T01:30

 | Date | Article | Domain |
 |---|---|---|
-| 2026-05-10 | [ClamAV Fleet Deployment with Ansible](02-selfhosting/security/clamav-fleet-deployment.md) — added DigitalOcean monitoring caveat for 1vCPU droplets | Self-Hosting |
+| 2026-05-10 | [ClamAV Fleet Deployment with Ansible](02-selfhosting/security/clamav-fleet-deployment.md) — added DigitalOcean monitoring caveat for 1vCPU droplets (with follow-up note: per-droplet relaxed alert can still trip; accept-the-page decision) | Self-Hosting |
 | 2026-05-10 | [Claude Desktop MCP Mass-Disconnect After Blocking SSH Reboot](05-troubleshooting/claude-desktop-mcp-mass-disconnect-blocking-reboot.md) | Troubleshooting |
 | 2026-05-10 | [Castopod Posts Don't Appear on Mastodon — Diagnosing the Federation Path](05-troubleshooting/security/castopod-broadcast-not-on-mastodon.md) | Troubleshooting |
 | 2026-05-08 | [Castopod: Stale Federated Avatar URLs After Remote Profile Updates](05-troubleshooting/security/castopod-stale-federated-avatar.md) | Troubleshooting |