From af14e36caf3e73b51815846e5f55ed6fbe91eef6 Mon Sep 17 00:00:00 2001 From: MajorLinux Date: Sun, 10 May 2026 02:24:17 -0400 Subject: [PATCH] ClamAV fleet article: add DigitalOcean monitoring caveat for 1vCPU droplets MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit DO's hypervisor-level CPU metric doesn't know about nice/ionice — a "polite" weekly clamscan on a 1 vCPU droplet still reads 100% utilization and trips a default >85%/5m alert. Adds a new section explaining the trade-off and providing the DO API recipe (PUT existing alert with explicit entities, POST a new relaxed alert scoped to the small droplet) plus when not to bother (2+ vCPU boxes won't trip). Triggered by the 2026-05-10 teelia incident where the weekly cron fired the fleet-wide CPU alert despite the cron script already wrapping clamscan in nice 19 + ionice idle + cgroup memory limits. Co-Authored-By: Claude Opus 4.7 (1M context) --- .../security/clamav-fleet-deployment.md | 73 ++++++++++++++++++- index.md | 1 + 2 files changed, 73 insertions(+), 1 deletion(-) diff --git a/02-selfhosting/security/clamav-fleet-deployment.md b/02-selfhosting/security/clamav-fleet-deployment.md index b731795..e3071fb 100644 --- a/02-selfhosting/security/clamav-fleet-deployment.md +++ b/02-selfhosting/security/clamav-fleet-deployment.md @@ -11,7 +11,7 @@ tags: - cron status: published created: 2026-04-18 -updated: 2026-04-30T05:21 +updated: 2026-05-10T01:50 --- # ClamAV Fleet Deployment with Ansible @@ -147,6 +147,77 @@ clamscan /tmp/eicar-test.txt rm /tmp/eicar-test.txt ``` +## DigitalOcean Monitoring Caveat (1 vCPU droplets) + +`nice -n 19 ionice -c 3` plus `MemoryMax`/`MemorySwapMax` cgroups make clamscan "polite" to the Linux scheduler — it yields to PHP-FPM, MySQL, etc. instantly. **But hypervisor-level CPU monitoring (DigitalOcean, Linode, Hetzner) doesn't know about niceness.** It sees raw CPU utilization. On a 1 vCPU droplet during quiet hours, a single-threaded clamscan can fill 100% of the vCPU on its own, tripping a default `>85%/5m` CPU alert every week — even though the workload is genuinely insulating real traffic. + +**Symptoms:** +- Weekly `[ALERT] CPU is running high` email from DO at the same time/day every week +- The alert clears within 10–60 min (when scan finishes) +- No actual user-visible service degradation +- Netdata shows CPU 80–100% but PHP-FPM/MySQL response times barely move + +**Fix: per-droplet alert scoping.** Two changes via the DO API: + +1. **Scope the existing fleet-wide CPU alert to exclude affected 1 vCPU droplets** by setting `entities` to an explicit array of *all other* droplet IDs. +2. **Add a new alert scoped to just the affected droplet(s)** with a relaxed threshold: + - `value: 95` + - `window: "30m"` + - `entities: []` + +The relaxed threshold still catches runaway PHP loops, mining trojans, and actual sustained saturation — but ignores the weekly polite scan. + +### Apply via DO API + +```bash +TOKEN="" + +# 1. Scope existing CPU alert (PUT requires the full alert spec) +curl -sS -X PUT \ + -H "Authorization: Bearer $TOKEN" \ + -H "Content-Type: application/json" \ + -d '{ + "alerts": {"email": ["you@example.com"], "slack": []}, + "compare": "GreaterThan", + "description": "CPU is running high (excludes 1vCPU clamscan boxes)", + "enabled": true, + "entities": ["", ""], + "tags": [], + "type": "v1/insights/droplet/cpu", + "value": 85, + "window": "5m" + }' \ + "https://api.digitalocean.com/v2/monitoring/alerts/" + +# 2. Create a relaxed alert for the small box +curl -sS -X POST \ + -H "Authorization: Bearer $TOKEN" \ + -H "Content-Type: application/json" \ + -d '{ + "alerts": {"email": ["you@example.com"], "slack": []}, + "compare": "GreaterThan", + "description": " CPU sustained high (clamscan-aware)", + "enabled": true, + "entities": [""], + "tags": [], + "type": "v1/insights/droplet/cpu", + "value": 95, + "window": "30m" + }' \ + "https://api.digitalocean.com/v2/monitoring/alerts" +``` + +To list current alerts (find UUIDs and current `entities`): + +```bash +curl -sS -H "Authorization: Bearer $TOKEN" \ + "https://api.digitalocean.com/v2/monitoring/alerts" | jq +``` + +**When *not* to do this:** If your droplet has 2+ vCPUs and clamscan only consumes ~50% of total, you probably won't trip an 85% alert in the first place. The per-droplet exemption is mainly for 1 vCPU boxes. + +**Alternative considered: switch to `clamdscan`** — uses a resident `clamd` daemon, signatures stay loaded, scan finishes ~10× faster with much less CPU/RAM. Better long-term answer, but requires running `clamd` continuously (memory cost on small boxes is ~250 MB resident vs the cron approach which only holds RAM during scan). Trade-off, not strictly better. + ## See Also - [clamscan-cpu-spike-nice-ionice](../../05-troubleshooting/security/clamscan-cpu-spike-nice-ionice.md) — troubleshooting CPU spikes from unthrottled scans diff --git a/index.md b/index.md index 7c37d11..591bd51 100644 --- a/index.md +++ b/index.md @@ -217,6 +217,7 @@ updated: 2026-05-10T01:30 | Date | Article | Domain | |---|---|---| +| 2026-05-10 | [ClamAV Fleet Deployment with Ansible](02-selfhosting/security/clamav-fleet-deployment.md) — added DigitalOcean monitoring caveat for 1vCPU droplets | Self-Hosting | | 2026-05-10 | [Claude Desktop MCP Mass-Disconnect After Blocking SSH Reboot](05-troubleshooting/claude-desktop-mcp-mass-disconnect-blocking-reboot.md) | Troubleshooting | | 2026-05-10 | [Castopod Posts Don't Appear on Mastodon — Diagnosing the Federation Path](05-troubleshooting/security/castopod-broadcast-not-on-mastodon.md) | Troubleshooting | | 2026-05-08 | [Castopod: Stale Federated Avatar URLs After Remote Profile Updates](05-troubleshooting/security/castopod-stale-federated-avatar.md) | Troubleshooting |