ClamAV fleet article: add DigitalOcean monitoring caveat for 1vCPU droplets
DO's hypervisor-level CPU metric doesn't know about nice/ionice — a "polite" weekly clamscan on a 1 vCPU droplet still reads 100% utilization and trips a default >85%/5m alert. Adds a new section explaining the trade-off and providing the DO API recipe (PUT existing alert with explicit entities, POST a new relaxed alert scoped to the small droplet) plus when not to bother (2+ vCPU boxes won't trip). Triggered by the 2026-05-10 teelia incident where the weekly cron fired the fleet-wide CPU alert despite the cron script already wrapping clamscan in nice 19 + ionice idle + cgroup memory limits. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
parent
545df9f5c6
commit
af14e36caf
2 changed files with 73 additions and 1 deletions
|
|
@ -11,7 +11,7 @@ tags:
|
|||
- cron
|
||||
status: published
|
||||
created: 2026-04-18
|
||||
updated: 2026-04-30T05:21
|
||||
updated: 2026-05-10T01:50
|
||||
---
|
||||
# ClamAV Fleet Deployment with Ansible
|
||||
|
||||
|
|
@ -147,6 +147,77 @@ clamscan /tmp/eicar-test.txt
|
|||
rm /tmp/eicar-test.txt
|
||||
```
|
||||
|
||||
## DigitalOcean Monitoring Caveat (1 vCPU droplets)
|
||||
|
||||
`nice -n 19 ionice -c 3` plus `MemoryMax`/`MemorySwapMax` cgroups make clamscan "polite" to the Linux scheduler — it yields to PHP-FPM, MySQL, etc. instantly. **But hypervisor-level CPU monitoring (DigitalOcean, Linode, Hetzner) doesn't know about niceness.** It sees raw CPU utilization. On a 1 vCPU droplet during quiet hours, a single-threaded clamscan can fill 100% of the vCPU on its own, tripping a default `>85%/5m` CPU alert every week — even though the workload is genuinely insulating real traffic.
|
||||
|
||||
**Symptoms:**
|
||||
- Weekly `[ALERT] CPU is running high` email from DO at the same time/day every week
|
||||
- The alert clears within 10–60 min (when scan finishes)
|
||||
- No actual user-visible service degradation
|
||||
- Netdata shows CPU 80–100% but PHP-FPM/MySQL response times barely move
|
||||
|
||||
**Fix: per-droplet alert scoping.** Two changes via the DO API:
|
||||
|
||||
1. **Scope the existing fleet-wide CPU alert to exclude affected 1 vCPU droplets** by setting `entities` to an explicit array of *all other* droplet IDs.
|
||||
2. **Add a new alert scoped to just the affected droplet(s)** with a relaxed threshold:
|
||||
- `value: 95`
|
||||
- `window: "30m"`
|
||||
- `entities: [<droplet_id>]`
|
||||
|
||||
The relaxed threshold still catches runaway PHP loops, mining trojans, and actual sustained saturation — but ignores the weekly polite scan.
|
||||
|
||||
### Apply via DO API
|
||||
|
||||
```bash
|
||||
TOKEN="<your DigitalOcean PAT>"
|
||||
|
||||
# 1. Scope existing CPU alert (PUT requires the full alert spec)
|
||||
curl -sS -X PUT \
|
||||
-H "Authorization: Bearer $TOKEN" \
|
||||
-H "Content-Type: application/json" \
|
||||
-d '{
|
||||
"alerts": {"email": ["you@example.com"], "slack": []},
|
||||
"compare": "GreaterThan",
|
||||
"description": "CPU is running high (excludes 1vCPU clamscan boxes)",
|
||||
"enabled": true,
|
||||
"entities": ["<droplet_id_1>", "<droplet_id_2>"],
|
||||
"tags": [],
|
||||
"type": "v1/insights/droplet/cpu",
|
||||
"value": 85,
|
||||
"window": "5m"
|
||||
}' \
|
||||
"https://api.digitalocean.com/v2/monitoring/alerts/<existing_uuid>"
|
||||
|
||||
# 2. Create a relaxed alert for the small box
|
||||
curl -sS -X POST \
|
||||
-H "Authorization: Bearer $TOKEN" \
|
||||
-H "Content-Type: application/json" \
|
||||
-d '{
|
||||
"alerts": {"email": ["you@example.com"], "slack": []},
|
||||
"compare": "GreaterThan",
|
||||
"description": "<host> CPU sustained high (clamscan-aware)",
|
||||
"enabled": true,
|
||||
"entities": ["<small_droplet_id>"],
|
||||
"tags": [],
|
||||
"type": "v1/insights/droplet/cpu",
|
||||
"value": 95,
|
||||
"window": "30m"
|
||||
}' \
|
||||
"https://api.digitalocean.com/v2/monitoring/alerts"
|
||||
```
|
||||
|
||||
To list current alerts (find UUIDs and current `entities`):
|
||||
|
||||
```bash
|
||||
curl -sS -H "Authorization: Bearer $TOKEN" \
|
||||
"https://api.digitalocean.com/v2/monitoring/alerts" | jq
|
||||
```
|
||||
|
||||
**When *not* to do this:** If your droplet has 2+ vCPUs and clamscan only consumes ~50% of total, you probably won't trip an 85% alert in the first place. The per-droplet exemption is mainly for 1 vCPU boxes.
|
||||
|
||||
**Alternative considered: switch to `clamdscan`** — uses a resident `clamd` daemon, signatures stay loaded, scan finishes ~10× faster with much less CPU/RAM. Better long-term answer, but requires running `clamd` continuously (memory cost on small boxes is ~250 MB resident vs the cron approach which only holds RAM during scan). Trade-off, not strictly better.
|
||||
|
||||
## See Also
|
||||
|
||||
- [clamscan-cpu-spike-nice-ionice](../../05-troubleshooting/security/clamscan-cpu-spike-nice-ionice.md) — troubleshooting CPU spikes from unthrottled scans
|
||||
|
|
|
|||
1
index.md
1
index.md
|
|
@ -217,6 +217,7 @@ updated: 2026-05-10T01:30
|
|||
|
||||
| Date | Article | Domain |
|
||||
|---|---|---|
|
||||
| 2026-05-10 | [ClamAV Fleet Deployment with Ansible](02-selfhosting/security/clamav-fleet-deployment.md) — added DigitalOcean monitoring caveat for 1vCPU droplets | Self-Hosting |
|
||||
| 2026-05-10 | [Claude Desktop MCP Mass-Disconnect After Blocking SSH Reboot](05-troubleshooting/claude-desktop-mcp-mass-disconnect-blocking-reboot.md) | Troubleshooting |
|
||||
| 2026-05-10 | [Castopod Posts Don't Appear on Mastodon — Diagnosing the Federation Path](05-troubleshooting/security/castopod-broadcast-not-on-mastodon.md) | Troubleshooting |
|
||||
| 2026-05-08 | [Castopod: Stale Federated Avatar URLs After Remote Profile Updates](05-troubleshooting/security/castopod-stale-federated-avatar.md) | Troubleshooting |
|
||||
|
|
|
|||
Loading…
Add table
Reference in a new issue