DO's hypervisor-level CPU metric doesn't know about nice/ionice — a "polite" weekly clamscan on a 1 vCPU droplet still reads 100% utilization and trips a default >85%/5m alert. Adds a new section explaining the trade-off and providing the DO API recipe (PUT existing alert with explicit entities, POST a new relaxed alert scoped to the small droplet) plus when not to bother (2+ vCPU boxes won't trip). Triggered by the 2026-05-10 teelia incident where the weekly cron fired the fleet-wide CPU alert despite the cron script already wrapping clamscan in nice 19 + ionice idle + cgroup memory limits. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
8.5 KiB
| title | domain | category | tags | status | created | updated | ||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| ClamAV Fleet Deployment with Ansible | selfhosting | security |
|
published | 2026-04-18 | 2026-05-10T01:50 |
ClamAV Fleet Deployment with Ansible
Overview
ClamAV is the standard open-source antivirus for Linux servers. For internet-facing hosts, a weekly scan with fresh definitions catches known malware, web shells, and suspicious files before they cause damage. The key operational concern is CPU impact — an unthrottled clamscan will saturate a core for hours on a busy host. The solution is nice and ionice wrappers.
This guide covers deployment to internet-facing hosts. Internal-only hosts (storage, inference, gaming) are lower priority and can be skipped.
What Gets Deployed
clamav+clamav-updatepackages (providesclamscan+freshclam)freshclamservice enabled for automatic definition updates- A quarantine directory at
/var/lib/clamav/quarantine/ - A weekly
clamscancron job, niced to background priority - SELinux context set on the quarantine directory (Fedora hosts)
Ansible Playbook
- name: Deploy ClamAV to internet-facing hosts
hosts: internet_facing # dca, majorlinux, teelia, tttpod, majortoot, majormail
become: true
tasks:
- name: Install ClamAV packages
ansible.builtin.package:
name:
- clamav
- clamav-update
state: present
- name: Enable and start freshclam
ansible.builtin.service:
name: clamav-freshclam
enabled: true
state: started
- name: Create quarantine directory
ansible.builtin.file:
path: /var/lib/clamav/quarantine
state: directory
owner: root
group: root
mode: '0700'
- name: Set SELinux context on quarantine dir (Fedora/RHEL)
ansible.builtin.command:
cmd: chcon -t var_t /var/lib/clamav/quarantine
when: ansible_os_family == "RedHat"
changed_when: false
- name: Deploy weekly clamscan cron job
ansible.builtin.cron:
name: "Weekly ClamAV scan"
user: root
weekday: "0" # Sunday
hour: "3"
minute: "0"
job: >-
nice -n 19 ionice -c 3
clamscan -r /
--exclude-dir=^/proc
--exclude-dir=^/sys
--exclude-dir=^/dev
--exclude-dir=^/run
--move=/var/lib/clamav/quarantine
--log=/var/log/clamav/scan.log
--quiet
2>&1 | logger -t clamscan
The nice/ionice Flags
Without throttling, clamscan -r / will peg a CPU core for 30–90 minutes depending on disk size and file count. On production hosts this causes Netdata alerts and visible service degradation.
| Flag | Value | Meaning |
|---|---|---|
nice -n 19 |
Lowest CPU priority | Kernel will preempt this process for anything else |
ionice -c 3 |
Idle I/O class | Disk I/O only runs when no other process needs the disk |
With both flags set, clamscan becomes essentially invisible under normal load. The scan takes longer (possibly 2–4× on busy disks), but this is acceptable for a weekly background job.
SELinux on Fedora/Fedora:
ionicemay trigger AVC denials under SELinux Enforcing. If scans silently fail on Fedora hosts, checkausearch -m avc -ts recentforclamscandenials. See selinux-fail2ban-execmem-fix for the pattern.
Excluded Paths
Always exclude virtual/pseudo filesystems — scanning them wastes time and can trigger false positives or kernel errors:
--exclude-dir=^/proc # Process info (not real files)
--exclude-dir=^/sys # Kernel interfaces
--exclude-dir=^/dev # Device nodes
--exclude-dir=^/run # Runtime tmpfs
You may also want to exclude large data directories (/var/lib/docker, backup volumes, media stores) if scan time is a concern. These are lower-risk targets anyway.
Quarantine vs Delete
--move=/var/lib/clamav/quarantine moves detected files rather than deleting them. This is safer than --remove — you can inspect and restore false positives. Review the quarantine directory periodically:
ls -la /var/lib/clamav/quarantine/
If a file is a confirmed false positive, restore it and add it to /etc/clamav/whitelist.ign2.
Checking Scan Results
# View last scan log
cat /var/log/clamav/scan.log
# Summary line from the log
grep -E "^Infected|^Scanned" /var/log/clamav/scan.log | tail -5
# Check freshclam is keeping definitions current
systemctl status clamav-freshclam
freshclam --version
Verifying Deployment
Test that ClamAV can detect malware using the EICAR test file (a harmless string that all AV tools recognize as test malware):
echo 'X5O!P%@AP[4\PZX54(P^)7CC)7}$EICAR-STANDARD-ANTIVIRUS-TEST-FILE!$H+H*' \
> /tmp/eicar-test.txt
clamscan /tmp/eicar-test.txt
# Expected: /tmp/eicar-test.txt: Eicar-Signature FOUND
rm /tmp/eicar-test.txt
DigitalOcean Monitoring Caveat (1 vCPU droplets)
nice -n 19 ionice -c 3 plus MemoryMax/MemorySwapMax cgroups make clamscan "polite" to the Linux scheduler — it yields to PHP-FPM, MySQL, etc. instantly. But hypervisor-level CPU monitoring (DigitalOcean, Linode, Hetzner) doesn't know about niceness. It sees raw CPU utilization. On a 1 vCPU droplet during quiet hours, a single-threaded clamscan can fill 100% of the vCPU on its own, tripping a default >85%/5m CPU alert every week — even though the workload is genuinely insulating real traffic.
Symptoms:
- Weekly
[ALERT] CPU is running highemail from DO at the same time/day every week - The alert clears within 10–60 min (when scan finishes)
- No actual user-visible service degradation
- Netdata shows CPU 80–100% but PHP-FPM/MySQL response times barely move
Fix: per-droplet alert scoping. Two changes via the DO API:
- Scope the existing fleet-wide CPU alert to exclude affected 1 vCPU droplets by setting
entitiesto an explicit array of all other droplet IDs. - Add a new alert scoped to just the affected droplet(s) with a relaxed threshold:
value: 95window: "30m"entities: [<droplet_id>]
The relaxed threshold still catches runaway PHP loops, mining trojans, and actual sustained saturation — but ignores the weekly polite scan.
Apply via DO API
TOKEN="<your DigitalOcean PAT>"
# 1. Scope existing CPU alert (PUT requires the full alert spec)
curl -sS -X PUT \
-H "Authorization: Bearer $TOKEN" \
-H "Content-Type: application/json" \
-d '{
"alerts": {"email": ["you@example.com"], "slack": []},
"compare": "GreaterThan",
"description": "CPU is running high (excludes 1vCPU clamscan boxes)",
"enabled": true,
"entities": ["<droplet_id_1>", "<droplet_id_2>"],
"tags": [],
"type": "v1/insights/droplet/cpu",
"value": 85,
"window": "5m"
}' \
"https://api.digitalocean.com/v2/monitoring/alerts/<existing_uuid>"
# 2. Create a relaxed alert for the small box
curl -sS -X POST \
-H "Authorization: Bearer $TOKEN" \
-H "Content-Type: application/json" \
-d '{
"alerts": {"email": ["you@example.com"], "slack": []},
"compare": "GreaterThan",
"description": "<host> CPU sustained high (clamscan-aware)",
"enabled": true,
"entities": ["<small_droplet_id>"],
"tags": [],
"type": "v1/insights/droplet/cpu",
"value": 95,
"window": "30m"
}' \
"https://api.digitalocean.com/v2/monitoring/alerts"
To list current alerts (find UUIDs and current entities):
curl -sS -H "Authorization: Bearer $TOKEN" \
"https://api.digitalocean.com/v2/monitoring/alerts" | jq
When not to do this: If your droplet has 2+ vCPUs and clamscan only consumes ~50% of total, you probably won't trip an 85% alert in the first place. The per-droplet exemption is mainly for 1 vCPU boxes.
Alternative considered: switch to clamdscan — uses a resident clamd daemon, signatures stay loaded, scan finishes ~10× faster with much less CPU/RAM. Better long-term answer, but requires running clamd continuously (memory cost on small boxes is ~250 MB resident vs the cron approach which only holds RAM during scan). Trade-off, not strictly better.
See Also
- clamscan-cpu-spike-nice-ionice — troubleshooting CPU spikes from unthrottled scans
- linux-server-hardening-checklist
- ssh-hardening-ansible-fleet