Same-day correction. The proposed per-droplet relaxed alert (>95%/30m) turned out to also trip on a 1 vCPU box during low-traffic weekly scans, because there's literally no real load for nice 19 to yield to — clamscan opportunistically fills the vCPU and DO sees 100% utilization regardless of `%nice` vs `%user` split. Documents the three realistic options (accept page / switch to clamdscan / disable alert) and the underlying limit (no DO threshold can distinguish polite from impolite CPU when the box is fully utilized). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
9.7 KiB
| title | domain | category | tags | status | created | updated | ||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| ClamAV Fleet Deployment with Ansible | selfhosting | security |
|
published | 2026-04-18 | 2026-05-10T01:50 |
ClamAV Fleet Deployment with Ansible
Overview
ClamAV is the standard open-source antivirus for Linux servers. For internet-facing hosts, a weekly scan with fresh definitions catches known malware, web shells, and suspicious files before they cause damage. The key operational concern is CPU impact — an unthrottled clamscan will saturate a core for hours on a busy host. The solution is nice and ionice wrappers.
This guide covers deployment to internet-facing hosts. Internal-only hosts (storage, inference, gaming) are lower priority and can be skipped.
What Gets Deployed
clamav+clamav-updatepackages (providesclamscan+freshclam)freshclamservice enabled for automatic definition updates- A quarantine directory at
/var/lib/clamav/quarantine/ - A weekly
clamscancron job, niced to background priority - SELinux context set on the quarantine directory (Fedora hosts)
Ansible Playbook
- name: Deploy ClamAV to internet-facing hosts
hosts: internet_facing # dca, majorlinux, teelia, tttpod, majortoot, majormail
become: true
tasks:
- name: Install ClamAV packages
ansible.builtin.package:
name:
- clamav
- clamav-update
state: present
- name: Enable and start freshclam
ansible.builtin.service:
name: clamav-freshclam
enabled: true
state: started
- name: Create quarantine directory
ansible.builtin.file:
path: /var/lib/clamav/quarantine
state: directory
owner: root
group: root
mode: '0700'
- name: Set SELinux context on quarantine dir (Fedora/RHEL)
ansible.builtin.command:
cmd: chcon -t var_t /var/lib/clamav/quarantine
when: ansible_os_family == "RedHat"
changed_when: false
- name: Deploy weekly clamscan cron job
ansible.builtin.cron:
name: "Weekly ClamAV scan"
user: root
weekday: "0" # Sunday
hour: "3"
minute: "0"
job: >-
nice -n 19 ionice -c 3
clamscan -r /
--exclude-dir=^/proc
--exclude-dir=^/sys
--exclude-dir=^/dev
--exclude-dir=^/run
--move=/var/lib/clamav/quarantine
--log=/var/log/clamav/scan.log
--quiet
2>&1 | logger -t clamscan
The nice/ionice Flags
Without throttling, clamscan -r / will peg a CPU core for 30–90 minutes depending on disk size and file count. On production hosts this causes Netdata alerts and visible service degradation.
| Flag | Value | Meaning |
|---|---|---|
nice -n 19 |
Lowest CPU priority | Kernel will preempt this process for anything else |
ionice -c 3 |
Idle I/O class | Disk I/O only runs when no other process needs the disk |
With both flags set, clamscan becomes essentially invisible under normal load. The scan takes longer (possibly 2–4× on busy disks), but this is acceptable for a weekly background job.
SELinux on Fedora/Fedora:
ionicemay trigger AVC denials under SELinux Enforcing. If scans silently fail on Fedora hosts, checkausearch -m avc -ts recentforclamscandenials. See selinux-fail2ban-execmem-fix for the pattern.
Excluded Paths
Always exclude virtual/pseudo filesystems — scanning them wastes time and can trigger false positives or kernel errors:
--exclude-dir=^/proc # Process info (not real files)
--exclude-dir=^/sys # Kernel interfaces
--exclude-dir=^/dev # Device nodes
--exclude-dir=^/run # Runtime tmpfs
You may also want to exclude large data directories (/var/lib/docker, backup volumes, media stores) if scan time is a concern. These are lower-risk targets anyway.
Quarantine vs Delete
--move=/var/lib/clamav/quarantine moves detected files rather than deleting them. This is safer than --remove — you can inspect and restore false positives. Review the quarantine directory periodically:
ls -la /var/lib/clamav/quarantine/
If a file is a confirmed false positive, restore it and add it to /etc/clamav/whitelist.ign2.
Checking Scan Results
# View last scan log
cat /var/log/clamav/scan.log
# Summary line from the log
grep -E "^Infected|^Scanned" /var/log/clamav/scan.log | tail -5
# Check freshclam is keeping definitions current
systemctl status clamav-freshclam
freshclam --version
Verifying Deployment
Test that ClamAV can detect malware using the EICAR test file (a harmless string that all AV tools recognize as test malware):
echo 'X5O!P%@AP[4\PZX54(P^)7CC)7}$EICAR-STANDARD-ANTIVIRUS-TEST-FILE!$H+H*' \
> /tmp/eicar-test.txt
clamscan /tmp/eicar-test.txt
# Expected: /tmp/eicar-test.txt: Eicar-Signature FOUND
rm /tmp/eicar-test.txt
DigitalOcean Monitoring Caveat (1 vCPU droplets)
nice -n 19 ionice -c 3 plus MemoryMax/MemorySwapMax cgroups make clamscan "polite" to the Linux scheduler — it yields to PHP-FPM, MySQL, etc. instantly. But hypervisor-level CPU monitoring (DigitalOcean, Linode, Hetzner) doesn't know about niceness. It sees raw CPU utilization. On a 1 vCPU droplet during quiet hours, a single-threaded clamscan can fill 100% of the vCPU on its own, tripping a default >85%/5m CPU alert every week — even though the workload is genuinely insulating real traffic.
Symptoms:
- Weekly
[ALERT] CPU is running highemail from DO at the same time/day every week - The alert clears within 10–60 min (when scan finishes)
- No actual user-visible service degradation
- Netdata shows CPU 80–100% but PHP-FPM/MySQL response times barely move
Fix: per-droplet alert scoping. Two changes via the DO API:
- Scope the existing fleet-wide CPU alert to exclude affected 1 vCPU droplets by setting
entitiesto an explicit array of all other droplet IDs. - Add a new alert scoped to just the affected droplet(s) with a relaxed threshold:
value: 95window: "30m"entities: [<droplet_id>]
The relaxed threshold still catches runaway PHP loops, mining trojans, and actual sustained saturation — but ignores the weekly polite scan.
Apply via DO API
TOKEN="<your DigitalOcean PAT>"
# 1. Scope existing CPU alert (PUT requires the full alert spec)
curl -sS -X PUT \
-H "Authorization: Bearer $TOKEN" \
-H "Content-Type: application/json" \
-d '{
"alerts": {"email": ["you@example.com"], "slack": []},
"compare": "GreaterThan",
"description": "CPU is running high (excludes 1vCPU clamscan boxes)",
"enabled": true,
"entities": ["<droplet_id_1>", "<droplet_id_2>"],
"tags": [],
"type": "v1/insights/droplet/cpu",
"value": 85,
"window": "5m"
}' \
"https://api.digitalocean.com/v2/monitoring/alerts/<existing_uuid>"
# 2. Create a relaxed alert for the small box
curl -sS -X POST \
-H "Authorization: Bearer $TOKEN" \
-H "Content-Type: application/json" \
-d '{
"alerts": {"email": ["you@example.com"], "slack": []},
"compare": "GreaterThan",
"description": "<host> CPU sustained high (clamscan-aware)",
"enabled": true,
"entities": ["<small_droplet_id>"],
"tags": [],
"type": "v1/insights/droplet/cpu",
"value": 95,
"window": "30m"
}' \
"https://api.digitalocean.com/v2/monitoring/alerts"
To list current alerts (find UUIDs and current entities):
curl -sS -H "Authorization: Bearer $TOKEN" \
"https://api.digitalocean.com/v2/monitoring/alerts" | jq
When not to do this: If your droplet has 2+ vCPUs and clamscan only consumes ~50% of total, you probably won't trip an 85% alert in the first place. The per-droplet exemption is mainly for 1 vCPU boxes.
When the per-droplet relaxed alert also trips (and what to do): On a 1 vCPU droplet during low-traffic hours (e.g., the default Sunday-morning weekly cron window), clamscan has nothing real to yield to — nice 19 only matters when something else wants the CPU. The kernel correctly schedules clamscan as nice/idle (iostat shows %nice ~94, %idle 0) but DO sees 100% - 0% idle = 100% CPU and trips even the 95%/30m threshold for the duration of the scan (~30–50 min on small webserver boxes). At that point the realistic options are:
- Accept the weekly page as expected noise — simplest, no further engineering
- Switch to
clamdscan(daemon-backed) — scans finish ~3–5× faster and fit in a 30m window, butclamdadds ~250 MB resident memory continuously - Disable the per-droplet CPU alert entirely for that host and rely on Netdata for the real signal
The "polite CPU is invisible to DO" trick stops working once the box is small enough that the polite work fills the entire core unopposed. There is no DO threshold that distinguishes "polite scan filling idle CPU" from "runaway process pinning the vCPU" — that distinction lives in iostat's %nice vs %user split, which DO doesn't expose.
Alternative considered: switch to clamdscan — uses a resident clamd daemon, signatures stay loaded, scan finishes ~10× faster with much less CPU/RAM. Better long-term answer, but requires running clamd continuously (memory cost on small boxes is ~250 MB resident vs the cron approach which only holds RAM during scan). Trade-off, not strictly better.
See Also
- clamscan-cpu-spike-nice-ionice — troubleshooting CPU spikes from unthrottled scans
- linux-server-hardening-checklist
- ssh-hardening-ansible-fleet