majorwiki/02-selfhosting/security/clamav-fleet-deployment.md
MajorLinux af14e36caf ClamAV fleet article: add DigitalOcean monitoring caveat for 1vCPU droplets
DO's hypervisor-level CPU metric doesn't know about nice/ionice — a
"polite" weekly clamscan on a 1 vCPU droplet still reads 100% utilization
and trips a default >85%/5m alert. Adds a new section explaining the
trade-off and providing the DO API recipe (PUT existing alert with
explicit entities, POST a new relaxed alert scoped to the small
droplet) plus when not to bother (2+ vCPU boxes won't trip).

Triggered by the 2026-05-10 teelia incident where the weekly cron fired
the fleet-wide CPU alert despite the cron script already wrapping
clamscan in nice 19 + ionice idle + cgroup memory limits.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-10 02:24:17 -04:00

225 lines
8.5 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

---
title: ClamAV Fleet Deployment with Ansible
domain: selfhosting
category: security
tags:
- clamav
- antivirus
- security
- ansible
- fleet
- cron
status: published
created: 2026-04-18
updated: 2026-05-10T01:50
---
# ClamAV Fleet Deployment with Ansible
## Overview
ClamAV is the standard open-source antivirus for Linux servers. For internet-facing hosts, a weekly scan with fresh definitions catches known malware, web shells, and suspicious files before they cause damage. The key operational concern is CPU impact — an unthrottled `clamscan` will saturate a core for hours on a busy host. The solution is `nice` and `ionice` wrappers.
> This guide covers deployment to internet-facing hosts. Internal-only hosts (storage, inference, gaming) are lower priority and can be skipped.
## What Gets Deployed
- `clamav` + `clamav-update` packages (provides `clamscan` + `freshclam`)
- `freshclam` service enabled for automatic definition updates
- A quarantine directory at `/var/lib/clamav/quarantine/`
- A weekly `clamscan` cron job, niced to background priority
- SELinux context set on the quarantine directory (Fedora hosts)
## Ansible Playbook
```yaml
- name: Deploy ClamAV to internet-facing hosts
hosts: internet_facing # dca, majorlinux, teelia, tttpod, majortoot, majormail
become: true
tasks:
- name: Install ClamAV packages
ansible.builtin.package:
name:
- clamav
- clamav-update
state: present
- name: Enable and start freshclam
ansible.builtin.service:
name: clamav-freshclam
enabled: true
state: started
- name: Create quarantine directory
ansible.builtin.file:
path: /var/lib/clamav/quarantine
state: directory
owner: root
group: root
mode: '0700'
- name: Set SELinux context on quarantine dir (Fedora/RHEL)
ansible.builtin.command:
cmd: chcon -t var_t /var/lib/clamav/quarantine
when: ansible_os_family == "RedHat"
changed_when: false
- name: Deploy weekly clamscan cron job
ansible.builtin.cron:
name: "Weekly ClamAV scan"
user: root
weekday: "0" # Sunday
hour: "3"
minute: "0"
job: >-
nice -n 19 ionice -c 3
clamscan -r /
--exclude-dir=^/proc
--exclude-dir=^/sys
--exclude-dir=^/dev
--exclude-dir=^/run
--move=/var/lib/clamav/quarantine
--log=/var/log/clamav/scan.log
--quiet
2>&1 | logger -t clamscan
```
## The nice/ionice Flags
Without throttling, `clamscan -r /` will peg a CPU core for 3090 minutes depending on disk size and file count. On production hosts this causes Netdata alerts and visible service degradation.
| Flag | Value | Meaning |
|------|-------|---------|
| `nice -n 19` | Lowest CPU priority | Kernel will preempt this process for anything else |
| `ionice -c 3` | Idle I/O class | Disk I/O only runs when no other process needs the disk |
With both flags set, `clamscan` becomes essentially invisible under normal load. The scan takes longer (possibly 24× on busy disks), but this is acceptable for a weekly background job.
> **SELinux on Fedora/Fedora:** `ionice` may trigger AVC denials under SELinux Enforcing. If scans silently fail on Fedora hosts, check `ausearch -m avc -ts recent` for `clamscan` denials. See [selinux-fail2ban-execmem-fix](../../05-troubleshooting/selinux-fail2ban-execmem-fix.md) for the pattern.
## Excluded Paths
Always exclude virtual/pseudo filesystems — scanning them wastes time and can trigger false positives or kernel errors:
```
--exclude-dir=^/proc # Process info (not real files)
--exclude-dir=^/sys # Kernel interfaces
--exclude-dir=^/dev # Device nodes
--exclude-dir=^/run # Runtime tmpfs
```
You may also want to exclude large data directories (`/var/lib/docker`, backup volumes, media stores) if scan time is a concern. These are lower-risk targets anyway.
## Quarantine vs Delete
`--move=/var/lib/clamav/quarantine` moves detected files rather than deleting them. This is safer than `--remove` — you can inspect and restore false positives. Review the quarantine directory periodically:
```bash
ls -la /var/lib/clamav/quarantine/
```
If a file is a confirmed false positive, restore it and add it to `/etc/clamav/whitelist.ign2`.
## Checking Scan Results
```bash
# View last scan log
cat /var/log/clamav/scan.log
# Summary line from the log
grep -E "^Infected|^Scanned" /var/log/clamav/scan.log | tail -5
# Check freshclam is keeping definitions current
systemctl status clamav-freshclam
freshclam --version
```
## Verifying Deployment
Test that ClamAV can detect malware using the EICAR test file (a harmless string that all AV tools recognize as test malware):
```bash
echo 'X5O!P%@AP[4\PZX54(P^)7CC)7}$EICAR-STANDARD-ANTIVIRUS-TEST-FILE!$H+H*' \
> /tmp/eicar-test.txt
clamscan /tmp/eicar-test.txt
# Expected: /tmp/eicar-test.txt: Eicar-Signature FOUND
rm /tmp/eicar-test.txt
```
## DigitalOcean Monitoring Caveat (1 vCPU droplets)
`nice -n 19 ionice -c 3` plus `MemoryMax`/`MemorySwapMax` cgroups make clamscan "polite" to the Linux scheduler — it yields to PHP-FPM, MySQL, etc. instantly. **But hypervisor-level CPU monitoring (DigitalOcean, Linode, Hetzner) doesn't know about niceness.** It sees raw CPU utilization. On a 1 vCPU droplet during quiet hours, a single-threaded clamscan can fill 100% of the vCPU on its own, tripping a default `>85%/5m` CPU alert every week — even though the workload is genuinely insulating real traffic.
**Symptoms:**
- Weekly `[ALERT] CPU is running high` email from DO at the same time/day every week
- The alert clears within 1060 min (when scan finishes)
- No actual user-visible service degradation
- Netdata shows CPU 80100% but PHP-FPM/MySQL response times barely move
**Fix: per-droplet alert scoping.** Two changes via the DO API:
1. **Scope the existing fleet-wide CPU alert to exclude affected 1 vCPU droplets** by setting `entities` to an explicit array of *all other* droplet IDs.
2. **Add a new alert scoped to just the affected droplet(s)** with a relaxed threshold:
- `value: 95`
- `window: "30m"`
- `entities: [<droplet_id>]`
The relaxed threshold still catches runaway PHP loops, mining trojans, and actual sustained saturation — but ignores the weekly polite scan.
### Apply via DO API
```bash
TOKEN="<your DigitalOcean PAT>"
# 1. Scope existing CPU alert (PUT requires the full alert spec)
curl -sS -X PUT \
-H "Authorization: Bearer $TOKEN" \
-H "Content-Type: application/json" \
-d '{
"alerts": {"email": ["you@example.com"], "slack": []},
"compare": "GreaterThan",
"description": "CPU is running high (excludes 1vCPU clamscan boxes)",
"enabled": true,
"entities": ["<droplet_id_1>", "<droplet_id_2>"],
"tags": [],
"type": "v1/insights/droplet/cpu",
"value": 85,
"window": "5m"
}' \
"https://api.digitalocean.com/v2/monitoring/alerts/<existing_uuid>"
# 2. Create a relaxed alert for the small box
curl -sS -X POST \
-H "Authorization: Bearer $TOKEN" \
-H "Content-Type: application/json" \
-d '{
"alerts": {"email": ["you@example.com"], "slack": []},
"compare": "GreaterThan",
"description": "<host> CPU sustained high (clamscan-aware)",
"enabled": true,
"entities": ["<small_droplet_id>"],
"tags": [],
"type": "v1/insights/droplet/cpu",
"value": 95,
"window": "30m"
}' \
"https://api.digitalocean.com/v2/monitoring/alerts"
```
To list current alerts (find UUIDs and current `entities`):
```bash
curl -sS -H "Authorization: Bearer $TOKEN" \
"https://api.digitalocean.com/v2/monitoring/alerts" | jq
```
**When *not* to do this:** If your droplet has 2+ vCPUs and clamscan only consumes ~50% of total, you probably won't trip an 85% alert in the first place. The per-droplet exemption is mainly for 1 vCPU boxes.
**Alternative considered: switch to `clamdscan`** — uses a resident `clamd` daemon, signatures stay loaded, scan finishes ~10× faster with much less CPU/RAM. Better long-term answer, but requires running `clamd` continuously (memory cost on small boxes is ~250 MB resident vs the cron approach which only holds RAM during scan). Trade-off, not strictly better.
## See Also
- [clamscan-cpu-spike-nice-ionice](../../05-troubleshooting/security/clamscan-cpu-spike-nice-ionice.md) — troubleshooting CPU spikes from unthrottled scans
- [linux-server-hardening-checklist](linux-server-hardening-checklist.md)
- [ssh-hardening-ansible-fleet](ssh-hardening-ansible-fleet.md)