Operational/how-to references updated to the role entry playbooks after the ADR-0001 migration. Historical incident narrative (dated callouts, commit refs) preserved. - clamav-fleet-deployment: override + re-run -> clamav.yml; role note - ssh-hardening-ansible-fleet: note this is now the ssh_hardening role - vps-migration-baseline-checklist: table -> clamav.yml / ssh_hardening.yml - ssh-socket-tailscale-race-condition: Affected Hosts + Prevention + References -> tailscale role tasks (network_wait/ssh_only_ubuntu/ssh_only_fedora) - freshclam-logwatch-false-no-updates: codify refs -> clamav role
272 lines
12 KiB
Markdown
272 lines
12 KiB
Markdown
---
|
||
title: ClamAV Fleet Deployment with Ansible
|
||
domain: selfhosting
|
||
category: security
|
||
tags:
|
||
- clamav
|
||
- antivirus
|
||
- security
|
||
- ansible
|
||
- fleet
|
||
- cron
|
||
status: published
|
||
created: 2026-04-18
|
||
updated: 2026-05-15T03:00
|
||
---
|
||
# ClamAV Fleet Deployment with Ansible
|
||
|
||
## Overview
|
||
|
||
ClamAV is the standard open-source antivirus for Linux servers. For internet-facing hosts, a weekly scan with fresh definitions catches known malware, web shells, and suspicious files before they cause damage. The key operational concern is CPU impact — an unthrottled `clamscan` will saturate a core for hours on a busy host. The solution is `nice` and `ionice` wrappers.
|
||
|
||
> This guide covers deployment to internet-facing hosts. Internal-only hosts (storage, inference, gaming) are lower priority and can be skipped.
|
||
|
||
## What Gets Deployed
|
||
|
||
- `clamav` + `clamav-update` packages (provides `clamscan` + `freshclam`)
|
||
- `freshclam` service enabled for automatic definition updates
|
||
- A quarantine directory at `/var/lib/clamav/quarantine/`
|
||
- A weekly `clamscan` cron job, niced to background priority
|
||
- SELinux context set on the quarantine directory (Fedora hosts)
|
||
|
||
## Ansible Playbook
|
||
|
||
> On the MajorsHouse fleet this is packaged as the **`clamav` role** (`roles/clamav/`,
|
||
> tasks split install → service → scan → verify) and run via `clamav.yml` or `site.yml`.
|
||
> The standalone playbook below is the illustrative equivalent.
|
||
|
||
```yaml
|
||
- name: Deploy ClamAV to internet-facing hosts
|
||
hosts: internet_facing # dca, majorlinux, teelia, tttpod, majortoot, majormail
|
||
become: true
|
||
|
||
tasks:
|
||
|
||
- name: Install ClamAV packages
|
||
ansible.builtin.package:
|
||
name:
|
||
- clamav
|
||
- clamav-update
|
||
state: present
|
||
|
||
- name: Enable and start freshclam
|
||
ansible.builtin.service:
|
||
name: clamav-freshclam
|
||
enabled: true
|
||
state: started
|
||
|
||
- name: Create quarantine directory
|
||
ansible.builtin.file:
|
||
path: /var/lib/clamav/quarantine
|
||
state: directory
|
||
owner: root
|
||
group: root
|
||
mode: '0700'
|
||
|
||
- name: Set SELinux context on quarantine dir (Fedora/RHEL)
|
||
ansible.builtin.command:
|
||
cmd: chcon -t var_t /var/lib/clamav/quarantine
|
||
when: ansible_os_family == "RedHat"
|
||
changed_when: false
|
||
|
||
- name: Deploy weekly clamscan cron job
|
||
ansible.builtin.cron:
|
||
name: "Weekly ClamAV scan"
|
||
user: root
|
||
weekday: "0" # Sunday
|
||
hour: "3"
|
||
minute: "0"
|
||
job: >-
|
||
nice -n 19 ionice -c 3
|
||
clamscan -r /
|
||
--exclude-dir=^/proc
|
||
--exclude-dir=^/sys
|
||
--exclude-dir=^/dev
|
||
--exclude-dir=^/run
|
||
--move=/var/lib/clamav/quarantine
|
||
--log=/var/log/clamav/scan.log
|
||
--quiet
|
||
2>&1 | logger -t clamscan
|
||
```
|
||
|
||
## The nice/ionice Flags
|
||
|
||
Without throttling, `clamscan -r /` will peg a CPU core for 30–90 minutes depending on disk size and file count. On production hosts this causes Netdata alerts and visible service degradation.
|
||
|
||
| Flag | Value | Meaning |
|
||
|------|-------|---------|
|
||
| `nice -n 19` | Lowest CPU priority | Kernel will preempt this process for anything else |
|
||
| `ionice -c 3` | Idle I/O class | Disk I/O only runs when no other process needs the disk |
|
||
|
||
With both flags set, `clamscan` becomes essentially invisible under normal load. The scan takes longer (possibly 2–4× on busy disks), but this is acceptable for a weekly background job.
|
||
|
||
> **SELinux on Fedora/Fedora:** `ionice` may trigger AVC denials under SELinux Enforcing. If scans silently fail on Fedora hosts, check `ausearch -m avc -ts recent` for `clamscan` denials. See [selinux-fail2ban-execmem-fix](../../05-troubleshooting/selinux-fail2ban-execmem-fix.md) for the pattern.
|
||
|
||
## Excluded Paths
|
||
|
||
Always exclude virtual/pseudo filesystems — scanning them wastes time and can trigger false positives or kernel errors:
|
||
|
||
```
|
||
--exclude-dir=^/proc # Process info (not real files)
|
||
--exclude-dir=^/sys # Kernel interfaces
|
||
--exclude-dir=^/dev # Device nodes
|
||
--exclude-dir=^/run # Runtime tmpfs
|
||
```
|
||
|
||
You may also want to exclude large data directories (`/var/lib/docker`, backup volumes, media stores) if scan time is a concern. These are lower-risk targets anyway.
|
||
|
||
## Quarantine vs Delete
|
||
|
||
`--move=/var/lib/clamav/quarantine` moves detected files rather than deleting them. This is safer than `--remove` — you can inspect and restore false positives. Review the quarantine directory periodically:
|
||
|
||
```bash
|
||
ls -la /var/lib/clamav/quarantine/
|
||
```
|
||
|
||
If a file is a confirmed false positive, restore it and add it to `/etc/clamav/whitelist.ign2`.
|
||
|
||
## Checking Scan Results
|
||
|
||
```bash
|
||
# View last scan log
|
||
cat /var/log/clamav/scan.log
|
||
|
||
# Summary line from the log
|
||
grep -E "^Infected|^Scanned" /var/log/clamav/scan.log | tail -5
|
||
|
||
# Check freshclam is keeping definitions current
|
||
systemctl status clamav-freshclam
|
||
freshclam --version
|
||
```
|
||
|
||
## Verifying Deployment
|
||
|
||
Test that ClamAV can detect malware using the EICAR test file (a harmless string that all AV tools recognize as test malware):
|
||
|
||
```bash
|
||
echo 'X5O!P%@AP[4\PZX54(P^)7CC)7}$EICAR-STANDARD-ANTIVIRUS-TEST-FILE!$H+H*' \
|
||
> /tmp/eicar-test.txt
|
||
clamscan /tmp/eicar-test.txt
|
||
# Expected: /tmp/eicar-test.txt: Eicar-Signature FOUND
|
||
rm /tmp/eicar-test.txt
|
||
```
|
||
|
||
## DigitalOcean Monitoring Caveat (1 vCPU droplets)
|
||
|
||
`nice -n 19 ionice -c 3` plus `MemoryMax`/`MemorySwapMax` cgroups make clamscan "polite" to the Linux scheduler — it yields to PHP-FPM, MySQL, etc. instantly. **But hypervisor-level CPU monitoring (DigitalOcean, Linode, Hetzner) doesn't know about niceness.** It sees raw CPU utilization. On a 1 vCPU droplet during quiet hours, a single-threaded clamscan can fill 100% of the vCPU on its own, tripping a default `>85%/5m` CPU alert every week — even though the workload is genuinely insulating real traffic.
|
||
|
||
**Symptoms:**
|
||
- Weekly `[ALERT] CPU is running high` email from DO at the same time/day every week
|
||
- The alert clears within 10–60 min (when scan finishes)
|
||
- No actual user-visible service degradation
|
||
- Netdata shows CPU 80–100% but PHP-FPM/MySQL response times barely move
|
||
|
||
**Fix: per-droplet alert scoping.** Two changes via the DO API:
|
||
|
||
1. **Scope the existing fleet-wide CPU alert to exclude affected 1 vCPU droplets** by setting `entities` to an explicit array of *all other* droplet IDs.
|
||
2. **Add a new alert scoped to just the affected droplet(s)** with a relaxed threshold:
|
||
- `value: 95`
|
||
- `window: "30m"`
|
||
- `entities: [<droplet_id>]`
|
||
|
||
The relaxed threshold still catches runaway PHP loops, mining trojans, and actual sustained saturation — but ignores the weekly polite scan.
|
||
|
||
### Apply via DO API
|
||
|
||
```bash
|
||
TOKEN="<your DigitalOcean PAT>"
|
||
|
||
# 1. Scope existing CPU alert (PUT requires the full alert spec)
|
||
curl -sS -X PUT \
|
||
-H "Authorization: Bearer $TOKEN" \
|
||
-H "Content-Type: application/json" \
|
||
-d '{
|
||
"alerts": {"email": ["you@example.com"], "slack": []},
|
||
"compare": "GreaterThan",
|
||
"description": "CPU is running high (excludes 1vCPU clamscan boxes)",
|
||
"enabled": true,
|
||
"entities": ["<droplet_id_1>", "<droplet_id_2>"],
|
||
"tags": [],
|
||
"type": "v1/insights/droplet/cpu",
|
||
"value": 85,
|
||
"window": "5m"
|
||
}' \
|
||
"https://api.digitalocean.com/v2/monitoring/alerts/<existing_uuid>"
|
||
|
||
# 2. Create a relaxed alert for the small box
|
||
curl -sS -X POST \
|
||
-H "Authorization: Bearer $TOKEN" \
|
||
-H "Content-Type: application/json" \
|
||
-d '{
|
||
"alerts": {"email": ["you@example.com"], "slack": []},
|
||
"compare": "GreaterThan",
|
||
"description": "<host> CPU sustained high (clamscan-aware)",
|
||
"enabled": true,
|
||
"entities": ["<small_droplet_id>"],
|
||
"tags": [],
|
||
"type": "v1/insights/droplet/cpu",
|
||
"value": 95,
|
||
"window": "30m"
|
||
}' \
|
||
"https://api.digitalocean.com/v2/monitoring/alerts"
|
||
```
|
||
|
||
To list current alerts (find UUIDs and current `entities`):
|
||
|
||
```bash
|
||
curl -sS -H "Authorization: Bearer $TOKEN" \
|
||
"https://api.digitalocean.com/v2/monitoring/alerts" | jq
|
||
```
|
||
|
||
**When *not* to do this:** If your droplet has 2+ vCPUs and clamscan only consumes ~50% of total, you probably won't trip an 85% alert in the first place. The per-droplet exemption is mainly for 1 vCPU boxes.
|
||
|
||
**When the per-droplet relaxed alert *also* trips (and what to do):** On a 1 vCPU droplet during low-traffic hours (e.g., the default Sunday-morning weekly cron window), clamscan has *nothing real to yield to* — `nice 19` only matters when something else wants the CPU. The kernel correctly schedules clamscan as nice/idle (`iostat` shows `%nice ~94, %idle 0`) but DO sees `100% - 0% idle = 100% CPU` and trips even the 95%/30m threshold for the duration of the scan (~30–50 min on small webserver boxes). At that point the realistic options are:
|
||
|
||
1. **Accept the weekly page** as expected noise — simplest, no further engineering
|
||
2. **Switch to `clamdscan`** (daemon-backed) — scans finish ~3–5× faster and fit in a 30m window, but `clamd` adds ~250 MB resident memory continuously
|
||
3. **Disable the per-droplet CPU alert entirely** for that host and rely on Netdata for the real signal
|
||
|
||
The "polite CPU is invisible to DO" trick stops working once the box is small enough that the polite work fills the entire core unopposed. There is no DO threshold that distinguishes "polite scan filling idle CPU" from "runaway process pinning the vCPU" — that distinction lives in `iostat`'s `%nice` vs `%user` split, which DO doesn't expose.
|
||
|
||
**Alternative considered: switch to `clamdscan`** — uses a resident `clamd` daemon, signatures stay loaded, scan finishes ~10× faster with much less CPU/RAM. Better long-term answer, but requires running `clamd` continuously (memory cost on small boxes is ~250 MB resident vs the cron approach which only holds RAM during scan). Trade-off, not strictly better.
|
||
|
||
## Daemonless Mode on Memory-Constrained Hosts
|
||
|
||
On hosts with ≤2 GB RAM, running `clamd` continuously is often counterproductive. The daemon loads its full signature database (~950 MB RSS) into memory and keeps it resident. On small VMs this crowds out MySQL, PHP-FPM, and other services — often pushing the whole system into swap rather than preventing anything.
|
||
|
||
**Affected hosts (fleet history):**
|
||
|
||
| Host | RAM | Incident | Resolution |
|
||
|------|-----|----------|------------|
|
||
| teelia | 1.9 GB | 2026-04-27 — clamd 728 MB RSS, 94% RAM alert | daemonless |
|
||
| dcaprod | 3.8 GB | 2026-04-30 — clamd OOM thrash after 512M cgroup cap | daemonless |
|
||
| majorlinux | 2.0 GB | 2026-05-15 — clamd 980 MB swap, mysqld swapping 293 MB | daemonless |
|
||
|
||
**The fix: `clamav_use_daemon: false` host_var**
|
||
|
||
The `clamav` role supports a per-host override. Add to the host's `host_vars/<hostname>/vars.yml`:
|
||
|
||
```yaml
|
||
clamav_use_daemon: false
|
||
```
|
||
|
||
Then re-run the role:
|
||
|
||
```bash
|
||
ansible-playbook clamav.yml --limit <hostname>
|
||
```
|
||
|
||
This will:
|
||
- Stop and disable `clamav-daemon.service` and `clamav-daemon.socket`
|
||
- Deploy the weekly scan template using `clamscan` (daemonless, loads DB per run)
|
||
- Leave `clamav-freshclam` active so definitions stay current
|
||
|
||
**Trade-off:** Each weekly scan loads the signature DB fresh (~950 MB peak RAM for the scan duration, then freed). The scan takes longer than `clamdscan` (~3–5× on a warm daemon), but this is acceptable for a weekly background job. The `systemd-run MemoryMax` cgroup wrapper in the scan template caps peak usage so the scan can't OOM the host.
|
||
|
||
**Rule of thumb:** Use daemon mode (`clamav_use_daemon: true` or unset) on hosts with ≥4 GB RAM where scan speed matters (mail servers, upload handlers). Use daemonless on webservers and small VMs where continuous memory residency is the bigger risk.
|
||
|
||
## See Also
|
||
|
||
- [clamscan-cpu-spike-nice-ionice](../../05-troubleshooting/security/clamscan-cpu-spike-nice-ionice.md) — troubleshooting CPU spikes from unthrottled scans
|
||
- [linux-server-hardening-checklist](linux-server-hardening-checklist.md)
|
||
- [ssh-hardening-ansible-fleet](ssh-hardening-ansible-fleet.md)
|