wiki: add ClamAV safe scheduling article; update Netdata new server setup

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-22 03:36:49 -04:00
parent d1e9571761
commit 0e640a3fff
3 changed files with 125 additions and 7 deletions
--- a/02-selfhosting/monitoring/netdata-new-server-setup.md
+++ b/02-selfhosting/monitoring/netdata-new-server-setup.md
@@ -2,17 +2,31 @@
 title: "Deploying Netdata to a New Server"
 domain: selfhosting
 category: monitoring
-tags: [netdata, monitoring, email, notifications, netdata-cloud, ubuntu, debian]
+tags: [netdata, monitoring, email, notifications, netdata-cloud, ubuntu, debian, n8n]
 status: published
 created: 2026-03-18
-updated: 2026-03-18
+updated: 2026-03-22
 ---
 # Deploying Netdata to a New Server
-This covers the full Netdata setup for a new server in the fleet: install, email notification config, and Netdata Cloud claim. Applies to Ubuntu/Debian servers.
+This covers the full Netdata setup for a new server in the fleet: install, email notification config, n8n webhook integration, and Netdata Cloud claim. Applies to Ubuntu/Debian servers.
-## 1. Install
+## 1. Install Prerequisites
 Install `jq` before anything else. It is required by the `custom_sender()` function in `health_alarm_notify.conf` to build the JSON payload sent to the n8n webhook. **If `jq` is missing, the webhook will fire with an empty body and n8n alert emails will have no information in them.**
 ```bash
 apt install -y jq
 ```
 Verify:
 ```bash
 jq --version
 ```
 ## 2. Install Netdata
 Use the official kickstart script:
@@ -28,7 +42,7 @@ systemctl is-active netdata
 curl -s http://localhost:19999/api/v1/info | python3 -c "import sys,json; d=json.load(sys.stdin); print('Netdata', d['version'])"
 ```
-## 2. Configure Email Notifications
+## 3. Configure Email Notifications
 Copy the default config and set the three required values:
@@ -64,7 +78,23 @@ You should see three `# OK` lines (WARNING → CRITICAL → CLEAR test cycle) an
 > [!note] Delivery via local Postfix
 > Email is relayed through the server's local Postfix instance. Ensure Postfix is installed and `/usr/sbin/sendmail` resolves.
-## 3. Claim to Netdata Cloud
+## 4. Configure n8n Webhook Notifications
 Copy the `health_alarm_notify.conf` from an existing server (e.g. majormail) which contains the `custom_sender()` function. This sends enriched JSON payloads to the n8n webhook at `https://n8n.majorshouse.com/webhook/netdata-alert`.
 > [!warning] jq required
 > The `custom_sender()` function uses `jq` to build the JSON payload. If `jq` is not installed, `payload` will be empty, curl will send `Content-Length: 0`, and n8n will produce alert emails with `Host: unknown`, blank alert/value fields, and `Status: UNKNOWN`. Always install `jq` first (Step 1).
 After deploying the config, run a test to confirm the webhook fires correctly:
 ```bash
 systemctl restart netdata
 /usr/libexec/netdata/plugins.d/alarm-notify.sh test 2>&1 | grep -E '(custom|n8n|OK|FAILED)'
 ```
 Verify in n8n that the latest execution shows a non-empty body with `hostname`, `alarm`, and `status` fields populated.
 ## 5. Claim to Netdata Cloud
 Get the claim command from **Netdata Cloud → Space Settings → Nodes → Add Nodes**. It will look like:
@@ -84,7 +114,7 @@ cat /var/lib/netdata/cloud.d/claimed_id
 A UUID will be present if claimed successfully. The node should appear in Netdata Cloud within ~60 seconds.
-## 4. Verify Alerts
+## 6. Verify Alerts
 Check that no unexpected alerts are active after setup:
@@ -111,6 +141,20 @@ for host in majorlab majorhome majormail majordiscord majortoot majorlinux tttpo
 done
 ```
 ## Fleet-wide jq Audit
 To check that all servers with `custom_sender` have `jq` installed:
 ```bash
 for host in majorlab majorhome majormail majordiscord majortoot majorlinux tttpod dca teelia; do
  echo -n "=== $host: "
  ssh -o ConnectTimeout=5 root@$host \
    'has_cs=$(grep -l "custom_sender\|n8n.majorshouse.com" /etc/netdata/health_alarm_notify.conf 2>/dev/null | wc -l); has_jq=$(which jq 2>/dev/null && echo yes || echo NO); echo "custom_sender=$has_cs jq=$has_jq"'
 done
 ```
 Any server showing `custom_sender=1 jq=NO` needs `apt install -y jq` immediately.
 ## Related
 - [Tuning Netdata Web Log Alerts](tuning-netdata-web-log-alerts.md)
--- a/05-troubleshooting/security/clamscan-cpu-spike-nice-ionice.md
+++ b/05-troubleshooting/security/clamscan-cpu-spike-nice-ionice.md
@@ -0,0 +1,73 @@
 # ClamAV Safe Scheduling on Live Servers
 Running `clamscan` unthrottled on a live server will peg CPU until completion. On a small VPS (1 vCPU), a full recursive scan can sustain 70–100% CPU for an hour or more, degrading or taking down hosted services.
 ## The Problem
 A common out-of-the-box ClamAV cron setup looks like this:
 ```cron
 0 1 * * 0 clamscan --infected --recursive / --exclude=/sys
 ```
 This runs at Linux's default scheduling priority (`nice 0`) with normal I/O priority. On a live server it will:
 - Monopolize the CPU for the scan duration
 - Cause high I/O wait, degrading web serving, databases, and other services
 - Trigger monitoring alerts (e.g., Netdata `10min_cpu_usage`)
 ## The Fix
 Throttle the scan with `nice` and `ionice`:
 ```cron
 0 1 * * 0 nice -n 19 ionice -c 3 clamscan --infected --recursive / --exclude=/sys
 ```
 | Flag | Meaning |
 |------|---------|
 | `nice -n 19` | Lowest CPU scheduling priority (range: -20 to 19) |
 | `ionice -c 3` | Idle I/O class — only uses disk when no other process needs it |
 The scan will take longer but will not impact server performance.
 ## Applying the Fix
 Edit root's crontab:
 ```bash
 crontab -e
 ```
 Or apply non-interactively:
 ```bash
 crontab -l | sed 's|clamscan|nice -n 19 ionice -c 3 clamscan|' | crontab -
 ```
 Verify:
 ```bash
 crontab -l | grep clam
 ```
 ## Diagnosing a Runaway Scan
 If CPU is already pegged, identify and kill the process:
 ```bash
 ps aux --sort=-%cpu | head -15
 # Look for clamscan
 kill <PID>
 ```
 ## Notes
 - `ionice -c 3` (Idle) requires Linux kernel ≥ 2.6.13 and CFQ/BFQ I/O scheduler. Works on most Ubuntu/Debian/Fedora systems.
 - On multi-core servers, consider also using `cpulimit` for a hard cap: `cpulimit -l 30 -- clamscan ...`
 - Always keep `--exclude=/sys` (and optionally `--exclude=/proc`, `--exclude=/dev`) to avoid scanning virtual filesystems.
 ## Related
 - [ClamAV Documentation](https://docs.clamav.net/)
 - [[02-selfhosting/security/linux-server-hardening-checklist|Linux Server Hardening Checklist]]
--- a/SUMMARY.md
+++ b/SUMMARY.md
@@ -53,3 +53,4 @@
    * [mdadm RAID Recovery After USB Hub Disconnect](05-troubleshooting/storage/mdadm-usb-hub-disconnect-recovery.md)
    * [Windows OpenSSH Server (sshd) Stops After Reboot](05-troubleshooting/networking/windows-sshd-stops-after-reboot.md)
    * [Ollama Drops Off Tailscale When Mac Sleeps](05-troubleshooting/ollama-macos-sleep-tailscale-disconnect.md)
    * [ClamAV CPU Spike: Safe Scheduling with nice/ionice](05-troubleshooting/security/clamscan-cpu-spike-nice-ionice.md)