wiki: add ClamAV safe scheduling article; update Netdata new server setup
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
This commit is contained in:
@@ -2,17 +2,31 @@
|
|||||||
title: "Deploying Netdata to a New Server"
|
title: "Deploying Netdata to a New Server"
|
||||||
domain: selfhosting
|
domain: selfhosting
|
||||||
category: monitoring
|
category: monitoring
|
||||||
tags: [netdata, monitoring, email, notifications, netdata-cloud, ubuntu, debian]
|
tags: [netdata, monitoring, email, notifications, netdata-cloud, ubuntu, debian, n8n]
|
||||||
status: published
|
status: published
|
||||||
created: 2026-03-18
|
created: 2026-03-18
|
||||||
updated: 2026-03-18
|
updated: 2026-03-22
|
||||||
---
|
---
|
||||||
|
|
||||||
# Deploying Netdata to a New Server
|
# Deploying Netdata to a New Server
|
||||||
|
|
||||||
This covers the full Netdata setup for a new server in the fleet: install, email notification config, and Netdata Cloud claim. Applies to Ubuntu/Debian servers.
|
This covers the full Netdata setup for a new server in the fleet: install, email notification config, n8n webhook integration, and Netdata Cloud claim. Applies to Ubuntu/Debian servers.
|
||||||
|
|
||||||
## 1. Install
|
## 1. Install Prerequisites
|
||||||
|
|
||||||
|
Install `jq` before anything else. It is required by the `custom_sender()` function in `health_alarm_notify.conf` to build the JSON payload sent to the n8n webhook. **If `jq` is missing, the webhook will fire with an empty body and n8n alert emails will have no information in them.**
|
||||||
|
|
||||||
|
```bash
|
||||||
|
apt install -y jq
|
||||||
|
```
|
||||||
|
|
||||||
|
Verify:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
jq --version
|
||||||
|
```
|
||||||
|
|
||||||
|
## 2. Install Netdata
|
||||||
|
|
||||||
Use the official kickstart script:
|
Use the official kickstart script:
|
||||||
|
|
||||||
@@ -28,7 +42,7 @@ systemctl is-active netdata
|
|||||||
curl -s http://localhost:19999/api/v1/info | python3 -c "import sys,json; d=json.load(sys.stdin); print('Netdata', d['version'])"
|
curl -s http://localhost:19999/api/v1/info | python3 -c "import sys,json; d=json.load(sys.stdin); print('Netdata', d['version'])"
|
||||||
```
|
```
|
||||||
|
|
||||||
## 2. Configure Email Notifications
|
## 3. Configure Email Notifications
|
||||||
|
|
||||||
Copy the default config and set the three required values:
|
Copy the default config and set the three required values:
|
||||||
|
|
||||||
@@ -64,7 +78,23 @@ You should see three `# OK` lines (WARNING → CRITICAL → CLEAR test cycle) an
|
|||||||
> [!note] Delivery via local Postfix
|
> [!note] Delivery via local Postfix
|
||||||
> Email is relayed through the server's local Postfix instance. Ensure Postfix is installed and `/usr/sbin/sendmail` resolves.
|
> Email is relayed through the server's local Postfix instance. Ensure Postfix is installed and `/usr/sbin/sendmail` resolves.
|
||||||
|
|
||||||
## 3. Claim to Netdata Cloud
|
## 4. Configure n8n Webhook Notifications
|
||||||
|
|
||||||
|
Copy the `health_alarm_notify.conf` from an existing server (e.g. majormail) which contains the `custom_sender()` function. This sends enriched JSON payloads to the n8n webhook at `https://n8n.majorshouse.com/webhook/netdata-alert`.
|
||||||
|
|
||||||
|
> [!warning] jq required
|
||||||
|
> The `custom_sender()` function uses `jq` to build the JSON payload. If `jq` is not installed, `payload` will be empty, curl will send `Content-Length: 0`, and n8n will produce alert emails with `Host: unknown`, blank alert/value fields, and `Status: UNKNOWN`. Always install `jq` first (Step 1).
|
||||||
|
|
||||||
|
After deploying the config, run a test to confirm the webhook fires correctly:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
systemctl restart netdata
|
||||||
|
/usr/libexec/netdata/plugins.d/alarm-notify.sh test 2>&1 | grep -E '(custom|n8n|OK|FAILED)'
|
||||||
|
```
|
||||||
|
|
||||||
|
Verify in n8n that the latest execution shows a non-empty body with `hostname`, `alarm`, and `status` fields populated.
|
||||||
|
|
||||||
|
## 5. Claim to Netdata Cloud
|
||||||
|
|
||||||
Get the claim command from **Netdata Cloud → Space Settings → Nodes → Add Nodes**. It will look like:
|
Get the claim command from **Netdata Cloud → Space Settings → Nodes → Add Nodes**. It will look like:
|
||||||
|
|
||||||
@@ -84,7 +114,7 @@ cat /var/lib/netdata/cloud.d/claimed_id
|
|||||||
|
|
||||||
A UUID will be present if claimed successfully. The node should appear in Netdata Cloud within ~60 seconds.
|
A UUID will be present if claimed successfully. The node should appear in Netdata Cloud within ~60 seconds.
|
||||||
|
|
||||||
## 4. Verify Alerts
|
## 6. Verify Alerts
|
||||||
|
|
||||||
Check that no unexpected alerts are active after setup:
|
Check that no unexpected alerts are active after setup:
|
||||||
|
|
||||||
@@ -111,6 +141,20 @@ for host in majorlab majorhome majormail majordiscord majortoot majorlinux tttpo
|
|||||||
done
|
done
|
||||||
```
|
```
|
||||||
|
|
||||||
|
## Fleet-wide jq Audit
|
||||||
|
|
||||||
|
To check that all servers with `custom_sender` have `jq` installed:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
for host in majorlab majorhome majormail majordiscord majortoot majorlinux tttpod dca teelia; do
|
||||||
|
echo -n "=== $host: "
|
||||||
|
ssh -o ConnectTimeout=5 root@$host \
|
||||||
|
'has_cs=$(grep -l "custom_sender\|n8n.majorshouse.com" /etc/netdata/health_alarm_notify.conf 2>/dev/null | wc -l); has_jq=$(which jq 2>/dev/null && echo yes || echo NO); echo "custom_sender=$has_cs jq=$has_jq"'
|
||||||
|
done
|
||||||
|
```
|
||||||
|
|
||||||
|
Any server showing `custom_sender=1 jq=NO` needs `apt install -y jq` immediately.
|
||||||
|
|
||||||
## Related
|
## Related
|
||||||
|
|
||||||
- [Tuning Netdata Web Log Alerts](tuning-netdata-web-log-alerts.md)
|
- [Tuning Netdata Web Log Alerts](tuning-netdata-web-log-alerts.md)
|
||||||
|
|||||||
@@ -0,0 +1,73 @@
|
|||||||
|
# ClamAV Safe Scheduling on Live Servers
|
||||||
|
|
||||||
|
Running `clamscan` unthrottled on a live server will peg CPU until completion. On a small VPS (1 vCPU), a full recursive scan can sustain 70–100% CPU for an hour or more, degrading or taking down hosted services.
|
||||||
|
|
||||||
|
## The Problem
|
||||||
|
|
||||||
|
A common out-of-the-box ClamAV cron setup looks like this:
|
||||||
|
|
||||||
|
```cron
|
||||||
|
0 1 * * 0 clamscan --infected --recursive / --exclude=/sys
|
||||||
|
```
|
||||||
|
|
||||||
|
This runs at Linux's default scheduling priority (`nice 0`) with normal I/O priority. On a live server it will:
|
||||||
|
|
||||||
|
- Monopolize the CPU for the scan duration
|
||||||
|
- Cause high I/O wait, degrading web serving, databases, and other services
|
||||||
|
- Trigger monitoring alerts (e.g., Netdata `10min_cpu_usage`)
|
||||||
|
|
||||||
|
## The Fix
|
||||||
|
|
||||||
|
Throttle the scan with `nice` and `ionice`:
|
||||||
|
|
||||||
|
```cron
|
||||||
|
0 1 * * 0 nice -n 19 ionice -c 3 clamscan --infected --recursive / --exclude=/sys
|
||||||
|
```
|
||||||
|
|
||||||
|
| Flag | Meaning |
|
||||||
|
|------|---------|
|
||||||
|
| `nice -n 19` | Lowest CPU scheduling priority (range: -20 to 19) |
|
||||||
|
| `ionice -c 3` | Idle I/O class — only uses disk when no other process needs it |
|
||||||
|
|
||||||
|
The scan will take longer but will not impact server performance.
|
||||||
|
|
||||||
|
## Applying the Fix
|
||||||
|
|
||||||
|
Edit root's crontab:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
crontab -e
|
||||||
|
```
|
||||||
|
|
||||||
|
Or apply non-interactively:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
crontab -l | sed 's|clamscan|nice -n 19 ionice -c 3 clamscan|' | crontab -
|
||||||
|
```
|
||||||
|
|
||||||
|
Verify:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
crontab -l | grep clam
|
||||||
|
```
|
||||||
|
|
||||||
|
## Diagnosing a Runaway Scan
|
||||||
|
|
||||||
|
If CPU is already pegged, identify and kill the process:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
ps aux --sort=-%cpu | head -15
|
||||||
|
# Look for clamscan
|
||||||
|
kill <PID>
|
||||||
|
```
|
||||||
|
|
||||||
|
## Notes
|
||||||
|
|
||||||
|
- `ionice -c 3` (Idle) requires Linux kernel ≥ 2.6.13 and CFQ/BFQ I/O scheduler. Works on most Ubuntu/Debian/Fedora systems.
|
||||||
|
- On multi-core servers, consider also using `cpulimit` for a hard cap: `cpulimit -l 30 -- clamscan ...`
|
||||||
|
- Always keep `--exclude=/sys` (and optionally `--exclude=/proc`, `--exclude=/dev`) to avoid scanning virtual filesystems.
|
||||||
|
|
||||||
|
## Related
|
||||||
|
|
||||||
|
- [ClamAV Documentation](https://docs.clamav.net/)
|
||||||
|
- [[02-selfhosting/security/linux-server-hardening-checklist|Linux Server Hardening Checklist]]
|
||||||
@@ -53,3 +53,4 @@
|
|||||||
* [mdadm RAID Recovery After USB Hub Disconnect](05-troubleshooting/storage/mdadm-usb-hub-disconnect-recovery.md)
|
* [mdadm RAID Recovery After USB Hub Disconnect](05-troubleshooting/storage/mdadm-usb-hub-disconnect-recovery.md)
|
||||||
* [Windows OpenSSH Server (sshd) Stops After Reboot](05-troubleshooting/networking/windows-sshd-stops-after-reboot.md)
|
* [Windows OpenSSH Server (sshd) Stops After Reboot](05-troubleshooting/networking/windows-sshd-stops-after-reboot.md)
|
||||||
* [Ollama Drops Off Tailscale When Mac Sleeps](05-troubleshooting/ollama-macos-sleep-tailscale-disconnect.md)
|
* [Ollama Drops Off Tailscale When Mac Sleeps](05-troubleshooting/ollama-macos-sleep-tailscale-disconnect.md)
|
||||||
|
* [ClamAV CPU Spike: Safe Scheduling with nice/ionice](05-troubleshooting/security/clamscan-cpu-spike-nice-ionice.md)
|
||||||
|
|||||||
Reference in New Issue
Block a user