wiki: add SELinux AVC chart, enriched alerts, new server setup, and pending articles; update indexes
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
This commit is contained in:
29
02-selfhosting/dns-networking/network-overview.md
Normal file
29
02-selfhosting/dns-networking/network-overview.md
Normal file
@@ -0,0 +1,29 @@
|
||||
# 🌐 Network Overview
|
||||
|
||||
The **[[MajorInfrastructure|MajorsHouse]]** infrastructure is connected via a private **[[Network Overview#Tailscale|Tailscale]]** mesh network. This allows secure, peer-to-peer communication between devices across different geographic locations (US and UK) without exposing services to the public internet.
|
||||
|
||||
## 🏛️ Infrastructure Summary
|
||||
|
||||
- **Address Space:** 100.x.x.x (Tailscale CGNAT)
|
||||
- **Management:** Centralized via **[[Network Overview#Ansible|Ansible]]** (`MajorAnsible` repo)
|
||||
- **Host Groupings:** Functional (web, mail, homelab, bots), OS (Fedora, Ubuntu), and Location (US, UK).
|
||||
|
||||
## 🌍 Geographic Nodes
|
||||
|
||||
| Host | Location | IP | OS |
|
||||
|---|---|---|---|
|
||||
| `[[dca|dca]]` | 🇺🇸 US | 100.104.11.146 | Ubuntu 24.04 |
|
||||
| `[[majortoot|majortoot]]` | 🇺🇸 US | 100.110.197.17 | Ubuntu 24.04 |
|
||||
| `[[majorhome|majorhome]]` | 🇺🇸 US | 100.120.209.106 | Fedora 43 |
|
||||
| `[[teelia|teelia]]` | 🇬🇧 UK | 100.120.32.69 | Ubuntu 24.04 |
|
||||
|
||||
## 🔗 Tailscale Setup
|
||||
|
||||
Tailscale is configured as a persistent service on all nodes. Key features used include:
|
||||
|
||||
- **Tailscale SSH:** Enabled for secure management via Ansible.
|
||||
- **MagicDNS:** Used for internal hostname resolution (e.g., `majorlab.tailscale.net`).
|
||||
- **ACLs:** Managed via the Tailscale admin console to restrict cross-group communication where necessary.
|
||||
|
||||
---
|
||||
*Last updated: 2026-03-04*
|
||||
157
02-selfhosting/docker/docker-healthchecks.md
Normal file
157
02-selfhosting/docker/docker-healthchecks.md
Normal file
@@ -0,0 +1,157 @@
|
||||
---
|
||||
title: "Docker Healthchecks"
|
||||
domain: selfhosting
|
||||
category: docker
|
||||
tags: [docker, healthcheck, monitoring, uptime-kuma, compose]
|
||||
status: published
|
||||
created: 2026-03-23
|
||||
updated: 2026-03-23
|
||||
---
|
||||
|
||||
# Docker Healthchecks
|
||||
|
||||
A Docker healthcheck tells the daemon (and any monitoring tool) whether a container is actually working — not just running. Without one, a container shows as `Up` even if the app inside is crashed, deadlocked, or waiting on a dependency.
|
||||
|
||||
## Why It Matters
|
||||
|
||||
Tools like Uptime Kuma report containers without healthchecks as:
|
||||
|
||||
> Container has not reported health and is currently running. As it is running, it is considered UP. Consider adding a health check for better service visibility.
|
||||
|
||||
A healthcheck upgrades that to a real `(healthy)` or `(unhealthy)` status, making monitoring meaningful.
|
||||
|
||||
## Basic Syntax (docker-compose)
|
||||
|
||||
```yaml
|
||||
healthcheck:
|
||||
test: ["CMD", "wget", "-q", "--spider", "http://localhost:8080/health"]
|
||||
interval: 30s
|
||||
timeout: 10s
|
||||
retries: 3
|
||||
start_period: 30s
|
||||
```
|
||||
|
||||
| Field | Description |
|
||||
|---|---|
|
||||
| `test` | Command to run. Exit 0 = healthy, non-zero = unhealthy. |
|
||||
| `interval` | How often to run the check. |
|
||||
| `timeout` | How long to wait before marking as failed. |
|
||||
| `retries` | Failures before marking `unhealthy`. |
|
||||
| `start_period` | Grace period on startup before failures count. |
|
||||
|
||||
## Common Patterns
|
||||
|
||||
### HTTP service (wget — available in Alpine)
|
||||
```yaml
|
||||
healthcheck:
|
||||
test: ["CMD", "wget", "-q", "--spider", "http://localhost:2368/"]
|
||||
interval: 30s
|
||||
timeout: 10s
|
||||
retries: 3
|
||||
start_period: 30s
|
||||
```
|
||||
|
||||
### HTTP service (curl)
|
||||
```yaml
|
||||
healthcheck:
|
||||
test: ["CMD", "curl", "-f", "http://localhost:8080/health"]
|
||||
interval: 30s
|
||||
timeout: 10s
|
||||
retries: 3
|
||||
start_period: 30s
|
||||
```
|
||||
|
||||
### MySQL / MariaDB
|
||||
```yaml
|
||||
healthcheck:
|
||||
test: ["CMD", "mysqladmin", "ping", "-h", "localhost", "-u", "root", "-psecret"]
|
||||
interval: 10s
|
||||
timeout: 5s
|
||||
retries: 3
|
||||
start_period: 20s
|
||||
```
|
||||
|
||||
### PostgreSQL
|
||||
```yaml
|
||||
healthcheck:
|
||||
test: ["CMD-SHELL", "pg_isready -U postgres"]
|
||||
interval: 10s
|
||||
timeout: 5s
|
||||
retries: 5
|
||||
```
|
||||
|
||||
### Redis
|
||||
```yaml
|
||||
healthcheck:
|
||||
test: ["CMD", "redis-cli", "ping"]
|
||||
interval: 10s
|
||||
timeout: 5s
|
||||
retries: 3
|
||||
```
|
||||
|
||||
### TCP port check (no curl/wget available)
|
||||
```yaml
|
||||
healthcheck:
|
||||
test: ["CMD-SHELL", "nc -z localhost 8080 || exit 1"]
|
||||
interval: 30s
|
||||
timeout: 5s
|
||||
retries: 3
|
||||
```
|
||||
|
||||
## Using Healthchecks with `depends_on`
|
||||
|
||||
Healthchecks enable proper startup ordering. Instead of a fixed sleep, a dependent container waits until its dependency is actually ready:
|
||||
|
||||
```yaml
|
||||
services:
|
||||
app:
|
||||
depends_on:
|
||||
db:
|
||||
condition: service_healthy
|
||||
|
||||
db:
|
||||
image: mysql:8.0
|
||||
healthcheck:
|
||||
test: ["CMD", "mysqladmin", "ping", "-h", "localhost"]
|
||||
interval: 10s
|
||||
timeout: 5s
|
||||
retries: 3
|
||||
start_period: 20s
|
||||
```
|
||||
|
||||
This prevents the classic race condition where the app starts before the database is ready to accept connections.
|
||||
|
||||
## Checking Health Status
|
||||
|
||||
```bash
|
||||
# See health status in container list
|
||||
docker ps
|
||||
|
||||
# Get detailed health info including last check output
|
||||
docker inspect --format='{{json .State.Health}}' <container> | jq
|
||||
```
|
||||
|
||||
## Ghost Example
|
||||
|
||||
Ghost (Alpine-based) uses `wget` rather than `curl`:
|
||||
|
||||
```yaml
|
||||
healthcheck:
|
||||
test: ["CMD", "wget", "-q", "--spider", "http://localhost:2368/ghost/api/v4/admin/site/"]
|
||||
interval: 30s
|
||||
timeout: 10s
|
||||
retries: 3
|
||||
start_period: 30s
|
||||
```
|
||||
|
||||
## Gotchas & Notes
|
||||
|
||||
- **Alpine images** don't have `curl` by default — use `wget` or install curl in the image.
|
||||
- **`start_period`** is critical for slow-starting apps (databases, JVM services). Failures during this window don't count toward `retries`.
|
||||
- **`CMD` vs `CMD-SHELL`** — use `CMD` for direct exec (no shell needed), `CMD-SHELL` when you need pipes, `&&`, or shell builtins.
|
||||
- **Uptime Kuma** will pick up Docker healthcheck status automatically when monitoring via the Docker socket — no extra config needed.
|
||||
|
||||
## See Also
|
||||
|
||||
- [[debugging-broken-docker-containers]]
|
||||
- [[netdata-docker-health-alarm-tuning]]
|
||||
@@ -24,6 +24,7 @@ Guides for running your own services at home, including Docker, reverse proxies,
|
||||
|
||||
- [Tuning Netdata Web Log Alerts](monitoring/tuning-netdata-web-log-alerts.md)
|
||||
- [Tuning Netdata Docker Health Alarms](monitoring/netdata-docker-health-alarm-tuning.md)
|
||||
- [Deploying Netdata to a New Server](monitoring/netdata-new-server-setup.md)
|
||||
|
||||
## Security
|
||||
|
||||
|
||||
@@ -5,7 +5,7 @@ category: monitoring
|
||||
tags: [netdata, docker, nextcloud, alarms, health, monitoring]
|
||||
status: published
|
||||
created: 2026-03-18
|
||||
updated: 2026-03-18
|
||||
updated: 2026-03-22
|
||||
---
|
||||
|
||||
# Tuning Netdata Docker Health Alarms to Prevent Update Flapping
|
||||
@@ -40,7 +40,7 @@ component: Docker
|
||||
every: 30s
|
||||
lookup: average -5m of unhealthy
|
||||
warn: $this > 0
|
||||
delay: down 5m multiplier 1.5 max 30m
|
||||
delay: up 3m down 5m multiplier 1.5 max 30m
|
||||
summary: Docker container ${label:container_name} health
|
||||
info: ${label:container_name} docker container health status is unhealthy
|
||||
to: sysadmin
|
||||
@@ -49,10 +49,38 @@ component: Docker
|
||||
| Setting | Default | Tuned | Effect |
|
||||
|---|---|---|---|
|
||||
| `every` | 10s | 30s | Check less frequently |
|
||||
| `lookup` | average -10s | average -5m | Must be unhealthy for sustained 5 minutes |
|
||||
| `delay` | none | down 5m (max 30m) | Grace period after recovery before clearing |
|
||||
| `lookup` | average -10s | average -5m | Smooths transient unhealthy samples over 5 minutes |
|
||||
| `delay: up 3m` | none | 3m | Won't fire until unhealthy condition persists for 3 continuous minutes |
|
||||
| `delay: down 5m` | none | 5m (max 30m) | Grace period after recovery before clearing |
|
||||
|
||||
A typical Nextcloud AIO update cycle (30–90 seconds of container restarts) won't sustain 5 minutes of unhealthy status, so no alert fires. A genuinely broken container will still be caught.
|
||||
The `up` delay is the critical addition. Nextcloud AIO's `nextcloud-aio-nextcloud` container checks both PostgreSQL (port 5432) and PHP-FPM (port 9000). PHP-FPM takes ~90 seconds to warm up after a restart, causing 2–3 failing health checks before the container becomes healthy. With `delay: up 3m`, Netdata waits for 3 continuous minutes of unhealthy status before firing — absorbing the ~90 second startup window with margin to spare. A genuinely broken container will still trigger the alert.
|
||||
|
||||
## Also: Suppress `docker_container_down` for Normally-Exiting Containers
|
||||
|
||||
Nextcloud AIO runs `borgbackup` (scheduled backups) and `watchtower` (auto-updates) as containers that exit with code 0 after completing their work. The stock `docker_container_down` alarm fires on any exited container, generating false alerts after every nightly cycle.
|
||||
|
||||
Add a second override to the same file using `chart labels` to exclude them:
|
||||
|
||||
```ini
|
||||
# Suppress docker_container_down for Nextcloud AIO containers that exit normally
|
||||
# (borgbackup runs on schedule then exits; watchtower does updates then exits)
|
||||
template: docker_container_down
|
||||
on: docker.container_running_state
|
||||
class: Errors
|
||||
type: Containers
|
||||
component: Docker
|
||||
units: status
|
||||
every: 30s
|
||||
lookup: average -5m of down
|
||||
chart labels: container_name=!nextcloud-aio-borgbackup !nextcloud-aio-watchtower *
|
||||
warn: $this > 0
|
||||
delay: up 3m down 5m multiplier 1.5 max 30m
|
||||
summary: Docker container ${label:container_name} down
|
||||
info: ${label:container_name} docker container is down
|
||||
to: sysadmin
|
||||
```
|
||||
|
||||
The `chart labels` line uses Netdata's simple pattern syntax — `!` prefix excludes a container, `*` matches everything else. All other exited containers still alert normally.
|
||||
|
||||
## Applying the Config
|
||||
|
||||
@@ -74,7 +102,7 @@ In the Netdata UI, navigate to **Alerts → Manage Alerts** and search for `dock
|
||||
|
||||
## Notes
|
||||
|
||||
- This only overrides the `docker_container_unhealthy` alarm. The `docker_container_down` alarm (for exited containers) is left at its default — it already has a `delay: down 1m` and is disabled by default (`chart labels: container_name=!*`).
|
||||
- Both `docker_container_unhealthy` and `docker_container_down` are overridden in this config. Any container not explicitly excluded in the `chart labels` filter will still alert normally.
|
||||
- If you want per-container silencing instead of a blanket delay, use the `host labels` or `chart labels` filter to scope the alarm to specific containers.
|
||||
- Config volume path on majorlab: `/var/lib/docker/volumes/netdata_netdataconfig/_data/`
|
||||
|
||||
|
||||
159
02-selfhosting/monitoring/netdata-n8n-enriched-alerts.md
Normal file
159
02-selfhosting/monitoring/netdata-n8n-enriched-alerts.md
Normal file
@@ -0,0 +1,159 @@
|
||||
# Netdata → n8n Enriched Alert Emails
|
||||
|
||||
**Status:** Live across all MajorsHouse fleet servers as of 2026-03-21
|
||||
|
||||
Replaces Netdata's plain-text alert emails with rich HTML emails that include a plain-English explanation, a suggested remediation command, and a direct link to the relevant MajorWiki article.
|
||||
|
||||
---
|
||||
|
||||
## How It Works
|
||||
|
||||
```
|
||||
Netdata alarm fires
|
||||
→ custom_sender() in health_alarm_notify.conf
|
||||
→ POST JSON payload to n8n webhook
|
||||
→ Code node enriches with suggestion + wiki link
|
||||
→ Send Email node sends HTML email via SMTP
|
||||
→ Respond node returns 200 OK
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## n8n Workflow
|
||||
|
||||
**Name:** Netdata Enriched Alerts
|
||||
**URL:** https://n8n.majorshouse.com
|
||||
**Webhook endpoint:** `POST https://n8n.majorshouse.com/webhook/netdata-alert`
|
||||
**Workflow ID:** `a1b2c3d4-aaaa-bbbb-cccc-000000000001`
|
||||
|
||||
### Nodes
|
||||
|
||||
1. **Netdata Webhook** — receives POST from Netdata's `custom_sender()`
|
||||
2. **Enrich Alert** — Code node; matches alarm/chart/family to enrichment table, builds HTML email body in `$json.emailBody`
|
||||
3. **Send Enriched Email** — sends via SMTP port 465 (SMTP account 2), from `netdata@majorshouse.com` to `marcus@majorshouse.com`
|
||||
4. **Respond OK** — returns `ok` with HTTP 200 to Netdata
|
||||
|
||||
### Enrichment Keys
|
||||
|
||||
The Code node matches on `alarm`, `chart`, or `family` field (case-insensitive substring):
|
||||
|
||||
| Key | Title | Wiki Article | Notes |
|
||||
|-----|-------|-------------|-------|
|
||||
| `disk_space` | Disk Space Alert | snapraid-mergerfs-setup | |
|
||||
| `ram` | Memory Alert | managing-linux-services-systemd-ansible | |
|
||||
| `cpu` | CPU Alert | managing-linux-services-systemd-ansible | |
|
||||
| `load` | Load Average Alert | managing-linux-services-systemd-ansible | |
|
||||
| `net` | Network Alert | tailscale-homelab-remote-access | |
|
||||
| `docker` | Docker Container Alert | debugging-broken-docker-containers | |
|
||||
| `web_log` | Web Log Alert | tuning-netdata-web-log-alerts | Hostname-aware suggestion (see below) |
|
||||
| `health` | Docker Health Alarm | netdata-docker-health-alarm-tuning | |
|
||||
| `mdstat` | RAID Array Alert | mdadm-usb-hub-disconnect-recovery | |
|
||||
| `systemd` | Systemd Service Alert | docker-caddy-selinux-post-reboot-recovery | |
|
||||
| _(no match)_ | Server Alert | netdata-new-server-setup | |
|
||||
|
||||
> [!info] web_log hostname-aware suggestion (updated 2026-03-24)
|
||||
> The `web_log` suggestion branches on `hostname` in the Code node:
|
||||
> - **`majorlab`** → Check `docker logs caddy` (Caddy reverse proxy)
|
||||
> - **`teelia`, `majorlinux`, `dca`** → Check Apache logs + Fail2ban jail status
|
||||
> - **other** → Generic web server log guidance
|
||||
|
||||
---
|
||||
|
||||
## Netdata Configuration
|
||||
|
||||
### Config File Locations
|
||||
|
||||
| Server | Path |
|
||||
|--------|------|
|
||||
| majorhome, majormail, majordiscord, tttpod, teelia | `/etc/netdata/health_alarm_notify.conf` |
|
||||
| majorlinux, majortoot, dca | `/usr/lib/netdata/conf.d/health_alarm_notify.conf` |
|
||||
|
||||
### Required Settings
|
||||
|
||||
```bash
|
||||
DEFAULT_RECIPIENT_CUSTOM="n8n"
|
||||
role_recipients_custom[sysadmin]="${DEFAULT_RECIPIENT_CUSTOM}"
|
||||
```
|
||||
|
||||
### custom_sender() Function
|
||||
|
||||
```bash
|
||||
custom_sender() {
|
||||
local to="${1}"
|
||||
local payload
|
||||
payload=$(jq -n \
|
||||
--arg hostname "${host}" \
|
||||
--arg alarm "${name}" \
|
||||
--arg chart "${chart}" \
|
||||
--arg family "${family}" \
|
||||
--arg status "${status}" \
|
||||
--arg old_status "${old_status}" \
|
||||
--arg value "${value_string}" \
|
||||
--arg units "${units}" \
|
||||
--arg info "${info}" \
|
||||
--arg alert_url "${goto_url}" \
|
||||
--arg severity "${severity}" \
|
||||
--arg raised_for "${raised_for}" \
|
||||
--arg total_warnings "${total_warnings}" \
|
||||
--arg total_critical "${total_critical}" \
|
||||
'{hostname:$hostname,alarm:$alarm,chart:$chart,family:$family,status:$status,old_status:$old_status,value:$value,units:$units,info:$info,alert_url:$alert_url,severity:$severity,raised_for:$raised_for,total_warnings:$total_warnings,total_critical:$total_critical}')
|
||||
local httpcode
|
||||
httpcode=$(docurl -s -o /dev/null -w "%{http_code}" \
|
||||
-X POST \
|
||||
-H "Content-Type: application/json" \
|
||||
-d "${payload}" \
|
||||
"https://n8n.majorshouse.com/webhook/netdata-alert")
|
||||
if [ "${httpcode}" = "200" ]; then
|
||||
info "sent enriched notification to n8n for ${status} of ${host}.${name}"
|
||||
sent=$((sent + 1))
|
||||
else
|
||||
error "failed to send notification to n8n, HTTP code: ${httpcode}"
|
||||
fi
|
||||
}
|
||||
```
|
||||
|
||||
!!! note "jq required"
|
||||
The `custom_sender()` function requires `jq` to be installed. Verify with `which jq` on each server.
|
||||
|
||||
---
|
||||
|
||||
## Deploying to a New Server
|
||||
|
||||
```bash
|
||||
# 1. Find the config file
|
||||
find /etc/netdata /usr/lib/netdata -name health_alarm_notify.conf 2>/dev/null
|
||||
|
||||
# 2. Edit it — add the two lines and the custom_sender() function above
|
||||
|
||||
# 3. Test connectivity from the server
|
||||
curl -s -o /dev/null -w "%{http_code}" \
|
||||
-X POST https://n8n.majorshouse.com/webhook/netdata-alert \
|
||||
-H "Content-Type: application/json" \
|
||||
-d '{"hostname":"test","alarm":"disk_space._","status":"WARNING"}'
|
||||
# Expected: 200
|
||||
|
||||
# 4. Restart Netdata
|
||||
systemctl restart netdata
|
||||
|
||||
# 5. Send a test alarm
|
||||
/usr/libexec/netdata/plugins.d/alarm-notify.sh test custom
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Troubleshooting
|
||||
|
||||
**Emails not arriving — check n8n execution log:**
|
||||
Go to https://n8n.majorshouse.com → open "Netdata Enriched Alerts" → Executions tab. Look for `error` status entries.
|
||||
|
||||
**Email body empty:**
|
||||
The Send Email node's HTML field must be `={{ $json.emailBody }}`. Shell variable expansion can silently strip `$json` if the workflow is patched via inline SSH commands — always use a Python script file.
|
||||
|
||||
**`000` curl response from a server:**
|
||||
Usually a timeout, not a DNS or connection failure. Re-test with `--max-time 30`.
|
||||
|
||||
**`custom_sender()` syntax error in Netdata logs:**
|
||||
Bash heredocs don't work inside sourced config files. Use `jq -n --arg ...` as shown above — no heredocs.
|
||||
|
||||
**n8n `N8N_TRUST_PROXY` must be set:**
|
||||
Without `N8N_TRUST_PROXY=true` in the Docker environment, Caddy's `X-Forwarded-For` header causes n8n's rate limiter to abort requests before parsing the body. Set in `/opt/n8n/compose.yml`.
|
||||
161
02-selfhosting/monitoring/netdata-new-server-setup.md
Normal file
161
02-selfhosting/monitoring/netdata-new-server-setup.md
Normal file
@@ -0,0 +1,161 @@
|
||||
---
|
||||
title: "Deploying Netdata to a New Server"
|
||||
domain: selfhosting
|
||||
category: monitoring
|
||||
tags: [netdata, monitoring, email, notifications, netdata-cloud, ubuntu, debian, n8n]
|
||||
status: published
|
||||
created: 2026-03-18
|
||||
updated: 2026-03-22
|
||||
---
|
||||
|
||||
# Deploying Netdata to a New Server
|
||||
|
||||
This covers the full Netdata setup for a new server in the fleet: install, email notification config, n8n webhook integration, and Netdata Cloud claim. Applies to Ubuntu/Debian servers.
|
||||
|
||||
## 1. Install Prerequisites
|
||||
|
||||
Install `jq` before anything else. It is required by the `custom_sender()` function in `health_alarm_notify.conf` to build the JSON payload sent to the n8n webhook. **If `jq` is missing, the webhook will fire with an empty body and n8n alert emails will have no information in them.**
|
||||
|
||||
```bash
|
||||
apt install -y jq
|
||||
```
|
||||
|
||||
Verify:
|
||||
|
||||
```bash
|
||||
jq --version
|
||||
```
|
||||
|
||||
## 2. Install Netdata
|
||||
|
||||
Use the official kickstart script:
|
||||
|
||||
```bash
|
||||
wget -O /tmp/netdata-install.sh https://get.netdata.cloud/kickstart.sh
|
||||
sh /tmp/netdata-install.sh --non-interactive --stable-channel --disable-telemetry
|
||||
```
|
||||
|
||||
Verify it's running:
|
||||
|
||||
```bash
|
||||
systemctl is-active netdata
|
||||
curl -s http://localhost:19999/api/v1/info | python3 -c "import sys,json; d=json.load(sys.stdin); print('Netdata', d['version'])"
|
||||
```
|
||||
|
||||
## 3. Configure Email Notifications
|
||||
|
||||
Copy the default config and set the three required values:
|
||||
|
||||
```bash
|
||||
cp /usr/lib/netdata/conf.d/health_alarm_notify.conf /etc/netdata/health_alarm_notify.conf
|
||||
```
|
||||
|
||||
Edit `/etc/netdata/health_alarm_notify.conf`:
|
||||
|
||||
```ini
|
||||
EMAIL_SENDER="netdata@majorshouse.com"
|
||||
SEND_EMAIL="YES"
|
||||
DEFAULT_RECIPIENT_EMAIL="marcus@majorshouse.com"
|
||||
```
|
||||
|
||||
Or apply with `sed` in one shot:
|
||||
|
||||
```bash
|
||||
sed -i 's/^#\?EMAIL_SENDER=.*/EMAIL_SENDER="netdata@majorshouse.com"/' /etc/netdata/health_alarm_notify.conf
|
||||
sed -i 's/^#\?SEND_EMAIL=.*/SEND_EMAIL="YES"/' /etc/netdata/health_alarm_notify.conf
|
||||
sed -i 's/^#\?DEFAULT_RECIPIENT_EMAIL=.*/DEFAULT_RECIPIENT_EMAIL="marcus@majorshouse.com"/' /etc/netdata/health_alarm_notify.conf
|
||||
```
|
||||
|
||||
Restart and test:
|
||||
|
||||
```bash
|
||||
systemctl restart netdata
|
||||
/usr/libexec/netdata/plugins.d/alarm-notify.sh test 2>&1 | grep -E '(OK|FAILED|email)'
|
||||
```
|
||||
|
||||
You should see three `# OK` lines (WARNING → CRITICAL → CLEAR test cycle) and confirmation that email was sent to `marcus@majorshouse.com`.
|
||||
|
||||
> [!note] Delivery via local Postfix
|
||||
> Email is relayed through the server's local Postfix instance. Ensure Postfix is installed and `/usr/sbin/sendmail` resolves.
|
||||
|
||||
## 4. Configure n8n Webhook Notifications
|
||||
|
||||
Copy the `health_alarm_notify.conf` from an existing server (e.g. majormail) which contains the `custom_sender()` function. This sends enriched JSON payloads to the n8n webhook at `https://n8n.majorshouse.com/webhook/netdata-alert`.
|
||||
|
||||
> [!warning] jq required
|
||||
> The `custom_sender()` function uses `jq` to build the JSON payload. If `jq` is not installed, `payload` will be empty, curl will send `Content-Length: 0`, and n8n will produce alert emails with `Host: unknown`, blank alert/value fields, and `Status: UNKNOWN`. Always install `jq` first (Step 1).
|
||||
|
||||
After deploying the config, run a test to confirm the webhook fires correctly:
|
||||
|
||||
```bash
|
||||
systemctl restart netdata
|
||||
/usr/libexec/netdata/plugins.d/alarm-notify.sh test 2>&1 | grep -E '(custom|n8n|OK|FAILED)'
|
||||
```
|
||||
|
||||
Verify in n8n that the latest execution shows a non-empty body with `hostname`, `alarm`, and `status` fields populated.
|
||||
|
||||
## 5. Claim to Netdata Cloud
|
||||
|
||||
Get the claim command from **Netdata Cloud → Space Settings → Nodes → Add Nodes**. It will look like:
|
||||
|
||||
```bash
|
||||
wget -O /tmp/netdata-kickstart.sh https://get.netdata.cloud/kickstart.sh
|
||||
sh /tmp/netdata-kickstart.sh --stable-channel \
|
||||
--claim-token <token> \
|
||||
--claim-rooms <room-id> \
|
||||
--claim-url https://app.netdata.cloud
|
||||
```
|
||||
|
||||
Verify the claim was accepted:
|
||||
|
||||
```bash
|
||||
cat /var/lib/netdata/cloud.d/claimed_id
|
||||
```
|
||||
|
||||
A UUID will be present if claimed successfully. The node should appear in Netdata Cloud within ~60 seconds.
|
||||
|
||||
## 6. Verify Alerts
|
||||
|
||||
Check that no unexpected alerts are active after setup:
|
||||
|
||||
```bash
|
||||
curl -s 'http://localhost:19999/api/v1/alarms?active' | python3 -c "
|
||||
import sys, json
|
||||
d = json.load(sys.stdin)
|
||||
active = [v for v in d.get('alarms', {}).values() if v.get('status') not in ('CLEAR', 'UNINITIALIZED', 'UNDEFINED')]
|
||||
print(f'{len(active)} active alert(s)')
|
||||
for v in active:
|
||||
print(f' [{v[\"status\"]}] {v[\"name\"]} on {v[\"chart\"]}')
|
||||
"
|
||||
```
|
||||
|
||||
## Fleet-wide Alert Check
|
||||
|
||||
To audit all servers at once (requires Tailscale SSH access):
|
||||
|
||||
```bash
|
||||
for host in majorlab majorhome majormail majordiscord majortoot majorlinux tttpod dca teelia; do
|
||||
echo "=== $host ==="
|
||||
ssh root@$host "curl -s 'http://localhost:19999/api/v1/alarms?active' | python3 -c \
|
||||
\"import sys,json; d=json.load(sys.stdin); active=[v for v in d.get('alarms',{}).values() if v.get('status') not in ('CLEAR','UNINITIALIZED','UNDEFINED')]; print(str(len(active))+' active')\""
|
||||
done
|
||||
```
|
||||
|
||||
## Fleet-wide jq Audit
|
||||
|
||||
To check that all servers with `custom_sender` have `jq` installed:
|
||||
|
||||
```bash
|
||||
for host in majorlab majorhome majormail majordiscord majortoot majorlinux tttpod dca teelia; do
|
||||
echo -n "=== $host: "
|
||||
ssh -o ConnectTimeout=5 root@$host \
|
||||
'has_cs=$(grep -l "custom_sender\|n8n.majorshouse.com" /etc/netdata/health_alarm_notify.conf 2>/dev/null | wc -l); has_jq=$(which jq 2>/dev/null && echo yes || echo NO); echo "custom_sender=$has_cs jq=$has_jq"'
|
||||
done
|
||||
```
|
||||
|
||||
Any server showing `custom_sender=1 jq=NO` needs `apt install -y jq` immediately.
|
||||
|
||||
## Related
|
||||
|
||||
- [Tuning Netdata Web Log Alerts](tuning-netdata-web-log-alerts.md)
|
||||
- [Tuning Netdata Docker Health Alarms](netdata-docker-health-alarm-tuning.md)
|
||||
137
02-selfhosting/monitoring/netdata-selinux-avc-chart.md
Normal file
137
02-selfhosting/monitoring/netdata-selinux-avc-chart.md
Normal file
@@ -0,0 +1,137 @@
|
||||
---
|
||||
title: "Netdata SELinux AVC Denial Monitoring"
|
||||
domain: selfhosting
|
||||
category: monitoring
|
||||
tags: [netdata, selinux, fedora, monitoring, ausearch, charts.d]
|
||||
status: published
|
||||
created: 2026-03-27
|
||||
updated: 2026-03-27
|
||||
---
|
||||
|
||||
# Netdata SELinux AVC Denial Monitoring
|
||||
|
||||
A custom `charts.d` plugin that tracks SELinux AVC denials over time via Netdata. Deployed on all Fedora boxes in the fleet where SELinux is Enforcing.
|
||||
|
||||
## What It Does
|
||||
|
||||
The plugin runs `ausearch -m avc` every 60 seconds and reports the count of AVC denial events from the last 10 minutes. This gives a real-time chart in Netdata Cloud showing SELinux denial spikes — useful for catching misconfigurations after service changes or package updates.
|
||||
|
||||
## Where It's Deployed
|
||||
|
||||
| Host | OS | SELinux | Chart Installed |
|
||||
|------|----|---------|-----------------|
|
||||
| majorhome | Fedora 43 | Enforcing | Yes |
|
||||
| majorlab | Fedora 43 | Enforcing | Yes |
|
||||
| majormail | Fedora 43 | Enforcing | Yes |
|
||||
| majordiscord | Fedora 43 | Enforcing | Yes |
|
||||
|
||||
Ubuntu hosts (dca, teelia, tttpod, majortoot, majorlinux) do not run SELinux and do not have this chart.
|
||||
|
||||
## Installation
|
||||
|
||||
### 1. Create the Chart Plugin
|
||||
|
||||
Create `/etc/netdata/charts.d/selinux.chart.sh`:
|
||||
|
||||
```bash
|
||||
cat > /etc/netdata/charts.d/selinux.chart.sh << 'EOF'
|
||||
# SELinux AVC denial counter for Netdata charts.d
|
||||
selinux_update_every=60
|
||||
selinux_priority=90000
|
||||
|
||||
selinux_check() {
|
||||
which ausearch >/dev/null 2>&1 || return 1
|
||||
return 0
|
||||
}
|
||||
|
||||
selinux_create() {
|
||||
cat <<CHART
|
||||
CHART selinux.avc_denials '' 'SELinux AVC Denials (last 10 min)' 'denials' selinux '' line 90000 $selinux_update_every ''
|
||||
DIMENSION denials '' absolute 1 1
|
||||
CHART
|
||||
return 0
|
||||
}
|
||||
|
||||
selinux_update() {
|
||||
local count
|
||||
count=$(sudo /usr/bin/ausearch -m avc -if /var/log/audit/audit.log -ts recent 2>/dev/null | grep -c "type=AVC")
|
||||
echo "BEGIN selinux.avc_denials $1"
|
||||
echo "SET denials = ${count}"
|
||||
echo "END"
|
||||
return 0
|
||||
}
|
||||
EOF
|
||||
```
|
||||
|
||||
### 2. Grant Netdata Sudo Access to ausearch
|
||||
|
||||
`ausearch` requires root to read the audit log. Add a sudoers entry for the `netdata` user:
|
||||
|
||||
```bash
|
||||
echo 'netdata ALL=(root) NOPASSWD: /usr/bin/ausearch -m avc -if /var/log/audit/audit.log -ts recent' > /etc/sudoers.d/netdata-selinux
|
||||
chmod 440 /etc/sudoers.d/netdata-selinux
|
||||
visudo -c
|
||||
```
|
||||
|
||||
The `visudo -c` validates syntax. If it reports errors, fix the file before proceeding — a broken sudoers file can lock out sudo entirely.
|
||||
|
||||
### 3. Restart Netdata
|
||||
|
||||
```bash
|
||||
systemctl restart netdata
|
||||
```
|
||||
|
||||
### 4. Verify
|
||||
|
||||
Check that the chart is collecting data:
|
||||
|
||||
```bash
|
||||
curl -s 'http://localhost:19999/api/v1/chart?chart=selinux.avc_denials' | python3 -c "
|
||||
import sys, json
|
||||
d = json.load(sys.stdin)
|
||||
print(f'Chart: {d[\"id\"]}')
|
||||
print(f'Update every: {d[\"update_every\"]}s')
|
||||
print(f'Type: {d[\"chart_type\"]}')
|
||||
"
|
||||
```
|
||||
|
||||
If the chart doesn't appear, check that `charts.d` is enabled in `/etc/netdata/netdata.conf` and that the plugin file is readable by the `netdata` user.
|
||||
|
||||
## Known Side Effect: pam_systemd Log Noise
|
||||
|
||||
Because the `netdata` user calls `sudo ausearch` every 60 seconds, `pam_systemd` logs a warning each time:
|
||||
|
||||
```
|
||||
pam_systemd(sudo:session): Failed to check if /run/user/0/bus exists, ignoring: Permission denied
|
||||
```
|
||||
|
||||
This is cosmetic. The `sudo` command succeeds — `pam_systemd` just can't find a D-Bus user session for the `netdata` service account, which is expected. The message volume scales with the collection interval (1,440/day at 60-second intervals).
|
||||
|
||||
**To suppress it**, the `system-auth` PAM config on Fedora already marks `pam_systemd.so` as `-session optional` (the `-` prefix means "don't fail if the module errors"). The messages are informational log noise, not actual failures. No PAM changes are needed.
|
||||
|
||||
If the log volume is a concern for log analysis or monitoring, filter it at the journald level:
|
||||
|
||||
```ini
|
||||
# /etc/rsyslog.d/suppress-pam-systemd.conf
|
||||
:msg, contains, "pam_systemd(sudo:session): Failed to check" stop
|
||||
```
|
||||
|
||||
Or in Netdata's log alert config, exclude the pattern from any log-based alerts.
|
||||
|
||||
## Fleet Audit
|
||||
|
||||
To verify the chart is deployed and functioning on all Fedora hosts:
|
||||
|
||||
```bash
|
||||
for host in majorhome majorlab majormail majordiscord; do
|
||||
echo -n "=== $host: "
|
||||
ssh root@$host "curl -s 'http://localhost:19999/api/v1/chart?chart=selinux.avc_denials' 2>/dev/null | python3 -c 'import sys,json; d=json.load(sys.stdin); print(d[\"id\"], \"every\", str(d[\"update_every\"])+\"s\")' 2>/dev/null || echo 'NOT FOUND'"
|
||||
done
|
||||
```
|
||||
|
||||
## Related
|
||||
|
||||
- [Deploying Netdata to a New Server](netdata-new-server-setup.md)
|
||||
- [Tuning Netdata Web Log Alerts](tuning-netdata-web-log-alerts.md)
|
||||
- [Tuning Netdata Docker Health Alarms](netdata-docker-health-alarm-tuning.md)
|
||||
- [SELinux: Fixing Dovecot Mail Spool Context](/05-troubleshooting/selinux-dovecot-vmail-context.md)
|
||||
Reference in New Issue
Block a user