wiki: add SELinux AVC chart, enriched alerts, new server setup, and pending articles; update indexes

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-27 03:34:33 -04:00
parent 38fe720e63
commit fb2e3f6168
18 changed files with 881 additions and 15 deletions
--- a/02-selfhosting/dns-networking/network-overview.md
+++ b/02-selfhosting/dns-networking/network-overview.md
@@ -0,0 +1,29 @@
+# 🌐 Network Overview
+
+The **[[MajorInfrastructure|MajorsHouse]]** infrastructure is connected via a private **[[Network Overview#Tailscale|Tailscale]]** mesh network. This allows secure, peer-to-peer communication between devices across different geographic locations (US and UK) without exposing services to the public internet.
+
+## 🏛️ Infrastructure Summary
+
+- **Address Space:** 100.x.x.x (Tailscale CGNAT)
+- **Management:** Centralized via **[[Network Overview#Ansible|Ansible]]** (`MajorAnsible` repo)
+- **Host Groupings:** Functional (web, mail, homelab, bots), OS (Fedora, Ubuntu), and Location (US, UK).
+
+## 🌍 Geographic Nodes
+
+| Host | Location | IP | OS |
+|---|---|---|---|
+| `[[dca|dca]]` | 🇺🇸 US | 100.104.11.146 | Ubuntu 24.04 |
+| `[[majortoot|majortoot]]` | 🇺🇸 US | 100.110.197.17 | Ubuntu 24.04 |
+| `[[majorhome|majorhome]]` | 🇺🇸 US | 100.120.209.106 | Fedora 43 |
+| `[[teelia|teelia]]` | 🇬🇧 UK | 100.120.32.69 | Ubuntu 24.04 |
+
+## 🔗 Tailscale Setup
+
+Tailscale is configured as a persistent service on all nodes. Key features used include:
+
+- **Tailscale SSH:** Enabled for secure management via Ansible.
+- **MagicDNS:** Used for internal hostname resolution (e.g., `majorlab.tailscale.net`).
+- **ACLs:** Managed via the Tailscale admin console to restrict cross-group communication where necessary.
+
+---
+*Last updated: 2026-03-04*
--- a/02-selfhosting/docker/docker-healthchecks.md
+++ b/02-selfhosting/docker/docker-healthchecks.md
@@ -0,0 +1,157 @@
+---
+title: "Docker Healthchecks"
+domain: selfhosting
+category: docker
+tags: [docker, healthcheck, monitoring, uptime-kuma, compose]
+status: published
+created: 2026-03-23
+updated: 2026-03-23
+---
+
+# Docker Healthchecks
+
+A Docker healthcheck tells the daemon (and any monitoring tool) whether a container is actually working — not just running. Without one, a container shows as `Up` even if the app inside is crashed, deadlocked, or waiting on a dependency.
+
+## Why It Matters
+
+Tools like Uptime Kuma report containers without healthchecks as:
+
+> Container has not reported health and is currently running. As it is running, it is considered UP. Consider adding a health check for better service visibility.
+
+A healthcheck upgrades that to a real `(healthy)` or `(unhealthy)` status, making monitoring meaningful.
+
+## Basic Syntax (docker-compose)
+
+```yaml
+healthcheck:
+  test: ["CMD", "wget", "-q", "--spider", "http://localhost:8080/health"]
+  interval: 30s
+  timeout: 10s
+  retries: 3
+  start_period: 30s
+```
+
+| Field | Description |
+|---|---|
+| `test` | Command to run. Exit 0 = healthy, non-zero = unhealthy. |
+| `interval` | How often to run the check. |
+| `timeout` | How long to wait before marking as failed. |
+| `retries` | Failures before marking `unhealthy`. |
+| `start_period` | Grace period on startup before failures count. |
+
+## Common Patterns
+
+### HTTP service (wget — available in Alpine)
+```yaml
+healthcheck:
+  test: ["CMD", "wget", "-q", "--spider", "http://localhost:2368/"]
+  interval: 30s
+  timeout: 10s
+  retries: 3
+  start_period: 30s
+```
+
+### HTTP service (curl)
+```yaml
+healthcheck:
+  test: ["CMD", "curl", "-f", "http://localhost:8080/health"]
+  interval: 30s
+  timeout: 10s
+  retries: 3
+  start_period: 30s
+```
+
+### MySQL / MariaDB
+```yaml
+healthcheck:
+  test: ["CMD", "mysqladmin", "ping", "-h", "localhost", "-u", "root", "-psecret"]
+  interval: 10s
+  timeout: 5s
+  retries: 3
+  start_period: 20s
+```
+
+### PostgreSQL
+```yaml
+healthcheck:
+  test: ["CMD-SHELL", "pg_isready -U postgres"]
+  interval: 10s
+  timeout: 5s
+  retries: 5
+```
+
+### Redis
+```yaml
+healthcheck:
+  test: ["CMD", "redis-cli", "ping"]
+  interval: 10s
+  timeout: 5s
+  retries: 3
+```
+
+### TCP port check (no curl/wget available)
+```yaml
+healthcheck:
+  test: ["CMD-SHELL", "nc -z localhost 8080 || exit 1"]
+  interval: 30s
+  timeout: 5s
+  retries: 3
+```
+
+## Using Healthchecks with `depends_on`
+
+Healthchecks enable proper startup ordering. Instead of a fixed sleep, a dependent container waits until its dependency is actually ready:
+
+```yaml
+services:
+  app:
+    depends_on:
+      db:
+        condition: service_healthy
+
+  db:
+    image: mysql:8.0
+    healthcheck:
+      test: ["CMD", "mysqladmin", "ping", "-h", "localhost"]
+      interval: 10s
+      timeout: 5s
+      retries: 3
+      start_period: 20s
+```
+
+This prevents the classic race condition where the app starts before the database is ready to accept connections.
+
+## Checking Health Status
+
+```bash
+# See health status in container list
+docker ps
+
+# Get detailed health info including last check output
+docker inspect --format='{{json .State.Health}}' <container> | jq
+```
+
+## Ghost Example
+
+Ghost (Alpine-based) uses `wget` rather than `curl`:
+
+```yaml
+healthcheck:
+  test: ["CMD", "wget", "-q", "--spider", "http://localhost:2368/ghost/api/v4/admin/site/"]
+  interval: 30s
+  timeout: 10s
+  retries: 3
+  start_period: 30s
+```
+
+## Gotchas & Notes
+
+- **Alpine images** don't have `curl` by default — use `wget` or install curl in the image.
+- **`start_period`** is critical for slow-starting apps (databases, JVM services). Failures during this window don't count toward `retries`.
+- **`CMD` vs `CMD-SHELL`** — use `CMD` for direct exec (no shell needed), `CMD-SHELL` when you need pipes, `&&`, or shell builtins.
+- **Uptime Kuma** will pick up Docker healthcheck status automatically when monitoring via the Docker socket — no extra config needed.
+
+## See Also
+
+- [[debugging-broken-docker-containers]]
+- [[netdata-docker-health-alarm-tuning]]
--- a/02-selfhosting/index.md
+++ b/02-selfhosting/index.md
@@ -24,6 +24,7 @@ Guides for running your own services at home, including Docker, reverse proxies,

 - [Tuning Netdata Web Log Alerts](monitoring/tuning-netdata-web-log-alerts.md)
 - [Tuning Netdata Docker Health Alarms](monitoring/netdata-docker-health-alarm-tuning.md)
+- [Deploying Netdata to a New Server](monitoring/netdata-new-server-setup.md)

 ## Security

--- a/02-selfhosting/monitoring/netdata-docker-health-alarm-tuning.md
+++ b/02-selfhosting/monitoring/netdata-docker-health-alarm-tuning.md
@@ -5,7 +5,7 @@ category: monitoring
 tags: [netdata, docker, nextcloud, alarms, health, monitoring]
 status: published
 created: 2026-03-18
-updated: 2026-03-18
+updated: 2026-03-22
 ---

 # Tuning Netdata Docker Health Alarms to Prevent Update Flapping
@@ -40,7 +40,7 @@ component: Docker
    every: 30s
   lookup: average -5m of unhealthy
     warn: $this > 0
-    delay: down 5m multiplier 1.5 max 30m
+    delay: up 3m down 5m multiplier 1.5 max 30m
  summary: Docker container ${label:container_name} health
     info: ${label:container_name} docker container health status is unhealthy
       to: sysadmin
@@ -49,10 +49,38 @@ component: Docker
 | Setting | Default | Tuned | Effect |
 |---|---|---|---|
 | `every` | 10s | 30s | Check less frequently |
-| `lookup` | average -10s | average -5m | Must be unhealthy for sustained 5 minutes |
-| `delay` | none | down 5m (max 30m) | Grace period after recovery before clearing |
+| `lookup` | average -10s | average -5m | Smooths transient unhealthy samples over 5 minutes |
+| `delay: up 3m` | none | 3m | Won't fire until unhealthy condition persists for 3 continuous minutes |
+| `delay: down 5m` | none | 5m (max 30m) | Grace period after recovery before clearing |

-A typical Nextcloud AIO update cycle (30–90 seconds of container restarts) won't sustain 5 minutes of unhealthy status, so no alert fires. A genuinely broken container will still be caught.
+The `up` delay is the critical addition. Nextcloud AIO's `nextcloud-aio-nextcloud` container checks both PostgreSQL (port 5432) and PHP-FPM (port 9000). PHP-FPM takes ~90 seconds to warm up after a restart, causing 2–3 failing health checks before the container becomes healthy. With `delay: up 3m`, Netdata waits for 3 continuous minutes of unhealthy status before firing — absorbing the ~90 second startup window with margin to spare. A genuinely broken container will still trigger the alert.
+
+## Also: Suppress `docker_container_down` for Normally-Exiting Containers
+
+Nextcloud AIO runs `borgbackup` (scheduled backups) and `watchtower` (auto-updates) as containers that exit with code 0 after completing their work. The stock `docker_container_down` alarm fires on any exited container, generating false alerts after every nightly cycle.
+
+Add a second override to the same file using `chart labels` to exclude them:
+
+```ini
+# Suppress docker_container_down for Nextcloud AIO containers that exit normally
+# (borgbackup runs on schedule then exits; watchtower does updates then exits)
+template: docker_container_down
+       on: docker.container_running_state
+    class: Errors
+     type: Containers
+component: Docker
+    units: status
+    every: 30s
+   lookup: average -5m of down
+chart labels: container_name=!nextcloud-aio-borgbackup !nextcloud-aio-watchtower *
+     warn: $this > 0
+    delay: up 3m down 5m multiplier 1.5 max 30m
+  summary: Docker container ${label:container_name} down
+     info: ${label:container_name} docker container is down
+       to: sysadmin
+```
+
+The `chart labels` line uses Netdata's simple pattern syntax — `!` prefix excludes a container, `*` matches everything else. All other exited containers still alert normally.

 ## Applying the Config

@@ -74,7 +102,7 @@ In the Netdata UI, navigate to **Alerts → Manage Alerts** and search for `dock

 ## Notes

- This only overrides the `docker_container_unhealthy` alarm. The `docker_container_down` alarm (for exited containers) is left at its default — it already has a `delay: down 1m` and is disabled by default (`chart labels: container_name=!*`).
+- Both `docker_container_unhealthy` and `docker_container_down` are overridden in this config. Any container not explicitly excluded in the `chart labels` filter will still alert normally.
 - If you want per-container silencing instead of a blanket delay, use the `host labels` or `chart labels` filter to scope the alarm to specific containers.
 - Config volume path on majorlab: `/var/lib/docker/volumes/netdata_netdataconfig/_data/`

--- a/02-selfhosting/monitoring/netdata-n8n-enriched-alerts.md
+++ b/02-selfhosting/monitoring/netdata-n8n-enriched-alerts.md
@@ -0,0 +1,159 @@
+# Netdata → n8n Enriched Alert Emails
+
+**Status:** Live across all MajorsHouse fleet servers as of 2026-03-21
+
+Replaces Netdata's plain-text alert emails with rich HTML emails that include a plain-English explanation, a suggested remediation command, and a direct link to the relevant MajorWiki article.
+
+---
+
+## How It Works
+
+```
+Netdata alarm fires
+  → custom_sender() in health_alarm_notify.conf
+    → POST JSON payload to n8n webhook
+      → Code node enriches with suggestion + wiki link
+        → Send Email node sends HTML email via SMTP
+          → Respond node returns 200 OK
+```
+
+---
+
+## n8n Workflow
+
+**Name:** Netdata Enriched Alerts  
+**URL:** https://n8n.majorshouse.com  
+**Webhook endpoint:** `POST https://n8n.majorshouse.com/webhook/netdata-alert`  
+**Workflow ID:** `a1b2c3d4-aaaa-bbbb-cccc-000000000001`
+
+### Nodes
+
+1. **Netdata Webhook** — receives POST from Netdata's `custom_sender()`
+2. **Enrich Alert** — Code node; matches alarm/chart/family to enrichment table, builds HTML email body in `$json.emailBody`
+3. **Send Enriched Email** — sends via SMTP port 465 (SMTP account 2), from `netdata@majorshouse.com` to `marcus@majorshouse.com`
+4. **Respond OK** — returns `ok` with HTTP 200 to Netdata
+
+### Enrichment Keys
+
+The Code node matches on `alarm`, `chart`, or `family` field (case-insensitive substring):
+
+| Key | Title | Wiki Article | Notes |
+|-----|-------|-------------|-------|
+| `disk_space` | Disk Space Alert | snapraid-mergerfs-setup | |
+| `ram` | Memory Alert | managing-linux-services-systemd-ansible | |
+| `cpu` | CPU Alert | managing-linux-services-systemd-ansible | |
+| `load` | Load Average Alert | managing-linux-services-systemd-ansible | |
+| `net` | Network Alert | tailscale-homelab-remote-access | |
+| `docker` | Docker Container Alert | debugging-broken-docker-containers | |
+| `web_log` | Web Log Alert | tuning-netdata-web-log-alerts | Hostname-aware suggestion (see below) |
+| `health` | Docker Health Alarm | netdata-docker-health-alarm-tuning | |
+| `mdstat` | RAID Array Alert | mdadm-usb-hub-disconnect-recovery | |
+| `systemd` | Systemd Service Alert | docker-caddy-selinux-post-reboot-recovery | |
+| _(no match)_ | Server Alert | netdata-new-server-setup | |
+
+> [!info] web_log hostname-aware suggestion (updated 2026-03-24)
+> The `web_log` suggestion branches on `hostname` in the Code node:
+> - **`majorlab`** → Check `docker logs caddy` (Caddy reverse proxy)
+> - **`teelia`, `majorlinux`, `dca`** → Check Apache logs + Fail2ban jail status
+> - **other** → Generic web server log guidance
+
+---
+
+## Netdata Configuration
+
+### Config File Locations
+
+| Server | Path |
+|--------|------|
+| majorhome, majormail, majordiscord, tttpod, teelia | `/etc/netdata/health_alarm_notify.conf` |
+| majorlinux, majortoot, dca | `/usr/lib/netdata/conf.d/health_alarm_notify.conf` |
+
+### Required Settings
+
+```bash
+DEFAULT_RECIPIENT_CUSTOM="n8n"
+role_recipients_custom[sysadmin]="${DEFAULT_RECIPIENT_CUSTOM}"
+```
+
+### custom_sender() Function
+
+```bash
+custom_sender() {
+    local to="${1}"
+    local payload
+    payload=$(jq -n \
+        --arg hostname "${host}" \
+        --arg alarm "${name}" \
+        --arg chart "${chart}" \
+        --arg family "${family}" \
+        --arg status "${status}" \
+        --arg old_status "${old_status}" \
+        --arg value "${value_string}" \
+        --arg units "${units}" \
+        --arg info "${info}" \
+        --arg alert_url "${goto_url}" \
+        --arg severity "${severity}" \
+        --arg raised_for "${raised_for}" \
+        --arg total_warnings "${total_warnings}" \
+        --arg total_critical "${total_critical}" \
+        '{hostname:$hostname,alarm:$alarm,chart:$chart,family:$family,status:$status,old_status:$old_status,value:$value,units:$units,info:$info,alert_url:$alert_url,severity:$severity,raised_for:$raised_for,total_warnings:$total_warnings,total_critical:$total_critical}')
+    local httpcode
+    httpcode=$(docurl -s -o /dev/null -w "%{http_code}" \
+        -X POST \
+        -H "Content-Type: application/json" \
+        -d "${payload}" \
+        "https://n8n.majorshouse.com/webhook/netdata-alert")
+    if [ "${httpcode}" = "200" ]; then
+        info "sent enriched notification to n8n for ${status} of ${host}.${name}"
+        sent=$((sent + 1))
+    else
+        error "failed to send notification to n8n, HTTP code: ${httpcode}"
+    fi
+}
+```
+
+!!! note "jq required"
+    The `custom_sender()` function requires `jq` to be installed. Verify with `which jq` on each server.
+
+---
+
+## Deploying to a New Server
+
+```bash
+# 1. Find the config file
+find /etc/netdata /usr/lib/netdata -name health_alarm_notify.conf 2>/dev/null
+
+# 2. Edit it — add the two lines and the custom_sender() function above
+
+# 3. Test connectivity from the server
+curl -s -o /dev/null -w "%{http_code}" \
+  -X POST https://n8n.majorshouse.com/webhook/netdata-alert \
+  -H "Content-Type: application/json" \
+  -d '{"hostname":"test","alarm":"disk_space._","status":"WARNING"}'
+# Expected: 200
+
+# 4. Restart Netdata
+systemctl restart netdata
+
+# 5. Send a test alarm
+/usr/libexec/netdata/plugins.d/alarm-notify.sh test custom
+```
+
+---
+
+## Troubleshooting
+
+**Emails not arriving — check n8n execution log:**  
+Go to https://n8n.majorshouse.com → open "Netdata Enriched Alerts" → Executions tab. Look for `error` status entries.
+
+**Email body empty:**  
+The Send Email node's HTML field must be `={{ $json.emailBody }}`. Shell variable expansion can silently strip `$json` if the workflow is patched via inline SSH commands — always use a Python script file.
+
+**`000` curl response from a server:**  
+Usually a timeout, not a DNS or connection failure. Re-test with `--max-time 30`.
+
+**`custom_sender()` syntax error in Netdata logs:**  
+Bash heredocs don't work inside sourced config files. Use `jq -n --arg ...` as shown above — no heredocs.
+
+**n8n `N8N_TRUST_PROXY` must be set:**  
+Without `N8N_TRUST_PROXY=true` in the Docker environment, Caddy's `X-Forwarded-For` header causes n8n's rate limiter to abort requests before parsing the body. Set in `/opt/n8n/compose.yml`.
--- a/02-selfhosting/monitoring/netdata-new-server-setup.md
+++ b/02-selfhosting/monitoring/netdata-new-server-setup.md
@@ -0,0 +1,161 @@
+---
+title: "Deploying Netdata to a New Server"
+domain: selfhosting
+category: monitoring
+tags: [netdata, monitoring, email, notifications, netdata-cloud, ubuntu, debian, n8n]
+status: published
+created: 2026-03-18
+updated: 2026-03-22
+---
+
+# Deploying Netdata to a New Server
+
+This covers the full Netdata setup for a new server in the fleet: install, email notification config, n8n webhook integration, and Netdata Cloud claim. Applies to Ubuntu/Debian servers.
+
+## 1. Install Prerequisites
+
+Install `jq` before anything else. It is required by the `custom_sender()` function in `health_alarm_notify.conf` to build the JSON payload sent to the n8n webhook. **If `jq` is missing, the webhook will fire with an empty body and n8n alert emails will have no information in them.**
+
+```bash
+apt install -y jq
+```
+
+Verify:
+
+```bash
+jq --version
+```
+
+## 2. Install Netdata
+
+Use the official kickstart script:
+
+```bash
+wget -O /tmp/netdata-install.sh https://get.netdata.cloud/kickstart.sh
+sh /tmp/netdata-install.sh --non-interactive --stable-channel --disable-telemetry
+```
+
+Verify it's running:
+
+```bash
+systemctl is-active netdata
+curl -s http://localhost:19999/api/v1/info | python3 -c "import sys,json; d=json.load(sys.stdin); print('Netdata', d['version'])"
+```
+
+## 3. Configure Email Notifications
+
+Copy the default config and set the three required values:
+
+```bash
+cp /usr/lib/netdata/conf.d/health_alarm_notify.conf /etc/netdata/health_alarm_notify.conf
+```
+
+Edit `/etc/netdata/health_alarm_notify.conf`:
+
+```ini
+EMAIL_SENDER="netdata@majorshouse.com"
+SEND_EMAIL="YES"
+DEFAULT_RECIPIENT_EMAIL="marcus@majorshouse.com"
+```
+
+Or apply with `sed` in one shot:
+
+```bash
+sed -i 's/^#\?EMAIL_SENDER=.*/EMAIL_SENDER="netdata@majorshouse.com"/' /etc/netdata/health_alarm_notify.conf
+sed -i 's/^#\?SEND_EMAIL=.*/SEND_EMAIL="YES"/' /etc/netdata/health_alarm_notify.conf
+sed -i 's/^#\?DEFAULT_RECIPIENT_EMAIL=.*/DEFAULT_RECIPIENT_EMAIL="marcus@majorshouse.com"/' /etc/netdata/health_alarm_notify.conf
+```
+
+Restart and test:
+
+```bash
+systemctl restart netdata
+/usr/libexec/netdata/plugins.d/alarm-notify.sh test 2>&1 | grep -E '(OK|FAILED|email)'
+```
+
+You should see three `# OK` lines (WARNING → CRITICAL → CLEAR test cycle) and confirmation that email was sent to `marcus@majorshouse.com`.
+
+> [!note] Delivery via local Postfix
+> Email is relayed through the server's local Postfix instance. Ensure Postfix is installed and `/usr/sbin/sendmail` resolves.
+
+## 4. Configure n8n Webhook Notifications
+
+Copy the `health_alarm_notify.conf` from an existing server (e.g. majormail) which contains the `custom_sender()` function. This sends enriched JSON payloads to the n8n webhook at `https://n8n.majorshouse.com/webhook/netdata-alert`.
+
+> [!warning] jq required
+> The `custom_sender()` function uses `jq` to build the JSON payload. If `jq` is not installed, `payload` will be empty, curl will send `Content-Length: 0`, and n8n will produce alert emails with `Host: unknown`, blank alert/value fields, and `Status: UNKNOWN`. Always install `jq` first (Step 1).
+
+After deploying the config, run a test to confirm the webhook fires correctly:
+
+```bash
+systemctl restart netdata
+/usr/libexec/netdata/plugins.d/alarm-notify.sh test 2>&1 | grep -E '(custom|n8n|OK|FAILED)'
+```
+
+Verify in n8n that the latest execution shows a non-empty body with `hostname`, `alarm`, and `status` fields populated.
+
+## 5. Claim to Netdata Cloud
+
+Get the claim command from **Netdata Cloud → Space Settings → Nodes → Add Nodes**. It will look like:
+
+```bash
+wget -O /tmp/netdata-kickstart.sh https://get.netdata.cloud/kickstart.sh
+sh /tmp/netdata-kickstart.sh --stable-channel \
+  --claim-token <token> \
+  --claim-rooms <room-id> \
+  --claim-url https://app.netdata.cloud
+```
+
+Verify the claim was accepted:
+
+```bash
+cat /var/lib/netdata/cloud.d/claimed_id
+```
+
+A UUID will be present if claimed successfully. The node should appear in Netdata Cloud within ~60 seconds.
+
+## 6. Verify Alerts
+
+Check that no unexpected alerts are active after setup:
+
+```bash
+curl -s 'http://localhost:19999/api/v1/alarms?active' | python3 -c "
+import sys, json
+d = json.load(sys.stdin)
+active = [v for v in d.get('alarms', {}).values() if v.get('status') not in ('CLEAR', 'UNINITIALIZED', 'UNDEFINED')]
+print(f'{len(active)} active alert(s)')
+for v in active:
+    print(f'  [{v[\"status\"]}] {v[\"name\"]} on {v[\"chart\"]}')
+"
+```
+
+## Fleet-wide Alert Check
+
+To audit all servers at once (requires Tailscale SSH access):
+
+```bash
+for host in majorlab majorhome majormail majordiscord majortoot majorlinux tttpod dca teelia; do
+  echo "=== $host ==="
+  ssh root@$host "curl -s 'http://localhost:19999/api/v1/alarms?active' | python3 -c \
+    \"import sys,json; d=json.load(sys.stdin); active=[v for v in d.get('alarms',{}).values() if v.get('status') not in ('CLEAR','UNINITIALIZED','UNDEFINED')]; print(str(len(active))+' active')\""
+done
+```
+
+## Fleet-wide jq Audit
+
+To check that all servers with `custom_sender` have `jq` installed:
+
+```bash
+for host in majorlab majorhome majormail majordiscord majortoot majorlinux tttpod dca teelia; do
+  echo -n "=== $host: "
+  ssh -o ConnectTimeout=5 root@$host \
+    'has_cs=$(grep -l "custom_sender\|n8n.majorshouse.com" /etc/netdata/health_alarm_notify.conf 2>/dev/null | wc -l); has_jq=$(which jq 2>/dev/null && echo yes || echo NO); echo "custom_sender=$has_cs jq=$has_jq"'
+done
+```
+
+Any server showing `custom_sender=1 jq=NO` needs `apt install -y jq` immediately.
+
+## Related
+
+- [Tuning Netdata Web Log Alerts](tuning-netdata-web-log-alerts.md)
+- [Tuning Netdata Docker Health Alarms](netdata-docker-health-alarm-tuning.md)
--- a/02-selfhosting/monitoring/netdata-selinux-avc-chart.md
+++ b/02-selfhosting/monitoring/netdata-selinux-avc-chart.md
@@ -0,0 +1,137 @@
+---
+title: "Netdata SELinux AVC Denial Monitoring"
+domain: selfhosting
+category: monitoring
+tags: [netdata, selinux, fedora, monitoring, ausearch, charts.d]
+status: published
+created: 2026-03-27
+updated: 2026-03-27
+---
+
+# Netdata SELinux AVC Denial Monitoring
+
+A custom `charts.d` plugin that tracks SELinux AVC denials over time via Netdata. Deployed on all Fedora boxes in the fleet where SELinux is Enforcing.
+
+## What It Does
+
+The plugin runs `ausearch -m avc` every 60 seconds and reports the count of AVC denial events from the last 10 minutes. This gives a real-time chart in Netdata Cloud showing SELinux denial spikes — useful for catching misconfigurations after service changes or package updates.
+
+## Where It's Deployed
+
+| Host | OS | SELinux | Chart Installed |
+|------|----|---------|-----------------|
+| majorhome | Fedora 43 | Enforcing | Yes |
+| majorlab | Fedora 43 | Enforcing | Yes |
+| majormail | Fedora 43 | Enforcing | Yes |
+| majordiscord | Fedora 43 | Enforcing | Yes |
+
+Ubuntu hosts (dca, teelia, tttpod, majortoot, majorlinux) do not run SELinux and do not have this chart.
+
+## Installation
+
+### 1. Create the Chart Plugin
+
+Create `/etc/netdata/charts.d/selinux.chart.sh`:
+
+```bash
+cat > /etc/netdata/charts.d/selinux.chart.sh << 'EOF'
+# SELinux AVC denial counter for Netdata charts.d
+selinux_update_every=60
+selinux_priority=90000
+
+selinux_check() {
+    which ausearch >/dev/null 2>&1 || return 1
+    return 0
+}
+
+selinux_create() {
+    cat <<CHART
+CHART selinux.avc_denials '' 'SELinux AVC Denials (last 10 min)' 'denials' selinux '' line 90000 $selinux_update_every ''
+DIMENSION denials '' absolute 1 1
+CHART
+    return 0
+}
+
+selinux_update() {
+    local count
+    count=$(sudo /usr/bin/ausearch -m avc -if /var/log/audit/audit.log -ts recent 2>/dev/null | grep -c "type=AVC")
+    echo "BEGIN selinux.avc_denials $1"
+    echo "SET denials = ${count}"
+    echo "END"
+    return 0
+}
+EOF
+```
+
+### 2. Grant Netdata Sudo Access to ausearch
+
+`ausearch` requires root to read the audit log. Add a sudoers entry for the `netdata` user:
+
+```bash
+echo 'netdata ALL=(root) NOPASSWD: /usr/bin/ausearch -m avc -if /var/log/audit/audit.log -ts recent' > /etc/sudoers.d/netdata-selinux
+chmod 440 /etc/sudoers.d/netdata-selinux
+visudo -c
+```
+
+The `visudo -c` validates syntax. If it reports errors, fix the file before proceeding — a broken sudoers file can lock out sudo entirely.
+
+### 3. Restart Netdata
+
+```bash
+systemctl restart netdata
+```
+
+### 4. Verify
+
+Check that the chart is collecting data:
+
+```bash
+curl -s 'http://localhost:19999/api/v1/chart?chart=selinux.avc_denials' | python3 -c "
+import sys, json
+d = json.load(sys.stdin)
+print(f'Chart: {d[\"id\"]}')
+print(f'Update every: {d[\"update_every\"]}s')
+print(f'Type: {d[\"chart_type\"]}')
+"
+```
+
+If the chart doesn't appear, check that `charts.d` is enabled in `/etc/netdata/netdata.conf` and that the plugin file is readable by the `netdata` user.
+
+## Known Side Effect: pam_systemd Log Noise
+
+Because the `netdata` user calls `sudo ausearch` every 60 seconds, `pam_systemd` logs a warning each time:
+
+```
+pam_systemd(sudo:session): Failed to check if /run/user/0/bus exists, ignoring: Permission denied
+```
+
+This is cosmetic. The `sudo` command succeeds — `pam_systemd` just can't find a D-Bus user session for the `netdata` service account, which is expected. The message volume scales with the collection interval (1,440/day at 60-second intervals).
+
+**To suppress it**, the `system-auth` PAM config on Fedora already marks `pam_systemd.so` as `-session optional` (the `-` prefix means "don't fail if the module errors"). The messages are informational log noise, not actual failures. No PAM changes are needed.
+
+If the log volume is a concern for log analysis or monitoring, filter it at the journald level:
+
+```ini
+# /etc/rsyslog.d/suppress-pam-systemd.conf
+:msg, contains, "pam_systemd(sudo:session): Failed to check" stop
+```
+
+Or in Netdata's log alert config, exclude the pattern from any log-based alerts.
+
+## Fleet Audit
+
+To verify the chart is deployed and functioning on all Fedora hosts:
+
+```bash
+for host in majorhome majorlab majormail majordiscord; do
+  echo -n "=== $host: "
+  ssh root@$host "curl -s 'http://localhost:19999/api/v1/chart?chart=selinux.avc_denials' 2>/dev/null | python3 -c 'import sys,json; d=json.load(sys.stdin); print(d[\"id\"], \"every\", str(d[\"update_every\"])+\"s\")' 2>/dev/null || echo 'NOT FOUND'"
+done
+```
+
+## Related
+
+- [Deploying Netdata to a New Server](netdata-new-server-setup.md)
+- [Tuning Netdata Web Log Alerts](tuning-netdata-web-log-alerts.md)
+- [Tuning Netdata Docker Health Alarms](netdata-docker-health-alarm-tuning.md)
+- [SELinux: Fixing Dovecot Mail Spool Context](/05-troubleshooting/selinux-dovecot-vmail-context.md)