wiki: add SELinux AVC chart, enriched alerts, new server setup, and pending articles; update indexes

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-27 03:34:33 -04:00
parent 38fe720e63
commit fb2e3f6168
18 changed files with 881 additions and 15 deletions
--- a/_drafts/.keep
+++ b/_drafts/.keep
--- a/01-linux/distro-specific/wsl2-rebuild-fedora43-training-env.md
+++ b/01-linux/distro-specific/wsl2-rebuild-fedora43-training-env.md
@@ -154,7 +154,7 @@ alias majorlab='ssh root@100.86.14.126'
 alias majormail='ssh root@100.84.165.52'
 alias teelia='ssh root@100.120.32.69'
 alias tttpod='ssh root@100.84.42.102'
-alias majorrig='ssh -p 2222 majorlinux@100.98.47.29'
+alias majorrig='ssh majorlinux@100.98.47.29'  # port 2222 retired 2026-03-25, fleet uses port 22
 # DNF5
 alias update='sudo dnf upgrade --refresh'
--- a/02-selfhosting/dns-networking/network-overview.md
+++ b/02-selfhosting/dns-networking/network-overview.md
--- a/02-selfhosting/docker/docker-healthchecks.md
+++ b/02-selfhosting/docker/docker-healthchecks.md
@@ -0,0 +1,157 @@
 ---
 title: "Docker Healthchecks"
 domain: selfhosting
 category: docker
 tags: [docker, healthcheck, monitoring, uptime-kuma, compose]
 status: published
 created: 2026-03-23
 updated: 2026-03-23
 ---
 # Docker Healthchecks
 A Docker healthcheck tells the daemon (and any monitoring tool) whether a container is actually working — not just running. Without one, a container shows as `Up` even if the app inside is crashed, deadlocked, or waiting on a dependency.
 ## Why It Matters
 Tools like Uptime Kuma report containers without healthchecks as:
 > Container has not reported health and is currently running. As it is running, it is considered UP. Consider adding a health check for better service visibility.
 A healthcheck upgrades that to a real `(healthy)` or `(unhealthy)` status, making monitoring meaningful.
 ## Basic Syntax (docker-compose)
 ```yaml
 healthcheck:
  test: ["CMD", "wget", "-q", "--spider", "http://localhost:8080/health"]
  interval: 30s
  timeout: 10s
  retries: 3
  start_period: 30s
 ```
 | Field | Description |
 |---|---|
 | `test` | Command to run. Exit 0 = healthy, non-zero = unhealthy. |
 | `interval` | How often to run the check. |
 | `timeout` | How long to wait before marking as failed. |
 | `retries` | Failures before marking `unhealthy`. |
 | `start_period` | Grace period on startup before failures count. |
 ## Common Patterns
 ### HTTP service (wget — available in Alpine)
 ```yaml
 healthcheck:
  test: ["CMD", "wget", "-q", "--spider", "http://localhost:2368/"]
  interval: 30s
  timeout: 10s
  retries: 3
  start_period: 30s
 ```
 ### HTTP service (curl)
 ```yaml
 healthcheck:
  test: ["CMD", "curl", "-f", "http://localhost:8080/health"]
  interval: 30s
  timeout: 10s
  retries: 3
  start_period: 30s
 ```
 ### MySQL / MariaDB
 ```yaml
 healthcheck:
  test: ["CMD", "mysqladmin", "ping", "-h", "localhost", "-u", "root", "-psecret"]
  interval: 10s
  timeout: 5s
  retries: 3
  start_period: 20s
 ```
 ### PostgreSQL
 ```yaml
 healthcheck:
  test: ["CMD-SHELL", "pg_isready -U postgres"]
  interval: 10s
  timeout: 5s
  retries: 5
 ```
 ### Redis
 ```yaml
 healthcheck:
  test: ["CMD", "redis-cli", "ping"]
  interval: 10s
  timeout: 5s
  retries: 3
 ```
 ### TCP port check (no curl/wget available)
 ```yaml
 healthcheck:
  test: ["CMD-SHELL", "nc -z localhost 8080 || exit 1"]
  interval: 30s
  timeout: 5s
  retries: 3
 ```
 ## Using Healthchecks with `depends_on`
 Healthchecks enable proper startup ordering. Instead of a fixed sleep, a dependent container waits until its dependency is actually ready:
 ```yaml
 services:
  app:
    depends_on:
      db:
        condition: service_healthy
  db:
    image: mysql:8.0
    healthcheck:
      test: ["CMD", "mysqladmin", "ping", "-h", "localhost"]
      interval: 10s
      timeout: 5s
      retries: 3
      start_period: 20s
 ```
 This prevents the classic race condition where the app starts before the database is ready to accept connections.
 ## Checking Health Status
 ```bash
 # See health status in container list
 docker ps
 # Get detailed health info including last check output
 docker inspect --format='{{json .State.Health}}' <container> | jq
 ```
 ## Ghost Example
 Ghost (Alpine-based) uses `wget` rather than `curl`:
 ```yaml
 healthcheck:
  test: ["CMD", "wget", "-q", "--spider", "http://localhost:2368/ghost/api/v4/admin/site/"]
  interval: 30s
  timeout: 10s
  retries: 3
  start_period: 30s
 ```
 ## Gotchas & Notes
 - **Alpine images** don't have `curl` by default — use `wget` or install curl in the image.
 - **`start_period`** is critical for slow-starting apps (databases, JVM services). Failures during this window don't count toward `retries`.
 - **`CMD` vs `CMD-SHELL`** — use `CMD` for direct exec (no shell needed), `CMD-SHELL` when you need pipes, `&&`, or shell builtins.
 - **Uptime Kuma** will pick up Docker healthcheck status automatically when monitoring via the Docker socket — no extra config needed.
 ## See Also
 - [[debugging-broken-docker-containers]]
 - [[netdata-docker-health-alarm-tuning]]
--- a/02-selfhosting/index.md
+++ b/02-selfhosting/index.md
@@ -24,6 +24,7 @@ Guides for running your own services at home, including Docker, reverse proxies,
 - [Tuning Netdata Web Log Alerts](monitoring/tuning-netdata-web-log-alerts.md)
 - [Tuning Netdata Docker Health Alarms](monitoring/netdata-docker-health-alarm-tuning.md)
 - [Deploying Netdata to a New Server](monitoring/netdata-new-server-setup.md)
 ## Security
--- a/02-selfhosting/monitoring/netdata-docker-health-alarm-tuning.md
+++ b/02-selfhosting/monitoring/netdata-docker-health-alarm-tuning.md
@@ -5,7 +5,7 @@ category: monitoring
 tags: [netdata, docker, nextcloud, alarms, health, monitoring]
 status: published
 created: 2026-03-18
-updated: 2026-03-18
+updated: 2026-03-22
 ---
 # Tuning Netdata Docker Health Alarms to Prevent Update Flapping
@@ -40,7 +40,7 @@ component: Docker
    every: 30s
   lookup: average -5m of unhealthy
     warn: $this > 0
-    delay: down 5m multiplier 1.5 max 30m
+    delay: up 3m down 5m multiplier 1.5 max 30m
  summary: Docker container ${label:container_name} health
     info: ${label:container_name} docker container health status is unhealthy
       to: sysadmin
@@ -49,10 +49,38 @@ component: Docker
 | Setting | Default | Tuned | Effect |
 |---|---|---|---|
 | `every` | 10s | 30s | Check less frequently |
-| `lookup` | average -10s | average -5m | Must be unhealthy for sustained 5 minutes |
+| `lookup` | average -10s | average -5m | Smooths transient unhealthy samples over 5 minutes |
-| `delay` | none | down 5m (max 30m) | Grace period after recovery before clearing |
+| `delay: up 3m` | none | 3m | Won't fire until unhealthy condition persists for 3 continuous minutes |
 | `delay: down 5m` | none | 5m (max 30m) | Grace period after recovery before clearing |
-A typical Nextcloud AIO update cycle (30–90 seconds of container restarts) won't sustain 5 minutes of unhealthy status, so no alert fires. A genuinely broken container will still be caught.
+The `up` delay is the critical addition. Nextcloud AIO's `nextcloud-aio-nextcloud` container checks both PostgreSQL (port 5432) and PHP-FPM (port 9000). PHP-FPM takes ~90 seconds to warm up after a restart, causing 2–3 failing health checks before the container becomes healthy. With `delay: up 3m`, Netdata waits for 3 continuous minutes of unhealthy status before firing — absorbing the ~90 second startup window with margin to spare. A genuinely broken container will still trigger the alert.
 ## Also: Suppress `docker_container_down` for Normally-Exiting Containers
 Nextcloud AIO runs `borgbackup` (scheduled backups) and `watchtower` (auto-updates) as containers that exit with code 0 after completing their work. The stock `docker_container_down` alarm fires on any exited container, generating false alerts after every nightly cycle.
 Add a second override to the same file using `chart labels` to exclude them:
 ```ini
 # Suppress docker_container_down for Nextcloud AIO containers that exit normally
 # (borgbackup runs on schedule then exits; watchtower does updates then exits)
 template: docker_container_down
       on: docker.container_running_state
    class: Errors
     type: Containers
 component: Docker
    units: status
    every: 30s
   lookup: average -5m of down
 chart labels: container_name=!nextcloud-aio-borgbackup !nextcloud-aio-watchtower *
     warn: $this > 0
    delay: up 3m down 5m multiplier 1.5 max 30m
  summary: Docker container ${label:container_name} down
     info: ${label:container_name} docker container is down
       to: sysadmin
 ```
 The `chart labels` line uses Netdata's simple pattern syntax — `!` prefix excludes a container, `*` matches everything else. All other exited containers still alert normally.
 ## Applying the Config
@@ -74,7 +102,7 @@ In the Netdata UI, navigate to **Alerts → Manage Alerts** and search for `dock
 ## Notes
- This only overrides the `docker_container_unhealthy` alarm. The `docker_container_down` alarm (for exited containers) is left at its default — it already has a `delay: down 1m` and is disabled by default (`chart labels: container_name=!*`).
+- Both `docker_container_unhealthy` and `docker_container_down` are overridden in this config. Any container not explicitly excluded in the `chart labels` filter will still alert normally.
 - If you want per-container silencing instead of a blanket delay, use the `host labels` or `chart labels` filter to scope the alarm to specific containers.
 - Config volume path on majorlab: `/var/lib/docker/volumes/netdata_netdataconfig/_data/`
--- a/02-selfhosting/monitoring/netdata-n8n-enriched-alerts.md
+++ b/02-selfhosting/monitoring/netdata-n8n-enriched-alerts.md
@@ -0,0 +1,159 @@
 # Netdata → n8n Enriched Alert Emails
 **Status:** Live across all MajorsHouse fleet servers as of 2026-03-21
 Replaces Netdata's plain-text alert emails with rich HTML emails that include a plain-English explanation, a suggested remediation command, and a direct link to the relevant MajorWiki article.
 ---
 ## How It Works
 ```
 Netdata alarm fires
  → custom_sender() in health_alarm_notify.conf
    → POST JSON payload to n8n webhook
      → Code node enriches with suggestion + wiki link
        → Send Email node sends HTML email via SMTP
          → Respond node returns 200 OK
 ```
 ---
 ## n8n Workflow
 **Name:** Netdata Enriched Alerts  
 **URL:** https://n8n.majorshouse.com  
 **Webhook endpoint:** `POST https://n8n.majorshouse.com/webhook/netdata-alert`  
 **Workflow ID:** `a1b2c3d4-aaaa-bbbb-cccc-000000000001`
 ### Nodes
 1. **Netdata Webhook** — receives POST from Netdata's `custom_sender()`
 2. **Enrich Alert** — Code node; matches alarm/chart/family to enrichment table, builds HTML email body in `$json.emailBody`
 3. **Send Enriched Email** — sends via SMTP port 465 (SMTP account 2), from `netdata@majorshouse.com` to `marcus@majorshouse.com`
 4. **Respond OK** — returns `ok` with HTTP 200 to Netdata
 ### Enrichment Keys
 The Code node matches on `alarm`, `chart`, or `family` field (case-insensitive substring):
 | Key | Title | Wiki Article | Notes |
 |-----|-------|-------------|-------|
 | `disk_space` | Disk Space Alert | snapraid-mergerfs-setup | |
 | `ram` | Memory Alert | managing-linux-services-systemd-ansible | |
 | `cpu` | CPU Alert | managing-linux-services-systemd-ansible | |
 | `load` | Load Average Alert | managing-linux-services-systemd-ansible | |
 | `net` | Network Alert | tailscale-homelab-remote-access | |
 | `docker` | Docker Container Alert | debugging-broken-docker-containers | |
 | `web_log` | Web Log Alert | tuning-netdata-web-log-alerts | Hostname-aware suggestion (see below) |
 | `health` | Docker Health Alarm | netdata-docker-health-alarm-tuning | |
 | `mdstat` | RAID Array Alert | mdadm-usb-hub-disconnect-recovery | |
 | `systemd` | Systemd Service Alert | docker-caddy-selinux-post-reboot-recovery | |
 | _(no match)_ | Server Alert | netdata-new-server-setup | |
 > [!info] web_log hostname-aware suggestion (updated 2026-03-24)
 > The `web_log` suggestion branches on `hostname` in the Code node:
 > - **`majorlab`** → Check `docker logs caddy` (Caddy reverse proxy)
 > - **`teelia`, `majorlinux`, `dca`** → Check Apache logs + Fail2ban jail status
 > - **other** → Generic web server log guidance
 ---
 ## Netdata Configuration
 ### Config File Locations
 | Server | Path |
 |--------|------|
 | majorhome, majormail, majordiscord, tttpod, teelia | `/etc/netdata/health_alarm_notify.conf` |
 | majorlinux, majortoot, dca | `/usr/lib/netdata/conf.d/health_alarm_notify.conf` |
 ### Required Settings
 ```bash
 DEFAULT_RECIPIENT_CUSTOM="n8n"
 role_recipients_custom[sysadmin]="${DEFAULT_RECIPIENT_CUSTOM}"
 ```
 ### custom_sender() Function
 ```bash
 custom_sender() {
    local to="${1}"
    local payload
    payload=$(jq -n \
        --arg hostname "${host}" \
        --arg alarm "${name}" \
        --arg chart "${chart}" \
        --arg family "${family}" \
        --arg status "${status}" \
        --arg old_status "${old_status}" \
        --arg value "${value_string}" \
        --arg units "${units}" \
        --arg info "${info}" \
        --arg alert_url "${goto_url}" \
        --arg severity "${severity}" \
        --arg raised_for "${raised_for}" \
        --arg total_warnings "${total_warnings}" \
        --arg total_critical "${total_critical}" \
        '{hostname:$hostname,alarm:$alarm,chart:$chart,family:$family,status:$status,old_status:$old_status,value:$value,units:$units,info:$info,alert_url:$alert_url,severity:$severity,raised_for:$raised_for,total_warnings:$total_warnings,total_critical:$total_critical}')
    local httpcode
    httpcode=$(docurl -s -o /dev/null -w "%{http_code}" \
        -X POST \
        -H "Content-Type: application/json" \
        -d "${payload}" \
        "https://n8n.majorshouse.com/webhook/netdata-alert")
    if [ "${httpcode}" = "200" ]; then
        info "sent enriched notification to n8n for ${status} of ${host}.${name}"
        sent=$((sent + 1))
    else
        error "failed to send notification to n8n, HTTP code: ${httpcode}"
    fi
 }
 ```
 !!! note "jq required"
    The `custom_sender()` function requires `jq` to be installed. Verify with `which jq` on each server.
 ---
 ## Deploying to a New Server
 ```bash
 # 1. Find the config file
 find /etc/netdata /usr/lib/netdata -name health_alarm_notify.conf 2>/dev/null
 # 2. Edit it — add the two lines and the custom_sender() function above
 # 3. Test connectivity from the server
 curl -s -o /dev/null -w "%{http_code}" \
  -X POST https://n8n.majorshouse.com/webhook/netdata-alert \
  -H "Content-Type: application/json" \
  -d '{"hostname":"test","alarm":"disk_space._","status":"WARNING"}'
 # Expected: 200
 # 4. Restart Netdata
 systemctl restart netdata
 # 5. Send a test alarm
 /usr/libexec/netdata/plugins.d/alarm-notify.sh test custom
 ```
 ---
 ## Troubleshooting
 **Emails not arriving — check n8n execution log:**  
 Go to https://n8n.majorshouse.com → open "Netdata Enriched Alerts" → Executions tab. Look for `error` status entries.
 **Email body empty:**  
 The Send Email node's HTML field must be `={{ $json.emailBody }}`. Shell variable expansion can silently strip `$json` if the workflow is patched via inline SSH commands — always use a Python script file.
 **`000` curl response from a server:**  
 Usually a timeout, not a DNS or connection failure. Re-test with `--max-time 30`.
 **`custom_sender()` syntax error in Netdata logs:**  
 Bash heredocs don't work inside sourced config files. Use `jq -n --arg ...` as shown above — no heredocs.
 **n8n `N8N_TRUST_PROXY` must be set:**  
 Without `N8N_TRUST_PROXY=true` in the Docker environment, Caddy's `X-Forwarded-For` header causes n8n's rate limiter to abort requests before parsing the body. Set in `/opt/n8n/compose.yml`.
--- a/02-selfhosting/monitoring/netdata-new-server-setup.md
+++ b/02-selfhosting/monitoring/netdata-new-server-setup.md
@@ -0,0 +1,161 @@
 ---
 title: "Deploying Netdata to a New Server"
 domain: selfhosting
 category: monitoring
 tags: [netdata, monitoring, email, notifications, netdata-cloud, ubuntu, debian, n8n]
 status: published
 created: 2026-03-18
 updated: 2026-03-22
 ---
 # Deploying Netdata to a New Server
 This covers the full Netdata setup for a new server in the fleet: install, email notification config, n8n webhook integration, and Netdata Cloud claim. Applies to Ubuntu/Debian servers.
 ## 1. Install Prerequisites
 Install `jq` before anything else. It is required by the `custom_sender()` function in `health_alarm_notify.conf` to build the JSON payload sent to the n8n webhook. **If `jq` is missing, the webhook will fire with an empty body and n8n alert emails will have no information in them.**
 ```bash
 apt install -y jq
 ```
 Verify:
 ```bash
 jq --version
 ```
 ## 2. Install Netdata
 Use the official kickstart script:
 ```bash
 wget -O /tmp/netdata-install.sh https://get.netdata.cloud/kickstart.sh
 sh /tmp/netdata-install.sh --non-interactive --stable-channel --disable-telemetry
 ```
 Verify it's running:
 ```bash
 systemctl is-active netdata
 curl -s http://localhost:19999/api/v1/info | python3 -c "import sys,json; d=json.load(sys.stdin); print('Netdata', d['version'])"
 ```
 ## 3. Configure Email Notifications
 Copy the default config and set the three required values:
 ```bash
 cp /usr/lib/netdata/conf.d/health_alarm_notify.conf /etc/netdata/health_alarm_notify.conf
 ```
 Edit `/etc/netdata/health_alarm_notify.conf`:
 ```ini
 EMAIL_SENDER="netdata@majorshouse.com"
 SEND_EMAIL="YES"
 DEFAULT_RECIPIENT_EMAIL="marcus@majorshouse.com"
 ```
 Or apply with `sed` in one shot:
 ```bash
 sed -i 's/^#\?EMAIL_SENDER=.*/EMAIL_SENDER="netdata@majorshouse.com"/' /etc/netdata/health_alarm_notify.conf
 sed -i 's/^#\?SEND_EMAIL=.*/SEND_EMAIL="YES"/' /etc/netdata/health_alarm_notify.conf
 sed -i 's/^#\?DEFAULT_RECIPIENT_EMAIL=.*/DEFAULT_RECIPIENT_EMAIL="marcus@majorshouse.com"/' /etc/netdata/health_alarm_notify.conf
 ```
 Restart and test:
 ```bash
 systemctl restart netdata
 /usr/libexec/netdata/plugins.d/alarm-notify.sh test 2>&1 | grep -E '(OK|FAILED|email)'
 ```
 You should see three `# OK` lines (WARNING → CRITICAL → CLEAR test cycle) and confirmation that email was sent to `marcus@majorshouse.com`.
 > [!note] Delivery via local Postfix
 > Email is relayed through the server's local Postfix instance. Ensure Postfix is installed and `/usr/sbin/sendmail` resolves.
 ## 4. Configure n8n Webhook Notifications
 Copy the `health_alarm_notify.conf` from an existing server (e.g. majormail) which contains the `custom_sender()` function. This sends enriched JSON payloads to the n8n webhook at `https://n8n.majorshouse.com/webhook/netdata-alert`.
 > [!warning] jq required
 > The `custom_sender()` function uses `jq` to build the JSON payload. If `jq` is not installed, `payload` will be empty, curl will send `Content-Length: 0`, and n8n will produce alert emails with `Host: unknown`, blank alert/value fields, and `Status: UNKNOWN`. Always install `jq` first (Step 1).
 After deploying the config, run a test to confirm the webhook fires correctly:
 ```bash
 systemctl restart netdata
 /usr/libexec/netdata/plugins.d/alarm-notify.sh test 2>&1 | grep -E '(custom|n8n|OK|FAILED)'
 ```
 Verify in n8n that the latest execution shows a non-empty body with `hostname`, `alarm`, and `status` fields populated.
 ## 5. Claim to Netdata Cloud
 Get the claim command from **Netdata Cloud → Space Settings → Nodes → Add Nodes**. It will look like:
 ```bash
 wget -O /tmp/netdata-kickstart.sh https://get.netdata.cloud/kickstart.sh
 sh /tmp/netdata-kickstart.sh --stable-channel \
  --claim-token <token> \
  --claim-rooms <room-id> \
  --claim-url https://app.netdata.cloud
 ```
 Verify the claim was accepted:
 ```bash
 cat /var/lib/netdata/cloud.d/claimed_id
 ```
 A UUID will be present if claimed successfully. The node should appear in Netdata Cloud within ~60 seconds.
 ## 6. Verify Alerts
 Check that no unexpected alerts are active after setup:
 ```bash
 curl -s 'http://localhost:19999/api/v1/alarms?active' | python3 -c "
 import sys, json
 d = json.load(sys.stdin)
 active = [v for v in d.get('alarms', {}).values() if v.get('status') not in ('CLEAR', 'UNINITIALIZED', 'UNDEFINED')]
 print(f'{len(active)} active alert(s)')
 for v in active:
    print(f'  [{v[\"status\"]}] {v[\"name\"]} on {v[\"chart\"]}')
 "
 ```
 ## Fleet-wide Alert Check
 To audit all servers at once (requires Tailscale SSH access):
 ```bash
 for host in majorlab majorhome majormail majordiscord majortoot majorlinux tttpod dca teelia; do
  echo "=== $host ==="
  ssh root@$host "curl -s 'http://localhost:19999/api/v1/alarms?active' | python3 -c \
    \"import sys,json; d=json.load(sys.stdin); active=[v for v in d.get('alarms',{}).values() if v.get('status') not in ('CLEAR','UNINITIALIZED','UNDEFINED')]; print(str(len(active))+' active')\""
 done
 ```
 ## Fleet-wide jq Audit
 To check that all servers with `custom_sender` have `jq` installed:
 ```bash
 for host in majorlab majorhome majormail majordiscord majortoot majorlinux tttpod dca teelia; do
  echo -n "=== $host: "
  ssh -o ConnectTimeout=5 root@$host \
    'has_cs=$(grep -l "custom_sender\|n8n.majorshouse.com" /etc/netdata/health_alarm_notify.conf 2>/dev/null | wc -l); has_jq=$(which jq 2>/dev/null && echo yes || echo NO); echo "custom_sender=$has_cs jq=$has_jq"'
 done
 ```
 Any server showing `custom_sender=1 jq=NO` needs `apt install -y jq` immediately.
 ## Related
 - [Tuning Netdata Web Log Alerts](tuning-netdata-web-log-alerts.md)
 - [Tuning Netdata Docker Health Alarms](netdata-docker-health-alarm-tuning.md)
--- a/02-selfhosting/monitoring/netdata-selinux-avc-chart.md
+++ b/02-selfhosting/monitoring/netdata-selinux-avc-chart.md
@@ -0,0 +1,137 @@
 ---
 title: "Netdata SELinux AVC Denial Monitoring"
 domain: selfhosting
 category: monitoring
 tags: [netdata, selinux, fedora, monitoring, ausearch, charts.d]
 status: published
 created: 2026-03-27
 updated: 2026-03-27
 ---
 # Netdata SELinux AVC Denial Monitoring
 A custom `charts.d` plugin that tracks SELinux AVC denials over time via Netdata. Deployed on all Fedora boxes in the fleet where SELinux is Enforcing.
 ## What It Does
 The plugin runs `ausearch -m avc` every 60 seconds and reports the count of AVC denial events from the last 10 minutes. This gives a real-time chart in Netdata Cloud showing SELinux denial spikes — useful for catching misconfigurations after service changes or package updates.
 ## Where It's Deployed
 | Host | OS | SELinux | Chart Installed |
 |------|----|---------|-----------------|
 | majorhome | Fedora 43 | Enforcing | Yes |
 | majorlab | Fedora 43 | Enforcing | Yes |
 | majormail | Fedora 43 | Enforcing | Yes |
 | majordiscord | Fedora 43 | Enforcing | Yes |
 Ubuntu hosts (dca, teelia, tttpod, majortoot, majorlinux) do not run SELinux and do not have this chart.
 ## Installation
 ### 1. Create the Chart Plugin
 Create `/etc/netdata/charts.d/selinux.chart.sh`:
 ```bash
 cat > /etc/netdata/charts.d/selinux.chart.sh << 'EOF'
 # SELinux AVC denial counter for Netdata charts.d
 selinux_update_every=60
 selinux_priority=90000
 selinux_check() {
    which ausearch >/dev/null 2>&1 || return 1
    return 0
 }
 selinux_create() {
    cat <<CHART
 CHART selinux.avc_denials '' 'SELinux AVC Denials (last 10 min)' 'denials' selinux '' line 90000 $selinux_update_every ''
 DIMENSION denials '' absolute 1 1
 CHART
    return 0
 }
 selinux_update() {
    local count
    count=$(sudo /usr/bin/ausearch -m avc -if /var/log/audit/audit.log -ts recent 2>/dev/null | grep -c "type=AVC")
    echo "BEGIN selinux.avc_denials $1"
    echo "SET denials = ${count}"
    echo "END"
    return 0
 }
 EOF
 ```
 ### 2. Grant Netdata Sudo Access to ausearch
 `ausearch` requires root to read the audit log. Add a sudoers entry for the `netdata` user:
 ```bash
 echo 'netdata ALL=(root) NOPASSWD: /usr/bin/ausearch -m avc -if /var/log/audit/audit.log -ts recent' > /etc/sudoers.d/netdata-selinux
 chmod 440 /etc/sudoers.d/netdata-selinux
 visudo -c
 ```
 The `visudo -c` validates syntax. If it reports errors, fix the file before proceeding — a broken sudoers file can lock out sudo entirely.
 ### 3. Restart Netdata
 ```bash
 systemctl restart netdata
 ```
 ### 4. Verify
 Check that the chart is collecting data:
 ```bash
 curl -s 'http://localhost:19999/api/v1/chart?chart=selinux.avc_denials' | python3 -c "
 import sys, json
 d = json.load(sys.stdin)
 print(f'Chart: {d[\"id\"]}')
 print(f'Update every: {d[\"update_every\"]}s')
 print(f'Type: {d[\"chart_type\"]}')
 "
 ```
 If the chart doesn't appear, check that `charts.d` is enabled in `/etc/netdata/netdata.conf` and that the plugin file is readable by the `netdata` user.
 ## Known Side Effect: pam_systemd Log Noise
 Because the `netdata` user calls `sudo ausearch` every 60 seconds, `pam_systemd` logs a warning each time:
 ```
 pam_systemd(sudo:session): Failed to check if /run/user/0/bus exists, ignoring: Permission denied
 ```
 This is cosmetic. The `sudo` command succeeds — `pam_systemd` just can't find a D-Bus user session for the `netdata` service account, which is expected. The message volume scales with the collection interval (1,440/day at 60-second intervals).
 **To suppress it**, the `system-auth` PAM config on Fedora already marks `pam_systemd.so` as `-session optional` (the `-` prefix means "don't fail if the module errors"). The messages are informational log noise, not actual failures. No PAM changes are needed.
 If the log volume is a concern for log analysis or monitoring, filter it at the journald level:
 ```ini
 # /etc/rsyslog.d/suppress-pam-systemd.conf
 :msg, contains, "pam_systemd(sudo:session): Failed to check" stop
 ```
 Or in Netdata's log alert config, exclude the pattern from any log-based alerts.
 ## Fleet Audit
 To verify the chart is deployed and functioning on all Fedora hosts:
 ```bash
 for host in majorhome majorlab majormail majordiscord; do
  echo -n "=== $host: "
  ssh root@$host "curl -s 'http://localhost:19999/api/v1/chart?chart=selinux.avc_denials' 2>/dev/null | python3 -c 'import sys,json; d=json.load(sys.stdin); print(d[\"id\"], \"every\", str(d[\"update_every\"])+\"s\")' 2>/dev/null || echo 'NOT FOUND'"
 done
 ```
 ## Related
 - [Deploying Netdata to a New Server](netdata-new-server-setup.md)
 - [Tuning Netdata Web Log Alerts](tuning-netdata-web-log-alerts.md)
 - [Tuning Netdata Docker Health Alarms](netdata-docker-health-alarm-tuning.md)
 - [SELinux: Fixing Dovecot Mail Spool Context](/05-troubleshooting/selinux-dovecot-vmail-context.md)
--- a/05-troubleshooting/ansible-vault-password-file-missing.md
+++ b/05-troubleshooting/ansible-vault-password-file-missing.md
@@ -0,0 +1,59 @@
 # Ansible: Vault Password File Not Found
 ## Error
 ```
 [WARNING]: Error getting vault password file (default): The vault password file /Users/majorlinux/.ansible/vault_pass was not found
 [ERROR]: The vault password file /Users/majorlinux/.ansible/vault_pass was not found
 ```
 ## Cause
 Ansible is configured to look for a vault password file at `~/.ansible/vault_pass`, but the file does not exist. This is typically set in `ansible.cfg` via the `vault_password_file` directive.
 ## Solutions
 ### Option 1: Remove the vault config (if you're not using Vault)
 Check your `ansible.cfg` for this line and remove it if Vault is not needed:
 ```ini
 [defaults]
 vault_password_file = ~/.ansible/vault_pass
 ```
 ### Option 2: Create the vault password file
 ```bash
 echo 'your_vault_password' > ~/.ansible/vault_pass
 chmod 600 ~/.ansible/vault_pass
 ```
 > **Security note:** Keep permissions tight (`600`) so only your user can read the file. The actual vault password is stored in Bitwarden under the "Ansible Vault Password" entry.
 ### Option 3: Pass the password at runtime (no file needed)
 ```bash
 ansible-playbook test.yml --ask-vault-pass
 ```
 ## Diagnosing the Source of the Config
 To find which config file is setting `vault_password_file`, run:
 ```bash
 ansible-config dump --only-changed
 ```
 This shows all non-default config values and their source files. Config is loaded in this order of precedence:
 1. `ANSIBLE_CONFIG` environment variable
 2. `./ansible.cfg` (current directory)
 3. `~/.ansible.cfg`
 4. `/etc/ansible/ansible.cfg`
 ## Related
 - [Ansible Getting Started](../01-linux/shell-scripting/ansible-getting-started.md)
 - Vault password is stored in Bitwarden under **"Ansible Vault Password"**
 - Ansible playbooks live at `~/MajorAnsible` on MajorAir/MajorMac
--- a/05-troubleshooting/index.md
+++ b/05-troubleshooting/index.md
@@ -9,6 +9,7 @@ Practical fixes for common Linux, networking, and application problems.
 - [Apache Outage: Fail2ban Self-Ban + Missing iptables Rules](networking/fail2ban-self-ban-apache-outage.md)
 - [Mail Client Stops Receiving: Fail2ban IMAP Self-Ban](networking/fail2ban-imap-self-ban-mail-client.md)
 - [firewalld: Mail Ports Wiped After Reload](networking/firewalld-mail-ports-reset.md)
 - [Tailscale SSH: Unexpected Re-Authentication Prompt](networking/tailscale-ssh-reauth-prompt.md)
 - [ISP SNI Filtering & Caddy](isp-sni-filtering-caddy.md)
 - [yt-dlp YouTube JS Challenge Fix](yt-dlp-fedora-js-challenge.md)
--- a/05-troubleshooting/networking/tailscale-ssh-reauth-prompt.md
+++ b/05-troubleshooting/networking/tailscale-ssh-reauth-prompt.md
@@ -0,0 +1,66 @@
 # Tailscale SSH: Unexpected Re-Authentication Prompt
 If a Tailscale SSH connection unexpectedly presents a browser authentication URL mid-session, the first instinct is to check the ACL policy. However, this is often a one-off Tailscale hiccup rather than a misconfiguration.
 ## Symptoms
 - SSH connection to a fleet node displays a Tailscale auth URL:
  ```
  To authenticate, visit: https://login.tailscale.com/a/xxxxxxxx
  ```
 - The prompt appears even though the node worked fine previously
 - Other nodes in the fleet connect without prompting
 ## What Causes It
 Tailscale SSH supports two ACL `action` values:
 | Action | Behavior |
 |---|---|
 | `accept` | Trusts Tailscale identity — no additional auth required |
 | `check` | Requires periodic browser-based re-authentication |
 If `action: "check"` is set, every session (or after token expiry) will prompt for browser auth. However, even with `action: "accept"`, a one-off prompt can appear due to a Tailscale daemon glitch or key refresh event.
 ## How to Diagnose
 ### 1. Verify the ACL policy
 In the Tailscale admin console (or via `tailscale debug acl`), inspect the SSH rules. For a trusted homelab fleet, the rule should use `accept`:
 ```json
 {
    "src":    ["autogroup:member"],
    "dst":    ["autogroup:self"],
    "users":  ["autogroup:nonroot", "root"],
    "action": "accept",
 }
 ```
 If `action` is `check`, that is the root cause — change it to `accept` for trusted source/destination pairs.
 ### 2. Confirm it was a one-off
 If the ACL already shows `accept`, the prompt was transient. Test with:
 ```bash
 ssh <hostname> "echo ok"
 ```
 No auth prompt + `ok` output = resolved. Note that this test is only meaningful if the previous session's auth token has expired, or you test from a different device that hasn't recently authenticated.
 ## Fix
 **If ACL shows `check`:** Change to `accept` in the Tailscale admin console under Access Controls. Takes effect immediately — no server changes needed.
 **If ACL already shows `accept`:** No action required. The prompt was a one-off Tailscale event (daemon restart, key refresh, etc.). Monitor for recurrence.
 ## Notes
 - ~~Port 2222 on **MajorRig** previously existed as a hard bypass for Tailscale SSH browser auth. This workaround was retired on 2026-03-25 after the Tailscale SSH authentication issue was resolved. The entire fleet now uses port 22 uniformly.~~
 - The `autogroup:self` destination means the rule applies when connecting from your own devices to your own devices — appropriate for a personal homelab fleet.
 ## Related
 - [[Network Overview]] — Tailscale fleet inventory and SSH access model
 - [[SSH-Aliases]] — Fleet SSH access shortcuts
--- a/05-troubleshooting/networking/windows-sshd-stops-after-reboot.md
+++ b/05-troubleshooting/networking/windows-sshd-stops-after-reboot.md
@@ -48,7 +48,7 @@ The Windows OpenSSH Server is installed as a Windows Feature (`Add-WindowsCapabi
 - **This is a Windows-side issue** — WSL2 itself is unaffected. The service must be started and configured from Windows, not from within WSL2.
 - **Elevated PowerShell required** — `Start-Service` and `Set-Service` for sshd will return "Access is denied" if run without Administrator privileges.
- **Port 2222 is also affected** — both the standard port 22 and the bypass port 2222 on MajorRig are served by the same `sshd` service.
+- **Port 2222 was retired (2026-03-25)** — the bypass port 2222 on MajorRig is no longer in use. The entire fleet now uses port 22 uniformly after the Tailscale SSH auth fix. Only port 22 needs to be verified when troubleshooting sshd.
 - **Default shell still works once fixed** — MajorRig's sshd is configured to use `C:\Windows\System32\wsl.exe` as the default shell, dropping SSH sessions directly into WSL2/Bash. This config is preserved across service restarts.
 ---
--- a/05-troubleshooting/security/clamscan-cpu-spike-nice-ionice.md
+++ b/05-troubleshooting/security/clamscan-cpu-spike-nice-ionice.md
@@ -0,0 +1,73 @@
 # ClamAV Safe Scheduling on Live Servers
 Running `clamscan` unthrottled on a live server will peg CPU until completion. On a small VPS (1 vCPU), a full recursive scan can sustain 70–100% CPU for an hour or more, degrading or taking down hosted services.
 ## The Problem
 A common out-of-the-box ClamAV cron setup looks like this:
 ```cron
 0 1 * * 0 clamscan --infected --recursive / --exclude=/sys
 ```
 This runs at Linux's default scheduling priority (`nice 0`) with normal I/O priority. On a live server it will:
 - Monopolize the CPU for the scan duration
 - Cause high I/O wait, degrading web serving, databases, and other services
 - Trigger monitoring alerts (e.g., Netdata `10min_cpu_usage`)
 ## The Fix
 Throttle the scan with `nice` and `ionice`:
 ```cron
 0 1 * * 0 nice -n 19 ionice -c 3 clamscan --infected --recursive / --exclude=/sys
 ```
 | Flag | Meaning |
 |------|---------|
 | `nice -n 19` | Lowest CPU scheduling priority (range: -20 to 19) |
 | `ionice -c 3` | Idle I/O class — only uses disk when no other process needs it |
 The scan will take longer but will not impact server performance.
 ## Applying the Fix
 Edit root's crontab:
 ```bash
 crontab -e
 ```
 Or apply non-interactively:
 ```bash
 crontab -l | sed 's|clamscan|nice -n 19 ionice -c 3 clamscan|' | crontab -
 ```
 Verify:
 ```bash
 crontab -l | grep clam
 ```
 ## Diagnosing a Runaway Scan
 If CPU is already pegged, identify and kill the process:
 ```bash
 ps aux --sort=-%cpu | head -15
 # Look for clamscan
 kill <PID>
 ```
 ## Notes
 - `ionice -c 3` (Idle) requires Linux kernel ≥ 2.6.13 and CFQ/BFQ I/O scheduler. Works on most Ubuntu/Debian/Fedora systems.
 - On multi-core servers, consider also using `cpulimit` for a hard cap: `cpulimit -l 30 -- clamscan ...`
 - Always keep `--exclude=/sys` (and optionally `--exclude=/proc`, `--exclude=/dev`) to avoid scanning virtual filesystems.
 ## Related
 - [ClamAV Documentation](https://docs.clamav.net/)
 - [[02-selfhosting/security/linux-server-hardening-checklist|Linux Server Hardening Checklist]]
--- a/MajorWiki-Deploy-Status.md
+++ b/MajorWiki-Deploy-Status.md
@@ -128,7 +128,7 @@ Every time a new article is added, the following **MUST** be updated to maintain
 **Updated:** `updated: 2026-03-17`
-## Session Update — 2026-03-18
+## Session Update — 2026-03-18 (morning)
 **Article count:** 48 (was 47)
@@ -136,3 +136,12 @@ Every time a new article is added, the following **MUST** be updated to maintain
 - `02-selfhosting/monitoring/netdata-docker-health-alarm-tuning.md` — tuning docker_container_unhealthy alarm to prevent flapping during Nextcloud AIO updates
 **Updated:** `updated: 2026-03-18`
 ## Session Update — 2026-03-18 (afternoon)
 **Article count:** 49 (was 48)
 **New articles added:**
 - `02-selfhosting/monitoring/netdata-new-server-setup.md` — full Netdata deployment guide: install via kickstart.sh, email notification config, Netdata Cloud claim
 **Updated:** `updated: 2026-03-18`
--- a/README.md
+++ b/README.md
@@ -3,14 +3,14 @@
 > A growing reference of Linux, self-hosting, open source, streaming, and troubleshooting guides. Written by MajorLinux. Used by MajorTwin.
 >
 **Last updated:** 2026-03-18
-**Article count:** 48
+**Article count:** 49
 ## Domains
 | Domain | Folder | Articles |
 |---|---|---|
 | 🐧 Linux & Sysadmin | `01-linux/` | 11 |
-| 🏠 Self-Hosting & Homelab | `02-selfhosting/` | 10 |
+| 🏠 Self-Hosting & Homelab | `02-selfhosting/` | 11 |
 | 🔓 Open Source Tools | `03-opensource/` | 9 |
 | 🎙️ Streaming & Podcasting | `04-streaming/` | 2 |
 | 🔧 General Troubleshooting | `05-troubleshooting/` | 16 |
@@ -65,6 +65,7 @@
 ### Monitoring
 - [Tuning Netdata Web Log Alerts](02-selfhosting/monitoring/tuning-netdata-web-log-alerts.md) — tuning web_log_1m_redirects threshold for HTTPS-forcing servers
 - [Tuning Netdata Docker Health Alarms](02-selfhosting/monitoring/netdata-docker-health-alarm-tuning.md) — preventing false alerts during nightly Nextcloud AIO container update cycles
 - [Deploying Netdata to a New Server](02-selfhosting/monitoring/netdata-new-server-setup.md) — install, email notifications, and Netdata Cloud claim for Ubuntu/Debian servers
 ### Security
 - [Linux Server Hardening Checklist](02-selfhosting/security/linux-server-hardening-checklist.md) — non-root user, SSH key auth, sshd_config, firewall, fail2ban
@@ -129,6 +130,7 @@
 | Date | Article | Domain |
 |---|---|---|
 | 2026-03-18 | [Deploying Netdata to a New Server](02-selfhosting/monitoring/netdata-new-server-setup.md) | Self-Hosting |
 | 2026-03-18 | [Tuning Netdata Docker Health Alarms](02-selfhosting/monitoring/netdata-docker-health-alarm-tuning.md) | Self-Hosting |
 | 2026-03-17 | [Ollama Drops Off Tailscale When Mac Sleeps](05-troubleshooting/ollama-macos-sleep-tailscale-disconnect.md) | Troubleshooting |
 | 2026-03-17 | [Windows OpenSSH Server (sshd) Stops After Reboot](05-troubleshooting/networking/windows-sshd-stops-after-reboot.md) | Troubleshooting |
--- a/SUMMARY.md
+++ b/SUMMARY.md
@@ -15,11 +15,13 @@
    * [Self-Hosting Starter Guide](02-selfhosting/docker/self-hosting-starter-guide.md)
    * [Docker vs VMs for the Homelab](02-selfhosting/docker/docker-vs-vms-homelab.md)
    * [Debugging Broken Docker Containers](02-selfhosting/docker/debugging-broken-docker-containers.md)
    * [Docker Healthchecks](02-selfhosting/docker/docker-healthchecks.md)
    * [Setting Up Caddy as a Reverse Proxy](02-selfhosting/reverse-proxy/setting-up-caddy-reverse-proxy.md)
    * [Tailscale for Homelab Remote Access](02-selfhosting/dns-networking/tailscale-homelab-remote-access.md)
    * [rsync Backup Patterns](02-selfhosting/storage-backup/rsync-backup-patterns.md)
    * [Tuning Netdata Web Log Alerts](02-selfhosting/monitoring/tuning-netdata-web-log-alerts.md)
    * [Tuning Netdata Docker Health Alarms](02-selfhosting/monitoring/netdata-docker-health-alarm-tuning.md)
    * [Deploying Netdata to a New Server](02-selfhosting/monitoring/netdata-new-server-setup.md)
    * [Linux Server Hardening Checklist](02-selfhosting/security/linux-server-hardening-checklist.md)
    * [Standardizing unattended-upgrades with Ansible](02-selfhosting/security/ansible-unattended-upgrades-fleet.md)
 * [Open Source & Alternatives](03-opensource/index.md)
@@ -39,6 +41,7 @@
    * [Apache Outage: Fail2ban Self-Ban + Missing iptables Rules](05-troubleshooting/networking/fail2ban-self-ban-apache-outage.md)
    * [Mail Client Stops Receiving: Fail2ban IMAP Self-Ban](05-troubleshooting/networking/fail2ban-imap-self-ban-mail-client.md)
    * [firewalld: Mail Ports Wiped After Reload](05-troubleshooting/networking/firewalld-mail-ports-reset.md)
    * [Tailscale SSH: Unexpected Re-Authentication Prompt](05-troubleshooting/networking/tailscale-ssh-reauth-prompt.md)
    * [Docker & Caddy Recovery After Reboot (Fedora + SELinux)](05-troubleshooting/docker-caddy-selinux-post-reboot-recovery.md)
    * [ISP SNI Filtering with Caddy](05-troubleshooting/isp-sni-filtering-caddy.md)
    * [Obsidian Vault Recovery — Loading Cache Hang](05-troubleshooting/obsidian-cache-hang-recovery.md)
@@ -51,3 +54,6 @@
    * [mdadm RAID Recovery After USB Hub Disconnect](05-troubleshooting/storage/mdadm-usb-hub-disconnect-recovery.md)
    * [Windows OpenSSH Server (sshd) Stops After Reboot](05-troubleshooting/networking/windows-sshd-stops-after-reboot.md)
    * [Ollama Drops Off Tailscale When Mac Sleeps](05-troubleshooting/ollama-macos-sleep-tailscale-disconnect.md)
    * [ClamAV CPU Spike: Safe Scheduling with nice/ionice](05-troubleshooting/security/clamscan-cpu-spike-nice-ionice.md)
    * [Ansible: Vault Password File Not Found](05-troubleshooting/ansible-vault-password-file-missing.md)
--- a/index.md
+++ b/index.md
@@ -2,18 +2,20 @@
 > A growing reference of Linux, self-hosting, open source, streaming, and troubleshooting guides. Written by MajorLinux. Used by MajorTwin.
 >
-> **Last updated:** 2026-03-18
+> **Last updated:** 2026-03-23
-> **Article count:** 48
+> **Article count:** 50
 ## Domains
 | Domain | Folder | Articles |
 |---|---|---|
 | 🐧 Linux & Sysadmin | `01-linux/` | 11 |
-| 🏠 Self-Hosting & Homelab | `02-selfhosting/` | 10 |
+| 🏠 Self-Hosting & Homelab | `02-selfhosting/` | 11 |
 | 🔓 Open Source Tools | `03-opensource/` | 9 |
 | 🎙️ Streaming & Podcasting | `04-streaming/` | 2 |
-| 🔧 General Troubleshooting | `05-troubleshooting/` | 16 |
+| 🔧 General Troubleshooting | `05-troubleshooting/` | 17 |
 ---
@@ -65,6 +67,7 @@
 ### Monitoring
 - [Tuning Netdata Web Log Alerts](02-selfhosting/monitoring/tuning-netdata-web-log-alerts.md) — tuning web_log_1m_redirects threshold for HTTPS-forcing servers
 - [Tuning Netdata Docker Health Alarms](02-selfhosting/monitoring/netdata-docker-health-alarm-tuning.md) — preventing false alerts during nightly Nextcloud AIO container update cycles
 - [Deploying Netdata to a New Server](02-selfhosting/monitoring/netdata-new-server-setup.md) — install, email notifications, and Netdata Cloud claim for Ubuntu/Debian servers
 ### Security
 - [Linux Server Hardening Checklist](02-selfhosting/security/linux-server-hardening-checklist.md) — non-root user, SSH key auth, sshd_config, firewall, fail2ban
@@ -122,6 +125,8 @@
 - [mdadm RAID Recovery After USB Hub Disconnect](05-troubleshooting/storage/mdadm-usb-hub-disconnect-recovery.md) — diagnosing and recovering a failed mdadm array caused by a USB hub dropout
 - [Windows OpenSSH Server (sshd) Stops After Reboot](05-troubleshooting/networking/windows-sshd-stops-after-reboot.md) — fixing sshd not running after reboot due to Manual startup type
 - [Ollama Drops Off Tailscale When Mac Sleeps](05-troubleshooting/ollama-macos-sleep-tailscale-disconnect.md) — keeping Ollama reachable over Tailscale by disabling macOS sleep on AC power
 - [Ansible: Vault Password File Not Found](05-troubleshooting/ansible-vault-password-file-missing.md) — fixing the missing vault_pass file error when running ansible-playbook
 ---
@@ -129,6 +134,8 @@
 | Date | Article | Domain |
 |---|---|---|
 | 2026-03-23 | [Ansible: Vault Password File Not Found](05-troubleshooting/ansible-vault-password-file-missing.md) | Troubleshooting |
 | 2026-03-18 | [Deploying Netdata to a New Server](02-selfhosting/monitoring/netdata-new-server-setup.md) | Self-Hosting |
 | 2026-03-18 | [Tuning Netdata Docker Health Alarms](02-selfhosting/monitoring/netdata-docker-health-alarm-tuning.md) | Self-Hosting |
 | 2026-03-17 | [Ollama Drops Off Tailscale When Mac Sleeps](05-troubleshooting/ollama-macos-sleep-tailscale-disconnect.md) | Troubleshooting |
 | 2026-03-17 | [Windows OpenSSH Server (sshd) Stops After Reboot](05-troubleshooting/networking/windows-sshd-stops-after-reboot.md) | Troubleshooting |