wiki: add Netdata Docker health alarm tuning article; update indexes to 48

- 02-selfhosting/monitoring/netdata-docker-health-alarm-tuning.md — new - lookup extended to 5m average, delay: down 5m to prevent Nextcloud AIO update flapping - SUMMARY.md, index.md, README.md, deploy status updated Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-18 00:10:36 -04:00
parent 59a5cc530e
commit 38fe720e63
6 changed files with 104 additions and 6 deletions
--- a/02-selfhosting/index.md
+++ b/02-selfhosting/index.md
@@ -23,6 +23,7 @@ Guides for running your own services at home, including Docker, reverse proxies,
 ## Monitoring
 - [Tuning Netdata Web Log Alerts](monitoring/tuning-netdata-web-log-alerts.md)
 - [Tuning Netdata Docker Health Alarms](monitoring/netdata-docker-health-alarm-tuning.md)
 ## Security
--- a/02-selfhosting/monitoring/netdata-docker-health-alarm-tuning.md
+++ b/02-selfhosting/monitoring/netdata-docker-health-alarm-tuning.md
@@ -0,0 +1,83 @@
 ---
 title: "Tuning Netdata Docker Health Alarms to Prevent Update Flapping"
 domain: selfhosting
 category: monitoring
 tags: [netdata, docker, nextcloud, alarms, health, monitoring]
 status: published
 created: 2026-03-18
 updated: 2026-03-18
 ---
 # Tuning Netdata Docker Health Alarms to Prevent Update Flapping
 Netdata's default `docker_container_unhealthy` alarm fires on a 10-second average with no delay. When Nextcloud AIO (or any stack with a watchtower/auto-update setup) does its nightly update cycle, containers restart in sequence and briefly show as unhealthy — generating a flood of false alerts.
 ## The Default Alarm
 ```ini
 template: docker_container_unhealthy
       on: docker.container_health_status
    every: 10s
   lookup: average -10s of unhealthy
     warn: $this > 0
 ```
 A single container being unhealthy for 10 seconds triggers it. No grace period, no delay.
 ## The Fix
 Create a custom override at `/etc/netdata/health.d/docker.conf` (maps to the Netdata config volume if running in Docker). This file takes precedence over the stock config in `/usr/lib/netdata/conf.d/health.d/docker.conf`.
 ```ini
 # Custom override — reduces flapping during nightly container updates.
 template: docker_container_unhealthy
       on: docker.container_health_status
    class: Errors
     type: Containers
 component: Docker
    units: status
    every: 30s
   lookup: average -5m of unhealthy
     warn: $this > 0
    delay: down 5m multiplier 1.5 max 30m
  summary: Docker container ${label:container_name} health
     info: ${label:container_name} docker container health status is unhealthy
       to: sysadmin
 ```
 | Setting | Default | Tuned | Effect |
 |---|---|---|---|
 | `every` | 10s | 30s | Check less frequently |
 | `lookup` | average -10s | average -5m | Must be unhealthy for sustained 5 minutes |
 | `delay` | none | down 5m (max 30m) | Grace period after recovery before clearing |
 A typical Nextcloud AIO update cycle (30–90 seconds of container restarts) won't sustain 5 minutes of unhealthy status, so no alert fires. A genuinely broken container will still be caught.
 ## Applying the Config
 ```bash
 # If Netdata runs in Docker, write to the config volume
 sudo tee /var/lib/docker/volumes/netdata_netdataconfig/_data/health.d/docker.conf > /dev/null << 'EOF'
 # paste config here
 EOF
 # Reload health alarms without restarting the container
 sudo docker exec netdata netdatacli reload-health
 ```
 No container restart needed — `reload-health` picks up the new config immediately.
 ## Verify
 In the Netdata UI, navigate to **Alerts → Manage Alerts** and search for `docker_container_unhealthy`. The lookup and delay values should reflect the new config.
 ## Notes
 - This only overrides the `docker_container_unhealthy` alarm. The `docker_container_down` alarm (for exited containers) is left at its default — it already has a `delay: down 1m` and is disabled by default (`chart labels: container_name=!*`).
 - If you want per-container silencing instead of a blanket delay, use the `host labels` or `chart labels` filter to scope the alarm to specific containers.
 - Config volume path on majorlab: `/var/lib/docker/volumes/netdata_netdataconfig/_data/`
 ## See Also
 - [Tuning Netdata Web Log Alerts](tuning-netdata-web-log-alerts.md) — similar tuning for web_log redirect alerts
--- a/MajorWiki-Deploy-Status.md
+++ b/MajorWiki-Deploy-Status.md
@@ -127,3 +127,12 @@ Every time a new article is added, the following **MUST** be updated to maintain
 - `05-troubleshooting/ollama-macos-sleep-tailscale-disconnect.md` — Ollama drops off Tailscale when MajorMac sleeps
 **Updated:** `updated: 2026-03-17`
 ## Session Update — 2026-03-18
 **Article count:** 48 (was 47)
 **New articles added:**
 - `02-selfhosting/monitoring/netdata-docker-health-alarm-tuning.md` — tuning docker_container_unhealthy alarm to prevent flapping during Nextcloud AIO updates
 **Updated:** `updated: 2026-03-18`
--- a/README.md
+++ b/README.md
@@ -2,15 +2,15 @@
 > A growing reference of Linux, self-hosting, open source, streaming, and troubleshooting guides. Written by MajorLinux. Used by MajorTwin.
 >
-**Last updated:** 2026-03-17
+**Last updated:** 2026-03-18
-**Article count:** 47
+**Article count:** 48
 ## Domains
 | Domain | Folder | Articles |
 |---|---|---|
 | 🐧 Linux & Sysadmin | `01-linux/` | 11 |
-| 🏠 Self-Hosting & Homelab | `02-selfhosting/` | 9 |
+| 🏠 Self-Hosting & Homelab | `02-selfhosting/` | 10 |
 | 🔓 Open Source Tools | `03-opensource/` | 9 |
 | 🎙️ Streaming & Podcasting | `04-streaming/` | 2 |
 | 🔧 General Troubleshooting | `05-troubleshooting/` | 16 |
@@ -64,6 +64,7 @@
 ### Monitoring
 - [Tuning Netdata Web Log Alerts](02-selfhosting/monitoring/tuning-netdata-web-log-alerts.md) — tuning web_log_1m_redirects threshold for HTTPS-forcing servers
 - [Tuning Netdata Docker Health Alarms](02-selfhosting/monitoring/netdata-docker-health-alarm-tuning.md) — preventing false alerts during nightly Nextcloud AIO container update cycles
 ### Security
 - [Linux Server Hardening Checklist](02-selfhosting/security/linux-server-hardening-checklist.md) — non-root user, SSH key auth, sshd_config, firewall, fail2ban
@@ -128,6 +129,7 @@
 | Date | Article | Domain |
 |---|---|---|
 | 2026-03-18 | [Tuning Netdata Docker Health Alarms](02-selfhosting/monitoring/netdata-docker-health-alarm-tuning.md) | Self-Hosting |
 | 2026-03-17 | [Ollama Drops Off Tailscale When Mac Sleeps](05-troubleshooting/ollama-macos-sleep-tailscale-disconnect.md) | Troubleshooting |
 | 2026-03-17 | [Windows OpenSSH Server (sshd) Stops After Reboot](05-troubleshooting/networking/windows-sshd-stops-after-reboot.md) | Troubleshooting |
 | 2026-03-16 | [Standardizing unattended-upgrades with Ansible](02-selfhosting/security/ansible-unattended-upgrades-fleet.md) | Self-Hosting |
--- a/SUMMARY.md
+++ b/SUMMARY.md
@@ -19,6 +19,7 @@
    * [Tailscale for Homelab Remote Access](02-selfhosting/dns-networking/tailscale-homelab-remote-access.md)
    * [rsync Backup Patterns](02-selfhosting/storage-backup/rsync-backup-patterns.md)
    * [Tuning Netdata Web Log Alerts](02-selfhosting/monitoring/tuning-netdata-web-log-alerts.md)
    * [Tuning Netdata Docker Health Alarms](02-selfhosting/monitoring/netdata-docker-health-alarm-tuning.md)
    * [Linux Server Hardening Checklist](02-selfhosting/security/linux-server-hardening-checklist.md)
    * [Standardizing unattended-upgrades with Ansible](02-selfhosting/security/ansible-unattended-upgrades-fleet.md)
 * [Open Source & Alternatives](03-opensource/index.md)
--- a/index.md
+++ b/index.md
@@ -2,15 +2,15 @@
 > A growing reference of Linux, self-hosting, open source, streaming, and troubleshooting guides. Written by MajorLinux. Used by MajorTwin.
 >
-> **Last updated:** 2026-03-17
+> **Last updated:** 2026-03-18
-> **Article count:** 47
+> **Article count:** 48
 ## Domains
 | Domain | Folder | Articles |
 |---|---|---|
 | 🐧 Linux & Sysadmin | `01-linux/` | 11 |
-| 🏠 Self-Hosting & Homelab | `02-selfhosting/` | 9 |
+| 🏠 Self-Hosting & Homelab | `02-selfhosting/` | 10 |
 | 🔓 Open Source Tools | `03-opensource/` | 9 |
 | 🎙️ Streaming & Podcasting | `04-streaming/` | 2 |
 | 🔧 General Troubleshooting | `05-troubleshooting/` | 16 |
@@ -64,6 +64,7 @@
 ### Monitoring
 - [Tuning Netdata Web Log Alerts](02-selfhosting/monitoring/tuning-netdata-web-log-alerts.md) — tuning web_log_1m_redirects threshold for HTTPS-forcing servers
 - [Tuning Netdata Docker Health Alarms](02-selfhosting/monitoring/netdata-docker-health-alarm-tuning.md) — preventing false alerts during nightly Nextcloud AIO container update cycles
 ### Security
 - [Linux Server Hardening Checklist](02-selfhosting/security/linux-server-hardening-checklist.md) — non-root user, SSH key auth, sshd_config, firewall, fail2ban
@@ -128,6 +129,7 @@
 | Date | Article | Domain |
 |---|---|---|
 | 2026-03-18 | [Tuning Netdata Docker Health Alarms](02-selfhosting/monitoring/netdata-docker-health-alarm-tuning.md) | Self-Hosting |
 | 2026-03-17 | [Ollama Drops Off Tailscale When Mac Sleeps](05-troubleshooting/ollama-macos-sleep-tailscale-disconnect.md) | Troubleshooting |
 | 2026-03-17 | [Windows OpenSSH Server (sshd) Stops After Reboot](05-troubleshooting/networking/windows-sshd-stops-after-reboot.md) | Troubleshooting |
 | 2026-03-16 | [Standardizing unattended-upgrades with Ansible](02-selfhosting/security/ansible-unattended-upgrades-fleet.md) | Self-Hosting |