wiki: add Netdata Docker health alarm tuning article; update indexes to 48

- 02-selfhosting/monitoring/netdata-docker-health-alarm-tuning.md — new - lookup extended to 5m average, delay: down 5m to prevent Nextcloud AIO update flapping - SUMMARY.md, index.md, README.md, deploy status updated Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-18 00:10:36 -04:00
parent 59a5cc530e
commit 38fe720e63
6 changed files with 104 additions and 6 deletions
--- a/02-selfhosting/index.md
+++ b/02-selfhosting/index.md
@@ -23,6 +23,7 @@ Guides for running your own services at home, including Docker, reverse proxies,
 ## Monitoring

 - [Tuning Netdata Web Log Alerts](monitoring/tuning-netdata-web-log-alerts.md)
+- [Tuning Netdata Docker Health Alarms](monitoring/netdata-docker-health-alarm-tuning.md)

 ## Security

--- a/02-selfhosting/monitoring/netdata-docker-health-alarm-tuning.md
+++ b/02-selfhosting/monitoring/netdata-docker-health-alarm-tuning.md
@@ -0,0 +1,83 @@
+---
+title: "Tuning Netdata Docker Health Alarms to Prevent Update Flapping"
+domain: selfhosting
+category: monitoring
+tags: [netdata, docker, nextcloud, alarms, health, monitoring]
+status: published
+created: 2026-03-18
+updated: 2026-03-18
+---
+
+# Tuning Netdata Docker Health Alarms to Prevent Update Flapping
+
+Netdata's default `docker_container_unhealthy` alarm fires on a 10-second average with no delay. When Nextcloud AIO (or any stack with a watchtower/auto-update setup) does its nightly update cycle, containers restart in sequence and briefly show as unhealthy — generating a flood of false alerts.
+
+## The Default Alarm
+
+```ini
+template: docker_container_unhealthy
+       on: docker.container_health_status
+    every: 10s
+   lookup: average -10s of unhealthy
+     warn: $this > 0
+```
+
+A single container being unhealthy for 10 seconds triggers it. No grace period, no delay.
+
+## The Fix
+
+Create a custom override at `/etc/netdata/health.d/docker.conf` (maps to the Netdata config volume if running in Docker). This file takes precedence over the stock config in `/usr/lib/netdata/conf.d/health.d/docker.conf`.
+
+```ini
+# Custom override — reduces flapping during nightly container updates.
+
+template: docker_container_unhealthy
+       on: docker.container_health_status
+    class: Errors
+     type: Containers
+component: Docker
+    units: status
+    every: 30s
+   lookup: average -5m of unhealthy
+     warn: $this > 0
+    delay: down 5m multiplier 1.5 max 30m
+  summary: Docker container ${label:container_name} health
+     info: ${label:container_name} docker container health status is unhealthy
+       to: sysadmin
+```
+
+| Setting | Default | Tuned | Effect |
+|---|---|---|---|
+| `every` | 10s | 30s | Check less frequently |
+| `lookup` | average -10s | average -5m | Must be unhealthy for sustained 5 minutes |
+| `delay` | none | down 5m (max 30m) | Grace period after recovery before clearing |
+
+A typical Nextcloud AIO update cycle (30–90 seconds of container restarts) won't sustain 5 minutes of unhealthy status, so no alert fires. A genuinely broken container will still be caught.
+
+## Applying the Config
+
+```bash
+# If Netdata runs in Docker, write to the config volume
+sudo tee /var/lib/docker/volumes/netdata_netdataconfig/_data/health.d/docker.conf > /dev/null << 'EOF'
+# paste config here
+EOF
+
+# Reload health alarms without restarting the container
+sudo docker exec netdata netdatacli reload-health
+```
+
+No container restart needed — `reload-health` picks up the new config immediately.
+
+## Verify
+
+In the Netdata UI, navigate to **Alerts → Manage Alerts** and search for `docker_container_unhealthy`. The lookup and delay values should reflect the new config.
+
+## Notes
+
+- This only overrides the `docker_container_unhealthy` alarm. The `docker_container_down` alarm (for exited containers) is left at its default — it already has a `delay: down 1m` and is disabled by default (`chart labels: container_name=!*`).
+- If you want per-container silencing instead of a blanket delay, use the `host labels` or `chart labels` filter to scope the alarm to specific containers.
+- Config volume path on majorlab: `/var/lib/docker/volumes/netdata_netdataconfig/_data/`
+
+## See Also
+
+- [Tuning Netdata Web Log Alerts](tuning-netdata-web-log-alerts.md) — similar tuning for web_log redirect alerts