--- title: "Tuning Netdata Docker Health Alarms to Prevent Update Flapping" domain: selfhosting category: monitoring tags: [netdata, docker, nextcloud, alarms, health, monitoring] status: published created: 2026-03-18 updated: 2026-03-21 --- # Tuning Netdata Docker Health Alarms to Prevent Update Flapping Netdata's default `docker_container_unhealthy` alarm fires on a 10-second average with no delay. When Nextcloud AIO (or any stack with a watchtower/auto-update setup) does its nightly update cycle, containers restart in sequence and briefly show as unhealthy — generating a flood of false alerts. ## The Default Alarm ```ini template: docker_container_unhealthy on: docker.container_health_status every: 10s lookup: average -10s of unhealthy warn: $this > 0 ``` A single container being unhealthy for 10 seconds triggers it. No grace period, no delay. ## The Fix Create a custom override at `/etc/netdata/health.d/docker.conf` (maps to the Netdata config volume if running in Docker). This file takes precedence over the stock config in `/usr/lib/netdata/conf.d/health.d/docker.conf`. ```ini # Custom override — reduces flapping during nightly container updates. template: docker_container_unhealthy on: docker.container_health_status class: Errors type: Containers component: Docker units: status every: 30s lookup: average -5m of unhealthy warn: $this > 0 delay: up 3m down 5m multiplier 1.5 max 30m summary: Docker container ${label:container_name} health info: ${label:container_name} docker container health status is unhealthy to: sysadmin ``` | Setting | Default | Tuned | Effect | |---|---|---|---| | `every` | 10s | 30s | Check less frequently | | `lookup` | average -10s | average -5m | Smooths transient unhealthy samples over 5 minutes | | `delay: up 3m` | none | 3m | Won't fire until unhealthy condition persists for 3 continuous minutes | | `delay: down 5m` | none | 5m (max 30m) | Grace period after recovery before clearing | The `up` delay is the critical addition. Nextcloud AIO's `nextcloud-aio-nextcloud` container checks both PostgreSQL (port 5432) and PHP-FPM (port 9000). PHP-FPM takes ~90 seconds to warm up after a restart, causing 2–3 failing health checks before the container becomes healthy. With `delay: up 3m`, Netdata waits for 3 continuous minutes of unhealthy status before firing — absorbing the ~90 second startup window with margin to spare. A genuinely broken container will still trigger the alert. ## Applying the Config ```bash # If Netdata runs in Docker, write to the config volume sudo tee /var/lib/docker/volumes/netdata_netdataconfig/_data/health.d/docker.conf > /dev/null << 'EOF' # paste config here EOF # Reload health alarms without restarting the container sudo docker exec netdata netdatacli reload-health ``` No container restart needed — `reload-health` picks up the new config immediately. ## Verify In the Netdata UI, navigate to **Alerts → Manage Alerts** and search for `docker_container_unhealthy`. The lookup and delay values should reflect the new config. ## Notes - This only overrides the `docker_container_unhealthy` alarm. The `docker_container_down` alarm (for exited containers) is left at its default — it already has a `delay: down 1m` and is disabled by default (`chart labels: container_name=!*`). - If you want per-container silencing instead of a blanket delay, use the `host labels` or `chart labels` filter to scope the alarm to specific containers. - Config volume path on majorlab: `/var/lib/docker/volumes/netdata_netdataconfig/_data/` ## See Also - [Tuning Netdata Web Log Alerts](tuning-netdata-web-log-alerts.md) — similar tuning for web_log redirect alerts