wiki: add Netdata Docker health alarm tuning article; update indexes to 48
- 02-selfhosting/monitoring/netdata-docker-health-alarm-tuning.md — new - lookup extended to 5m average, delay: down 5m to prevent Nextcloud AIO update flapping - SUMMARY.md, index.md, README.md, deploy status updated Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
This commit is contained in:
@@ -23,6 +23,7 @@ Guides for running your own services at home, including Docker, reverse proxies,
|
||||
## Monitoring
|
||||
|
||||
- [Tuning Netdata Web Log Alerts](monitoring/tuning-netdata-web-log-alerts.md)
|
||||
- [Tuning Netdata Docker Health Alarms](monitoring/netdata-docker-health-alarm-tuning.md)
|
||||
|
||||
## Security
|
||||
|
||||
|
||||
@@ -0,0 +1,83 @@
|
||||
---
|
||||
title: "Tuning Netdata Docker Health Alarms to Prevent Update Flapping"
|
||||
domain: selfhosting
|
||||
category: monitoring
|
||||
tags: [netdata, docker, nextcloud, alarms, health, monitoring]
|
||||
status: published
|
||||
created: 2026-03-18
|
||||
updated: 2026-03-18
|
||||
---
|
||||
|
||||
# Tuning Netdata Docker Health Alarms to Prevent Update Flapping
|
||||
|
||||
Netdata's default `docker_container_unhealthy` alarm fires on a 10-second average with no delay. When Nextcloud AIO (or any stack with a watchtower/auto-update setup) does its nightly update cycle, containers restart in sequence and briefly show as unhealthy — generating a flood of false alerts.
|
||||
|
||||
## The Default Alarm
|
||||
|
||||
```ini
|
||||
template: docker_container_unhealthy
|
||||
on: docker.container_health_status
|
||||
every: 10s
|
||||
lookup: average -10s of unhealthy
|
||||
warn: $this > 0
|
||||
```
|
||||
|
||||
A single container being unhealthy for 10 seconds triggers it. No grace period, no delay.
|
||||
|
||||
## The Fix
|
||||
|
||||
Create a custom override at `/etc/netdata/health.d/docker.conf` (maps to the Netdata config volume if running in Docker). This file takes precedence over the stock config in `/usr/lib/netdata/conf.d/health.d/docker.conf`.
|
||||
|
||||
```ini
|
||||
# Custom override — reduces flapping during nightly container updates.
|
||||
|
||||
template: docker_container_unhealthy
|
||||
on: docker.container_health_status
|
||||
class: Errors
|
||||
type: Containers
|
||||
component: Docker
|
||||
units: status
|
||||
every: 30s
|
||||
lookup: average -5m of unhealthy
|
||||
warn: $this > 0
|
||||
delay: down 5m multiplier 1.5 max 30m
|
||||
summary: Docker container ${label:container_name} health
|
||||
info: ${label:container_name} docker container health status is unhealthy
|
||||
to: sysadmin
|
||||
```
|
||||
|
||||
| Setting | Default | Tuned | Effect |
|
||||
|---|---|---|---|
|
||||
| `every` | 10s | 30s | Check less frequently |
|
||||
| `lookup` | average -10s | average -5m | Must be unhealthy for sustained 5 minutes |
|
||||
| `delay` | none | down 5m (max 30m) | Grace period after recovery before clearing |
|
||||
|
||||
A typical Nextcloud AIO update cycle (30–90 seconds of container restarts) won't sustain 5 minutes of unhealthy status, so no alert fires. A genuinely broken container will still be caught.
|
||||
|
||||
## Applying the Config
|
||||
|
||||
```bash
|
||||
# If Netdata runs in Docker, write to the config volume
|
||||
sudo tee /var/lib/docker/volumes/netdata_netdataconfig/_data/health.d/docker.conf > /dev/null << 'EOF'
|
||||
# paste config here
|
||||
EOF
|
||||
|
||||
# Reload health alarms without restarting the container
|
||||
sudo docker exec netdata netdatacli reload-health
|
||||
```
|
||||
|
||||
No container restart needed — `reload-health` picks up the new config immediately.
|
||||
|
||||
## Verify
|
||||
|
||||
In the Netdata UI, navigate to **Alerts → Manage Alerts** and search for `docker_container_unhealthy`. The lookup and delay values should reflect the new config.
|
||||
|
||||
## Notes
|
||||
|
||||
- This only overrides the `docker_container_unhealthy` alarm. The `docker_container_down` alarm (for exited containers) is left at its default — it already has a `delay: down 1m` and is disabled by default (`chart labels: container_name=!*`).
|
||||
- If you want per-container silencing instead of a blanket delay, use the `host labels` or `chart labels` filter to scope the alarm to specific containers.
|
||||
- Config volume path on majorlab: `/var/lib/docker/volumes/netdata_netdataconfig/_data/`
|
||||
|
||||
## See Also
|
||||
|
||||
- [Tuning Netdata Web Log Alerts](tuning-netdata-web-log-alerts.md) — similar tuning for web_log redirect alerts
|
||||
Reference in New Issue
Block a user