wiki: add Netdata Docker health alarm tuning article; update indexes to 48
- 02-selfhosting/monitoring/netdata-docker-health-alarm-tuning.md — new - lookup extended to 5m average, delay: down 5m to prevent Nextcloud AIO update flapping - SUMMARY.md, index.md, README.md, deploy status updated Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
This commit is contained in:
@@ -23,6 +23,7 @@ Guides for running your own services at home, including Docker, reverse proxies,
|
|||||||
## Monitoring
|
## Monitoring
|
||||||
|
|
||||||
- [Tuning Netdata Web Log Alerts](monitoring/tuning-netdata-web-log-alerts.md)
|
- [Tuning Netdata Web Log Alerts](monitoring/tuning-netdata-web-log-alerts.md)
|
||||||
|
- [Tuning Netdata Docker Health Alarms](monitoring/netdata-docker-health-alarm-tuning.md)
|
||||||
|
|
||||||
## Security
|
## Security
|
||||||
|
|
||||||
|
|||||||
@@ -0,0 +1,83 @@
|
|||||||
|
---
|
||||||
|
title: "Tuning Netdata Docker Health Alarms to Prevent Update Flapping"
|
||||||
|
domain: selfhosting
|
||||||
|
category: monitoring
|
||||||
|
tags: [netdata, docker, nextcloud, alarms, health, monitoring]
|
||||||
|
status: published
|
||||||
|
created: 2026-03-18
|
||||||
|
updated: 2026-03-18
|
||||||
|
---
|
||||||
|
|
||||||
|
# Tuning Netdata Docker Health Alarms to Prevent Update Flapping
|
||||||
|
|
||||||
|
Netdata's default `docker_container_unhealthy` alarm fires on a 10-second average with no delay. When Nextcloud AIO (or any stack with a watchtower/auto-update setup) does its nightly update cycle, containers restart in sequence and briefly show as unhealthy — generating a flood of false alerts.
|
||||||
|
|
||||||
|
## The Default Alarm
|
||||||
|
|
||||||
|
```ini
|
||||||
|
template: docker_container_unhealthy
|
||||||
|
on: docker.container_health_status
|
||||||
|
every: 10s
|
||||||
|
lookup: average -10s of unhealthy
|
||||||
|
warn: $this > 0
|
||||||
|
```
|
||||||
|
|
||||||
|
A single container being unhealthy for 10 seconds triggers it. No grace period, no delay.
|
||||||
|
|
||||||
|
## The Fix
|
||||||
|
|
||||||
|
Create a custom override at `/etc/netdata/health.d/docker.conf` (maps to the Netdata config volume if running in Docker). This file takes precedence over the stock config in `/usr/lib/netdata/conf.d/health.d/docker.conf`.
|
||||||
|
|
||||||
|
```ini
|
||||||
|
# Custom override — reduces flapping during nightly container updates.
|
||||||
|
|
||||||
|
template: docker_container_unhealthy
|
||||||
|
on: docker.container_health_status
|
||||||
|
class: Errors
|
||||||
|
type: Containers
|
||||||
|
component: Docker
|
||||||
|
units: status
|
||||||
|
every: 30s
|
||||||
|
lookup: average -5m of unhealthy
|
||||||
|
warn: $this > 0
|
||||||
|
delay: down 5m multiplier 1.5 max 30m
|
||||||
|
summary: Docker container ${label:container_name} health
|
||||||
|
info: ${label:container_name} docker container health status is unhealthy
|
||||||
|
to: sysadmin
|
||||||
|
```
|
||||||
|
|
||||||
|
| Setting | Default | Tuned | Effect |
|
||||||
|
|---|---|---|---|
|
||||||
|
| `every` | 10s | 30s | Check less frequently |
|
||||||
|
| `lookup` | average -10s | average -5m | Must be unhealthy for sustained 5 minutes |
|
||||||
|
| `delay` | none | down 5m (max 30m) | Grace period after recovery before clearing |
|
||||||
|
|
||||||
|
A typical Nextcloud AIO update cycle (30–90 seconds of container restarts) won't sustain 5 minutes of unhealthy status, so no alert fires. A genuinely broken container will still be caught.
|
||||||
|
|
||||||
|
## Applying the Config
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# If Netdata runs in Docker, write to the config volume
|
||||||
|
sudo tee /var/lib/docker/volumes/netdata_netdataconfig/_data/health.d/docker.conf > /dev/null << 'EOF'
|
||||||
|
# paste config here
|
||||||
|
EOF
|
||||||
|
|
||||||
|
# Reload health alarms without restarting the container
|
||||||
|
sudo docker exec netdata netdatacli reload-health
|
||||||
|
```
|
||||||
|
|
||||||
|
No container restart needed — `reload-health` picks up the new config immediately.
|
||||||
|
|
||||||
|
## Verify
|
||||||
|
|
||||||
|
In the Netdata UI, navigate to **Alerts → Manage Alerts** and search for `docker_container_unhealthy`. The lookup and delay values should reflect the new config.
|
||||||
|
|
||||||
|
## Notes
|
||||||
|
|
||||||
|
- This only overrides the `docker_container_unhealthy` alarm. The `docker_container_down` alarm (for exited containers) is left at its default — it already has a `delay: down 1m` and is disabled by default (`chart labels: container_name=!*`).
|
||||||
|
- If you want per-container silencing instead of a blanket delay, use the `host labels` or `chart labels` filter to scope the alarm to specific containers.
|
||||||
|
- Config volume path on majorlab: `/var/lib/docker/volumes/netdata_netdataconfig/_data/`
|
||||||
|
|
||||||
|
## See Also
|
||||||
|
|
||||||
|
- [Tuning Netdata Web Log Alerts](tuning-netdata-web-log-alerts.md) — similar tuning for web_log redirect alerts
|
||||||
@@ -127,3 +127,12 @@ Every time a new article is added, the following **MUST** be updated to maintain
|
|||||||
- `05-troubleshooting/ollama-macos-sleep-tailscale-disconnect.md` — Ollama drops off Tailscale when MajorMac sleeps
|
- `05-troubleshooting/ollama-macos-sleep-tailscale-disconnect.md` — Ollama drops off Tailscale when MajorMac sleeps
|
||||||
|
|
||||||
**Updated:** `updated: 2026-03-17`
|
**Updated:** `updated: 2026-03-17`
|
||||||
|
|
||||||
|
## Session Update — 2026-03-18
|
||||||
|
|
||||||
|
**Article count:** 48 (was 47)
|
||||||
|
|
||||||
|
**New articles added:**
|
||||||
|
- `02-selfhosting/monitoring/netdata-docker-health-alarm-tuning.md` — tuning docker_container_unhealthy alarm to prevent flapping during Nextcloud AIO updates
|
||||||
|
|
||||||
|
**Updated:** `updated: 2026-03-18`
|
||||||
|
|||||||
@@ -2,15 +2,15 @@
|
|||||||
|
|
||||||
> A growing reference of Linux, self-hosting, open source, streaming, and troubleshooting guides. Written by MajorLinux. Used by MajorTwin.
|
> A growing reference of Linux, self-hosting, open source, streaming, and troubleshooting guides. Written by MajorLinux. Used by MajorTwin.
|
||||||
>
|
>
|
||||||
**Last updated:** 2026-03-17
|
**Last updated:** 2026-03-18
|
||||||
**Article count:** 47
|
**Article count:** 48
|
||||||
|
|
||||||
## Domains
|
## Domains
|
||||||
|
|
||||||
| Domain | Folder | Articles |
|
| Domain | Folder | Articles |
|
||||||
|---|---|---|
|
|---|---|---|
|
||||||
| 🐧 Linux & Sysadmin | `01-linux/` | 11 |
|
| 🐧 Linux & Sysadmin | `01-linux/` | 11 |
|
||||||
| 🏠 Self-Hosting & Homelab | `02-selfhosting/` | 9 |
|
| 🏠 Self-Hosting & Homelab | `02-selfhosting/` | 10 |
|
||||||
| 🔓 Open Source Tools | `03-opensource/` | 9 |
|
| 🔓 Open Source Tools | `03-opensource/` | 9 |
|
||||||
| 🎙️ Streaming & Podcasting | `04-streaming/` | 2 |
|
| 🎙️ Streaming & Podcasting | `04-streaming/` | 2 |
|
||||||
| 🔧 General Troubleshooting | `05-troubleshooting/` | 16 |
|
| 🔧 General Troubleshooting | `05-troubleshooting/` | 16 |
|
||||||
@@ -64,6 +64,7 @@
|
|||||||
|
|
||||||
### Monitoring
|
### Monitoring
|
||||||
- [Tuning Netdata Web Log Alerts](02-selfhosting/monitoring/tuning-netdata-web-log-alerts.md) — tuning web_log_1m_redirects threshold for HTTPS-forcing servers
|
- [Tuning Netdata Web Log Alerts](02-selfhosting/monitoring/tuning-netdata-web-log-alerts.md) — tuning web_log_1m_redirects threshold for HTTPS-forcing servers
|
||||||
|
- [Tuning Netdata Docker Health Alarms](02-selfhosting/monitoring/netdata-docker-health-alarm-tuning.md) — preventing false alerts during nightly Nextcloud AIO container update cycles
|
||||||
|
|
||||||
### Security
|
### Security
|
||||||
- [Linux Server Hardening Checklist](02-selfhosting/security/linux-server-hardening-checklist.md) — non-root user, SSH key auth, sshd_config, firewall, fail2ban
|
- [Linux Server Hardening Checklist](02-selfhosting/security/linux-server-hardening-checklist.md) — non-root user, SSH key auth, sshd_config, firewall, fail2ban
|
||||||
@@ -128,6 +129,7 @@
|
|||||||
|
|
||||||
| Date | Article | Domain |
|
| Date | Article | Domain |
|
||||||
|---|---|---|
|
|---|---|---|
|
||||||
|
| 2026-03-18 | [Tuning Netdata Docker Health Alarms](02-selfhosting/monitoring/netdata-docker-health-alarm-tuning.md) | Self-Hosting |
|
||||||
| 2026-03-17 | [Ollama Drops Off Tailscale When Mac Sleeps](05-troubleshooting/ollama-macos-sleep-tailscale-disconnect.md) | Troubleshooting |
|
| 2026-03-17 | [Ollama Drops Off Tailscale When Mac Sleeps](05-troubleshooting/ollama-macos-sleep-tailscale-disconnect.md) | Troubleshooting |
|
||||||
| 2026-03-17 | [Windows OpenSSH Server (sshd) Stops After Reboot](05-troubleshooting/networking/windows-sshd-stops-after-reboot.md) | Troubleshooting |
|
| 2026-03-17 | [Windows OpenSSH Server (sshd) Stops After Reboot](05-troubleshooting/networking/windows-sshd-stops-after-reboot.md) | Troubleshooting |
|
||||||
| 2026-03-16 | [Standardizing unattended-upgrades with Ansible](02-selfhosting/security/ansible-unattended-upgrades-fleet.md) | Self-Hosting |
|
| 2026-03-16 | [Standardizing unattended-upgrades with Ansible](02-selfhosting/security/ansible-unattended-upgrades-fleet.md) | Self-Hosting |
|
||||||
|
|||||||
@@ -19,6 +19,7 @@
|
|||||||
* [Tailscale for Homelab Remote Access](02-selfhosting/dns-networking/tailscale-homelab-remote-access.md)
|
* [Tailscale for Homelab Remote Access](02-selfhosting/dns-networking/tailscale-homelab-remote-access.md)
|
||||||
* [rsync Backup Patterns](02-selfhosting/storage-backup/rsync-backup-patterns.md)
|
* [rsync Backup Patterns](02-selfhosting/storage-backup/rsync-backup-patterns.md)
|
||||||
* [Tuning Netdata Web Log Alerts](02-selfhosting/monitoring/tuning-netdata-web-log-alerts.md)
|
* [Tuning Netdata Web Log Alerts](02-selfhosting/monitoring/tuning-netdata-web-log-alerts.md)
|
||||||
|
* [Tuning Netdata Docker Health Alarms](02-selfhosting/monitoring/netdata-docker-health-alarm-tuning.md)
|
||||||
* [Linux Server Hardening Checklist](02-selfhosting/security/linux-server-hardening-checklist.md)
|
* [Linux Server Hardening Checklist](02-selfhosting/security/linux-server-hardening-checklist.md)
|
||||||
* [Standardizing unattended-upgrades with Ansible](02-selfhosting/security/ansible-unattended-upgrades-fleet.md)
|
* [Standardizing unattended-upgrades with Ansible](02-selfhosting/security/ansible-unattended-upgrades-fleet.md)
|
||||||
* [Open Source & Alternatives](03-opensource/index.md)
|
* [Open Source & Alternatives](03-opensource/index.md)
|
||||||
|
|||||||
8
index.md
8
index.md
@@ -2,15 +2,15 @@
|
|||||||
|
|
||||||
> A growing reference of Linux, self-hosting, open source, streaming, and troubleshooting guides. Written by MajorLinux. Used by MajorTwin.
|
> A growing reference of Linux, self-hosting, open source, streaming, and troubleshooting guides. Written by MajorLinux. Used by MajorTwin.
|
||||||
>
|
>
|
||||||
> **Last updated:** 2026-03-17
|
> **Last updated:** 2026-03-18
|
||||||
> **Article count:** 47
|
> **Article count:** 48
|
||||||
|
|
||||||
## Domains
|
## Domains
|
||||||
|
|
||||||
| Domain | Folder | Articles |
|
| Domain | Folder | Articles |
|
||||||
|---|---|---|
|
|---|---|---|
|
||||||
| 🐧 Linux & Sysadmin | `01-linux/` | 11 |
|
| 🐧 Linux & Sysadmin | `01-linux/` | 11 |
|
||||||
| 🏠 Self-Hosting & Homelab | `02-selfhosting/` | 9 |
|
| 🏠 Self-Hosting & Homelab | `02-selfhosting/` | 10 |
|
||||||
| 🔓 Open Source Tools | `03-opensource/` | 9 |
|
| 🔓 Open Source Tools | `03-opensource/` | 9 |
|
||||||
| 🎙️ Streaming & Podcasting | `04-streaming/` | 2 |
|
| 🎙️ Streaming & Podcasting | `04-streaming/` | 2 |
|
||||||
| 🔧 General Troubleshooting | `05-troubleshooting/` | 16 |
|
| 🔧 General Troubleshooting | `05-troubleshooting/` | 16 |
|
||||||
@@ -64,6 +64,7 @@
|
|||||||
|
|
||||||
### Monitoring
|
### Monitoring
|
||||||
- [Tuning Netdata Web Log Alerts](02-selfhosting/monitoring/tuning-netdata-web-log-alerts.md) — tuning web_log_1m_redirects threshold for HTTPS-forcing servers
|
- [Tuning Netdata Web Log Alerts](02-selfhosting/monitoring/tuning-netdata-web-log-alerts.md) — tuning web_log_1m_redirects threshold for HTTPS-forcing servers
|
||||||
|
- [Tuning Netdata Docker Health Alarms](02-selfhosting/monitoring/netdata-docker-health-alarm-tuning.md) — preventing false alerts during nightly Nextcloud AIO container update cycles
|
||||||
|
|
||||||
### Security
|
### Security
|
||||||
- [Linux Server Hardening Checklist](02-selfhosting/security/linux-server-hardening-checklist.md) — non-root user, SSH key auth, sshd_config, firewall, fail2ban
|
- [Linux Server Hardening Checklist](02-selfhosting/security/linux-server-hardening-checklist.md) — non-root user, SSH key auth, sshd_config, firewall, fail2ban
|
||||||
@@ -128,6 +129,7 @@
|
|||||||
|
|
||||||
| Date | Article | Domain |
|
| Date | Article | Domain |
|
||||||
|---|---|---|
|
|---|---|---|
|
||||||
|
| 2026-03-18 | [Tuning Netdata Docker Health Alarms](02-selfhosting/monitoring/netdata-docker-health-alarm-tuning.md) | Self-Hosting |
|
||||||
| 2026-03-17 | [Ollama Drops Off Tailscale When Mac Sleeps](05-troubleshooting/ollama-macos-sleep-tailscale-disconnect.md) | Troubleshooting |
|
| 2026-03-17 | [Ollama Drops Off Tailscale When Mac Sleeps](05-troubleshooting/ollama-macos-sleep-tailscale-disconnect.md) | Troubleshooting |
|
||||||
| 2026-03-17 | [Windows OpenSSH Server (sshd) Stops After Reboot](05-troubleshooting/networking/windows-sshd-stops-after-reboot.md) | Troubleshooting |
|
| 2026-03-17 | [Windows OpenSSH Server (sshd) Stops After Reboot](05-troubleshooting/networking/windows-sshd-stops-after-reboot.md) | Troubleshooting |
|
||||||
| 2026-03-16 | [Standardizing unattended-upgrades with Ansible](02-selfhosting/security/ansible-unattended-upgrades-fleet.md) | Self-Hosting |
|
| 2026-03-16 | [Standardizing unattended-upgrades with Ansible](02-selfhosting/security/ansible-unattended-upgrades-fleet.md) | Self-Hosting |
|
||||||
|
|||||||
Reference in New Issue
Block a user