wiki: add Tailscale SSH reauth article; update Netdata Docker alarm tuning (50 articles)
- New: Tailscale SSH unexpected re-authentication prompt — diagnosis and fix - Updated: netdata-docker-health-alarm-tuning — add delay: up 3m to suppress Nextcloud AIO PHP-FPM ~90s startup false alerts; update settings table and notes - Updated: 05-troubleshooting/index.md and SUMMARY.md Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
This commit is contained in:
@@ -5,7 +5,7 @@ category: monitoring
|
||||
tags: [netdata, docker, nextcloud, alarms, health, monitoring]
|
||||
status: published
|
||||
created: 2026-03-18
|
||||
updated: 2026-03-18
|
||||
updated: 2026-03-21
|
||||
---
|
||||
|
||||
# Tuning Netdata Docker Health Alarms to Prevent Update Flapping
|
||||
@@ -40,7 +40,7 @@ component: Docker
|
||||
every: 30s
|
||||
lookup: average -5m of unhealthy
|
||||
warn: $this > 0
|
||||
delay: down 5m multiplier 1.5 max 30m
|
||||
delay: up 3m down 5m multiplier 1.5 max 30m
|
||||
summary: Docker container ${label:container_name} health
|
||||
info: ${label:container_name} docker container health status is unhealthy
|
||||
to: sysadmin
|
||||
@@ -49,10 +49,11 @@ component: Docker
|
||||
| Setting | Default | Tuned | Effect |
|
||||
|---|---|---|---|
|
||||
| `every` | 10s | 30s | Check less frequently |
|
||||
| `lookup` | average -10s | average -5m | Must be unhealthy for sustained 5 minutes |
|
||||
| `delay` | none | down 5m (max 30m) | Grace period after recovery before clearing |
|
||||
| `lookup` | average -10s | average -5m | Smooths transient unhealthy samples over 5 minutes |
|
||||
| `delay: up 3m` | none | 3m | Won't fire until unhealthy condition persists for 3 continuous minutes |
|
||||
| `delay: down 5m` | none | 5m (max 30m) | Grace period after recovery before clearing |
|
||||
|
||||
A typical Nextcloud AIO update cycle (30–90 seconds of container restarts) won't sustain 5 minutes of unhealthy status, so no alert fires. A genuinely broken container will still be caught.
|
||||
The `up` delay is the critical addition. Nextcloud AIO's `nextcloud-aio-nextcloud` container checks both PostgreSQL (port 5432) and PHP-FPM (port 9000). PHP-FPM takes ~90 seconds to warm up after a restart, causing 2–3 failing health checks before the container becomes healthy. With `delay: up 3m`, Netdata waits for 3 continuous minutes of unhealthy status before firing — absorbing the ~90 second startup window with margin to spare. A genuinely broken container will still trigger the alert.
|
||||
|
||||
## Applying the Config
|
||||
|
||||
|
||||
Reference in New Issue
Block a user