wiki: add Tailscale SSH reauth article; update Netdata Docker alarm tuning (50 articles)
- New: Tailscale SSH unexpected re-authentication prompt — diagnosis and fix - Updated: netdata-docker-health-alarm-tuning — add delay: up 3m to suppress Nextcloud AIO PHP-FPM ~90s startup false alerts; update settings table and notes - Updated: 05-troubleshooting/index.md and SUMMARY.md Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
This commit is contained in:
@@ -5,7 +5,7 @@ category: monitoring
|
|||||||
tags: [netdata, docker, nextcloud, alarms, health, monitoring]
|
tags: [netdata, docker, nextcloud, alarms, health, monitoring]
|
||||||
status: published
|
status: published
|
||||||
created: 2026-03-18
|
created: 2026-03-18
|
||||||
updated: 2026-03-18
|
updated: 2026-03-21
|
||||||
---
|
---
|
||||||
|
|
||||||
# Tuning Netdata Docker Health Alarms to Prevent Update Flapping
|
# Tuning Netdata Docker Health Alarms to Prevent Update Flapping
|
||||||
@@ -40,7 +40,7 @@ component: Docker
|
|||||||
every: 30s
|
every: 30s
|
||||||
lookup: average -5m of unhealthy
|
lookup: average -5m of unhealthy
|
||||||
warn: $this > 0
|
warn: $this > 0
|
||||||
delay: down 5m multiplier 1.5 max 30m
|
delay: up 3m down 5m multiplier 1.5 max 30m
|
||||||
summary: Docker container ${label:container_name} health
|
summary: Docker container ${label:container_name} health
|
||||||
info: ${label:container_name} docker container health status is unhealthy
|
info: ${label:container_name} docker container health status is unhealthy
|
||||||
to: sysadmin
|
to: sysadmin
|
||||||
@@ -49,10 +49,11 @@ component: Docker
|
|||||||
| Setting | Default | Tuned | Effect |
|
| Setting | Default | Tuned | Effect |
|
||||||
|---|---|---|---|
|
|---|---|---|---|
|
||||||
| `every` | 10s | 30s | Check less frequently |
|
| `every` | 10s | 30s | Check less frequently |
|
||||||
| `lookup` | average -10s | average -5m | Must be unhealthy for sustained 5 minutes |
|
| `lookup` | average -10s | average -5m | Smooths transient unhealthy samples over 5 minutes |
|
||||||
| `delay` | none | down 5m (max 30m) | Grace period after recovery before clearing |
|
| `delay: up 3m` | none | 3m | Won't fire until unhealthy condition persists for 3 continuous minutes |
|
||||||
|
| `delay: down 5m` | none | 5m (max 30m) | Grace period after recovery before clearing |
|
||||||
|
|
||||||
A typical Nextcloud AIO update cycle (30–90 seconds of container restarts) won't sustain 5 minutes of unhealthy status, so no alert fires. A genuinely broken container will still be caught.
|
The `up` delay is the critical addition. Nextcloud AIO's `nextcloud-aio-nextcloud` container checks both PostgreSQL (port 5432) and PHP-FPM (port 9000). PHP-FPM takes ~90 seconds to warm up after a restart, causing 2–3 failing health checks before the container becomes healthy. With `delay: up 3m`, Netdata waits for 3 continuous minutes of unhealthy status before firing — absorbing the ~90 second startup window with margin to spare. A genuinely broken container will still trigger the alert.
|
||||||
|
|
||||||
## Applying the Config
|
## Applying the Config
|
||||||
|
|
||||||
|
|||||||
@@ -9,6 +9,7 @@ Practical fixes for common Linux, networking, and application problems.
|
|||||||
- [Apache Outage: Fail2ban Self-Ban + Missing iptables Rules](networking/fail2ban-self-ban-apache-outage.md)
|
- [Apache Outage: Fail2ban Self-Ban + Missing iptables Rules](networking/fail2ban-self-ban-apache-outage.md)
|
||||||
- [Mail Client Stops Receiving: Fail2ban IMAP Self-Ban](networking/fail2ban-imap-self-ban-mail-client.md)
|
- [Mail Client Stops Receiving: Fail2ban IMAP Self-Ban](networking/fail2ban-imap-self-ban-mail-client.md)
|
||||||
- [firewalld: Mail Ports Wiped After Reload](networking/firewalld-mail-ports-reset.md)
|
- [firewalld: Mail Ports Wiped After Reload](networking/firewalld-mail-ports-reset.md)
|
||||||
|
- [Tailscale SSH: Unexpected Re-Authentication Prompt](networking/tailscale-ssh-reauth-prompt.md)
|
||||||
- [ISP SNI Filtering & Caddy](isp-sni-filtering-caddy.md)
|
- [ISP SNI Filtering & Caddy](isp-sni-filtering-caddy.md)
|
||||||
- [yt-dlp YouTube JS Challenge Fix](yt-dlp-fedora-js-challenge.md)
|
- [yt-dlp YouTube JS Challenge Fix](yt-dlp-fedora-js-challenge.md)
|
||||||
|
|
||||||
|
|||||||
66
05-troubleshooting/networking/tailscale-ssh-reauth-prompt.md
Normal file
66
05-troubleshooting/networking/tailscale-ssh-reauth-prompt.md
Normal file
@@ -0,0 +1,66 @@
|
|||||||
|
# Tailscale SSH: Unexpected Re-Authentication Prompt
|
||||||
|
|
||||||
|
If a Tailscale SSH connection unexpectedly presents a browser authentication URL mid-session, the first instinct is to check the ACL policy. However, this is often a one-off Tailscale hiccup rather than a misconfiguration.
|
||||||
|
|
||||||
|
## Symptoms
|
||||||
|
|
||||||
|
- SSH connection to a fleet node displays a Tailscale auth URL:
|
||||||
|
```
|
||||||
|
To authenticate, visit: https://login.tailscale.com/a/xxxxxxxx
|
||||||
|
```
|
||||||
|
- The prompt appears even though the node worked fine previously
|
||||||
|
- Other nodes in the fleet connect without prompting
|
||||||
|
|
||||||
|
## What Causes It
|
||||||
|
|
||||||
|
Tailscale SSH supports two ACL `action` values:
|
||||||
|
|
||||||
|
| Action | Behavior |
|
||||||
|
|---|---|
|
||||||
|
| `accept` | Trusts Tailscale identity — no additional auth required |
|
||||||
|
| `check` | Requires periodic browser-based re-authentication |
|
||||||
|
|
||||||
|
If `action: "check"` is set, every session (or after token expiry) will prompt for browser auth. However, even with `action: "accept"`, a one-off prompt can appear due to a Tailscale daemon glitch or key refresh event.
|
||||||
|
|
||||||
|
## How to Diagnose
|
||||||
|
|
||||||
|
### 1. Verify the ACL policy
|
||||||
|
|
||||||
|
In the Tailscale admin console (or via `tailscale debug acl`), inspect the SSH rules. For a trusted homelab fleet, the rule should use `accept`:
|
||||||
|
|
||||||
|
```json
|
||||||
|
{
|
||||||
|
"src": ["autogroup:member"],
|
||||||
|
"dst": ["autogroup:self"],
|
||||||
|
"users": ["autogroup:nonroot", "root"],
|
||||||
|
"action": "accept",
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
If `action` is `check`, that is the root cause — change it to `accept` for trusted source/destination pairs.
|
||||||
|
|
||||||
|
### 2. Confirm it was a one-off
|
||||||
|
|
||||||
|
If the ACL already shows `accept`, the prompt was transient. Test with:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
ssh <hostname> "echo ok"
|
||||||
|
```
|
||||||
|
|
||||||
|
No auth prompt + `ok` output = resolved. Note that this test is only meaningful if the previous session's auth token has expired, or you test from a different device that hasn't recently authenticated.
|
||||||
|
|
||||||
|
## Fix
|
||||||
|
|
||||||
|
**If ACL shows `check`:** Change to `accept` in the Tailscale admin console under Access Controls. Takes effect immediately — no server changes needed.
|
||||||
|
|
||||||
|
**If ACL already shows `accept`:** No action required. The prompt was a one-off Tailscale event (daemon restart, key refresh, etc.). Monitor for recurrence.
|
||||||
|
|
||||||
|
## Notes
|
||||||
|
|
||||||
|
- Port 2222 on **MajorRig** exists as a hard bypass for Tailscale SSH browser auth — regular SSH over Tailscale network, bypassing Tailscale SSH entirely. This is an alternative approach if `check` mode is required for compliance but browser auth is too disruptive.
|
||||||
|
- The `autogroup:self` destination means the rule applies when connecting from your own devices to your own devices — appropriate for a personal homelab fleet.
|
||||||
|
|
||||||
|
## Related
|
||||||
|
|
||||||
|
- [[Network Overview]] — Tailscale fleet inventory and SSH access model
|
||||||
|
- [[SSH-Aliases]] — Fleet SSH access shortcuts
|
||||||
@@ -40,6 +40,7 @@
|
|||||||
* [Apache Outage: Fail2ban Self-Ban + Missing iptables Rules](05-troubleshooting/networking/fail2ban-self-ban-apache-outage.md)
|
* [Apache Outage: Fail2ban Self-Ban + Missing iptables Rules](05-troubleshooting/networking/fail2ban-self-ban-apache-outage.md)
|
||||||
* [Mail Client Stops Receiving: Fail2ban IMAP Self-Ban](05-troubleshooting/networking/fail2ban-imap-self-ban-mail-client.md)
|
* [Mail Client Stops Receiving: Fail2ban IMAP Self-Ban](05-troubleshooting/networking/fail2ban-imap-self-ban-mail-client.md)
|
||||||
* [firewalld: Mail Ports Wiped After Reload](05-troubleshooting/networking/firewalld-mail-ports-reset.md)
|
* [firewalld: Mail Ports Wiped After Reload](05-troubleshooting/networking/firewalld-mail-ports-reset.md)
|
||||||
|
* [Tailscale SSH: Unexpected Re-Authentication Prompt](05-troubleshooting/networking/tailscale-ssh-reauth-prompt.md)
|
||||||
* [Docker & Caddy Recovery After Reboot (Fedora + SELinux)](05-troubleshooting/docker-caddy-selinux-post-reboot-recovery.md)
|
* [Docker & Caddy Recovery After Reboot (Fedora + SELinux)](05-troubleshooting/docker-caddy-selinux-post-reboot-recovery.md)
|
||||||
* [ISP SNI Filtering with Caddy](05-troubleshooting/isp-sni-filtering-caddy.md)
|
* [ISP SNI Filtering with Caddy](05-troubleshooting/isp-sni-filtering-caddy.md)
|
||||||
* [Obsidian Vault Recovery — Loading Cache Hang](05-troubleshooting/obsidian-cache-hang-recovery.md)
|
* [Obsidian Vault Recovery — Loading Cache Hang](05-troubleshooting/obsidian-cache-hang-recovery.md)
|
||||||
|
|||||||
Reference in New Issue
Block a user