From c4d3f8e9740cea1d237e4c3a7fe164a3e9daec62 Mon Sep 17 00:00:00 2001 From: MajorLinux Date: Sat, 21 Mar 2026 00:12:52 -0400 Subject: [PATCH] wiki: add Tailscale SSH reauth article; update Netdata Docker alarm tuning (50 articles) MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit - New: Tailscale SSH unexpected re-authentication prompt — diagnosis and fix - Updated: netdata-docker-health-alarm-tuning — add delay: up 3m to suppress Nextcloud AIO PHP-FPM ~90s startup false alerts; update settings table and notes - Updated: 05-troubleshooting/index.md and SUMMARY.md Co-Authored-By: Claude Sonnet 4.6 --- .../netdata-docker-health-alarm-tuning.md | 11 ++-- 05-troubleshooting/index.md | 1 + .../networking/tailscale-ssh-reauth-prompt.md | 66 +++++++++++++++++++ SUMMARY.md | 1 + 4 files changed, 74 insertions(+), 5 deletions(-) create mode 100644 05-troubleshooting/networking/tailscale-ssh-reauth-prompt.md diff --git a/02-selfhosting/monitoring/netdata-docker-health-alarm-tuning.md b/02-selfhosting/monitoring/netdata-docker-health-alarm-tuning.md index fe116ac..e232ed5 100644 --- a/02-selfhosting/monitoring/netdata-docker-health-alarm-tuning.md +++ b/02-selfhosting/monitoring/netdata-docker-health-alarm-tuning.md @@ -5,7 +5,7 @@ category: monitoring tags: [netdata, docker, nextcloud, alarms, health, monitoring] status: published created: 2026-03-18 -updated: 2026-03-18 +updated: 2026-03-21 --- # Tuning Netdata Docker Health Alarms to Prevent Update Flapping @@ -40,7 +40,7 @@ component: Docker every: 30s lookup: average -5m of unhealthy warn: $this > 0 - delay: down 5m multiplier 1.5 max 30m + delay: up 3m down 5m multiplier 1.5 max 30m summary: Docker container ${label:container_name} health info: ${label:container_name} docker container health status is unhealthy to: sysadmin @@ -49,10 +49,11 @@ component: Docker | Setting | Default | Tuned | Effect | |---|---|---|---| | `every` | 10s | 30s | Check less frequently | -| `lookup` | average -10s | average -5m | Must be unhealthy for sustained 5 minutes | -| `delay` | none | down 5m (max 30m) | Grace period after recovery before clearing | +| `lookup` | average -10s | average -5m | Smooths transient unhealthy samples over 5 minutes | +| `delay: up 3m` | none | 3m | Won't fire until unhealthy condition persists for 3 continuous minutes | +| `delay: down 5m` | none | 5m (max 30m) | Grace period after recovery before clearing | -A typical Nextcloud AIO update cycle (30–90 seconds of container restarts) won't sustain 5 minutes of unhealthy status, so no alert fires. A genuinely broken container will still be caught. +The `up` delay is the critical addition. Nextcloud AIO's `nextcloud-aio-nextcloud` container checks both PostgreSQL (port 5432) and PHP-FPM (port 9000). PHP-FPM takes ~90 seconds to warm up after a restart, causing 2–3 failing health checks before the container becomes healthy. With `delay: up 3m`, Netdata waits for 3 continuous minutes of unhealthy status before firing — absorbing the ~90 second startup window with margin to spare. A genuinely broken container will still trigger the alert. ## Applying the Config diff --git a/05-troubleshooting/index.md b/05-troubleshooting/index.md index 8de9d8e..d58a20a 100644 --- a/05-troubleshooting/index.md +++ b/05-troubleshooting/index.md @@ -9,6 +9,7 @@ Practical fixes for common Linux, networking, and application problems. - [Apache Outage: Fail2ban Self-Ban + Missing iptables Rules](networking/fail2ban-self-ban-apache-outage.md) - [Mail Client Stops Receiving: Fail2ban IMAP Self-Ban](networking/fail2ban-imap-self-ban-mail-client.md) - [firewalld: Mail Ports Wiped After Reload](networking/firewalld-mail-ports-reset.md) +- [Tailscale SSH: Unexpected Re-Authentication Prompt](networking/tailscale-ssh-reauth-prompt.md) - [ISP SNI Filtering & Caddy](isp-sni-filtering-caddy.md) - [yt-dlp YouTube JS Challenge Fix](yt-dlp-fedora-js-challenge.md) diff --git a/05-troubleshooting/networking/tailscale-ssh-reauth-prompt.md b/05-troubleshooting/networking/tailscale-ssh-reauth-prompt.md new file mode 100644 index 0000000..36937a0 --- /dev/null +++ b/05-troubleshooting/networking/tailscale-ssh-reauth-prompt.md @@ -0,0 +1,66 @@ +# Tailscale SSH: Unexpected Re-Authentication Prompt + +If a Tailscale SSH connection unexpectedly presents a browser authentication URL mid-session, the first instinct is to check the ACL policy. However, this is often a one-off Tailscale hiccup rather than a misconfiguration. + +## Symptoms + +- SSH connection to a fleet node displays a Tailscale auth URL: + ``` + To authenticate, visit: https://login.tailscale.com/a/xxxxxxxx + ``` +- The prompt appears even though the node worked fine previously +- Other nodes in the fleet connect without prompting + +## What Causes It + +Tailscale SSH supports two ACL `action` values: + +| Action | Behavior | +|---|---| +| `accept` | Trusts Tailscale identity — no additional auth required | +| `check` | Requires periodic browser-based re-authentication | + +If `action: "check"` is set, every session (or after token expiry) will prompt for browser auth. However, even with `action: "accept"`, a one-off prompt can appear due to a Tailscale daemon glitch or key refresh event. + +## How to Diagnose + +### 1. Verify the ACL policy + +In the Tailscale admin console (or via `tailscale debug acl`), inspect the SSH rules. For a trusted homelab fleet, the rule should use `accept`: + +```json +{ + "src": ["autogroup:member"], + "dst": ["autogroup:self"], + "users": ["autogroup:nonroot", "root"], + "action": "accept", +} +``` + +If `action` is `check`, that is the root cause — change it to `accept` for trusted source/destination pairs. + +### 2. Confirm it was a one-off + +If the ACL already shows `accept`, the prompt was transient. Test with: + +```bash +ssh "echo ok" +``` + +No auth prompt + `ok` output = resolved. Note that this test is only meaningful if the previous session's auth token has expired, or you test from a different device that hasn't recently authenticated. + +## Fix + +**If ACL shows `check`:** Change to `accept` in the Tailscale admin console under Access Controls. Takes effect immediately — no server changes needed. + +**If ACL already shows `accept`:** No action required. The prompt was a one-off Tailscale event (daemon restart, key refresh, etc.). Monitor for recurrence. + +## Notes + +- Port 2222 on **MajorRig** exists as a hard bypass for Tailscale SSH browser auth — regular SSH over Tailscale network, bypassing Tailscale SSH entirely. This is an alternative approach if `check` mode is required for compliance but browser auth is too disruptive. +- The `autogroup:self` destination means the rule applies when connecting from your own devices to your own devices — appropriate for a personal homelab fleet. + +## Related + +- [[Network Overview]] — Tailscale fleet inventory and SSH access model +- [[SSH-Aliases]] — Fleet SSH access shortcuts diff --git a/SUMMARY.md b/SUMMARY.md index 81ea689..5ccb815 100644 --- a/SUMMARY.md +++ b/SUMMARY.md @@ -40,6 +40,7 @@ * [Apache Outage: Fail2ban Self-Ban + Missing iptables Rules](05-troubleshooting/networking/fail2ban-self-ban-apache-outage.md) * [Mail Client Stops Receiving: Fail2ban IMAP Self-Ban](05-troubleshooting/networking/fail2ban-imap-self-ban-mail-client.md) * [firewalld: Mail Ports Wiped After Reload](05-troubleshooting/networking/firewalld-mail-ports-reset.md) + * [Tailscale SSH: Unexpected Re-Authentication Prompt](05-troubleshooting/networking/tailscale-ssh-reauth-prompt.md) * [Docker & Caddy Recovery After Reboot (Fedora + SELinux)](05-troubleshooting/docker-caddy-selinux-post-reboot-recovery.md) * [ISP SNI Filtering with Caddy](05-troubleshooting/isp-sni-filtering-caddy.md) * [Obsidian Vault Recovery — Loading Cache Hang](05-troubleshooting/obsidian-cache-hang-recovery.md)