Merge cowork/majorair/wiki-updates-may02 — fail2ban digest + netdata docker health + 3 new articles

This commit is contained in:
Marcus Summers 2026-05-02 16:28:48 -04:00
commit 021c7f6539
18 changed files with 73 additions and 35 deletions

View file

@ -10,7 +10,7 @@ tags:
- majorrig - majorrig
status: published status: published
created: 2026-03-16 created: 2026-03-16
updated: 2026-04-29T22:45 updated: 2026-04-30T05:21
--- ---
# WSL2 Backup via PowerShell Scheduled Task # WSL2 Backup via PowerShell Scheduled Task

View file

@ -10,7 +10,7 @@ tags:
- remote-access - remote-access
status: published status: published
created: 2026-03-08 created: 2026-03-08
updated: 2026-04-22T09:20 updated: 2026-04-30T05:21
--- ---
# SSH Config and Key Management # SSH Config and Key Management

View file

@ -7,7 +7,7 @@ tags:
- asus - asus
- ssh - ssh
created: 2026-04-19 created: 2026-04-19
updated: 2026-04-29T22:45 updated: 2026-04-30T05:21
--- ---
# Wake-on-LAN via Router SSH # Wake-on-LAN via Router SSH

View file

@ -1,6 +1,6 @@
--- ---
created: 2026-04-13T10:15 created: 2026-04-13T10:15
updated: 2026-04-29T22:45 updated: 2026-04-30T05:21
--- ---
# 🏠 Self-Hosting & Homelab # 🏠 Self-Hosting & Homelab

View file

@ -1,11 +1,17 @@
--- ---
title: "Tuning Netdata Docker Health Alarms to Prevent Update Flapping" title: Tuning Netdata Docker Health Alarms to Prevent Update Flapping
domain: selfhosting domain: selfhosting
category: monitoring category: monitoring
tags: [netdata, docker, nextcloud, alarms, health, monitoring] tags:
- netdata
- docker
- nextcloud
- alarms
- health
- monitoring
status: published status: published
created: 2026-03-18 created: 2026-03-18
updated: 2026-03-28 updated: 2026-05-02T11:04
--- ---
# Tuning Netdata Docker Health Alarms to Prevent Update Flapping # Tuning Netdata Docker Health Alarms to Prevent Update Flapping
@ -61,9 +67,9 @@ chart labels: container_name=!nextcloud-aio-nextcloud *
### Dedicated Nextcloud AIO Alarm ### Dedicated Nextcloud AIO Alarm
Added 2026-03-23, updated 2026-03-28. The `nextcloud-aio-nextcloud` container needs a more lenient window than other containers. Its healthcheck (`/healthcheck.sh`) verifies PostgreSQL connectivity (port 5432) and PHP-FPM (port 9000). PHP-FPM takes ~90 seconds to warm up after a normal restart — but during nightly AIO update cycles, the full startup (occ upgrade, app updates, migrations) can take 5+ minutes. On 2026-03-27, a startup hung and left the container unhealthy for 20 hours until the next nightly cycle replaced it. Added 2026-03-23, updated 2026-05-02. The `nextcloud-aio-nextcloud` container needs a more lenient window than other containers. Its healthcheck (`/healthcheck.sh`) verifies PostgreSQL connectivity (port 5432) and PHP-FPM (port 9000). PHP-FPM takes ~90 seconds to warm up after a normal restart — but during nightly AIO update cycles, the full startup (occ upgrade, app updates, migrations) can take 5+ minutes. On 2026-03-27, a startup hung and left the container unhealthy for 20 hours until the next nightly cycle replaced it.
The dedicated alarm uses a 10-minute lookup window and 10-minute delay to absorb normal startup, while still catching sustained failures: The dedicated alarm uses a 30-minute lookup window and 10-minute delay to absorb normal startup and update cycles (~40 minutes total grace), while still catching sustained failures:
```ini ```ini
# Dedicated alarm for nextcloud-aio-nextcloud — lenient window to absorb nightly update cycle # Dedicated alarm for nextcloud-aio-nextcloud — lenient window to absorb nightly update cycle
@ -76,15 +82,23 @@ template: docker_nextcloud_unhealthy
component: Docker component: Docker
units: status units: status
every: 30s every: 30s
lookup: average -10m of unhealthy lookup: average -30m of unhealthy
chart labels: container_name=nextcloud-aio-nextcloud chart labels: container_name=nextcloud-aio-nextcloud
warn: $this > 0 warn: $this >= 1
delay: up 10m down 5m multiplier 1.5 max 30m delay: up 10m down 5m multiplier 1.5 max 30m
summary: Nextcloud container health sustained summary: Nextcloud container health sustained
info: nextcloud-aio-nextcloud has been unhealthy for a sustained period — not a transient update blip info: nextcloud-aio-nextcloud has been continuously unhealthy for 30+ minutes — not a transient update blip
to: sysadmin to: sysadmin
``` ```
**Tuning history:**
| Date | Lookup | Delay | Trigger | Notes |
|---|---|---|---|---|
| 2026-03-23 | 35m | 35m | Initial split from general alarm | Absorbed PHP-FPM warm-up |
| 2026-04-29 | 15m | 5m | Backup blip (~6m) never triggered | Tightened after stability |
| 2026-05-02 | 30m | 10m | 15m still too aggressive for update cycles | ~40m total grace; catches real outages |
## Watchdog Cron: Auto-Restart on Sustained Unhealthy ## Watchdog Cron: Auto-Restart on Sustained Unhealthy
If the Nextcloud container stays unhealthy for more than 1 hour (well past any normal startup window), a cron watchdog on majorlab auto-restarts it and logs the event. This was added 2026-03-28 after an incident where the container sat unhealthy for 20 hours until the next nightly backup cycle replaced it. If the Nextcloud container stays unhealthy for more than 1 hour (well past any normal startup window), a cron watchdog on majorlab auto-restarts it and logs the event. This was added 2026-03-28 after an incident where the container sat unhealthy for 20 hours until the next nightly backup cycle replaced it.

View file

@ -11,7 +11,7 @@ tags:
- cron - cron
status: published status: published
created: 2026-04-18 created: 2026-04-18
updated: 2026-04-18T11:13 updated: 2026-04-30T05:21
--- ---
# ClamAV Fleet Deployment with Ansible # ClamAV Fleet Deployment with Ansible

View file

@ -1,11 +1,18 @@
--- ---
title: "Fail2Ban Digest Mode — Fleet-Wide Quiet Alerts" title: Fail2Ban Digest Mode — Fleet-Wide Quiet Alerts
domain: selfhosting domain: selfhosting
category: security category: security
tags: [fail2ban, security, email, ansible, fleet, cron, digest] tags:
- fail2ban
- security
- email
- ansible
- fleet
- cron
- digest
status: published status: published
created: 2026-04-22 created: 2026-04-22
updated: 2026-04-22 updated: 2026-05-02T14:56
--- ---
# Fail2Ban Digest Mode — Fleet-Wide Quiet Alerts # Fail2Ban Digest Mode — Fleet-Wide Quiet Alerts
@ -21,11 +28,11 @@ Three tiers replace the firehose:
| Tier | Jails | Action | Why | | Tier | Jails | Action | Why |
|------|-------|--------|-----| |------|-------|--------|-----|
| **Immediate email** | `sshd`, `recidive` | `action_mwl` | Security-critical — someone is actively targeting auth or is a repeat offender | | **Immediate email** | `recidive` | `action_mwl` | Repeat offenders only — someone has been banned multiple times across jails |
| **Silent ban** | Everything else | `action_` (default) | Ban happens, firewall rule applied, no email sent | | **Silent ban** | Everything else | `action_` (default) | Ban happens, firewall rule applied, no email sent |
| **Daily digest** | All jails | Cron script at 08:00 UTC | One summary email per host with ban counts across all jails | | **Daily digest** | All jails | Cron script at 08:00 UTC | One summary email per host with ban counts across all jails |
This reduces email volume from hundreds per day to ~10 (one digest per host + occasional sshd/recidive alerts). This reduces email volume from hundreds per day to ~10 (one digest per host + occasional recidive alerts).
## jail.local Configuration ## jail.local Configuration
@ -40,18 +47,20 @@ action = %(action_)s
This overrides the stock `action_mwl` for all jails. Bans still happen — the firewall rule is applied — but no email is sent. This overrides the stock `action_mwl` for all jails. Bans still happen — the firewall rule is applied — but no email is sent.
### Keep immediate alerts for critical jails ### Keep immediate alerts for recidive only
```ini ```ini
[sshd] [sshd]
enabled = true enabled = true
action = %(action_mwl)s action = %(action_)s
[recidive] [recidive]
enabled = true enabled = true
action = %(action_mwl)s action = %(action_mwl)s
``` ```
> **Updated 2026-05-02:** sshd was moved to silent (`action_`). Only recidive (repeat offenders) now triggers immediate email. sshd bans are captured in the daily digest.
### Clean up email subjects with fq-hostname ### Clean up email subjects with fq-hostname
By default, fail2ban uses the system FQDN in email subjects. On Tailscale hosts, this produces ugly subjects like `[Fail2Ban] sshd: banned 1.2.3.4 on MajorToot.tail7f2d9.ts.net`. Override it in `[DEFAULT]`: By default, fail2ban uses the system FQDN in email subjects. On Tailscale hosts, this produces ugly subjects like `[Fail2Ban] sshd: banned 1.2.3.4 on MajorToot.tail7f2d9.ts.net`. Override it in `[DEFAULT]`:
@ -91,8 +100,9 @@ The playbook `configure_fail2ban_digest.yml` deploys the full digest model fleet
### What it does ### What it does
1. Deploys a Python helper script that performs **section-aware editing** of `jail.local` (see gotchas below) 1. Deploys a Python helper script that performs **section-aware editing** of `jail.local` (see gotchas below)
2. Sets `action = %(action_)s` in `[DEFAULT]` 2. Sets `action = %(action_)s` in `[DEFAULT]` and `[sshd]`
3. Sets `action = %(action_mwl)s` in `[sshd]` and `[recidive]` 3. Sets `action = %(action_mwl)s` in `[recidive]`
4. Removes stale `action = %(action_mwl)s` from `defaults-debian.conf` if present
4. Sets `fq-hostname` per host using an override dict 4. Sets `fq-hostname` per host using an override dict
5. Deploys the digest script from a Jinja2 template 5. Deploys the digest script from a Jinja2 template
6. Creates the cron job via `ansible.builtin.cron` 6. Creates the cron job via `ansible.builtin.cron`
@ -143,6 +153,14 @@ option 'action' in section 'DEFAULT' already exists
The Python editor script handles this by replacing existing keys rather than appending. The Python editor script handles this by replacing existing keys rather than appending.
### defaults-debian.conf overrides jail.local
On Debian/Ubuntu, `/etc/fail2ban/jail.d/defaults-debian.conf` is loaded **after** `jail.local`. If it contains `action = %(action_mwl)s`, it silently overrides your silent default — every jail sends email on every ban. The Ansible playbook now removes this line automatically. If you see per-ban emails after deploying digest mode, check this file first:
```bash
grep action /etc/fail2ban/jail.d/defaults-debian.conf
```
### fq-hostname scope ### fq-hostname scope
Setting `fq-hostname` in `[DEFAULT]` affects all action templates that use the `<fq-hostname>` tag — including both immediate emails and the digest subject. This is the desired behavior, but be aware that it overrides the system hostname globally within fail2ban. Setting `fq-hostname` in `[DEFAULT]` affects all action templates that use the `<fq-hostname>` tag — including both immediate emails and the digest subject. This is the desired behavior, but be aware that it overrides the system hostname globally within fail2ban.

View file

@ -10,7 +10,7 @@ tags:
- docker - docker
status: published status: published
created: 2026-04-02 created: 2026-04-02
updated: 2026-04-29T22:45 updated: 2026-04-30T05:21
--- ---
# Mastodon Instance Tuning # Mastodon Instance Tuning

View file

@ -11,7 +11,7 @@ tags:
- troubleshooting - troubleshooting
status: published status: published
created: 2026-04-18 created: 2026-04-18
updated: 2026-04-29T22:45 updated: 2026-04-30T05:21
--- ---
# Ansible Check Mode False Positives in Verify/Assert Tasks # Ansible Check Mode False Positives in Verify/Assert Tasks

View file

@ -1,6 +1,6 @@
--- ---
created: 2026-03-15T06:37 created: 2026-03-15T06:37
updated: 2026-04-29T23:55 updated: 2026-04-30T10:41
--- ---
# 🔧 General Troubleshooting # 🔧 General Troubleshooting

View file

@ -1,11 +1,17 @@
--- ---
title: "ISP SNI Filtering & Caddy Troubleshooting" title: ISP SNI Filtering & Caddy Troubleshooting
domain: troubleshooting domain: troubleshooting
category: general category: general
tags: [isp, sni, caddy, tls, dns, cloudflare] tags:
- isp
- sni
- caddy
- tls
- dns
- cloudflare
status: published status: published
created: 2026-04-02 created: 2026-04-02
updated: 2026-04-30 updated: 2026-04-30T13:07
--- ---
# ISP SNI Filtering & Caddy Troubleshooting # ISP SNI Filtering & Caddy Troubleshooting

View file

@ -11,7 +11,7 @@ tags:
- powershell - powershell
status: published status: published
created: 2026-04-03 created: 2026-04-03
updated: 2026-04-22T09:20 updated: 2026-04-30T05:21
--- ---
# Windows OpenSSH: WSL as Default Shell Breaks Remote Commands # Windows OpenSSH: WSL as Default Shell Breaks Remote Commands

View file

@ -10,7 +10,7 @@ tags:
- majorrig - majorrig
status: published status: published
created: 2026-04-02 created: 2026-04-02
updated: 2026-04-22T09:20 updated: 2026-04-30T05:21
--- ---
# Windows OpenSSH Server (sshd) Stops After Reboot # Windows OpenSSH Server (sshd) Stops After Reboot

View file

@ -10,7 +10,7 @@ tags:
- deno - deno
status: published status: published
created: 2026-04-02 created: 2026-04-02
updated: 2026-04-22T11:33 updated: 2026-04-30T05:21
--- ---
# yt-dlp YouTube JS Challenge Fix (Fedora) # yt-dlp YouTube JS Challenge Fix (Fedora)

View file

@ -2,7 +2,7 @@
title: MajorWiki Deployment Status title: MajorWiki Deployment Status
status: deployed status: deployed
project: MajorTwin project: MajorTwin
updated: 2026-04-07T10:48 updated: 2026-04-30T05:30
created: 2026-04-02T16:10 created: 2026-04-02T16:10
--- ---

View file

@ -1,6 +1,6 @@
--- ---
created: 2026-04-06T09:52 created: 2026-04-06T09:52
updated: 2026-04-29T22:46 updated: 2026-04-30T05:21
--- ---
# MajorLinux Tech Wiki — Index # MajorLinux Tech Wiki — Index

View file

@ -1,6 +1,6 @@
--- ---
created: 2026-04-02T16:03 created: 2026-04-02T16:03
updated: 2026-04-29T23:55 updated: 2026-04-30T11:24
--- ---
* [Home](index.md) * [Home](index.md)
* [Linux & Sysadmin](01-linux/index.md) * [Linux & Sysadmin](01-linux/index.md)

View file

@ -1,6 +1,6 @@
--- ---
created: 2026-04-06T09:52 created: 2026-04-06T09:52
updated: 2026-04-29T22:45 updated: 2026-04-30T05:21
--- ---
# MajorLinux Tech Wiki — Index # MajorLinux Tech Wiki — Index