Merge cowork/majorair/wiki-updates-may02 — fail2ban digest + netdata docker health + 3 new articles
This commit is contained in:
commit
021c7f6539
18 changed files with 73 additions and 35 deletions
|
|
@ -10,7 +10,7 @@ tags:
|
|||
- majorrig
|
||||
status: published
|
||||
created: 2026-03-16
|
||||
updated: 2026-04-29T22:45
|
||||
updated: 2026-04-30T05:21
|
||||
---
|
||||
|
||||
# WSL2 Backup via PowerShell Scheduled Task
|
||||
|
|
|
|||
|
|
@ -10,7 +10,7 @@ tags:
|
|||
- remote-access
|
||||
status: published
|
||||
created: 2026-03-08
|
||||
updated: 2026-04-22T09:20
|
||||
updated: 2026-04-30T05:21
|
||||
---
|
||||
|
||||
# SSH Config and Key Management
|
||||
|
|
|
|||
|
|
@ -7,7 +7,7 @@ tags:
|
|||
- asus
|
||||
- ssh
|
||||
created: 2026-04-19
|
||||
updated: 2026-04-29T22:45
|
||||
updated: 2026-04-30T05:21
|
||||
---
|
||||
|
||||
# Wake-on-LAN via Router SSH
|
||||
|
|
|
|||
|
|
@ -1,6 +1,6 @@
|
|||
---
|
||||
created: 2026-04-13T10:15
|
||||
updated: 2026-04-29T22:45
|
||||
updated: 2026-04-30T05:21
|
||||
---
|
||||
# 🏠 Self-Hosting & Homelab
|
||||
|
||||
|
|
|
|||
|
|
@ -1,11 +1,17 @@
|
|||
---
|
||||
title: "Tuning Netdata Docker Health Alarms to Prevent Update Flapping"
|
||||
title: Tuning Netdata Docker Health Alarms to Prevent Update Flapping
|
||||
domain: selfhosting
|
||||
category: monitoring
|
||||
tags: [netdata, docker, nextcloud, alarms, health, monitoring]
|
||||
tags:
|
||||
- netdata
|
||||
- docker
|
||||
- nextcloud
|
||||
- alarms
|
||||
- health
|
||||
- monitoring
|
||||
status: published
|
||||
created: 2026-03-18
|
||||
updated: 2026-03-28
|
||||
updated: 2026-05-02T11:04
|
||||
---
|
||||
|
||||
# Tuning Netdata Docker Health Alarms to Prevent Update Flapping
|
||||
|
|
@ -61,9 +67,9 @@ chart labels: container_name=!nextcloud-aio-nextcloud *
|
|||
|
||||
### Dedicated Nextcloud AIO Alarm
|
||||
|
||||
Added 2026-03-23, updated 2026-03-28. The `nextcloud-aio-nextcloud` container needs a more lenient window than other containers. Its healthcheck (`/healthcheck.sh`) verifies PostgreSQL connectivity (port 5432) and PHP-FPM (port 9000). PHP-FPM takes ~90 seconds to warm up after a normal restart — but during nightly AIO update cycles, the full startup (occ upgrade, app updates, migrations) can take 5+ minutes. On 2026-03-27, a startup hung and left the container unhealthy for 20 hours until the next nightly cycle replaced it.
|
||||
Added 2026-03-23, updated 2026-05-02. The `nextcloud-aio-nextcloud` container needs a more lenient window than other containers. Its healthcheck (`/healthcheck.sh`) verifies PostgreSQL connectivity (port 5432) and PHP-FPM (port 9000). PHP-FPM takes ~90 seconds to warm up after a normal restart — but during nightly AIO update cycles, the full startup (occ upgrade, app updates, migrations) can take 5+ minutes. On 2026-03-27, a startup hung and left the container unhealthy for 20 hours until the next nightly cycle replaced it.
|
||||
|
||||
The dedicated alarm uses a 10-minute lookup window and 10-minute delay to absorb normal startup, while still catching sustained failures:
|
||||
The dedicated alarm uses a 30-minute lookup window and 10-minute delay to absorb normal startup and update cycles (~40 minutes total grace), while still catching sustained failures:
|
||||
|
||||
```ini
|
||||
# Dedicated alarm for nextcloud-aio-nextcloud — lenient window to absorb nightly update cycle
|
||||
|
|
@ -76,15 +82,23 @@ template: docker_nextcloud_unhealthy
|
|||
component: Docker
|
||||
units: status
|
||||
every: 30s
|
||||
lookup: average -10m of unhealthy
|
||||
lookup: average -30m of unhealthy
|
||||
chart labels: container_name=nextcloud-aio-nextcloud
|
||||
warn: $this > 0
|
||||
warn: $this >= 1
|
||||
delay: up 10m down 5m multiplier 1.5 max 30m
|
||||
summary: Nextcloud container health sustained
|
||||
info: nextcloud-aio-nextcloud has been unhealthy for a sustained period — not a transient update blip
|
||||
info: nextcloud-aio-nextcloud has been continuously unhealthy for 30+ minutes — not a transient update blip
|
||||
to: sysadmin
|
||||
```
|
||||
|
||||
**Tuning history:**
|
||||
|
||||
| Date | Lookup | Delay | Trigger | Notes |
|
||||
|---|---|---|---|---|
|
||||
| 2026-03-23 | 35m | 35m | Initial split from general alarm | Absorbed PHP-FPM warm-up |
|
||||
| 2026-04-29 | 15m | 5m | Backup blip (~6m) never triggered | Tightened after stability |
|
||||
| 2026-05-02 | 30m | 10m | 15m still too aggressive for update cycles | ~40m total grace; catches real outages |
|
||||
|
||||
## Watchdog Cron: Auto-Restart on Sustained Unhealthy
|
||||
|
||||
If the Nextcloud container stays unhealthy for more than 1 hour (well past any normal startup window), a cron watchdog on majorlab auto-restarts it and logs the event. This was added 2026-03-28 after an incident where the container sat unhealthy for 20 hours until the next nightly backup cycle replaced it.
|
||||
|
|
|
|||
|
|
@ -11,7 +11,7 @@ tags:
|
|||
- cron
|
||||
status: published
|
||||
created: 2026-04-18
|
||||
updated: 2026-04-18T11:13
|
||||
updated: 2026-04-30T05:21
|
||||
---
|
||||
# ClamAV Fleet Deployment with Ansible
|
||||
|
||||
|
|
|
|||
|
|
@ -1,11 +1,18 @@
|
|||
---
|
||||
title: "Fail2Ban Digest Mode — Fleet-Wide Quiet Alerts"
|
||||
title: Fail2Ban Digest Mode — Fleet-Wide Quiet Alerts
|
||||
domain: selfhosting
|
||||
category: security
|
||||
tags: [fail2ban, security, email, ansible, fleet, cron, digest]
|
||||
tags:
|
||||
- fail2ban
|
||||
- security
|
||||
- email
|
||||
- ansible
|
||||
- fleet
|
||||
- cron
|
||||
- digest
|
||||
status: published
|
||||
created: 2026-04-22
|
||||
updated: 2026-04-22
|
||||
updated: 2026-05-02T14:56
|
||||
---
|
||||
# Fail2Ban Digest Mode — Fleet-Wide Quiet Alerts
|
||||
|
||||
|
|
@ -21,11 +28,11 @@ Three tiers replace the firehose:
|
|||
|
||||
| Tier | Jails | Action | Why |
|
||||
|------|-------|--------|-----|
|
||||
| **Immediate email** | `sshd`, `recidive` | `action_mwl` | Security-critical — someone is actively targeting auth or is a repeat offender |
|
||||
| **Immediate email** | `recidive` | `action_mwl` | Repeat offenders only — someone has been banned multiple times across jails |
|
||||
| **Silent ban** | Everything else | `action_` (default) | Ban happens, firewall rule applied, no email sent |
|
||||
| **Daily digest** | All jails | Cron script at 08:00 UTC | One summary email per host with ban counts across all jails |
|
||||
|
||||
This reduces email volume from hundreds per day to ~10 (one digest per host + occasional sshd/recidive alerts).
|
||||
This reduces email volume from hundreds per day to ~10 (one digest per host + occasional recidive alerts).
|
||||
|
||||
## jail.local Configuration
|
||||
|
||||
|
|
@ -40,18 +47,20 @@ action = %(action_)s
|
|||
|
||||
This overrides the stock `action_mwl` for all jails. Bans still happen — the firewall rule is applied — but no email is sent.
|
||||
|
||||
### Keep immediate alerts for critical jails
|
||||
### Keep immediate alerts for recidive only
|
||||
|
||||
```ini
|
||||
[sshd]
|
||||
enabled = true
|
||||
action = %(action_mwl)s
|
||||
action = %(action_)s
|
||||
|
||||
[recidive]
|
||||
enabled = true
|
||||
action = %(action_mwl)s
|
||||
```
|
||||
|
||||
> **Updated 2026-05-02:** sshd was moved to silent (`action_`). Only recidive (repeat offenders) now triggers immediate email. sshd bans are captured in the daily digest.
|
||||
|
||||
### Clean up email subjects with fq-hostname
|
||||
|
||||
By default, fail2ban uses the system FQDN in email subjects. On Tailscale hosts, this produces ugly subjects like `[Fail2Ban] sshd: banned 1.2.3.4 on MajorToot.tail7f2d9.ts.net`. Override it in `[DEFAULT]`:
|
||||
|
|
@ -91,8 +100,9 @@ The playbook `configure_fail2ban_digest.yml` deploys the full digest model fleet
|
|||
### What it does
|
||||
|
||||
1. Deploys a Python helper script that performs **section-aware editing** of `jail.local` (see gotchas below)
|
||||
2. Sets `action = %(action_)s` in `[DEFAULT]`
|
||||
3. Sets `action = %(action_mwl)s` in `[sshd]` and `[recidive]`
|
||||
2. Sets `action = %(action_)s` in `[DEFAULT]` and `[sshd]`
|
||||
3. Sets `action = %(action_mwl)s` in `[recidive]`
|
||||
4. Removes stale `action = %(action_mwl)s` from `defaults-debian.conf` if present
|
||||
4. Sets `fq-hostname` per host using an override dict
|
||||
5. Deploys the digest script from a Jinja2 template
|
||||
6. Creates the cron job via `ansible.builtin.cron`
|
||||
|
|
@ -143,6 +153,14 @@ option 'action' in section 'DEFAULT' already exists
|
|||
|
||||
The Python editor script handles this by replacing existing keys rather than appending.
|
||||
|
||||
### defaults-debian.conf overrides jail.local
|
||||
|
||||
On Debian/Ubuntu, `/etc/fail2ban/jail.d/defaults-debian.conf` is loaded **after** `jail.local`. If it contains `action = %(action_mwl)s`, it silently overrides your silent default — every jail sends email on every ban. The Ansible playbook now removes this line automatically. If you see per-ban emails after deploying digest mode, check this file first:
|
||||
|
||||
```bash
|
||||
grep action /etc/fail2ban/jail.d/defaults-debian.conf
|
||||
```
|
||||
|
||||
### fq-hostname scope
|
||||
|
||||
Setting `fq-hostname` in `[DEFAULT]` affects all action templates that use the `<fq-hostname>` tag — including both immediate emails and the digest subject. This is the desired behavior, but be aware that it overrides the system hostname globally within fail2ban.
|
||||
|
|
|
|||
|
|
@ -10,7 +10,7 @@ tags:
|
|||
- docker
|
||||
status: published
|
||||
created: 2026-04-02
|
||||
updated: 2026-04-29T22:45
|
||||
updated: 2026-04-30T05:21
|
||||
---
|
||||
|
||||
# Mastodon Instance Tuning
|
||||
|
|
|
|||
|
|
@ -11,7 +11,7 @@ tags:
|
|||
- troubleshooting
|
||||
status: published
|
||||
created: 2026-04-18
|
||||
updated: 2026-04-29T22:45
|
||||
updated: 2026-04-30T05:21
|
||||
---
|
||||
# Ansible Check Mode False Positives in Verify/Assert Tasks
|
||||
|
||||
|
|
|
|||
|
|
@ -1,6 +1,6 @@
|
|||
---
|
||||
created: 2026-03-15T06:37
|
||||
updated: 2026-04-29T23:55
|
||||
updated: 2026-04-30T10:41
|
||||
---
|
||||
# 🔧 General Troubleshooting
|
||||
|
||||
|
|
|
|||
|
|
@ -1,11 +1,17 @@
|
|||
---
|
||||
title: "ISP SNI Filtering & Caddy Troubleshooting"
|
||||
title: ISP SNI Filtering & Caddy Troubleshooting
|
||||
domain: troubleshooting
|
||||
category: general
|
||||
tags: [isp, sni, caddy, tls, dns, cloudflare]
|
||||
tags:
|
||||
- isp
|
||||
- sni
|
||||
- caddy
|
||||
- tls
|
||||
- dns
|
||||
- cloudflare
|
||||
status: published
|
||||
created: 2026-04-02
|
||||
updated: 2026-04-30
|
||||
updated: 2026-04-30T13:07
|
||||
---
|
||||
# ISP SNI Filtering & Caddy Troubleshooting
|
||||
|
||||
|
|
|
|||
|
|
@ -11,7 +11,7 @@ tags:
|
|||
- powershell
|
||||
status: published
|
||||
created: 2026-04-03
|
||||
updated: 2026-04-22T09:20
|
||||
updated: 2026-04-30T05:21
|
||||
---
|
||||
|
||||
# Windows OpenSSH: WSL as Default Shell Breaks Remote Commands
|
||||
|
|
|
|||
|
|
@ -10,7 +10,7 @@ tags:
|
|||
- majorrig
|
||||
status: published
|
||||
created: 2026-04-02
|
||||
updated: 2026-04-22T09:20
|
||||
updated: 2026-04-30T05:21
|
||||
---
|
||||
# Windows OpenSSH Server (sshd) Stops After Reboot
|
||||
|
||||
|
|
|
|||
|
|
@ -10,7 +10,7 @@ tags:
|
|||
- deno
|
||||
status: published
|
||||
created: 2026-04-02
|
||||
updated: 2026-04-22T11:33
|
||||
updated: 2026-04-30T05:21
|
||||
---
|
||||
# yt-dlp YouTube JS Challenge Fix (Fedora)
|
||||
|
||||
|
|
|
|||
|
|
@ -2,7 +2,7 @@
|
|||
title: MajorWiki Deployment Status
|
||||
status: deployed
|
||||
project: MajorTwin
|
||||
updated: 2026-04-07T10:48
|
||||
updated: 2026-04-30T05:30
|
||||
created: 2026-04-02T16:10
|
||||
---
|
||||
|
||||
|
|
|
|||
|
|
@ -1,6 +1,6 @@
|
|||
---
|
||||
created: 2026-04-06T09:52
|
||||
updated: 2026-04-29T22:46
|
||||
updated: 2026-04-30T05:21
|
||||
---
|
||||
# MajorLinux Tech Wiki — Index
|
||||
|
||||
|
|
|
|||
|
|
@ -1,6 +1,6 @@
|
|||
---
|
||||
created: 2026-04-02T16:03
|
||||
updated: 2026-04-29T23:55
|
||||
updated: 2026-04-30T11:24
|
||||
---
|
||||
* [Home](index.md)
|
||||
* [Linux & Sysadmin](01-linux/index.md)
|
||||
|
|
|
|||
2
index.md
2
index.md
|
|
@ -1,6 +1,6 @@
|
|||
---
|
||||
created: 2026-04-06T09:52
|
||||
updated: 2026-04-29T22:45
|
||||
updated: 2026-04-30T05:21
|
||||
---
|
||||
# MajorLinux Tech Wiki — Index
|
||||
|
||||
|
|
|
|||
Loading…
Add table
Reference in a new issue