diff --git a/01-linux/distro-specific/wsl2-backup-powershell.md b/01-linux/distro-specific/wsl2-backup-powershell.md index 1fc7e27..04a4e4c 100644 --- a/01-linux/distro-specific/wsl2-backup-powershell.md +++ b/01-linux/distro-specific/wsl2-backup-powershell.md @@ -10,7 +10,7 @@ tags: - majorrig status: published created: 2026-03-16 -updated: 2026-04-29T22:45 +updated: 2026-04-30T05:21 --- # WSL2 Backup via PowerShell Scheduled Task diff --git a/01-linux/networking/ssh-config-key-management.md b/01-linux/networking/ssh-config-key-management.md index 2bfedbf..86919fa 100644 --- a/01-linux/networking/ssh-config-key-management.md +++ b/01-linux/networking/ssh-config-key-management.md @@ -10,7 +10,7 @@ tags: - remote-access status: published created: 2026-03-08 -updated: 2026-04-22T09:20 +updated: 2026-04-30T05:21 --- # SSH Config and Key Management diff --git a/02-selfhosting/dns-networking/wake-on-lan-router-ssh.md b/02-selfhosting/dns-networking/wake-on-lan-router-ssh.md index 0d3731e..f3a125e 100644 --- a/02-selfhosting/dns-networking/wake-on-lan-router-ssh.md +++ b/02-selfhosting/dns-networking/wake-on-lan-router-ssh.md @@ -7,7 +7,7 @@ tags: - asus - ssh created: 2026-04-19 -updated: 2026-04-29T22:45 +updated: 2026-04-30T05:21 --- # Wake-on-LAN via Router SSH diff --git a/02-selfhosting/index.md b/02-selfhosting/index.md index ca9a4c4..362544e 100644 --- a/02-selfhosting/index.md +++ b/02-selfhosting/index.md @@ -1,6 +1,6 @@ --- created: 2026-04-13T10:15 -updated: 2026-04-29T22:45 +updated: 2026-04-30T05:21 --- # 🏠 Self-Hosting & Homelab diff --git a/02-selfhosting/monitoring/netdata-docker-health-alarm-tuning.md b/02-selfhosting/monitoring/netdata-docker-health-alarm-tuning.md index b5fc1cf..5b6b395 100644 --- a/02-selfhosting/monitoring/netdata-docker-health-alarm-tuning.md +++ b/02-selfhosting/monitoring/netdata-docker-health-alarm-tuning.md @@ -1,11 +1,17 @@ --- -title: "Tuning Netdata Docker Health Alarms to Prevent Update Flapping" +title: Tuning Netdata Docker Health Alarms to Prevent Update Flapping domain: selfhosting category: monitoring -tags: [netdata, docker, nextcloud, alarms, health, monitoring] +tags: + - netdata + - docker + - nextcloud + - alarms + - health + - monitoring status: published created: 2026-03-18 -updated: 2026-03-28 +updated: 2026-05-02T11:04 --- # Tuning Netdata Docker Health Alarms to Prevent Update Flapping @@ -61,9 +67,9 @@ chart labels: container_name=!nextcloud-aio-nextcloud * ### Dedicated Nextcloud AIO Alarm -Added 2026-03-23, updated 2026-03-28. The `nextcloud-aio-nextcloud` container needs a more lenient window than other containers. Its healthcheck (`/healthcheck.sh`) verifies PostgreSQL connectivity (port 5432) and PHP-FPM (port 9000). PHP-FPM takes ~90 seconds to warm up after a normal restart — but during nightly AIO update cycles, the full startup (occ upgrade, app updates, migrations) can take 5+ minutes. On 2026-03-27, a startup hung and left the container unhealthy for 20 hours until the next nightly cycle replaced it. +Added 2026-03-23, updated 2026-05-02. The `nextcloud-aio-nextcloud` container needs a more lenient window than other containers. Its healthcheck (`/healthcheck.sh`) verifies PostgreSQL connectivity (port 5432) and PHP-FPM (port 9000). PHP-FPM takes ~90 seconds to warm up after a normal restart — but during nightly AIO update cycles, the full startup (occ upgrade, app updates, migrations) can take 5+ minutes. On 2026-03-27, a startup hung and left the container unhealthy for 20 hours until the next nightly cycle replaced it. -The dedicated alarm uses a 10-minute lookup window and 10-minute delay to absorb normal startup, while still catching sustained failures: +The dedicated alarm uses a 30-minute lookup window and 10-minute delay to absorb normal startup and update cycles (~40 minutes total grace), while still catching sustained failures: ```ini # Dedicated alarm for nextcloud-aio-nextcloud — lenient window to absorb nightly update cycle @@ -76,15 +82,23 @@ template: docker_nextcloud_unhealthy component: Docker units: status every: 30s - lookup: average -10m of unhealthy + lookup: average -30m of unhealthy chart labels: container_name=nextcloud-aio-nextcloud - warn: $this > 0 + warn: $this >= 1 delay: up 10m down 5m multiplier 1.5 max 30m summary: Nextcloud container health sustained - info: nextcloud-aio-nextcloud has been unhealthy for a sustained period — not a transient update blip + info: nextcloud-aio-nextcloud has been continuously unhealthy for 30+ minutes — not a transient update blip to: sysadmin ``` +**Tuning history:** + +| Date | Lookup | Delay | Trigger | Notes | +|---|---|---|---|---| +| 2026-03-23 | 35m | 35m | Initial split from general alarm | Absorbed PHP-FPM warm-up | +| 2026-04-29 | 15m | 5m | Backup blip (~6m) never triggered | Tightened after stability | +| 2026-05-02 | 30m | 10m | 15m still too aggressive for update cycles | ~40m total grace; catches real outages | + ## Watchdog Cron: Auto-Restart on Sustained Unhealthy If the Nextcloud container stays unhealthy for more than 1 hour (well past any normal startup window), a cron watchdog on majorlab auto-restarts it and logs the event. This was added 2026-03-28 after an incident where the container sat unhealthy for 20 hours until the next nightly backup cycle replaced it. diff --git a/02-selfhosting/security/clamav-fleet-deployment.md b/02-selfhosting/security/clamav-fleet-deployment.md index 373c75f..b731795 100644 --- a/02-selfhosting/security/clamav-fleet-deployment.md +++ b/02-selfhosting/security/clamav-fleet-deployment.md @@ -11,7 +11,7 @@ tags: - cron status: published created: 2026-04-18 -updated: 2026-04-18T11:13 +updated: 2026-04-30T05:21 --- # ClamAV Fleet Deployment with Ansible diff --git a/02-selfhosting/security/fail2ban-digest-mode-fleet.md b/02-selfhosting/security/fail2ban-digest-mode-fleet.md index 9004784..5499fbc 100644 --- a/02-selfhosting/security/fail2ban-digest-mode-fleet.md +++ b/02-selfhosting/security/fail2ban-digest-mode-fleet.md @@ -1,11 +1,18 @@ --- -title: "Fail2Ban Digest Mode — Fleet-Wide Quiet Alerts" +title: Fail2Ban Digest Mode — Fleet-Wide Quiet Alerts domain: selfhosting category: security -tags: [fail2ban, security, email, ansible, fleet, cron, digest] +tags: + - fail2ban + - security + - email + - ansible + - fleet + - cron + - digest status: published created: 2026-04-22 -updated: 2026-04-22 +updated: 2026-05-02T14:56 --- # Fail2Ban Digest Mode — Fleet-Wide Quiet Alerts @@ -21,11 +28,11 @@ Three tiers replace the firehose: | Tier | Jails | Action | Why | |------|-------|--------|-----| -| **Immediate email** | `sshd`, `recidive` | `action_mwl` | Security-critical — someone is actively targeting auth or is a repeat offender | +| **Immediate email** | `recidive` | `action_mwl` | Repeat offenders only — someone has been banned multiple times across jails | | **Silent ban** | Everything else | `action_` (default) | Ban happens, firewall rule applied, no email sent | | **Daily digest** | All jails | Cron script at 08:00 UTC | One summary email per host with ban counts across all jails | -This reduces email volume from hundreds per day to ~10 (one digest per host + occasional sshd/recidive alerts). +This reduces email volume from hundreds per day to ~10 (one digest per host + occasional recidive alerts). ## jail.local Configuration @@ -40,18 +47,20 @@ action = %(action_)s This overrides the stock `action_mwl` for all jails. Bans still happen — the firewall rule is applied — but no email is sent. -### Keep immediate alerts for critical jails +### Keep immediate alerts for recidive only ```ini [sshd] enabled = true -action = %(action_mwl)s +action = %(action_)s [recidive] enabled = true action = %(action_mwl)s ``` +> **Updated 2026-05-02:** sshd was moved to silent (`action_`). Only recidive (repeat offenders) now triggers immediate email. sshd bans are captured in the daily digest. + ### Clean up email subjects with fq-hostname By default, fail2ban uses the system FQDN in email subjects. On Tailscale hosts, this produces ugly subjects like `[Fail2Ban] sshd: banned 1.2.3.4 on MajorToot.tail7f2d9.ts.net`. Override it in `[DEFAULT]`: @@ -91,8 +100,9 @@ The playbook `configure_fail2ban_digest.yml` deploys the full digest model fleet ### What it does 1. Deploys a Python helper script that performs **section-aware editing** of `jail.local` (see gotchas below) -2. Sets `action = %(action_)s` in `[DEFAULT]` -3. Sets `action = %(action_mwl)s` in `[sshd]` and `[recidive]` +2. Sets `action = %(action_)s` in `[DEFAULT]` and `[sshd]` +3. Sets `action = %(action_mwl)s` in `[recidive]` +4. Removes stale `action = %(action_mwl)s` from `defaults-debian.conf` if present 4. Sets `fq-hostname` per host using an override dict 5. Deploys the digest script from a Jinja2 template 6. Creates the cron job via `ansible.builtin.cron` @@ -143,6 +153,14 @@ option 'action' in section 'DEFAULT' already exists The Python editor script handles this by replacing existing keys rather than appending. +### defaults-debian.conf overrides jail.local + +On Debian/Ubuntu, `/etc/fail2ban/jail.d/defaults-debian.conf` is loaded **after** `jail.local`. If it contains `action = %(action_mwl)s`, it silently overrides your silent default — every jail sends email on every ban. The Ansible playbook now removes this line automatically. If you see per-ban emails after deploying digest mode, check this file first: + +```bash +grep action /etc/fail2ban/jail.d/defaults-debian.conf +``` + ### fq-hostname scope Setting `fq-hostname` in `[DEFAULT]` affects all action templates that use the `` tag — including both immediate emails and the digest subject. This is the desired behavior, but be aware that it overrides the system hostname globally within fail2ban. diff --git a/02-selfhosting/services/mastodon-instance-tuning.md b/02-selfhosting/services/mastodon-instance-tuning.md index 2f8241b..f5459a3 100644 --- a/02-selfhosting/services/mastodon-instance-tuning.md +++ b/02-selfhosting/services/mastodon-instance-tuning.md @@ -10,7 +10,7 @@ tags: - docker status: published created: 2026-04-02 -updated: 2026-04-29T22:45 +updated: 2026-04-30T05:21 --- # Mastodon Instance Tuning diff --git a/05-troubleshooting/ansible-check-mode-false-positives.md b/05-troubleshooting/ansible-check-mode-false-positives.md index f88e251..6796756 100644 --- a/05-troubleshooting/ansible-check-mode-false-positives.md +++ b/05-troubleshooting/ansible-check-mode-false-positives.md @@ -11,7 +11,7 @@ tags: - troubleshooting status: published created: 2026-04-18 -updated: 2026-04-29T22:45 +updated: 2026-04-30T05:21 --- # Ansible Check Mode False Positives in Verify/Assert Tasks diff --git a/05-troubleshooting/index.md b/05-troubleshooting/index.md index c6de078..6625575 100644 --- a/05-troubleshooting/index.md +++ b/05-troubleshooting/index.md @@ -1,6 +1,6 @@ --- created: 2026-03-15T06:37 -updated: 2026-04-29T23:55 +updated: 2026-04-30T10:41 --- # 🔧 General Troubleshooting diff --git a/05-troubleshooting/isp-sni-filtering-caddy.md b/05-troubleshooting/isp-sni-filtering-caddy.md index c0a7fb0..bd435bf 100644 --- a/05-troubleshooting/isp-sni-filtering-caddy.md +++ b/05-troubleshooting/isp-sni-filtering-caddy.md @@ -1,11 +1,17 @@ --- -title: "ISP SNI Filtering & Caddy Troubleshooting" +title: ISP SNI Filtering & Caddy Troubleshooting domain: troubleshooting category: general -tags: [isp, sni, caddy, tls, dns, cloudflare] +tags: + - isp + - sni + - caddy + - tls + - dns + - cloudflare status: published created: 2026-04-02 -updated: 2026-04-30 +updated: 2026-04-30T13:07 --- # ISP SNI Filtering & Caddy Troubleshooting diff --git a/05-troubleshooting/networking/windows-openssh-wsl-default-shell-breaks-remote-commands.md b/05-troubleshooting/networking/windows-openssh-wsl-default-shell-breaks-remote-commands.md index cd78463..ec8233a 100644 --- a/05-troubleshooting/networking/windows-openssh-wsl-default-shell-breaks-remote-commands.md +++ b/05-troubleshooting/networking/windows-openssh-wsl-default-shell-breaks-remote-commands.md @@ -11,7 +11,7 @@ tags: - powershell status: published created: 2026-04-03 -updated: 2026-04-22T09:20 +updated: 2026-04-30T05:21 --- # Windows OpenSSH: WSL as Default Shell Breaks Remote Commands diff --git a/05-troubleshooting/networking/windows-sshd-stops-after-reboot.md b/05-troubleshooting/networking/windows-sshd-stops-after-reboot.md index 9f299fb..840c2ac 100644 --- a/05-troubleshooting/networking/windows-sshd-stops-after-reboot.md +++ b/05-troubleshooting/networking/windows-sshd-stops-after-reboot.md @@ -10,7 +10,7 @@ tags: - majorrig status: published created: 2026-04-02 -updated: 2026-04-22T09:20 +updated: 2026-04-30T05:21 --- # Windows OpenSSH Server (sshd) Stops After Reboot diff --git a/05-troubleshooting/yt-dlp-fedora-js-challenge.md b/05-troubleshooting/yt-dlp-fedora-js-challenge.md index 9fdd2e4..fd514b7 100644 --- a/05-troubleshooting/yt-dlp-fedora-js-challenge.md +++ b/05-troubleshooting/yt-dlp-fedora-js-challenge.md @@ -10,7 +10,7 @@ tags: - deno status: published created: 2026-04-02 -updated: 2026-04-22T11:33 +updated: 2026-04-30T05:21 --- # yt-dlp YouTube JS Challenge Fix (Fedora) diff --git a/MajorWiki-Deploy-Status.md b/MajorWiki-Deploy-Status.md index 79fade2..cb61548 100644 --- a/MajorWiki-Deploy-Status.md +++ b/MajorWiki-Deploy-Status.md @@ -2,7 +2,7 @@ title: MajorWiki Deployment Status status: deployed project: MajorTwin -updated: 2026-04-07T10:48 +updated: 2026-04-30T05:30 created: 2026-04-02T16:10 --- diff --git a/README.md b/README.md index c6100b9..4c0523a 100644 --- a/README.md +++ b/README.md @@ -1,6 +1,6 @@ --- created: 2026-04-06T09:52 -updated: 2026-04-29T22:46 +updated: 2026-04-30T05:21 --- # MajorLinux Tech Wiki — Index diff --git a/SUMMARY.md b/SUMMARY.md index a60b4e5..62551e0 100644 --- a/SUMMARY.md +++ b/SUMMARY.md @@ -1,6 +1,6 @@ --- created: 2026-04-02T16:03 -updated: 2026-04-29T23:55 +updated: 2026-04-30T11:24 --- * [Home](index.md) * [Linux & Sysadmin](01-linux/index.md) diff --git a/index.md b/index.md index cc7c37b..7cf0be2 100644 --- a/index.md +++ b/index.md @@ -1,6 +1,6 @@ --- created: 2026-04-06T09:52 -updated: 2026-04-29T22:45 +updated: 2026-04-30T05:21 --- # MajorLinux Tech Wiki — Index