Compare commits

..

54 Commits

Author SHA1 Message Date
d616eb2afb SUMMARY: add 4 new articles to nav (nginx/apache bad-request, SSH hardening, Watchtower relay)
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-17 21:07:05 -04:00
961ce75b88 Add 4 articles: nginx/apache bad-request jails, SSH fleet hardening, Watchtower localhost relay
All sourced from 2026-04-17 work sessions:
- fail2ban-nginx-bad-request-jail: enable stock jail (just needs wiring)
- fail2ban-apache-bad-request-jail: custom filter from scratch, no stock equivalent
- ssh-hardening-ansible-fleet: drop-in approach with Fedora/Ubuntu edge cases
- watchtower-smtp-localhost-relay: credential-free localhost postfix relay pattern

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-17 21:06:09 -04:00
9c1a8c95d5 wiki: add claude-mem troubleshooting article for Claude Code 2.1 arg mismatch
claude-mem 12.1.3 passes --setting-sources with no value, which Claude Code
2.1.x rejects. Documents the silent summaryStored=null symptom, the real
error revealed under DEBUG logging, and the claude-shim workaround.
2026-04-17 10:21:21 -04:00
4f66955d33 wiki: correct article counts in index and README
Local and remote both have 76 articles on disk, but the counter
and per-domain table were stale (74 total / self-hosting 21 /
troubleshooting 29). Trued up to 76 / 22 / 30.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-14 14:17:03 -04:00
c0837b7e89 wiki: add fail2ban jail for Apache PHP webshell probes
Documents the 2026-04-09 scanner incident where 301-redirected PHP probes
bypassed the existing apache-404scan jail, leaving the scanner unbanned
and firing Netdata web_log_1m_redirects alerts. New jail catches 301/302/
403/404 PHP responses while excluding legitimate WordPress endpoints.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-13 10:17:24 -04:00
326c87421f wiki: add troubleshooting article on /var/run heartbeat reboot false alarm
Captures the majorlab incident where the backup watchdog emailed a missing
heartbeat after a kernel-update reboot wiped /var/run, even though the
backup had actually completed cleanly. Documents the tmpfs root cause and
the fix of storing heartbeats under /var/lib instead.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-13 10:11:24 -04:00
efc8f22f6c wiki: add curl_cffi impersonation fix for yt-dlp 429 errors
YouTube rate-limits non-browser clients. Installing curl_cffi enables
TLS fingerprint impersonation, fixing HTTP 429 on subtitle downloads.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-09 15:40:17 -04:00
2c51e2b043 Fix merge conflict markers in SUMMARY.md frontmatter
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-08 10:25:28 -04:00
56f1014f73 Add troubleshooting article: wget/curl URLs with special characters
Covers shell quoting for URLs containing &, ?, #, and other characters
that Bash interprets as operators. Common gotcha when downloading from
CDNs with token-based URLs.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-08 10:18:34 -04:00
5af934a6c6 wiki: update SSH docs with bash.exe default shell fix and Windows admin key auth
- ssh-config-key-management: add Windows OpenSSH admin user key auth section
  (administrators_authorized_keys, BOM-free writing, ACL requirements)
- windows-openssh-wsl-default-shell: add bash.exe as recommended fix (Option 1),
  demote PowerShell to Option 2, add shell-not-found diagnostic tip
- windows-sshd-stops-after-reboot: fix stale wsl.exe reference to bash.exe
- index/README: update Recently Updated table and article descriptions

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-07 22:01:36 -04:00
84a1893e80 wiki: fix article count to 73, update frontmatter timestamps
Corrected inflated article count (was 76, actual is 73).
Updated domain breakdown and frontmatter timestamps from Obsidian.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-07 10:51:23 -04:00
daa771760b wiki: add WSL OpenSSH default shell + Ansible world-writable mount articles
Two new troubleshooting articles from today's MajorRig/MajorMac Ansible setup:
- Windows OpenSSH WSL default shell breaks remote SSH commands
- Ansible silently ignores ansible.cfg on WSL2 world-writable mounts

Article count: 76

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-03 10:23:02 -04:00
c66d3a6fd0 Update UFW article: add web server ports lesson from tttpod outage
Adds a section documenting how missing HTTP/HTTPS rules caused a
site outage on tttpod, and updates the fleet reference table.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-03 03:57:27 -04:00
1a00fef199 Update wiki indexes for WordPress login jail article
Article count 73 → 74. Added to SUMMARY.md, index.md, README.md,
and 02-selfhosting/index.md (which was also missing 5 other security
articles from prior sessions).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-02 16:07:08 -04:00
9a7e43e67d Add wiki article: Fail2ban WordPress login brute force jail
Access-log-based filter for wp-login.php brute force detection without
requiring the WP fail2ban plugin. Documents the backend=polling gotcha
on Ubuntu 24.04 and manual banning workflow.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-02 16:04:13 -04:00
6592eb4fea wiki: audit fixes — broken links, wikilinks, frontmatter, stale content (66 files)
- Fixed 4 broken markdown links (bad relative paths in See Also sections)
- Corrected n8n port binding to 127.0.0.1:5678 (matches actual deployment)
- Updated SnapRAID article with actual majorhome paths (/majorRAID, disk1-3)
- Converted 67 Obsidian wikilinks to relative markdown links or plain text
- Added YAML frontmatter to 35 articles missing it entirely
- Completed frontmatter on 8 articles with missing fields

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-02 11:16:29 -04:00
6da77c2db7 wiki: remove Obsidian-style hashtag tags from 12 articles
These #hashtag tag lines render as plain text on MkDocs. All articles
already have tags in YAML frontmatter, so the inline tags were redundant.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-02 11:03:28 -04:00
6f53b7c6db wiki: fix broken wikilinks in index and README related sections
Removed Obsidian [[wikilinks]] pointing to vault-only docs (01-Phases, majorlab)
that don't resolve on the MkDocs site. Kept deploy status as a proper markdown link.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-02 10:59:17 -04:00
6d81e7f020 wiki: add 4 new articles from archive, merge 8 archive notes into existing articles (73 articles)
New: mdadm RAID rebuild, Mastodon instance tuning, Ventoy, Fedora networking/kernel recovery.
Merged: Glacier Deep Archive into rsync, SpamAssassin into hardening checklist,
OBS captions/VLC capture into OBS setup, yt-dlp subtitles/temp fix into yt-dlp.
Updated index.md, README.md, SUMMARY.md with 21 previously missing articles.
Fixed merge conflict in index.md Recently Updated table.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-02 10:55:53 -04:00
2045c090c0 wiki: add UFW firewall management article and pending articles (63 articles)
New articles: UFW firewall management, Fail2ban Apache 404 scanner jail,
SELinux Fail2ban execmem fix, updating n8n Docker, Ansible SSH timeout
during dnf upgrade, n8n proxy X-Forwarded-For fix, macOS mirrored
notification alert loop. Updated dca→dcaprod reference in network overview.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-02 09:49:48 -04:00
ca7ddb67f2 wiki: add SELinux fail2ban execmem fix + pending articles
New article: selinux-fail2ban-execmem-fix.md — custom policy module
for fail2ban grep execmem denial on Fedora 43.

Also includes previously uncommitted:
- n8n-proxy-trust-x-forwarded-for.md
- fail2ban-apache-404-scanner-jail.md updates

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-01 09:51:33 -04:00
6e131637a1 wiki: add backend=polling gotcha to apache-404scan jail article
Global backend=systemd in jail.local silently breaks file-based jails.
Added required backend=polling to config, diagnostic command, and warning.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-31 19:14:36 -04:00
0df5ace1a2 wiki: add n8n reverse proxy X-Forwarded-For trust fix article
Documents the N8N_PROXY_HOPS env var needed for n8n behind Caddy/Nginx
when N8N_TRUST_PROXY alone is insufficient in newer versions.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-30 19:48:01 -04:00
6dccc43d15 Add n8n Docker update guide
Covers version checking, pinned-tag update process, SQLite password
reset, and why Arcane may not catch updates when the latest tag lags
behind npm releases.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-30 15:08:30 -04:00
MajorLinux
ed810ebdf9 Add: macOS repeating alert tone from mirrored iPhone notification 2026-03-30 07:15:09 -04:00
1bb872ef75 Add Ansible SSH timeout troubleshooting article
Documents the SSH keepalive fix for dnf upgrade timeouts on Fedora hosts,
plus the do-agent task guard fix. Also adds Ansible & Fleet Management
section to the troubleshooting index.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-28 11:22:48 -04:00
23a35e021b wiki: add fail2ban apache 404 scanner jail article
New guide for custom access-log-based fail2ban jail that catches
rapid-fire 404 vulnerability scanners missed by default error-log jails.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-28 11:22:19 -04:00
9acd083577 wiki: add fail2ban UFW rule bloat and Apache dirscan jail articles (56 articles)
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-28 00:54:06 -04:00
cfaee5cf43 wiki: document Nextcloud AIO 20h unhealthy incident and watchdog cron fix
Add troubleshooting article for the 2026-03-27 incident where PHP-FPM
hung after the nightly update cycle. Update the Netdata Docker alarm
tuning article with the dedicated Nextcloud alarm split and the new
watchdog cron deployed to majorlab. (54 articles)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-28 00:52:49 -04:00
d37bd60a24 wiki: add systemd session scope failure troubleshooting article
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-27 11:22:44 -04:00
8c22ee708d merge: resolve conflicts, add SELinux AVC chart article; update indexes to 53
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-27 03:36:49 -04:00
fb2e3f6168 wiki: add SELinux AVC chart, enriched alerts, new server setup, and pending articles; update indexes
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-27 03:34:33 -04:00
0e640a3fff wiki: add ClamAV safe scheduling article; update Netdata new server setup
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-22 03:36:49 -04:00
d1e9571761 wiki: update Netdata Docker alarm tuning — add docker_container_down suppression
Nextcloud AIO borgbackup and watchtower exit normally after nightly update/backup
cycles. Added docker_container_down override with chart labels to exclude them,
preventing false alerts. Documents chart labels pattern syntax.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-22 03:17:31 -04:00
9e205f60e4 wiki: add Netdata n8n enriched alert pipeline article (51 articles) 2026-03-21 04:25:56 -04:00
c4d3f8e974 wiki: add Tailscale SSH reauth article; update Netdata Docker alarm tuning (50 articles)
- New: Tailscale SSH unexpected re-authentication prompt — diagnosis and fix
- Updated: netdata-docker-health-alarm-tuning — add delay: up 3m to suppress
  Nextcloud AIO PHP-FPM ~90s startup false alerts; update settings table and notes
- Updated: 05-troubleshooting/index.md and SUMMARY.md

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-21 00:12:52 -04:00
4d59856c1e wiki: add Netdata new server deployment guide (49 articles) 2026-03-18 11:00:41 -04:00
38fe720e63 wiki: add Netdata Docker health alarm tuning article; update indexes to 48
- 02-selfhosting/monitoring/netdata-docker-health-alarm-tuning.md — new
- lookup extended to 5m average, delay: down 5m to prevent Nextcloud AIO update flapping
- SUMMARY.md, index.md, README.md, deploy status updated

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-18 00:10:36 -04:00
59a5cc530e wiki: add Windows sshd and Ollama/Tailscale sleep articles; update indexes to 47
- 05-troubleshooting/networking/windows-sshd-stops-after-reboot.md
- 05-troubleshooting/ollama-macos-sleep-tailscale-disconnect.md
- SUMMARY.md, index.md, README.md: count 45 → 47, add 5 missing articles (3 from 2026-03-16 + 2 today)
- MajorWiki-Deploy-Status.md: session update 2026-03-17

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-17 21:20:15 -04:00
e8598cfac8 wiki: add WSL2 backup, Fedora43 training env, Ansible upgrades, firewalld mail ports articles; update indexes
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-16 16:47:02 -04:00
6a4681dc4b merge: resolve conflicts, keep firewalld article and count 42
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-15 16:15:50 -04:00
279c094afc wiki: add firewalld mail ports reset article + session updates
- New article: firewalld mail ports wiped after reload (IMAP + webmail outage)
- New article: Plex 4K codec compatibility (Apple TV)
- New article: mdadm RAID recovery after USB hub disconnect
- Updated yt-dlp article
- Updated all index files: SUMMARY.md, index.md, README.md, category indexes
- Article count: 41 → 42

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-15 16:15:02 -04:00
7fb739d3a2 wiki: add Plex 4K codec guide and mdadm USB recovery; update yt-dlp, indexes
New articles:
- 04-streaming/plex/plex-4k-codec-compatibility.md
- 05-troubleshooting/storage/mdadm-usb-hub-disconnect-recovery.md

Updated:
- yt-dlp.md: Plex section and config reflect new HEVC auto-convert workflow
- SUMMARY.md, index.md, README.md, section indexes: 39 → 41 articles
- MajorWiki-Deploy-Status.md: count + date

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-15 07:12:09 -04:00
0bcc2c822a wiki: add SELinux vmail and gitea-runner articles; update indexes
- New: SELinux Fixing Dovecot Mail Spool Context (/var/vmail)
  Corrected fix — mail_spool_t only, no dovecot_tmp_t on tmp/ dirs.
  Includes warning and recovery steps for the Postfix delivery outage.
- New: Gitea Actions Runner Boot Race Condition Fix
  network-online.target dependency, RestartSec=10, /etc/hosts workaround.
- Updated SUMMARY.md, index.md, README.md, 05-troubleshooting/index.md
- Article count: 37 → 39; MajorWiki-Deploy-Status updated

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-14 22:49:01 -04:00
3159bbfb48 merge: resolve conflicts, keep new IMAP self-ban article
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-14 22:03:16 -04:00
deb32ce756 wiki: expand SUMMARY.md to include all articles across all sections
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-14 11:01:21 -04:00
b81c8feda0 wiki: add alternatives section with SearXNG, FreshRSS, and Gitea
Add three new articles to 03-opensource/alternatives/:
- SearXNG: private metasearch, Open WebUI integration
- FreshRSS: self-hosted RSS, mobile app sync, OPML portability
- Gitea: lightweight GitHub alternative, webhook pipeline

Article count: 33 → 36. Open source section: 6 → 9.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-14 00:37:42 -04:00
31d0a9806d wiki: add yt-dlp article to media-creative section
Cover installation, Plex-optimized format selection, playlist
downloading, config file, and background session usage. Cross-reference
existing JS challenge troubleshooting article.

Article count: 32 → 33. Open source section: 5 → 6.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-14 00:33:58 -04:00
6e0ceb0972 wiki: add Vaultwarden article to privacy-security section
Add 03-opensource/privacy-security/vaultwarden.md covering deployment
with Docker Compose, Caddy reverse proxy, client setup, access model
via Tailscale, and SQLite backup. Remove KeePassXC from backlog.

Article count: 31 → 32. Open source section: 4 → 5.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-13 23:48:40 -04:00
4f3e5877ae wiki: add dev-tools section with tmux, screen, and rsync articles
Add three new articles to 03-opensource/dev-tools/:
- tmux: persistent terminal sessions, background jobs, capture-pane
- screen: lightweight alternative, comparison table
- rsync: flags reference, resumable transfers, SSH usage

Update all indexes (SUMMARY, section index, main index, README).
Article count: 28 → 31. Remove tmux from writing backlog.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-13 23:33:38 -04:00
2e5512ed97 wiki: document maintenance protocol 2026-03-13 22:50:59 -04:00
4bfb99efa6 wiki: update main index and readme with new articles 2026-03-13 22:49:56 -04:00
697269f574 merge: resolve conflicts in summary and troubleshooting index 2026-03-13 22:46:42 -04:00
b59f6bb6b1 WSyncing from MajorMaciki expansion (Phase 10): 3 new articles and updated indices 2026-03-13 12:02:11 -04:00
13 changed files with 920 additions and 16 deletions

View File

@@ -0,0 +1,105 @@
---
title: "Watchtower SMTP via Localhost Postfix Relay"
domain: selfhosting
category: docker
tags: [watchtower, docker, smtp, postfix, email, notifications]
status: published
created: 2026-04-17
updated: 2026-04-17
---
# Watchtower SMTP via Localhost Postfix Relay
## The Problem
Watchtower supports email notifications via its built-in shoutrrr SMTP driver. The typical setup stores SMTP credentials in the compose file or a separate env file. This creates two failure modes:
1. **Password rotation breaks notifications silently.** When you rotate your mail server password, Watchtower keeps running but stops sending emails. You only discover it when you notice container updates happened with no notification.
2. **Credentials at rest.** `docker-compose.yml` and `.env` files are often world-readable or checked into git. SMTP passwords stored there are a credential leak waiting to happen.
The shoutrrr SMTP driver also has a quirk: it attempts AUTH over an unencrypted connection to remote SMTP servers, which most mail servers (correctly) reject with `535 5.7.8 authentication failed` or similar.
## The Solution
Route Watchtower's outbound mail through **localhost port 25** using `network_mode: host`. The local Postfix MTA — already running on the host for relay purposes — handles authentication to the upstream mail server. Watchtower never sees a credential.
```
Watchtower → localhost:25 (Postfix, trusted via mynetworks — no auth required)
→ Postfix → upstream mail server → delivery
```
## docker-compose.yml
```yaml
services:
watchtower:
image: containrrr/watchtower
restart: always
network_mode: host
volumes:
- /var/run/docker.sock:/var/run/docker.sock
environment:
- DOCKER_API_VERSION=1.44
- WATCHTOWER_CLEANUP=true
- WATCHTOWER_SCHEDULE=0 0 4 * * *
- WATCHTOWER_INCLUDE_STOPPED=false
- WATCHTOWER_NOTIFICATIONS=email
- WATCHTOWER_NOTIFICATION_EMAIL_FROM=watchtower@yourdomain.com
- WATCHTOWER_NOTIFICATION_EMAIL_TO=you@yourdomain.com
- WATCHTOWER_NOTIFICATION_EMAIL_SERVER=localhost
- WATCHTOWER_NOTIFICATION_EMAIL_SERVER_PORT=25
- WATCHTOWER_NOTIFICATION_EMAIL_SERVER_TLS_SKIP_VERIFY=true
- WATCHTOWER_NOTIFICATION_EMAIL_DELAY=2
```
**Key settings:**
- `network_mode: host` — required so `localhost` resolves to the host's loopback interface (and port 25). Without this, `localhost` resolves to the container's own loopback, which has no Postfix.
- `EMAIL_SERVER=localhost`, `PORT=25` — target the local Postfix
- `TLS_SKIP_VERIFY=true` — shoutrrr still negotiates STARTTLS even on port 25; a self-signed or expired local Postfix cert is fine to skip
- No `EMAIL_SERVER_USER` or `EMAIL_SERVER_PASSWORD` — Postfix trusts `127.0.0.1` via `mynetworks`, no auth needed
## Prerequisites
The host needs a Postfix instance that:
1. Listens on `localhost:25`
2. Includes `127.0.0.0/8` in `mynetworks` so local processes can relay without authentication
3. Is configured to relay outbound to your actual mail server
This is standard for any host already running a Postfix relay. If Postfix isn't installed, a minimal relay-only config is a few lines in `main.cf`.
## Why Not Just Use an Env File?
A separate env file (mode 0600) is better than inline compose, but you still have a credential that breaks on rotation. The localhost relay pattern eliminates the credential entirely.
| Approach | Credentials stored | Rotation-safe |
|---|---|---|
| Inline in compose | Yes (plaintext, often 0644) | ❌ |
| Separate env file (0600) | Yes (protected but present) | ❌ |
| Localhost Postfix relay | None | ✅ |
## Testing
After `docker compose up -d`, check the Watchtower logs for a startup notification:
```bash
docker logs <watchtower-container-name> 2>&1 | head -20
# Look for: "Sending notification..."
```
Confirm Postfix delivered it:
```bash
grep watchtower /var/log/mail.log | tail -5
# Look for: status=sent (250 2.0.0 Ok)
```
## Gotchas
- **`network_mode: host` is Linux-only.** Docker Desktop on macOS/Windows doesn't support host networking. This pattern only works on Linux hosts.
- **`network_mode: host` drops port mappings.** Any `ports:` entries are silently ignored under `network_mode: host`. Watchtower doesn't expose ports, so this isn't an issue.
- **Postfix TLS cert warning.** shoutrrr attempts STARTTLS on port 25 regardless. If the local Postfix has a self-signed or expired cert, `TLS_SKIP_VERIFY=true` suppresses the error. For a proper fix, renew the Postfix cert.
- **`WATCHTOWER_DISABLE_CONTAINERS`.** If you run stacks that manage their own updates (Nextcloud AIO, etc.), list those containers here (space-separated) to prevent Watchtower from interfering.
## See Also
- [docker-healthchecks](docker-healthchecks.md)
- [debugging-broken-docker-containers](debugging-broken-docker-containers.md)

View File

@@ -1,3 +1,7 @@
---
created: 2026-04-13T10:15
updated: 2026-04-13T10:15
---
# 🏠 Self-Hosting & Homelab
Guides for running your own services at home, including Docker, reverse proxies, DNS, storage, monitoring, and security.
@@ -31,6 +35,7 @@ Guides for running your own services at home, including Docker, reverse proxies,
- [Linux Server Hardening Checklist](security/linux-server-hardening-checklist.md)
- [Standardizing unattended-upgrades with Ansible](security/ansible-unattended-upgrades-fleet.md)
- [Fail2ban Custom Jail: Apache 404 Scanner Detection](security/fail2ban-apache-404-scanner-jail.md)
- [Fail2ban Custom Jail: Apache PHP Webshell Probe Detection](security/fail2ban-apache-php-probe-jail.md)
- [Fail2ban Custom Jail: WordPress Login Brute Force](security/fail2ban-wordpress-login-jail.md)
- [SELinux: Fixing Fail2ban grep execmem Denial](security/selinux-fail2ban-execmem-fix.md)
- [UFW Firewall Management](security/ufw-firewall-management.md)

View File

@@ -0,0 +1,127 @@
---
title: "Fail2ban Custom Jail: Apache Bad Request Detection"
domain: selfhosting
category: security
tags: [fail2ban, apache, security, firewall, bad-request]
status: published
created: 2026-04-17
updated: 2026-04-17
---
# Fail2ban Custom Jail: Apache Bad Request Detection
## The Problem
fail2ban ships a stock `nginx-bad-request` filter for catching malformed HTTP requests (400s), but **there is no Apache equivalent**. Apache servers are left unprotected against the same class of attack: scanners that send garbage request lines to probe for vulnerabilities or overwhelm the access log.
Unlike the nginx version, this filter has to be written from scratch.
## The Solution
Create a custom filter targeting **400 Bad Request** responses in Apache's Combined Log Format, then wire it to a jail.
### Step 1 — Create the filter
Create `/etc/fail2ban/filter.d/apache-bad-request.conf`:
```ini
# Fail2Ban filter: catch 400 Bad Request responses in Apache access logs
# Targets malformed HTTP requests — garbage request lines, empty methods, etc.
# No stock equivalent exists; nginx-bad-request ships with fail2ban but Apache does not.
[Definition]
# Match 400 responses in Apache Combined/Common Log Format
failregex = ^<HOST> -.*".*" 400 \d+
ignoreregex =
datepattern = %%d/%%b/%%Y:%%H:%%M:%%S %%z
```
### Step 2 — Validate the filter
Always test before deploying:
```bash
fail2ban-regex /var/log/apache2/access.log /etc/fail2ban/filter.d/apache-bad-request.conf
```
Against a live server under typical traffic this matched **155 lines with zero false positives**. If you see unexpected matches, refine `ignoreregex`.
### Step 3 — Create the jail drop-in
Create `/etc/fail2ban/jail.d/apache-bad-request.conf`:
```ini
[apache-bad-request]
enabled = true
port = http,https
filter = apache-bad-request
logpath = /var/log/apache2/access.log
maxretry = 10
findtime = 60
bantime = 1h
```
> **Note:** On Fedora/RHEL, the log path may be `/var/log/httpd/access_log`. If your `[DEFAULT]` sets `backend = systemd`, add `backend = polling` to the jail — otherwise it silently ignores `logpath` and reads journald instead.
### Step 4 — Reload fail2ban
```bash
systemctl reload fail2ban
fail2ban-client status apache-bad-request
```
## Deploy Fleet-Wide with Ansible
If you run multiple Apache hosts, use Ansible to deploy both the filter and jail atomically:
```yaml
- name: Deploy apache-bad-request fail2ban filter
ansible.builtin.template:
src: templates/fail2ban_apache_bad_request_filter.conf.j2
dest: /etc/fail2ban/filter.d/apache-bad-request.conf
notify: Reload fail2ban
- name: Deploy apache-bad-request fail2ban jail
ansible.builtin.template:
src: templates/fail2ban_apache_bad_request_jail.conf.j2
dest: /etc/fail2ban/jail.d/apache-bad-request.conf
notify: Reload fail2ban
```
## Why Not Use nginx-bad-request on Apache?
The `nginx-bad-request` filter parses nginx's log format, which differs from Apache's Combined Log Format. The timestamp format, field ordering, and quoting differ enough that the regex won't match. You need a separate filter.
| | nginx-bad-request | apache-bad-request |
|---|---|---|
| Ships with fail2ban | ✅ Yes | ❌ No — must write custom |
| Log source | nginx access log | Apache access log |
| What it catches | 400 responses (malformed requests) | 400 responses (malformed requests) |
| Regex target | nginx Combined Log Format | Apache Combined Log Format |
## Diagnostic Commands
```bash
# Validate filter against live log
fail2ban-regex /var/log/apache2/access.log /etc/fail2ban/filter.d/apache-bad-request.conf
# Check jail status
fail2ban-client status apache-bad-request
# Confirm the jail is monitoring the correct log file
fail2ban-client get apache-bad-request logpath
# Watch bans in real time
tail -f /var/log/fail2ban.log | grep apache-bad-request
# Count 400s in today's access log
grep '" 400 ' /var/log/apache2/access.log | wc -l
```
## See Also
- [fail2ban-nginx-bad-request-jail](fail2ban-nginx-bad-request-jail.md) — the nginx equivalent (stock filter, just needs wiring)
- [fail2ban-apache-404-scanner-jail](fail2ban-apache-404-scanner-jail.md) — catches 404 probe scanners
- [fail2ban-apache-php-probe-jail](fail2ban-apache-php-probe-jail.md)

View File

@@ -0,0 +1,146 @@
---
title: "Fail2ban Custom Jail: Apache PHP Webshell Probe Detection"
domain: selfhosting
category: security
tags:
- fail2ban
- apache
- security
- php
- webshell
- scanner
status: published
created: 2026-04-09
updated: 2026-04-13T10:15
---
# Fail2ban Custom Jail: Apache PHP Webshell Probe Detection
## The Problem
Automated scanners flood web servers with rapid-fire requests for non-existent `.php` files — `bless.php`, `alfa.php`, `lock360.php`, `about.php`, `cgi-bin/bypass.php`, and hundreds of others. These are classic **webshell/backdoor probes** looking for compromised PHP files left behind by prior attackers.
On servers that force HTTPS (or have HTTP→HTTPS redirects in place), these probes often return **301 Moved Permanently** instead of 404. That causes three problems:
1. **The `apache-404scan` jail misses them** — it only matches 404 responses
2. **Netdata fires false `web_log_1m_redirects` alerts** — the redirect ratio spikes to 96%+ during scans
3. **The scanner is never banned**, and will return repeatedly
This was the exact trigger for the 2026-04-09 `[MajorLinux] Web Log Alert` incident where `45.86.202.224` sent 202 PHP probe requests in a few minutes, all returning 301.
## The Solution
Create a custom Fail2ban filter that matches **any `.php` request returning a redirect, forbidden, or not-found response** — while excluding legitimate WordPress PHP endpoints.
### Step 1 — Create the filter
Create `/etc/fail2ban/filter.d/apache-php-probe.conf`:
```ini
# Fail2Ban filter to catch PHP file probing (webshell/backdoor scanners)
# These requests hit non-existent .php files and get 301/302/403/404 responses
[Definition]
failregex = ^<HOST> -.*"(GET|POST|HEAD) /[^ ]*\.php[^ ]* HTTP/[0-9.]+" (301|302|403|404) \d+
ignoreregex = ^<HOST> -.*(wp-cron\.php|xmlrpc\.php|wp-login\.php|wp-admin|index\.php|wp-comments-post\.php)
datepattern = %%d/%%b/%%Y:%%H:%%M:%%S %%z
```
**Why the ignoreregex matters:** Legitimate WordPress traffic hits `wp-cron.php`, `xmlrpc.php` (often 403-blocked on hardened sites), `wp-login.php`, and `index.php` constantly. Without exclusions the jail would ban your own WordPress admins. Note that `wp-login.php` brute force is caught separately by the `wordpress` jail.
### Step 2 — Add the jail
Add to `/etc/fail2ban/jail.local`:
```ini
[apache-php-probe]
enabled = true
port = http,https
filter = apache-php-probe
logpath = /var/log/apache2/access.log
maxretry = 5
findtime = 1m
bantime = 48h
```
**5 hits in 1 minute** is tight — scanners fire 20200 PHP probes in seconds, while a real user hitting one broken PHP link won't trip the threshold. The 48-hour bantime is longer than `apache-404scan`'s 24h because PHP webshell scanning is a stronger signal of malicious intent.
### Step 3 — Test the regex
```bash
fail2ban-regex /var/log/apache2/access.log /etc/fail2ban/filter.d/apache-php-probe.conf
```
Verify it matches the scanner requests and does **not** match legitimate WordPress traffic.
### Step 4 — Reload Fail2ban
```bash
systemctl restart fail2ban
fail2ban-client status apache-php-probe
```
## Why This Complements `apache-404scan`
| Jail | Catches | Misses |
|---|---|---|
| `apache-404scan` | Any 404 (config file probes, `.env`, random paths) | PHP probes redirected to HTTPS (301) |
| **`apache-php-probe`** | **PHP webshell probes (301/302/403/404)** | Non-`.php` probes |
Running both jails together covers:
- **HTTP→HTTPS redirected PHP probes** (301 responses)
- **Directly-served PHP probes** (404 responses)
- **Blocked PHP paths** like `xmlrpc.php` in non-WP contexts (403 responses)
## Pair With Recidive
The `recidive` jail catches repeat offenders across all jails:
```ini
[recidive]
enabled = true
bantime = -1
findtime = 86400
maxretry = 3
```
A scanner that trips `apache-php-probe` three times in 24 hours gets a **permanent** firewall-level ban.
## Manual IP Blocking via UFW
For known scanners you want to block immediately without waiting for the jail to trip, use UFW:
```bash
# Insert at top of rule list (priority over Apache ALLOW rules)
ufw insert 1 deny from <IP> to any comment "PHP webshell scanner YYYY-MM-DD"
```
This bypasses fail2ban entirely and is useful for:
- Scanners you spot in logs after the fact
- Known-malicious subnets from threat intel
- Entire CIDR blocks (`ufw insert 1 deny from 45.86.202.0/24`)
## Quick Diagnostic Commands
```bash
# Count recent PHP probes returning 301/403/404
awk '/09\/Apr\/2026:18:/ && /\.php/ && ($9==301 || $9==403 || $9==404)' /var/log/apache2/access.log | wc -l
# Top probed PHP filenames (useful for writing additional ignoreregex)
grep '\.php' /var/log/apache2/access.log | awk '{print $7}' | sort | uniq -c | sort -rn | head -20
# Top scanner IPs by PHP probe count
grep '\.php' /var/log/apache2/access.log | awk '$9 ~ /^(301|403|404)$/ {print $1}' | sort | uniq -c | sort -rn | head -10
# Watch bans in real time
tail -f /var/log/fail2ban.log | grep apache-php-probe
```
## Key Notes
- **This jail only makes sense on servers that redirect HTTP→HTTPS.** On plain-HTTPS-only servers, PHP probes return 404 and `apache-404scan` already catches them.
- **Add your own WordPress plugin paths to `ignoreregex`** if you use non-standard endpoints (e.g., custom admin URLs, REST API `.php` handlers).
- **This filter pairs naturally with Netdata `web_log_1m_redirects` alerts** — during a scan, Netdata fires first (threshold crossed), then fail2ban bans the IP within seconds.
- Also see: [Fail2ban Custom Jail: Apache 404 Scanner Detection](fail2ban-apache-404-scanner-jail.md) for the sibling 404-based filter.

View File

@@ -0,0 +1,89 @@
---
title: "Fail2ban: Enable the nginx-bad-request Jail"
domain: selfhosting
category: security
tags: [fail2ban, nginx, security, firewall, bad-request]
status: published
created: 2026-04-17
updated: 2026-04-17
---
# Fail2ban: Enable the nginx-bad-request Jail
## The Problem
Automated scanners sometimes send **malformed HTTP requests** — empty request lines, truncated headers, or garbage data — that nginx rejects with a `400 Bad Request`. These aren't caught by the default fail2ban jails (`nginx-botsearch`, `nginx-http-auth`) because those target URL-probe patterns and auth failures, not raw protocol abuse.
In a real incident: a single IP (`185.177.72.70`) sent **2,778 malformed requests in ~4 minutes**, driving Netdata's `web_log_1m_bad_requests` to 93.7% and triggering a CRITICAL alert. The neighboring IP (`185.177.72.61`) was already banned — the `/24` was known-bad and operating in shifts.
## The Solution
fail2ban ships a `nginx-bad-request` filter out of the box. It's just not wired to a jail by default. Enabling it is a one-step drop-in.
### Step 1 — Create the jail drop-in
Create `/etc/fail2ban/jail.d/nginx-bad-request.conf`:
```ini
[nginx-bad-request]
enabled = true
port = http,https
filter = nginx-bad-request
logpath = /var/log/nginx/access.log
maxretry = 10
findtime = 60
bantime = 1h
```
**Settings rationale:**
- `maxretry = 10` — a legitimate browser never sends 10 malformed requests; this threshold catches burst scanners immediately
- `findtime = 60` — 60-second window; the attack pattern fires dozens of requests per minute
- `bantime = 1h` — reasonable starting point; pair with `recidive` for repeat offenders
### Step 2 — Verify the filter matches your log format
Before reloading, confirm the stock filter matches your nginx logs:
```bash
fail2ban-regex /var/log/nginx/access.log nginx-bad-request
```
In a real-world test against an active server this matched **2,829 lines with zero false positives**.
### Step 3 — Reload fail2ban
```bash
systemctl reload fail2ban
fail2ban-client status nginx-bad-request
```
You can also ban an IP manually while the jail is loading:
```bash
fail2ban-client set nginx-bad-request banip 185.177.72.70
```
## Verify It's Working
```bash
# Check jail status and active bans
fail2ban-client status nginx-bad-request
# Watch bans in real time
tail -f /var/log/fail2ban.log | grep nginx-bad-request
# Confirm the jail is monitoring the right file
fail2ban-client get nginx-bad-request logpath
```
## Key Notes
- The stock filter is at `/etc/fail2ban/filter.d/nginx-bad-request.conf` — no need to create it.
- If your `[DEFAULT]` section sets `backend = systemd` (common on Fedora/RHEL), add `backend = polling` to the jail or it will silently ignore `logpath` and monitor journald instead — where nginx doesn't write.
- Make sure your Tailscale subnet (`100.64.0.0/10`) is in `ignoreip` under `[DEFAULT]` to avoid banning your own monitoring.
- This jail targets **400 Bad Request** responses. For 404 scanner detection, see [fail2ban-apache-404-scanner-jail](fail2ban-apache-404-scanner-jail.md).
## See Also
- [fail2ban-apache-bad-request-jail](fail2ban-apache-bad-request-jail.md) — Apache equivalent (no stock filter; custom filter required)
- [fail2ban-apache-404-scanner-jail](fail2ban-apache-404-scanner-jail.md)
- [fail2ban-apache-php-probe-jail](fail2ban-apache-php-probe-jail.md)

View File

@@ -0,0 +1,138 @@
---
title: "SSH Hardening Fleet-Wide with Ansible"
domain: selfhosting
category: security
tags: [ssh, ansible, security, hardening, fleet]
status: published
created: 2026-04-17
updated: 2026-04-17
---
# SSH Hardening Fleet-Wide with Ansible
## Overview
Default SSH daemon settings on both Ubuntu and Fedora/RHEL are permissive. A drop-in configuration file (`/etc/ssh/sshd_config.d/99-hardening.conf`) lets you tighten settings without touching the distro-managed base config — and Ansible can deploy it atomically across every fleet host with a single playbook run.
## Settings to Change
| Setting | Default | Hardened | Reason |
|---|---|---|---|
| `PermitRootLogin` | `yes` | `without-password` | Prevent password-based root login; key auth still works for Ansible |
| `X11Forwarding` | `yes` | `no` | Nothing in a typical homelab fleet uses X11 tunneling |
| `AllowTcpForwarding` | `yes` | `no` | Eliminates a tunneling vector if a service account is compromised |
| `MaxAuthTries` | `6` | `3` | Cuts per-connection brute-force attempts in half |
| `LoginGraceTime` | `120` | `30` | Reduces the window for slow-connect attacks |
## The Drop-in Approach
Rather than editing `/etc/ssh/sshd_config` directly (which may be managed by the distro or overwritten on upgrades), place overrides in `/etc/ssh/sshd_config.d/99-hardening.conf`. The `Include /etc/ssh/sshd_config.d/*.conf` directive in the base config loads these in alphabetical order, and **first match wins** — so `99-` ensures your overrides come last and take precedence.
> **Fedora/RHEL gotcha:** Fedora ships `/etc/ssh/sshd_config.d/50-redhat.conf` which sets `X11Forwarding yes`. Because first-match-wins applies, `50-redhat.conf` loads before `99-hardening.conf` and wins. You must patch `50-redhat.conf` in-place before deploying your drop-in, or the X11Forwarding setting will be silently ignored.
## Ansible Playbook
```yaml
- name: Harden SSH daemon fleet-wide
hosts: all:!raspbian
become: true
gather_facts: true
tasks:
- name: Ensure sshd_config.d directory exists
ansible.builtin.file:
path: /etc/ssh/sshd_config.d
state: directory
owner: root
group: root
mode: '0755'
- name: Ensure Include directive is present in sshd_config
ansible.builtin.lineinfile:
path: /etc/ssh/sshd_config
line: "Include /etc/ssh/sshd_config.d/*.conf"
insertbefore: BOF
state: present
# Fedora only: neutralize 50-redhat.conf's X11Forwarding yes
# (first-match-wins means it would override our 99- drop-in)
- name: Comment out X11Forwarding in 50-redhat.conf (Fedora)
ansible.builtin.replace:
path: /etc/ssh/sshd_config.d/50-redhat.conf
regexp: '^(X11Forwarding yes)'
replace: '# \1 # disabled by ansible hardening'
when: ansible_os_family == "RedHat"
ignore_errors: true
- name: Deploy SSH hardening drop-in
ansible.builtin.copy:
dest: /etc/ssh/sshd_config.d/99-hardening.conf
content: |
# Managed by Ansible — do not edit manually
PermitRootLogin without-password
X11Forwarding no
AllowTcpForwarding no
MaxAuthTries 3
LoginGraceTime 30
owner: root
group: root
mode: '0644'
notify: Reload sshd
- name: Verify effective SSH settings
ansible.builtin.command:
cmd: sshd -T
register: sshd_effective
changed_when: false
- name: Assert hardened settings are active
ansible.builtin.assert:
that:
- "'permitrootlogin without-password' in sshd_effective.stdout"
- "'x11forwarding no' in sshd_effective.stdout"
- "'allowtcpforwarding no' in sshd_effective.stdout"
- "'maxauthtries 3' in sshd_effective.stdout"
- "'logingracetime 30' in sshd_effective.stdout"
fail_msg: "One or more SSH hardening settings not effective — check for conflicting config"
when: not ansible_check_mode
handlers:
- name: Reload sshd
ansible.builtin.service:
# Ubuntu/Debian: 'ssh' | Fedora/RHEL: 'sshd'
name: "{{ 'ssh' if ansible_os_family == 'Debian' else 'sshd' }}"
state: reloaded
```
## Edge Cases
**Ubuntu vs Fedora service name:** The SSH daemon is `ssh` on Debian/Ubuntu and `sshd` on Fedora/RHEL. The handler uses `ansible_os_family` to pick the right name automatically.
**Missing Include directive:** Some minimal installs don't have `Include /etc/ssh/sshd_config.d/*.conf` in their base config. The `lineinfile` task adds it if absent. Without this, the drop-in directory exists but is never loaded.
**Fedora's 50-redhat.conf:** Sets `X11Forwarding yes` with first-match priority. The playbook patches it before deploying the drop-in.
**`sshd -T` in check mode:** `sshd -T` reads the *current* running config, not the pending changes. The assert task is guarded with `when: not ansible_check_mode` to prevent false failures during dry runs.
**PermitRootLogin on hosts that already had it set:** Some hosts (e.g., those managed by another tool) may already have `PermitRootLogin without-password` set elsewhere. The drop-in still applies cleanly — it just becomes a no-op for that setting.
## Verify Manually
```bash
# Check effective settings on any host
ssh root@<host> "sshd -T | grep -E 'permitrootlogin|x11forwarding|allowtcpforwarding|maxauthtries|logingracetime'"
# Expected:
# permitrootlogin without-password
# x11forwarding no
# allowtcpforwarding no
# maxauthtries 3
# logingracetime 30
```
## See Also
- [linux-server-hardening-checklist](linux-server-hardening-checklist.md)
- [ansible-unattended-upgrades-fleet](ansible-unattended-upgrades-fleet.md)
- [ufw-firewall-management](ufw-firewall-management.md)

View File

@@ -0,0 +1,178 @@
---
title: "claude-mem Silently Fails with Claude Code 2.1+ (Empty --setting-sources)"
domain: troubleshooting
category: claude-code
tags: [claude-code, claude-mem, cli, subprocess, version-mismatch, shim]
status: published
created: 2026-04-17
updated: 2026-04-17
---
# claude-mem Silently Fails with Claude Code 2.1+ (Empty `--setting-sources`)
## Symptom
After installing the `claude-mem` plugin (v12.1.3) in Claude Code (v2.1.112), every Claude Code session starts with:
```
No previous sessions found for this project yet.
```
…even for directories where you've worked repeatedly. Session records *do* appear in `~/.claude-mem/claude-mem.db` (table `sdk_sessions`), but:
- `session_summaries` count stays at **0**
- `observations` count stays at **0**
- Chroma vector DB stays empty
Tailing `~/.claude-mem/logs/claude-mem-YYYY-MM-DD.log` shows the Stop hook firing on every assistant turn, but always:
```
[HOOK ] → Stop: Requesting summary {hasLastAssistantMessage=true}
[HOOK ] Summary processing complete {waitedMs=503, summaryStored=null}
```
No errors, no stack traces — just a silent `null`. Raising `CLAUDE_MEM_LOG_LEVEL` to `DEBUG` reveals the true error:
```
[WARN ] [SDK_SPAWN] Claude process exited {code=1, signal=null, pid=…}
[ERROR] [SESSION] Generator failed {provider=claude, error=Claude Code process exited with code 1}
```
## Root cause
`claude-mem` 12.1.3 spawns the `claude` CLI as a subprocess to generate per-turn observations and session summaries. The argv it passes includes:
```
claude --output-format stream-json --verbose --input-format stream-json \
--model claude-sonnet-4-6 \
--disallowedTools Bash,Read,Write,… \
--setting-sources \ ← no value!
--permission-mode default
```
`claude-mem` intends to pass `--setting-sources ""` (empty string, meaning "no sources"). Claude Code **v2.1.x** now validates this flag and rejects empty values — it requires one of `user`, `project`, or `local`. With no value present, the CLI's argument parser consumes the next flag (`--permission-mode`) as the value and produces:
```
Error processing --setting-sources: Invalid setting source: --permission-mode.
Valid options are: user, project, local
```
The child process exits immediately with code 1 (within ~130 ms). `claude-mem` only logs `exited with code 1` and discards stderr by default, which is why the failure looks silent.
This is a **version-mismatch bug** between `claude-mem` 12.1.3 (latest as of 2026-04-17) and `claude-code` 2.1.x. Earlier Claude Code releases accepted empty values.
## Investigation path
1. Confirm worker processes are alive:
```bash
pgrep -fl "worker-service|mcp-server.cjs|chroma-mcp"
cat ~/.claude-mem/supervisor.json
```
2. Confirm sessions are being *recorded* but not *summarised*:
```bash
sqlite3 ~/.claude-mem/claude-mem.db \
"SELECT COUNT(*) FROM sdk_sessions; -- nonzero
SELECT COUNT(*) FROM session_summaries; -- 0 = pipeline broken
SELECT COUNT(*) FROM observations; -- 0 = pipeline broken"
```
3. Grep the log for `summaryStored=null` — if every Stop hook ends in `null`, summarisation is failing.
4. Raise log level to expose the real error:
```bash
# In ~/.claude-mem/settings.json
"CLAUDE_MEM_LOG_LEVEL": "DEBUG"
```
Kill and respawn workers (`pkill -f worker-service.cjs`). New logs should show `SDK_SPAWN Claude process exited {code=1}`.
5. Capture the exact argv by replacing `CLAUDE_CODE_PATH` with a debug shim that logs `$@` before exec'ing the real binary (see the fix below for the production shim — the debug version just tees argv to a log file).
## The fix
Apply in this order.
### 1. Fix the settings `claude-mem` ships with empty
Edit `~/.claude-mem/settings.json`:
```json
{
"CLAUDE_CODE_PATH": "/Users/you/.local/bin/claude-shim",
"CLAUDE_MEM_TIER_SUMMARY_MODEL": "claude-sonnet-4-6"
}
```
Both ship empty in a fresh install. `CLAUDE_CODE_PATH` points at the shim (below), not the real binary. `CLAUDE_MEM_TIER_SUMMARY_MODEL` is required when `CLAUDE_MEM_TIER_ROUTING_ENABLED=true`.
### 2. Install the shim
`/Users/you/.local/bin/claude-shim`:
```bash
#!/bin/bash
# Workaround shim for claude-mem 12.1.3 <-> Claude Code 2.1.x incompat.
# claude-mem passes `--setting-sources` with no value; Claude CLI 2.1+ rejects
# empty and consumes the next flag as the value. Fix: inject "user" when missing.
REAL=/Users/you/.local/bin/claude
new_args=()
i=0
args=("$@")
while [ $i -lt ${#args[@]} ]; do
cur="${args[$i]}"
new_args+=("$cur")
if [ "$cur" = "--setting-sources" ]; then
next="${args[$((i+1))]}"
case "$next" in
user|project|local) : ;; # already valid
*) new_args+=("user") ;; # inject missing value
esac
fi
i=$((i+1))
done
exec "$REAL" "${new_args[@]}"
```
Chmod it executable: `chmod +x ~/.local/bin/claude-shim`.
### 3. Restart workers
```bash
pkill -f "worker-service.cjs --daemon"
```
They respawn automatically on the next Claude Code hook fire. Verify:
```bash
# Within ~15 s:
sqlite3 ~/.claude-mem/claude-mem.db "SELECT COUNT(*) FROM observations;"
# Should be growing as you continue the session.
```
### 4. Sanity-check the shim is being used
```bash
ps -eww | grep -F 'setting-sources user'
```
Every live `claude` child should have `--setting-sources user` in its argv, not a bare `--setting-sources`.
## Why a shim instead of patching `claude-mem`
The offending code is inside the minified `worker-service.cjs` bundle shipped by `@anthropic-ai/claude-code` SDK, which `claude-mem` vendors. Patching the bundle is possible but fragile: any `claude-mem` update overwrites it. The shim is a one-file wrapper at a stable path, survives plugin updates, and becomes a no-op the moment upstream ships a fix.
## When to remove the shim
Check for a newer `claude-mem` release or an Anthropic SDK update that stops passing `--setting-sources` with an empty value. Test by:
1. Point `CLAUDE_CODE_PATH` back at the real `/Users/you/.local/bin/claude`.
2. Restart workers.
3. Confirm `observations` count keeps growing.
If it does, remove the shim. If not, restore the shim path and wait for a later release.
## Related
- Install notes: `20-Projects/Personal-Tasks.md` — "Install claude-mem plugin on MajorMac — 2026-04-15"
- Config file: `~/.claude-mem/settings.json`
- Logs: `~/.claude-mem/logs/claude-mem-YYYY-MM-DD.log`
- DB: `~/.claude-mem/claude-mem.db` (SQLite, FTS5 enabled)

View File

@@ -0,0 +1,84 @@
---
title: "Cron Heartbeat False Alarm: /var/run Cleared by Reboot"
domain: troubleshooting
category: general
tags:
- cron
- systemd
- tmpfs
- monitoring
- backups
- heartbeat
status: published
created: 2026-04-13
updated: 2026-04-13T10:10
---
# Cron Heartbeat False Alarm: /var/run Cleared by Reboot
If a cron-driven watchdog emails you that a job "may never have run" — but the job's log clearly shows it completed successfully — check whether the heartbeat file lives under `/var/run` (or `/run`). On most modern Linux distros, `/run` is a **tmpfs** and is wiped on every reboot. Any file there survives only until the next boot.
## Symptoms
- A heartbeat-based watchdog fires a missing-heartbeat or stale-heartbeat alert
- The job the watchdog is monitoring actually ran successfully — its log file shows a clean completion long before the alert fired
- The host was rebooted between when the job wrote its heartbeat and when the watchdog checked it
- `stat /var/run/<your-heartbeat>` returns `No such file or directory`
- `readlink -f /var/run` returns `/run`, and `mount | grep ' /run '` shows `tmpfs`
## Why It Happens
Systemd distros mount `/run` as a tmpfs for runtime state. `/var/run` is kept only as a compatibility symlink to `/run`. The whole filesystem is memory-backed: when the host reboots, every file under `/run` vanishes unless a `tmpfiles.d` rule explicitly recreates it. The convention is that only things like PID files and sockets — state that is meaningful **only for the current boot** — should live there.
A daily backup or maintenance job that touches a heartbeat file to prove it ran is *not* boot-scoped state. If the job runs at 03:00, the host reboots at 07:00 for a kernel update, and a watchdog checks the heartbeat at 08:00, the watchdog sees nothing — even though the job ran four hours earlier and exited 0.
The common mitigation of checking the heartbeat's mtime against a max age (e.g. "alert if older than 25h") does **not** protect against this. It catches stale heartbeats from real failures, but a deleted file has no mtime to compare.
## Fix
Move the heartbeat out of tmpfs and into a persistent directory. Good options:
- `/var/lib/<service>/heartbeat` — canonical home for persistent service state
- `/var/log/<service>-heartbeat` — acceptable if you want it alongside existing logs
- Any path on a real disk-backed filesystem
Both the writer (the monitored job) and the reader (the watchdog) need to agree on the new path. Make sure the parent directory exists before the first write:
```bash
HEARTBEAT="/var/lib/myservice/heartbeat"
mkdir -p "$(dirname "$HEARTBEAT")"
# ... later, on success:
touch "$HEARTBEAT"
```
The `mkdir -p` is cheap to run unconditionally and avoids a first-run-after-deploy edge case where the directory hasn't been created yet.
## Verification
After deploying the fix:
```bash
# 1. Run the monitored job manually (or wait for its next scheduled run)
sudo bash /path/to/monitored-job.sh
# 2. Confirm the heartbeat was created on persistent storage
ls -la /var/lib/myservice/heartbeat
# 3. Reboot and re-check — the file should survive
sudo reboot
# ... after reboot ...
ls -la /var/lib/myservice/heartbeat # still there, mtime unchanged
# 4. Run the watchdog manually to confirm it passes
sudo bash /path/to/watchdog.sh
```
## Why Not Use `tmpfiles.d` Instead
systemd-tmpfiles can recreate files in `/run` at boot via a `f /run/<name> 0644 root root - -` entry. That works, but it's the wrong tool for this problem: a boot-created empty file has the boot time as its mtime, which defeats the watchdog's age check. The watchdog would see a fresh heartbeat after every reboot even if the monitored job hasn't actually run in days.
Keep `/run` for true runtime state (PIDs, sockets, locks). Put success markers on persistent storage.
## Related
- [Docker & Caddy Recovery After Reboot (Fedora + SELinux)](docker-caddy-selinux-post-reboot-recovery.md) — another class of post-reboot surprise
- [rsync Backup Patterns](../02-selfhosting/storage-backup/rsync-backup-patterns.md) — reusable backup script patterns

View File

@@ -1,6 +1,6 @@
---
created: 2026-03-15T06:37
updated: 2026-04-08
updated: 2026-04-17T09:57
---
# 🔧 General Troubleshooting
@@ -29,6 +29,7 @@ Practical fixes for common Linux, networking, and application problems.
- [Gitea Actions Runner: Boot Race Condition Fix](gitea-runner-boot-race-network-target.md)
- [Systemd Session Scope Fails at Login (`session-cN.scope`)](systemd/session-scope-failure-at-login.md)
- [MajorWiki Setup & Publishing Pipeline](majwiki-setup-and-pipeline.md)
- [Cron Heartbeat False Alarm: /var/run Cleared by Reboot](cron-heartbeat-tmpfs-reboot-false-alarm.md)
## 🔒 SELinux
- [SELinux: Fixing Dovecot Mail Spool Context (/var/vmail)](selinux-dovecot-vmail-context.md)
@@ -43,3 +44,4 @@ Practical fixes for common Linux, networking, and application problems.
## 🤖 AI / Local LLM
- [Ollama Drops Off Tailscale When Mac Sleeps](ollama-macos-sleep-tailscale-disconnect.md)
- [Windows OpenSSH Server (sshd) Stops After Reboot](networking/windows-sshd-stops-after-reboot.md)
- [claude-mem Silently Fails with Claude Code 2.1+ (Empty `--setting-sources`)](claude-mem-setting-sources-empty-arg.md)

View File

@@ -121,6 +121,26 @@ yt-dlp --list-formats --remote-components ejs:github \
https://www.youtube.com/watch?v=VIDEO_ID
```
### HTTP 429 Too Many Requests + Impersonation Warning
Downloads or subtitle fetches fail with:
```
WARNING: The extractor specified to use impersonation for this download,
but no impersonate target is available.
ERROR: Unable to download video subtitles for 'en-en-US': HTTP Error 429: Too Many Requests
```
**Cause:** yt-dlp needs `curl_cffi` to impersonate a real browser's TLS fingerprint. Without it, YouTube detects the non-browser client and rate-limits with 429s. Subtitle downloads are usually the first to fail.
**Fix:**
```bash
pip3 install --upgrade yt-dlp curl_cffi
```
Once `curl_cffi` is installed, yt-dlp automatically uses browser impersonation and the 429s stop. No config changes needed.
### SABR-Only Streaming Warning
Some videos may show:

View File

@@ -1,23 +1,23 @@
---
created: 2026-04-06T09:52
updated: 2026-04-07T21:59
updated: 2026-04-13T10:16
---
# MajorLinux Tech Wiki — Index
> A growing reference of Linux, self-hosting, open source, streaming, and troubleshooting guides. Written by MajorLinux. Used by MajorTwin.
>
**Last updated:** 2026-04-08
**Article count:** 74
**Last updated:** 2026-04-14
**Article count:** 76
## Domains
| Domain | Folder | Articles |
|---|---|---|
| 🐧 Linux & Sysadmin | `01-linux/` | 12 |
| 🏠 Self-Hosting & Homelab | `02-selfhosting/` | 21 |
| 🏠 Self-Hosting & Homelab | `02-selfhosting/` | 22 |
| 🔓 Open Source Tools | `03-opensource/` | 10 |
| 🎙️ Streaming & Podcasting | `04-streaming/` | 2 |
| 🔧 General Troubleshooting | `05-troubleshooting/` | 29 |
| 🔧 General Troubleshooting | `05-troubleshooting/` | 30 |
---
@@ -80,6 +80,7 @@ updated: 2026-04-07T21:59
- [Linux Server Hardening Checklist](02-selfhosting/security/linux-server-hardening-checklist.md) — non-root user, SSH key auth, sshd_config, firewall, fail2ban, SpamAssassin
- [Standardizing unattended-upgrades with Ansible](02-selfhosting/security/ansible-unattended-upgrades-fleet.md) — fleet-wide automatic security updates across Ubuntu servers
- [Fail2ban Custom Jail: Apache 404 Scanner Detection](02-selfhosting/security/fail2ban-apache-404-scanner-jail.md) — custom filter and jail for blocking 404 scanners
- [Fail2ban Custom Jail: Apache PHP Webshell Probe Detection](02-selfhosting/security/fail2ban-apache-php-probe-jail.md) — catching PHP webshell/backdoor probes that return 301 on HTTPS-redirecting servers
- [Fail2ban Custom Jail: WordPress Login Brute Force](02-selfhosting/security/fail2ban-wordpress-login-jail.md) — access-log-based wp-login.php brute force detection without plugins
- [SELinux: Fixing Fail2ban grep execmem Denial](02-selfhosting/security/selinux-fail2ban-execmem-fix.md) — resolving execmem AVC denials from Fail2ban's grep on Fedora
- [UFW Firewall Management](02-selfhosting/security/ufw-firewall-management.md) — managing UFW rules, common patterns, troubleshooting
@@ -142,6 +143,7 @@ updated: 2026-04-07T21:59
- [Gemini CLI Manual Update](05-troubleshooting/gemini-cli-manual-update.md) — how to manually update the Gemini CLI when automatic updates fail
- [MajorWiki Setup & Pipeline](05-troubleshooting/majwiki-setup-and-pipeline.md) — setting up MajorWiki and the Obsidian → Gitea → MkDocs publishing pipeline
- [Gitea Actions Runner: Boot Race Condition Fix](05-troubleshooting/gitea-runner-boot-race-network-target.md) — fixing act_runner crash loop on boot caused by DNS not ready at startup
- [Cron Heartbeat False Alarm: /var/run Cleared by Reboot](05-troubleshooting/cron-heartbeat-tmpfs-reboot-false-alarm.md) — why `/run` is tmpfs and how a reboot wipes cron heartbeat files, and where to put them instead
- [SELinux: Fixing Dovecot Mail Spool Context (/var/vmail)](05-troubleshooting/selinux-dovecot-vmail-context.md) — fixing thousands of AVC denials when /var/vmail has wrong SELinux context
- [mdadm RAID Recovery After USB Hub Disconnect](05-troubleshooting/storage/mdadm-usb-hub-disconnect-recovery.md) — diagnosing and recovering a failed mdadm array caused by a USB hub dropout
- [Windows OpenSSH Server (sshd) Stops After Reboot](05-troubleshooting/networking/windows-sshd-stops-after-reboot.md) — fixing sshd not running after reboot due to Manual startup type
@@ -160,6 +162,8 @@ updated: 2026-04-07T21:59
| Date | Article | Domain |
|---|---|---|
| 2026-04-13 | [Cron Heartbeat False Alarm: /var/run Cleared by Reboot](05-troubleshooting/cron-heartbeat-tmpfs-reboot-false-alarm.md) | Troubleshooting |
| 2026-04-09 | [Fail2ban Custom Jail: Apache PHP Webshell Probe Detection](02-selfhosting/security/fail2ban-apache-php-probe-jail.md) | Self-Hosting |
| 2026-04-08 | [wget/curl: URLs with Special Characters Fail in Bash](05-troubleshooting/wget-url-special-characters.md) | Troubleshooting |
| 2026-04-07 | [SSH Config & Key Management](01-linux/networking/ssh-config-key-management.md) | Linux |
| 2026-04-07 | [Windows OpenSSH: WSL Default Shell Breaks Remote Commands](05-troubleshooting/networking/windows-openssh-wsl-default-shell-breaks-remote-commands.md) | Troubleshooting |

View File

@@ -1,10 +1,6 @@
---
created: 2026-04-02T16:03
<<<<<<< HEAD
updated: 2026-04-07T10:48
=======
updated: 2026-04-08
>>>>>>> 4dc77d4 (Add troubleshooting article: wget/curl URLs with special characters)
updated: 2026-04-13T10:16
---
* [Home](index.md)
* [Linux & Sysadmin](01-linux/index.md)
@@ -25,6 +21,7 @@ updated: 2026-04-08
* [Docker vs VMs for the Homelab](02-selfhosting/docker/docker-vs-vms-homelab.md)
* [Debugging Broken Docker Containers](02-selfhosting/docker/debugging-broken-docker-containers.md)
* [Docker Healthchecks](02-selfhosting/docker/docker-healthchecks.md)
* [Watchtower SMTP via Localhost Postfix Relay](02-selfhosting/docker/watchtower-smtp-localhost-relay.md)
* [Setting Up Caddy as a Reverse Proxy](02-selfhosting/reverse-proxy/setting-up-caddy-reverse-proxy.md)
* [Tailscale for Homelab Remote Access](02-selfhosting/dns-networking/tailscale-homelab-remote-access.md)
* [Network Overview](02-selfhosting/dns-networking/network-overview.md)
@@ -39,9 +36,13 @@ updated: 2026-04-08
* [Linux Server Hardening Checklist](02-selfhosting/security/linux-server-hardening-checklist.md)
* [Standardizing unattended-upgrades with Ansible](02-selfhosting/security/ansible-unattended-upgrades-fleet.md)
* [Fail2ban Custom Jail: Apache 404 Scanner Detection](02-selfhosting/security/fail2ban-apache-404-scanner-jail.md)
* [Fail2ban Custom Jail: Apache PHP Webshell Probe Detection](02-selfhosting/security/fail2ban-apache-php-probe-jail.md)
* [Fail2ban Custom Jail: WordPress Login Brute Force](02-selfhosting/security/fail2ban-wordpress-login-jail.md)
* [SELinux: Fixing Fail2ban grep execmem Denial](02-selfhosting/security/selinux-fail2ban-execmem-fix.md)
* [UFW Firewall Management](02-selfhosting/security/ufw-firewall-management.md)
* [Fail2ban: Enable the nginx-bad-request Jail](02-selfhosting/security/fail2ban-nginx-bad-request-jail.md)
* [Fail2ban Custom Jail: Apache Bad Request Detection](02-selfhosting/security/fail2ban-apache-bad-request-jail.md)
* [SSH Hardening Fleet-Wide with Ansible](02-selfhosting/security/ssh-hardening-ansible-fleet.md)
* [Open Source & Alternatives](03-opensource/index.md)
* [SearXNG: Private Self-Hosted Search](03-opensource/alternatives/searxng.md)
* [FreshRSS: Self-Hosted RSS Reader](03-opensource/alternatives/freshrss.md)
@@ -73,6 +74,7 @@ updated: 2026-04-08
* [Gemini CLI Manual Update](05-troubleshooting/gemini-cli-manual-update.md)
* [MajorWiki Setup & Publishing Pipeline](05-troubleshooting/majwiki-setup-and-pipeline.md)
* [Gitea Actions Runner: Boot Race Condition Fix](05-troubleshooting/gitea-runner-boot-race-network-target.md)
* [Cron Heartbeat False Alarm: /var/run Cleared by Reboot](05-troubleshooting/cron-heartbeat-tmpfs-reboot-false-alarm.md)
* [SELinux: Fixing Dovecot Mail Spool Context (/var/vmail)](05-troubleshooting/selinux-dovecot-vmail-context.md)
* [mdadm RAID Recovery After USB Hub Disconnect](05-troubleshooting/storage/mdadm-usb-hub-disconnect-recovery.md)
* [Windows OpenSSH Server (sshd) Stops After Reboot](05-troubleshooting/networking/windows-sshd-stops-after-reboot.md)

View File

@@ -1,23 +1,23 @@
---
created: 2026-04-06T09:52
updated: 2026-04-07T21:59
updated: 2026-04-13T10:16
---
# MajorLinux Tech Wiki — Index
> A growing reference of Linux, self-hosting, open source, streaming, and troubleshooting guides. Written by MajorLinux. Used by MajorTwin.
>
> **Last updated:** 2026-04-08
> **Article count:** 74
> **Last updated:** 2026-04-14
> **Article count:** 76
## Domains
| Domain | Folder | Articles |
|---|---|---|
| 🐧 Linux & Sysadmin | `01-linux/` | 12 |
| 🏠 Self-Hosting & Homelab | `02-selfhosting/` | 21 |
| 🏠 Self-Hosting & Homelab | `02-selfhosting/` | 22 |
| 🔓 Open Source Tools | `03-opensource/` | 10 |
| 🎙️ Streaming & Podcasting | `04-streaming/` | 2 |
| 🔧 General Troubleshooting | `05-troubleshooting/` | 29 |
| 🔧 General Troubleshooting | `05-troubleshooting/` | 30 |
---
@@ -81,6 +81,7 @@ updated: 2026-04-07T21:59
- [Linux Server Hardening Checklist](02-selfhosting/security/linux-server-hardening-checklist.md) — non-root user, SSH key auth, sshd_config, firewall, fail2ban, SpamAssassin
- [Standardizing unattended-upgrades with Ansible](02-selfhosting/security/ansible-unattended-upgrades-fleet.md) — fleet-wide automatic security updates across Ubuntu servers
- [Fail2ban Custom Jail: Apache 404 Scanner Detection](02-selfhosting/security/fail2ban-apache-404-scanner-jail.md) — custom filter and jail for blocking 404 scanners
- [Fail2ban Custom Jail: Apache PHP Webshell Probe Detection](02-selfhosting/security/fail2ban-apache-php-probe-jail.md) — catching PHP webshell/backdoor probes that return 301 on HTTPS-redirecting servers
- [Fail2ban Custom Jail: WordPress Login Brute Force](02-selfhosting/security/fail2ban-wordpress-login-jail.md) — access-log-based wp-login.php brute force detection without plugins
- [SELinux: Fixing Fail2ban grep execmem Denial](02-selfhosting/security/selinux-fail2ban-execmem-fix.md) — resolving execmem AVC denials from Fail2ban's grep on Fedora
- [UFW Firewall Management](02-selfhosting/security/ufw-firewall-management.md) — managing UFW rules, common patterns, troubleshooting
@@ -143,6 +144,7 @@ updated: 2026-04-07T21:59
- [Gemini CLI Manual Update](05-troubleshooting/gemini-cli-manual-update.md) — how to manually update the Gemini CLI when automatic updates fail
- [MajorWiki Setup & Pipeline](05-troubleshooting/majwiki-setup-and-pipeline.md) — setting up MajorWiki and the Obsidian → Gitea → MkDocs publishing pipeline
- [Gitea Actions Runner: Boot Race Condition Fix](05-troubleshooting/gitea-runner-boot-race-network-target.md) — fixing act_runner crash loop on boot caused by DNS not ready at startup
- [Cron Heartbeat False Alarm: /var/run Cleared by Reboot](05-troubleshooting/cron-heartbeat-tmpfs-reboot-false-alarm.md) — why `/run` is tmpfs and how a reboot wipes cron heartbeat files, and where to put them instead
- [SELinux: Fixing Dovecot Mail Spool Context (/var/vmail)](05-troubleshooting/selinux-dovecot-vmail-context.md) — fixing thousands of AVC denials when /var/vmail has wrong SELinux context
- [mdadm RAID Recovery After USB Hub Disconnect](05-troubleshooting/storage/mdadm-usb-hub-disconnect-recovery.md) — diagnosing and recovering a failed mdadm array caused by a USB hub dropout
- [Windows OpenSSH Server (sshd) Stops After Reboot](05-troubleshooting/networking/windows-sshd-stops-after-reboot.md) — fixing sshd not running after reboot due to Manual startup type
@@ -164,6 +166,8 @@ updated: 2026-04-07T21:59
| Date | Article | Domain |
|---|---|---|
| 2026-04-13 | [Cron Heartbeat False Alarm: /var/run Cleared by Reboot](05-troubleshooting/cron-heartbeat-tmpfs-reboot-false-alarm.md) | Troubleshooting |
| 2026-04-09 | [Fail2ban Custom Jail: Apache PHP Webshell Probe Detection](02-selfhosting/security/fail2ban-apache-php-probe-jail.md) | Self-Hosting |
| 2026-04-08 | [wget/curl: URLs with Special Characters Fail in Bash](05-troubleshooting/wget-url-special-characters.md) | Troubleshooting |
| 2026-04-07 | [SSH Config & Key Management](01-linux/networking/ssh-config-key-management.md) | Linux |
| 2026-04-07 | [Windows OpenSSH: WSL Default Shell Breaks Remote Commands](05-troubleshooting/networking/windows-openssh-wsl-default-shell-breaks-remote-commands.md) | Troubleshooting |