Compare commits
57 Commits
64df4b8cfb
...
main
| Author | SHA1 | Date | |
|---|---|---|---|
| b40e484aae | |||
| d616eb2afb | |||
| 961ce75b88 | |||
| 9c1a8c95d5 | |||
| 4f66955d33 | |||
| c0837b7e89 | |||
| 326c87421f | |||
| efc8f22f6c | |||
| 2c51e2b043 | |||
| 56f1014f73 | |||
| 5af934a6c6 | |||
| 84a1893e80 | |||
| daa771760b | |||
| c66d3a6fd0 | |||
| 1a00fef199 | |||
| 9a7e43e67d | |||
| 6592eb4fea | |||
| 6da77c2db7 | |||
| 6f53b7c6db | |||
| 6d81e7f020 | |||
| 2045c090c0 | |||
| ca7ddb67f2 | |||
| 6e131637a1 | |||
| 0df5ace1a2 | |||
| 6dccc43d15 | |||
|
|
ed810ebdf9 | ||
| 1bb872ef75 | |||
| 23a35e021b | |||
| 9acd083577 | |||
| cfaee5cf43 | |||
| d37bd60a24 | |||
| 8c22ee708d | |||
| fb2e3f6168 | |||
| 0e640a3fff | |||
| d1e9571761 | |||
| 9e205f60e4 | |||
| c4d3f8e974 | |||
| 4d59856c1e | |||
| 38fe720e63 | |||
| 59a5cc530e | |||
| e8598cfac8 | |||
| 6a4681dc4b | |||
| 279c094afc | |||
| 7fb739d3a2 | |||
| 0bcc2c822a | |||
| 3159bbfb48 | |||
| 1d8be8669e | |||
| deb32ce756 | |||
| b81c8feda0 | |||
| 31d0a9806d | |||
| 6e0ceb0972 | |||
| 4f3e5877ae | |||
| 2e5512ed97 | |||
| 4bfb99efa6 | |||
| 697269f574 | |||
| 2861cade55 | |||
| b59f6bb6b1 |
18
.gitattributes
vendored
18
.gitattributes
vendored
@@ -1,18 +0,0 @@
|
|||||||
# Normalize line endings to LF for all text files
|
|
||||||
* text=auto eol=lf
|
|
||||||
|
|
||||||
# Explicitly handle markdown
|
|
||||||
*.md text eol=lf
|
|
||||||
|
|
||||||
# Explicitly handle config files
|
|
||||||
*.yml text eol=lf
|
|
||||||
*.yaml text eol=lf
|
|
||||||
*.json text eol=lf
|
|
||||||
*.toml text eol=lf
|
|
||||||
|
|
||||||
# Binary files — don't touch
|
|
||||||
*.png binary
|
|
||||||
*.jpg binary
|
|
||||||
*.jpeg binary
|
|
||||||
*.gif binary
|
|
||||||
*.pdf binary
|
|
||||||
8
.gitignore
vendored
8
.gitignore
vendored
@@ -1,8 +0,0 @@
|
|||||||
# Obsidian specific
|
|
||||||
.obsidian/workspace.json
|
|
||||||
.obsidian/workspace-mobile.json
|
|
||||||
.obsidian/cache/
|
|
||||||
|
|
||||||
# Windows/WSL specific
|
|
||||||
Thumbs.db
|
|
||||||
.DS_Store
|
|
||||||
@@ -86,5 +86,5 @@ Be specific when asking for help. Include your distro and version, what you trie
|
|||||||
|
|
||||||
## See Also
|
## See Also
|
||||||
|
|
||||||
- [[wsl2-instance-migration-fedora43]]
|
- [wsl2-instance-migration-fedora43](wsl2-instance-migration-fedora43.md)
|
||||||
- [[managing-linux-services-systemd-ansible]]
|
- [managing-linux-services-systemd-ansible](../process-management/managing-linux-services-systemd-ansible.md)
|
||||||
|
|||||||
86
01-linux/distro-specific/wsl2-backup-powershell.md
Normal file
86
01-linux/distro-specific/wsl2-backup-powershell.md
Normal file
@@ -0,0 +1,86 @@
|
|||||||
|
---
|
||||||
|
title: WSL2 Backup via PowerShell Scheduled Task
|
||||||
|
domain: linux
|
||||||
|
category: distro-specific
|
||||||
|
tags:
|
||||||
|
- wsl2
|
||||||
|
- windows
|
||||||
|
- backup
|
||||||
|
- powershell
|
||||||
|
- majorrig
|
||||||
|
status: published
|
||||||
|
created: '2026-03-16'
|
||||||
|
updated: '2026-03-16'
|
||||||
|
---
|
||||||
|
|
||||||
|
# WSL2 Backup via PowerShell Scheduled Task
|
||||||
|
|
||||||
|
WSL2 distributions are stored as a VHDX file on disk. Unlike traditional VMs, there's no built-in snapshot or backup mechanism. This article covers a simple weekly backup strategy using `wsl --export` and a PowerShell scheduled task.
|
||||||
|
|
||||||
|
## The Short Answer
|
||||||
|
|
||||||
|
Save this as `C:\Users\majli\Scripts\backup-wsl.ps1` and register it as a weekly scheduled task.
|
||||||
|
|
||||||
|
## Backup Script
|
||||||
|
|
||||||
|
```powershell
|
||||||
|
$BackupDir = "D:\WSL\Backups"
|
||||||
|
$Date = Get-Date -Format "yyyy-MM-dd"
|
||||||
|
$BackupFile = "$BackupDir\FedoraLinux-43-$Date.tar"
|
||||||
|
$MaxBackups = 3
|
||||||
|
|
||||||
|
New-Item -ItemType Directory -Force -Path $BackupDir | Out-Null
|
||||||
|
|
||||||
|
# Must shut down WSL first — export fails if VHDX is locked
|
||||||
|
Write-Host "Shutting down WSL2..."
|
||||||
|
wsl --shutdown
|
||||||
|
Start-Sleep -Seconds 5
|
||||||
|
|
||||||
|
Write-Host "Backing up FedoraLinux-43 to $BackupFile..."
|
||||||
|
wsl --export FedoraLinux-43 $BackupFile
|
||||||
|
|
||||||
|
if ($LASTEXITCODE -eq 0) {
|
||||||
|
Write-Host "Backup complete: $BackupFile"
|
||||||
|
Get-ChildItem "$BackupDir\FedoraLinux-43-*.tar" |
|
||||||
|
Sort-Object LastWriteTime -Descending |
|
||||||
|
Select-Object -Skip $MaxBackups |
|
||||||
|
Remove-Item -Force
|
||||||
|
Write-Host "Cleanup done. Keeping last $MaxBackups backups."
|
||||||
|
} else {
|
||||||
|
Write-Host "ERROR: Backup failed!"
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
## Register the Scheduled Task
|
||||||
|
|
||||||
|
Run in PowerShell as Administrator:
|
||||||
|
|
||||||
|
```powershell
|
||||||
|
$Action = New-ScheduledTaskAction -Execute "PowerShell.exe" `
|
||||||
|
-Argument "-NonInteractive -File C:\Users\majli\Scripts\backup-wsl.ps1"
|
||||||
|
$Trigger = New-ScheduledTaskTrigger -Weekly -DaysOfWeek Sunday -At 2am
|
||||||
|
$Settings = New-ScheduledTaskSettingsSet -StartWhenAvailable -RunOnlyIfNetworkAvailable:$false
|
||||||
|
Register-ScheduledTask -TaskName "WSL2 Backup - FedoraLinux43" `
|
||||||
|
-Action $Action -Trigger $Trigger -Settings $Settings `
|
||||||
|
-RunLevel Highest -Force
|
||||||
|
```
|
||||||
|
|
||||||
|
## Restore from Backup
|
||||||
|
|
||||||
|
```powershell
|
||||||
|
wsl --unregister FedoraLinux-43
|
||||||
|
wsl --import FedoraLinux-43 D:\WSL\Fedora43 D:\WSL\Backups\FedoraLinux-43-YYYY-MM-DD.tar
|
||||||
|
```
|
||||||
|
|
||||||
|
Then fix the default user — after import WSL resets to root. See [WSL2 Instance Migration](wsl2-instance-migration-fedora43.md) for the `/etc/wsl.conf` fix.
|
||||||
|
|
||||||
|
## Gotchas
|
||||||
|
|
||||||
|
- **`wsl --export` fails with `ERROR_SHARING_VIOLATION` if WSL is running.** The script includes `wsl --shutdown` before export to handle this. Any active WSL sessions will be terminated — schedule the task for a time when WSL is idle (2am works well).
|
||||||
|
- **Backblaze picks up D:\WSL\Backups\ automatically** if D: drive is in scope — provides offsite backup without extra config.
|
||||||
|
- **Each backup tar is ~500MB–1GB** depending on what's installed. Keep MaxBackups at 3 to balance retention vs disk usage.
|
||||||
|
|
||||||
|
## See Also
|
||||||
|
|
||||||
|
- [WSL2 Instance Migration](wsl2-instance-migration-fedora43.md)
|
||||||
|
- [WSL2 Training Environment Rebuild](wsl2-rebuild-fedora43-training-env.md)
|
||||||
@@ -97,5 +97,5 @@ alias clean='sudo dnf clean all'
|
|||||||
|
|
||||||
## See Also
|
## See Also
|
||||||
|
|
||||||
- [[Managing disk space on MajorRig]]
|
- Managing disk space on MajorRig
|
||||||
- [[Unsloth QLoRA fine-tuning setup]]
|
- Unsloth QLoRA fine-tuning setup
|
||||||
|
|||||||
203
01-linux/distro-specific/wsl2-rebuild-fedora43-training-env.md
Normal file
203
01-linux/distro-specific/wsl2-rebuild-fedora43-training-env.md
Normal file
@@ -0,0 +1,203 @@
|
|||||||
|
---
|
||||||
|
title: WSL2 Fedora 43 Training Environment Rebuild
|
||||||
|
domain: linux
|
||||||
|
category: distro-specific
|
||||||
|
tags:
|
||||||
|
- wsl2
|
||||||
|
- fedora
|
||||||
|
- unsloth
|
||||||
|
- pytorch
|
||||||
|
- cuda
|
||||||
|
- majorrig
|
||||||
|
- majortwin
|
||||||
|
status: published
|
||||||
|
created: '2026-03-16'
|
||||||
|
updated: '2026-03-16'
|
||||||
|
---
|
||||||
|
|
||||||
|
# WSL2 Fedora 43 Training Environment Rebuild
|
||||||
|
|
||||||
|
How to rebuild the MajorTwin training environment from scratch on MajorRig after a WSL2 loss. Covers Fedora 43 install, Python 3.11 via pyenv, PyTorch with CUDA, Unsloth, and llama.cpp for GGUF conversion.
|
||||||
|
|
||||||
|
## The Short Answer
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# 1. Install Fedora 43 and move to D:
|
||||||
|
wsl --install -d FedoraLinux-43 --no-launch
|
||||||
|
wsl --export FedoraLinux-43 D:\WSL\fedora43.tar
|
||||||
|
wsl --unregister FedoraLinux-43
|
||||||
|
wsl --import FedoraLinux-43 D:\WSL\Fedora43 D:\WSL\fedora43.tar
|
||||||
|
|
||||||
|
# 2. Set default user
|
||||||
|
echo -e "[boot]\nsystemd=true\n[user]\ndefault=majorlinux" | sudo tee /etc/wsl.conf
|
||||||
|
useradd -m -G wheel majorlinux && passwd majorlinux
|
||||||
|
echo "%wheel ALL=(ALL) ALL" | sudo tee /etc/sudoers.d/wheel
|
||||||
|
|
||||||
|
# 3. Install Python 3.11 via pyenv, PyTorch, Unsloth
|
||||||
|
# See full steps below
|
||||||
|
```
|
||||||
|
|
||||||
|
## Step 1 — System Packages
|
||||||
|
|
||||||
|
```bash
|
||||||
|
sudo dnf update -y
|
||||||
|
sudo dnf install -y git curl wget tmux screen htop rsync unzip \
|
||||||
|
python3 python3-pip python3-devel gcc gcc-c++ make cmake \
|
||||||
|
ninja-build pkg-config openssl-devel libffi-devel \
|
||||||
|
gawk patch readline-devel sqlite-devel
|
||||||
|
```
|
||||||
|
|
||||||
|
## Step 2 — Python 3.11 via pyenv
|
||||||
|
|
||||||
|
Fedora 43 ships Python 3.13. Unsloth requires 3.11. Use pyenv:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
curl https://pyenv.run | bash
|
||||||
|
|
||||||
|
# Add to ~/.bashrc
|
||||||
|
export PYENV_ROOT="$HOME/.pyenv"
|
||||||
|
[[ -d $PYENV_ROOT/bin ]] && export PATH="$PYENV_ROOT/bin:$PATH"
|
||||||
|
eval "$(pyenv init - bash)"
|
||||||
|
|
||||||
|
source ~/.bashrc
|
||||||
|
pyenv install 3.11.9
|
||||||
|
pyenv global 3.11.9
|
||||||
|
```
|
||||||
|
|
||||||
|
The tkinter warning during install is harmless — it's not needed for training.
|
||||||
|
|
||||||
|
## Step 3 — Training Virtualenv + PyTorch
|
||||||
|
|
||||||
|
```bash
|
||||||
|
mkdir -p ~/majortwin/{staging,datasets,outputs,scripts}
|
||||||
|
python -m venv ~/majortwin/venv
|
||||||
|
source ~/majortwin/venv/bin/activate
|
||||||
|
|
||||||
|
pip install --upgrade pip
|
||||||
|
pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu128
|
||||||
|
|
||||||
|
# Verify GPU
|
||||||
|
python -c "import torch; print(torch.cuda.is_available(), torch.cuda.get_device_name(0))"
|
||||||
|
```
|
||||||
|
|
||||||
|
Expected output: `True NVIDIA GeForce RTX 3080 Ti`
|
||||||
|
|
||||||
|
## Step 4 — Unsloth + Training Stack
|
||||||
|
|
||||||
|
```bash
|
||||||
|
source ~/majortwin/venv/bin/activate
|
||||||
|
|
||||||
|
pip install "unsloth[colab-new] @ git+https://github.com/unslothai/unsloth.git"
|
||||||
|
pip install transformers datasets accelerate peft trl bitsandbytes \
|
||||||
|
sentencepiece protobuf scipy einops
|
||||||
|
|
||||||
|
# Pin transformers for unsloth-zoo compatibility
|
||||||
|
pip install "transformers<=5.2.0"
|
||||||
|
|
||||||
|
# Verify
|
||||||
|
python -c "import unsloth; print('Unsloth OK')"
|
||||||
|
```
|
||||||
|
|
||||||
|
> [!warning] Never run `pip install -r requirements.txt` from inside llama.cpp while the training venv is active. It installs CPU-only PyTorch and downgrades transformers, breaking the CUDA setup.
|
||||||
|
|
||||||
|
## Step 5 — llama.cpp (CPU-only for GGUF conversion)
|
||||||
|
|
||||||
|
CUDA 12.8 is incompatible with Fedora 43's glibc for compiling llama.cpp (math function conflicts in `/usr/include/bits/mathcalls.h`). Build CPU-only — it's sufficient for GGUF conversion, which doesn't need GPU:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# Install GCC 14 (CUDA 12.8 doesn't support GCC 15 which Fedora 43 ships)
|
||||||
|
sudo dnf install -y gcc14 gcc14-c++
|
||||||
|
|
||||||
|
cd ~/majortwin
|
||||||
|
git clone https://github.com/ggerganov/llama.cpp.git
|
||||||
|
cd llama.cpp
|
||||||
|
|
||||||
|
cmake -B build \
|
||||||
|
-DGGML_CUDA=OFF \
|
||||||
|
-DCMAKE_BUILD_TYPE=Release \
|
||||||
|
-DCMAKE_C_COMPILER=/usr/bin/gcc-14 \
|
||||||
|
-DCMAKE_CXX_COMPILER=/usr/bin/g++-14
|
||||||
|
|
||||||
|
cmake --build build --config Release -j$(nproc) 2>&1 | tee /tmp/llama_build.log &
|
||||||
|
tail -f /tmp/llama_build.log
|
||||||
|
```
|
||||||
|
|
||||||
|
Verify:
|
||||||
|
```bash
|
||||||
|
ls ~/majortwin/llama.cpp/build/bin/llama-quantize && echo "OK"
|
||||||
|
ls ~/majortwin/llama.cpp/build/bin/llama-cli && echo "OK"
|
||||||
|
```
|
||||||
|
|
||||||
|
## Step 6 — Shell Environment
|
||||||
|
|
||||||
|
```bash
|
||||||
|
cat >> ~/.bashrc << 'EOF'
|
||||||
|
# MajorInfrastructure Paths
|
||||||
|
export VAULT="/mnt/c/Users/majli/Documents/MajorVault"
|
||||||
|
export MAJORANSIBLE="/mnt/d/MajorAnsible"
|
||||||
|
export MAJORTWIN_D="/mnt/d/MajorTwin"
|
||||||
|
export MAJORTWIN_WSL="$HOME/majortwin"
|
||||||
|
export LLAMA_CPP="$HOME/majortwin/llama.cpp"
|
||||||
|
|
||||||
|
# Venv
|
||||||
|
alias mtwin='source $MAJORTWIN_WSL/venv/bin/activate && cd $MAJORTWIN_WSL'
|
||||||
|
alias vault='cd $VAULT'
|
||||||
|
alias ll='ls -lah --color=auto'
|
||||||
|
|
||||||
|
# SSH Fleet Aliases
|
||||||
|
alias majorhome='ssh majorlinux@100.120.209.106'
|
||||||
|
alias dca='ssh root@100.104.11.146'
|
||||||
|
alias majortoot='ssh root@100.110.197.17'
|
||||||
|
alias majorlinuxvm='ssh root@100.87.200.5'
|
||||||
|
alias majordiscord='ssh root@100.122.240.83'
|
||||||
|
alias majorlab='ssh root@100.86.14.126'
|
||||||
|
alias majormail='ssh root@100.84.165.52'
|
||||||
|
alias teelia='ssh root@100.120.32.69'
|
||||||
|
alias tttpod='ssh root@100.84.42.102'
|
||||||
|
alias majorrig='ssh majorlinux@100.98.47.29' # port 2222 retired 2026-03-25, fleet uses port 22
|
||||||
|
|
||||||
|
# DNF5
|
||||||
|
alias update='sudo dnf upgrade --refresh'
|
||||||
|
alias install='sudo dnf install'
|
||||||
|
alias clean='sudo dnf clean all'
|
||||||
|
|
||||||
|
# MajorTwin helpers
|
||||||
|
stage_dataset() {
|
||||||
|
cp "$VAULT/20-Projects/MajorTwin/03-Datasets/$1" "$MAJORTWIN_WSL/datasets/"
|
||||||
|
echo "Staged: $1"
|
||||||
|
}
|
||||||
|
export_gguf() {
|
||||||
|
cp "$MAJORTWIN_WSL/outputs/$1" "$MAJORTWIN_D/models/"
|
||||||
|
echo "Exported: $1 → $MAJORTWIN_D/models/"
|
||||||
|
}
|
||||||
|
EOF
|
||||||
|
|
||||||
|
source ~/.bashrc
|
||||||
|
```
|
||||||
|
|
||||||
|
## Key Rules
|
||||||
|
|
||||||
|
- **Always activate venv before pip installs:** `source ~/majortwin/venv/bin/activate`
|
||||||
|
- **Never train from /mnt/c or /mnt/d** — stage files in `~/majortwin/staging/` first
|
||||||
|
- **Never put ML artifacts inside MajorVault** — models, venvs, artifacts go on D: drive
|
||||||
|
- **Max viable training model:** 7B at QLoRA 4-bit (RTX 3080 Ti, 12GB VRAM)
|
||||||
|
- **Current base model:** Qwen2.5-7B-Instruct (ChatML format — stop token: `<|im_end|>` only)
|
||||||
|
- **Transformers must be pinned:** `pip install "transformers<=5.2.0"` for unsloth-zoo compatibility
|
||||||
|
|
||||||
|
## D: Drive Layout
|
||||||
|
|
||||||
|
```
|
||||||
|
D:\MajorTwin\
|
||||||
|
models\ ← finished GGUFs
|
||||||
|
datasets\ ← dataset archives
|
||||||
|
artifacts\ ← training run artifacts
|
||||||
|
training-runs\ ← logs, checkpoints
|
||||||
|
D:\WSL\
|
||||||
|
Fedora43\ ← WSL2 VHDX
|
||||||
|
Backups\ ← weekly WSL2 backup tars
|
||||||
|
```
|
||||||
|
|
||||||
|
## See Also
|
||||||
|
|
||||||
|
- [WSL2 Instance Migration](wsl2-instance-migration-fedora43.md)
|
||||||
|
- [WSL2 Backup via PowerShell](wsl2-backup-powershell.md)
|
||||||
@@ -152,6 +152,6 @@ find /var/www/html -type d -exec chmod 755 {} \;
|
|||||||
|
|
||||||
## See Also
|
## See Also
|
||||||
|
|
||||||
- [[linux-server-hardening-checklist]]
|
- [linux-server-hardening-checklist](../../02-selfhosting/security/linux-server-hardening-checklist.md)
|
||||||
- [[ssh-config-key-management]]
|
- [ssh-config-key-management](../networking/ssh-config-key-management.md)
|
||||||
- [[bash-scripting-patterns]]
|
- [bash-scripting-patterns](../shell-scripting/bash-scripting-patterns.md)
|
||||||
|
|||||||
@@ -1,11 +1,16 @@
|
|||||||
---
|
---
|
||||||
title: "SSH Config and Key Management"
|
title: SSH Config and Key Management
|
||||||
domain: linux
|
domain: linux
|
||||||
category: networking
|
category: networking
|
||||||
tags: [ssh, keys, security, linux, remote-access]
|
tags:
|
||||||
|
- ssh
|
||||||
|
- keys
|
||||||
|
- security
|
||||||
|
- linux
|
||||||
|
- remote-access
|
||||||
status: published
|
status: published
|
||||||
created: 2026-03-08
|
created: 2026-03-08
|
||||||
updated: 2026-03-08
|
updated: 2026-04-14T14:27
|
||||||
---
|
---
|
||||||
|
|
||||||
# SSH Config and Key Management
|
# SSH Config and Key Management
|
||||||
@@ -129,7 +134,51 @@ If key auth isn't working and the config looks right, permissions are the first
|
|||||||
- **`ServerAliveInterval` in your config** keeps connections from timing out on idle sessions. Saves you from the annoyance of reconnecting after stepping away.
|
- **`ServerAliveInterval` in your config** keeps connections from timing out on idle sessions. Saves you from the annoyance of reconnecting after stepping away.
|
||||||
- **Never put private keys in cloud storage, Git repos, or Docker images.** It happens more than you'd think.
|
- **Never put private keys in cloud storage, Git repos, or Docker images.** It happens more than you'd think.
|
||||||
|
|
||||||
|
## Windows OpenSSH: Admin User Key Auth
|
||||||
|
|
||||||
|
Windows OpenSSH has a separate key file for users in the `Administrators` group. Regular `~/.ssh/authorized_keys` is **ignored** for admin users unless the `Match Group administrators` block in `sshd_config` is disabled.
|
||||||
|
|
||||||
|
### Where keys go
|
||||||
|
|
||||||
|
| User type | Key file |
|
||||||
|
|---|---|
|
||||||
|
| Regular user | `C:\Users\<user>\.ssh\authorized_keys` |
|
||||||
|
| Admin user | `C:\ProgramData\ssh\administrators_authorized_keys` |
|
||||||
|
|
||||||
|
### Setup (elevated PowerShell)
|
||||||
|
|
||||||
|
1. **Enable the Match block** in `C:\ProgramData\ssh\sshd_config` — both lines must be uncommented:
|
||||||
|
```
|
||||||
|
Match Group administrators
|
||||||
|
AuthorizedKeysFile __PROGRAMDATA__/ssh/administrators_authorized_keys
|
||||||
|
```
|
||||||
|
|
||||||
|
2. **Write the key file without BOM** — PowerShell 5 defaults to UTF-16LE or UTF-8 with BOM, both of which OpenSSH silently rejects:
|
||||||
|
```powershell
|
||||||
|
[System.IO.File]::WriteAllText(
|
||||||
|
"C:\ProgramData\ssh\administrators_authorized_keys",
|
||||||
|
"ssh-ed25519 AAAA... user@hostname`n",
|
||||||
|
[System.Text.UTF8Encoding]::new($false)
|
||||||
|
)
|
||||||
|
```
|
||||||
|
|
||||||
|
3. **Lock down permissions** — OpenSSH requires strict ACLs:
|
||||||
|
```powershell
|
||||||
|
icacls "C:\ProgramData\ssh\administrators_authorized_keys" /inheritance:r /grant "SYSTEM:(F)" /grant "Administrators:(F)"
|
||||||
|
```
|
||||||
|
|
||||||
|
4. **Restart sshd:**
|
||||||
|
```powershell
|
||||||
|
Restart-Service sshd
|
||||||
|
```
|
||||||
|
|
||||||
|
### Troubleshooting
|
||||||
|
|
||||||
|
- If key auth silently fails, check `Get-WinEvent -LogName OpenSSH/Operational -MaxEvents 10`
|
||||||
|
- Common cause: BOM in the key file or `sshd_config` — PowerShell file-writing commands are the usual culprit
|
||||||
|
- If the log says `User not allowed because shell does not exist`, the `DefaultShell` registry path is wrong — see [WSL default shell troubleshooting](../../05-troubleshooting/networking/windows-openssh-wsl-default-shell-breaks-remote-commands.md)
|
||||||
|
|
||||||
## See Also
|
## See Also
|
||||||
|
|
||||||
- [[linux-server-hardening-checklist]]
|
- [linux-server-hardening-checklist](../../02-selfhosting/security/linux-server-hardening-checklist.md)
|
||||||
- [[managing-linux-services-systemd-ansible]]
|
- [managing-linux-services-systemd-ansible](../process-management/managing-linux-services-systemd-ansible.md)
|
||||||
|
|||||||
@@ -168,5 +168,5 @@ Flatpak is what I prefer — better sandboxing story, Flathub has most things yo
|
|||||||
|
|
||||||
## See Also
|
## See Also
|
||||||
|
|
||||||
- [[linux-distro-guide-beginners]]
|
- [linux-distro-guide-beginners](../distro-specific/linux-distro-guide-beginners.md)
|
||||||
- [[linux-server-hardening-checklist]]
|
- [linux-server-hardening-checklist](../../02-selfhosting/security/linux-server-hardening-checklist.md)
|
||||||
|
|||||||
@@ -146,5 +146,5 @@ ansible-playbook -i inventory.ini manage-services.yml
|
|||||||
|
|
||||||
## See Also
|
## See Also
|
||||||
|
|
||||||
- [[wsl2-instance-migration-fedora43]]
|
- [wsl2-instance-migration-fedora43](../distro-specific/wsl2-instance-migration-fedora43.md)
|
||||||
- [[tuning-netdata-web-log-alerts]]
|
- [tuning-netdata-web-log-alerts](../../02-selfhosting/monitoring/tuning-netdata-web-log-alerts.md)
|
||||||
|
|||||||
@@ -204,5 +204,5 @@ Roles keep things organized and reusable across projects.
|
|||||||
|
|
||||||
## See Also
|
## See Also
|
||||||
|
|
||||||
- [[managing-linux-services-systemd-ansible]]
|
- [managing-linux-services-systemd-ansible](../process-management/managing-linux-services-systemd-ansible.md)
|
||||||
- [[linux-server-hardening-checklist]]
|
- [linux-server-hardening-checklist](../../02-selfhosting/security/linux-server-hardening-checklist.md)
|
||||||
|
|||||||
@@ -211,5 +211,5 @@ retry 3 10 curl -f https://example.com/health
|
|||||||
|
|
||||||
## See Also
|
## See Also
|
||||||
|
|
||||||
- [[ansible-getting-started]]
|
- [ansible-getting-started](ansible-getting-started.md)
|
||||||
- [[managing-linux-services-systemd-ansible]]
|
- [managing-linux-services-systemd-ansible](../process-management/managing-linux-services-systemd-ansible.md)
|
||||||
|
|||||||
113
01-linux/storage/mdadm-raid-rebuild.md
Normal file
113
01-linux/storage/mdadm-raid-rebuild.md
Normal file
@@ -0,0 +1,113 @@
|
|||||||
|
---
|
||||||
|
title: "mdadm — Rebuilding a RAID Array After Reinstall"
|
||||||
|
domain: linux
|
||||||
|
category: storage
|
||||||
|
tags: [mdadm, raid, linux, storage, recovery, homelab]
|
||||||
|
status: published
|
||||||
|
created: 2026-04-02
|
||||||
|
updated: 2026-04-02
|
||||||
|
---
|
||||||
|
|
||||||
|
# mdadm — Rebuilding a RAID Array After Reinstall
|
||||||
|
|
||||||
|
If you reinstall the OS on a machine that has an existing mdadm RAID array, the array metadata is still on the disks — you just need to reassemble it. The data isn't gone unless you've overwritten the member disks.
|
||||||
|
|
||||||
|
## The Short Answer
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# Scan for existing arrays
|
||||||
|
sudo mdadm --assemble --scan
|
||||||
|
|
||||||
|
# Check what was found
|
||||||
|
cat /proc/mdstat
|
||||||
|
```
|
||||||
|
|
||||||
|
If that works, your array is back. If not, you'll need to manually identify the member disks and reassemble.
|
||||||
|
|
||||||
|
## Step-by-Step Recovery
|
||||||
|
|
||||||
|
### 1. Identify the RAID member disks
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# Show mdadm superblock info on each disk/partition
|
||||||
|
sudo mdadm --examine /dev/sda1
|
||||||
|
sudo mdadm --examine /dev/sdb1
|
||||||
|
|
||||||
|
# Or scan all devices at once
|
||||||
|
sudo mdadm --examine --scan
|
||||||
|
```
|
||||||
|
|
||||||
|
Look for matching `UUID` fields — disks with the same array UUID belong to the same array.
|
||||||
|
|
||||||
|
### 2. Reassemble the array
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# Assemble from specific devices
|
||||||
|
sudo mdadm --assemble /dev/md0 /dev/sda1 /dev/sdb1
|
||||||
|
|
||||||
|
# Or let mdadm figure it out from superblocks
|
||||||
|
sudo mdadm --assemble --scan
|
||||||
|
```
|
||||||
|
|
||||||
|
### 3. Verify the array state
|
||||||
|
|
||||||
|
```bash
|
||||||
|
cat /proc/mdstat
|
||||||
|
sudo mdadm --detail /dev/md0
|
||||||
|
```
|
||||||
|
|
||||||
|
You want to see `State : active` (or `active, degraded` if a disk is missing). If degraded, the array is still usable but should be rebuilt.
|
||||||
|
|
||||||
|
### 4. Update mdadm.conf so it persists across reboots
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# Generate the config
|
||||||
|
sudo mdadm --detail --scan | sudo tee -a /etc/mdadm.conf
|
||||||
|
|
||||||
|
# Fedora/RHEL — rebuild initramfs so the array is found at boot
|
||||||
|
sudo dracut --force
|
||||||
|
|
||||||
|
# Debian/Ubuntu — update initramfs
|
||||||
|
sudo update-initramfs -u
|
||||||
|
```
|
||||||
|
|
||||||
|
### 5. Mount the filesystem
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# Check the filesystem
|
||||||
|
sudo fsck /dev/md0
|
||||||
|
|
||||||
|
# Mount
|
||||||
|
sudo mount /dev/md0 /mnt/raid
|
||||||
|
|
||||||
|
# Add to fstab for auto-mount
|
||||||
|
echo '/dev/md0 /mnt/raid ext4 defaults 0 2' | sudo tee -a /etc/fstab
|
||||||
|
```
|
||||||
|
|
||||||
|
## Rebuilding a Degraded Array
|
||||||
|
|
||||||
|
If a disk failed or was replaced:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# Add the new disk to the existing array
|
||||||
|
sudo mdadm --manage /dev/md0 --add /dev/sdc1
|
||||||
|
|
||||||
|
# Watch the rebuild progress
|
||||||
|
watch cat /proc/mdstat
|
||||||
|
```
|
||||||
|
|
||||||
|
Rebuild time depends on array size and disk speed. The array is usable during rebuild but with degraded performance.
|
||||||
|
|
||||||
|
## Gotchas & Notes
|
||||||
|
|
||||||
|
- **Don't `--create` when you mean `--assemble`.** `--create` initializes a new array and will overwrite existing superblocks. `--assemble` brings an existing array back online.
|
||||||
|
- **Superblock versions matter.** Modern mdadm uses 1.2 superblocks by default. If the array was created with an older version, specify `--metadata=0.90` during assembly.
|
||||||
|
- **RAID is not a backup.** mdadm protects against disk failure, not against accidental deletion, ransomware, or filesystem corruption. Pair it with rsync or Restic for actual backups.
|
||||||
|
- **Check SMART status on all member disks** after a reinstall. If you're reassembling because a disk failed, make sure the remaining disks are healthy.
|
||||||
|
|
||||||
|
Reference: [mdadm — How to rebuild RAID array after fresh install (Unix & Linux Stack Exchange)](https://unix.stackexchange.com/questions/593836/mdadm-how-to-rebuild-raid-array-after-fresh-install)
|
||||||
|
|
||||||
|
## See Also
|
||||||
|
|
||||||
|
- [snapraid-mergerfs-setup](snapraid-mergerfs-setup.md)
|
||||||
|
- [rsync-backup-patterns](../../02-selfhosting/storage-backup/rsync-backup-patterns.md)
|
||||||
@@ -1,3 +1,12 @@
|
|||||||
|
---
|
||||||
|
title: "SnapRAID & MergerFS Storage Setup"
|
||||||
|
domain: linux
|
||||||
|
category: storage
|
||||||
|
tags: [snapraid, mergerfs, storage, parity, raid, majorraid]
|
||||||
|
status: published
|
||||||
|
created: 2026-04-02
|
||||||
|
updated: 2026-04-02
|
||||||
|
---
|
||||||
# SnapRAID & MergerFS Storage Setup
|
# SnapRAID & MergerFS Storage Setup
|
||||||
|
|
||||||
## Problem
|
## Problem
|
||||||
@@ -16,19 +25,24 @@ A combination of **MergerFS** for pooling and **SnapRAID** for parity. This is i
|
|||||||
### 2. Implementation Strategy
|
### 2. Implementation Strategy
|
||||||
|
|
||||||
1. **Clean the Pool:** Use `rmlint` to clear duplicates and reclaim space.
|
1. **Clean the Pool:** Use `rmlint` to clear duplicates and reclaim space.
|
||||||
2. **Identify the Parity Drive:** Choose your largest drive (or one equal to the largest data drive) to hold the parity information. In my setup, `/mnt/usb` (sdc) was cleared of 4TB of duplicates to be repurposed for this.
|
2. **Identify the Parity Drive:** Choose your largest drive (or one equal to the largest data drive) to hold the parity information.
|
||||||
3. **Configure MergerFS:** Pool the data drives (e.g., `/mnt/disk1`, `/mnt/disk2`) into `/storage`.
|
3. **Configure MergerFS:** Pool the data drives into a single mount point.
|
||||||
4. **Configure SnapRAID:** Point SnapRAID to the data drives and the parity drive.
|
4. **Configure SnapRAID:** Point SnapRAID to the data drives and the parity drive.
|
||||||
|
|
||||||
### 3. MergerFS Config (/etc/fstab)
|
### 3. MergerFS Config (/etc/fstab)
|
||||||
|
|
||||||
|
On majorhome, the pool mounts three ext4 drives to `/majorRAID`:
|
||||||
|
|
||||||
```fstab
|
```fstab
|
||||||
# Example MergerFS pool
|
/mnt/disk1:/mnt/disk2:/mnt/disk3 /majorRAID fuse.mergerfs defaults,allow_other,cache.files=off,use_ino,category.create=mfs,minfreespace=20G,fsname=mergerfsPool 0 0
|
||||||
/mnt/disk*:/mnt/usb-data /storage fuse.mergerfs defaults,allow_other,cache.files=off,use_ino,category.create=mfs,minfreespace=20G,fsname=mergerfsPool 0 0
|
|
||||||
```
|
```
|
||||||
|
|
||||||
|
Adjust the source paths and mount point to match your setup. Each `/mnt/diskN` is an individual ext4 drive mounted separately — MergerFS unions them into the single `/majorRAID` path.
|
||||||
|
|
||||||
### 4. SnapRAID Config (/etc/snapraid.conf)
|
### 4. SnapRAID Config (/etc/snapraid.conf)
|
||||||
|
|
||||||
|
> **Note:** SnapRAID is not yet active on majorhome — a 12TB parity drive purchase is deferred. The config below is the planned setup.
|
||||||
|
|
||||||
```conf
|
```conf
|
||||||
# Parity file location
|
# Parity file location
|
||||||
parity /mnt/parity/snapraid.parity
|
parity /mnt/parity/snapraid.parity
|
||||||
@@ -37,9 +51,11 @@ parity /mnt/parity/snapraid.parity
|
|||||||
content /var/snapraid/snapraid.content
|
content /var/snapraid/snapraid.content
|
||||||
content /mnt/disk1/.snapraid.content
|
content /mnt/disk1/.snapraid.content
|
||||||
content /mnt/disk2/.snapraid.content
|
content /mnt/disk2/.snapraid.content
|
||||||
|
content /mnt/disk3/.snapraid.content
|
||||||
|
|
||||||
data d1 /mnt/disk1/
|
data d1 /mnt/disk1/
|
||||||
data d2 /mnt/disk2/
|
data d2 /mnt/disk2/
|
||||||
|
data d3 /mnt/disk3/
|
||||||
|
|
||||||
# Exclusions
|
# Exclusions
|
||||||
exclude /lost+found/
|
exclude /lost+found/
|
||||||
@@ -68,7 +84,3 @@ snapraid scrub
|
|||||||
```
|
```
|
||||||
|
|
||||||
---
|
---
|
||||||
|
|
||||||
## Tags
|
|
||||||
|
|
||||||
#snapraid #mergerfs #linux #storage #homelab #raid
|
|
||||||
|
|||||||
38
02-selfhosting/dns-networking/network-overview.md
Normal file
38
02-selfhosting/dns-networking/network-overview.md
Normal file
@@ -0,0 +1,38 @@
|
|||||||
|
---
|
||||||
|
title: "Network Overview"
|
||||||
|
domain: selfhosting
|
||||||
|
category: dns-networking
|
||||||
|
tags: [tailscale, networking, infrastructure, dns, vpn]
|
||||||
|
status: published
|
||||||
|
created: 2026-04-02
|
||||||
|
updated: 2026-04-02
|
||||||
|
---
|
||||||
|
# 🌐 Network Overview
|
||||||
|
|
||||||
|
The **MajorsHouse** infrastructure is connected via a private **Tailscale** mesh network. This allows secure, peer-to-peer communication between devices across different geographic locations (US and UK) without exposing services to the public internet.
|
||||||
|
|
||||||
|
## 🏛️ Infrastructure Summary
|
||||||
|
|
||||||
|
- **Address Space:** 100.x.x.x (Tailscale CGNAT)
|
||||||
|
- **Management:** Centralized via **Ansible** (`MajorAnsible` repo)
|
||||||
|
- **Host Groupings:** Functional (web, mail, homelab, bots), OS (Fedora, Ubuntu), and Location (US, UK).
|
||||||
|
|
||||||
|
## 🌍 Geographic Nodes
|
||||||
|
|
||||||
|
| Host | Location | IP | OS |
|
||||||
|
|---|---|---|---|
|
||||||
|
| `dcaprod` | 🇺🇸 US | 100.104.11.146 | Ubuntu 24.04 |
|
||||||
|
| `majortoot` | 🇺🇸 US | 100.110.197.17 | Ubuntu 24.04 |
|
||||||
|
| `majorhome` | 🇺🇸 US | 100.120.209.106 | Fedora 43 |
|
||||||
|
| `teelia` | 🇬🇧 UK | 100.120.32.69 | Ubuntu 24.04 |
|
||||||
|
|
||||||
|
## 🔗 Tailscale Setup
|
||||||
|
|
||||||
|
Tailscale is configured as a persistent service on all nodes. Key features used include:
|
||||||
|
|
||||||
|
- **Tailscale SSH:** Enabled for secure management via Ansible.
|
||||||
|
- **MagicDNS:** Used for internal hostname resolution (e.g., `majorlab.tailscale.net`).
|
||||||
|
- **ACLs:** Managed via the Tailscale admin console to restrict cross-group communication where necessary.
|
||||||
|
|
||||||
|
---
|
||||||
|
*Last updated: 2026-03-04*
|
||||||
@@ -140,6 +140,6 @@ Now any device on your home LAN is reachable from anywhere on the tailnet, even
|
|||||||
|
|
||||||
## See Also
|
## See Also
|
||||||
|
|
||||||
- [[self-hosting-starter-guide]]
|
- [self-hosting-starter-guide](../docker/self-hosting-starter-guide.md)
|
||||||
- [[linux-server-hardening-checklist]]
|
- [linux-server-hardening-checklist](../security/linux-server-hardening-checklist.md)
|
||||||
- [[setting-up-caddy-reverse-proxy]]
|
- [setting-up-caddy-reverse-proxy](../reverse-proxy/setting-up-caddy-reverse-proxy.md)
|
||||||
|
|||||||
@@ -164,5 +164,5 @@ Don't jump straight to the nuclear option. Only use `-v` if you want a completel
|
|||||||
|
|
||||||
## See Also
|
## See Also
|
||||||
|
|
||||||
- [[docker-vs-vms-homelab]]
|
- [docker-vs-vms-homelab](docker-vs-vms-homelab.md)
|
||||||
- [[tuning-netdata-web-log-alerts]]
|
- [tuning-netdata-web-log-alerts](../monitoring/tuning-netdata-web-log-alerts.md)
|
||||||
|
|||||||
157
02-selfhosting/docker/docker-healthchecks.md
Normal file
157
02-selfhosting/docker/docker-healthchecks.md
Normal file
@@ -0,0 +1,157 @@
|
|||||||
|
---
|
||||||
|
title: "Docker Healthchecks"
|
||||||
|
domain: selfhosting
|
||||||
|
category: docker
|
||||||
|
tags: [docker, healthcheck, monitoring, uptime-kuma, compose]
|
||||||
|
status: published
|
||||||
|
created: 2026-03-23
|
||||||
|
updated: 2026-03-23
|
||||||
|
---
|
||||||
|
|
||||||
|
# Docker Healthchecks
|
||||||
|
|
||||||
|
A Docker healthcheck tells the daemon (and any monitoring tool) whether a container is actually working — not just running. Without one, a container shows as `Up` even if the app inside is crashed, deadlocked, or waiting on a dependency.
|
||||||
|
|
||||||
|
## Why It Matters
|
||||||
|
|
||||||
|
Tools like Uptime Kuma report containers without healthchecks as:
|
||||||
|
|
||||||
|
> Container has not reported health and is currently running. As it is running, it is considered UP. Consider adding a health check for better service visibility.
|
||||||
|
|
||||||
|
A healthcheck upgrades that to a real `(healthy)` or `(unhealthy)` status, making monitoring meaningful.
|
||||||
|
|
||||||
|
## Basic Syntax (docker-compose)
|
||||||
|
|
||||||
|
```yaml
|
||||||
|
healthcheck:
|
||||||
|
test: ["CMD", "wget", "-q", "--spider", "http://localhost:8080/health"]
|
||||||
|
interval: 30s
|
||||||
|
timeout: 10s
|
||||||
|
retries: 3
|
||||||
|
start_period: 30s
|
||||||
|
```
|
||||||
|
|
||||||
|
| Field | Description |
|
||||||
|
|---|---|
|
||||||
|
| `test` | Command to run. Exit 0 = healthy, non-zero = unhealthy. |
|
||||||
|
| `interval` | How often to run the check. |
|
||||||
|
| `timeout` | How long to wait before marking as failed. |
|
||||||
|
| `retries` | Failures before marking `unhealthy`. |
|
||||||
|
| `start_period` | Grace period on startup before failures count. |
|
||||||
|
|
||||||
|
## Common Patterns
|
||||||
|
|
||||||
|
### HTTP service (wget — available in Alpine)
|
||||||
|
```yaml
|
||||||
|
healthcheck:
|
||||||
|
test: ["CMD", "wget", "-q", "--spider", "http://localhost:2368/"]
|
||||||
|
interval: 30s
|
||||||
|
timeout: 10s
|
||||||
|
retries: 3
|
||||||
|
start_period: 30s
|
||||||
|
```
|
||||||
|
|
||||||
|
### HTTP service (curl)
|
||||||
|
```yaml
|
||||||
|
healthcheck:
|
||||||
|
test: ["CMD", "curl", "-f", "http://localhost:8080/health"]
|
||||||
|
interval: 30s
|
||||||
|
timeout: 10s
|
||||||
|
retries: 3
|
||||||
|
start_period: 30s
|
||||||
|
```
|
||||||
|
|
||||||
|
### MySQL / MariaDB
|
||||||
|
```yaml
|
||||||
|
healthcheck:
|
||||||
|
test: ["CMD", "mysqladmin", "ping", "-h", "localhost", "-u", "root", "-psecret"]
|
||||||
|
interval: 10s
|
||||||
|
timeout: 5s
|
||||||
|
retries: 3
|
||||||
|
start_period: 20s
|
||||||
|
```
|
||||||
|
|
||||||
|
### PostgreSQL
|
||||||
|
```yaml
|
||||||
|
healthcheck:
|
||||||
|
test: ["CMD-SHELL", "pg_isready -U postgres"]
|
||||||
|
interval: 10s
|
||||||
|
timeout: 5s
|
||||||
|
retries: 5
|
||||||
|
```
|
||||||
|
|
||||||
|
### Redis
|
||||||
|
```yaml
|
||||||
|
healthcheck:
|
||||||
|
test: ["CMD", "redis-cli", "ping"]
|
||||||
|
interval: 10s
|
||||||
|
timeout: 5s
|
||||||
|
retries: 3
|
||||||
|
```
|
||||||
|
|
||||||
|
### TCP port check (no curl/wget available)
|
||||||
|
```yaml
|
||||||
|
healthcheck:
|
||||||
|
test: ["CMD-SHELL", "nc -z localhost 8080 || exit 1"]
|
||||||
|
interval: 30s
|
||||||
|
timeout: 5s
|
||||||
|
retries: 3
|
||||||
|
```
|
||||||
|
|
||||||
|
## Using Healthchecks with `depends_on`
|
||||||
|
|
||||||
|
Healthchecks enable proper startup ordering. Instead of a fixed sleep, a dependent container waits until its dependency is actually ready:
|
||||||
|
|
||||||
|
```yaml
|
||||||
|
services:
|
||||||
|
app:
|
||||||
|
depends_on:
|
||||||
|
db:
|
||||||
|
condition: service_healthy
|
||||||
|
|
||||||
|
db:
|
||||||
|
image: mysql:8.0
|
||||||
|
healthcheck:
|
||||||
|
test: ["CMD", "mysqladmin", "ping", "-h", "localhost"]
|
||||||
|
interval: 10s
|
||||||
|
timeout: 5s
|
||||||
|
retries: 3
|
||||||
|
start_period: 20s
|
||||||
|
```
|
||||||
|
|
||||||
|
This prevents the classic race condition where the app starts before the database is ready to accept connections.
|
||||||
|
|
||||||
|
## Checking Health Status
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# See health status in container list
|
||||||
|
docker ps
|
||||||
|
|
||||||
|
# Get detailed health info including last check output
|
||||||
|
docker inspect --format='{{json .State.Health}}' <container> | jq
|
||||||
|
```
|
||||||
|
|
||||||
|
## Ghost Example
|
||||||
|
|
||||||
|
Ghost (Alpine-based) uses `wget` rather than `curl`:
|
||||||
|
|
||||||
|
```yaml
|
||||||
|
healthcheck:
|
||||||
|
test: ["CMD", "wget", "-q", "--spider", "http://localhost:2368/ghost/api/v4/admin/site/"]
|
||||||
|
interval: 30s
|
||||||
|
timeout: 10s
|
||||||
|
retries: 3
|
||||||
|
start_period: 30s
|
||||||
|
```
|
||||||
|
|
||||||
|
## Gotchas & Notes
|
||||||
|
|
||||||
|
- **Alpine images** don't have `curl` by default — use `wget` or install curl in the image.
|
||||||
|
- **`start_period`** is critical for slow-starting apps (databases, JVM services). Failures during this window don't count toward `retries`.
|
||||||
|
- **`CMD` vs `CMD-SHELL`** — use `CMD` for direct exec (no shell needed), `CMD-SHELL` when you need pipes, `&&`, or shell builtins.
|
||||||
|
- **Uptime Kuma** will pick up Docker healthcheck status automatically when monitoring via the Docker socket — no extra config needed.
|
||||||
|
|
||||||
|
## See Also
|
||||||
|
|
||||||
|
- [debugging-broken-docker-containers](debugging-broken-docker-containers.md)
|
||||||
|
- [netdata-docker-health-alarm-tuning](../monitoring/netdata-docker-health-alarm-tuning.md)
|
||||||
@@ -91,5 +91,5 @@ The two coexist fine on the same host. Docker handles the service layer, KVM han
|
|||||||
|
|
||||||
## See Also
|
## See Also
|
||||||
|
|
||||||
- [[managing-linux-services-systemd-ansible]]
|
- [managing-linux-services-systemd-ansible](../../01-linux/process-management/managing-linux-services-systemd-ansible.md)
|
||||||
- [[tuning-netdata-web-log-alerts]]
|
- [tuning-netdata-web-log-alerts](../monitoring/tuning-netdata-web-log-alerts.md)
|
||||||
|
|||||||
@@ -110,6 +110,6 @@ Tailscale is the easiest and safest starting point for personal use.
|
|||||||
|
|
||||||
## See Also
|
## See Also
|
||||||
|
|
||||||
- [[docker-vs-vms-homelab]]
|
- [docker-vs-vms-homelab](docker-vs-vms-homelab.md)
|
||||||
- [[debugging-broken-docker-containers]]
|
- [debugging-broken-docker-containers](debugging-broken-docker-containers.md)
|
||||||
- [[linux-server-hardening-checklist]]
|
- [linux-server-hardening-checklist](../security/linux-server-hardening-checklist.md)
|
||||||
|
|||||||
105
02-selfhosting/docker/watchtower-smtp-localhost-relay.md
Normal file
105
02-selfhosting/docker/watchtower-smtp-localhost-relay.md
Normal file
@@ -0,0 +1,105 @@
|
|||||||
|
---
|
||||||
|
title: "Watchtower SMTP via Localhost Postfix Relay"
|
||||||
|
domain: selfhosting
|
||||||
|
category: docker
|
||||||
|
tags: [watchtower, docker, smtp, postfix, email, notifications]
|
||||||
|
status: published
|
||||||
|
created: 2026-04-17
|
||||||
|
updated: 2026-04-17
|
||||||
|
---
|
||||||
|
# Watchtower SMTP via Localhost Postfix Relay
|
||||||
|
|
||||||
|
## The Problem
|
||||||
|
|
||||||
|
Watchtower supports email notifications via its built-in shoutrrr SMTP driver. The typical setup stores SMTP credentials in the compose file or a separate env file. This creates two failure modes:
|
||||||
|
|
||||||
|
1. **Password rotation breaks notifications silently.** When you rotate your mail server password, Watchtower keeps running but stops sending emails. You only discover it when you notice container updates happened with no notification.
|
||||||
|
2. **Credentials at rest.** `docker-compose.yml` and `.env` files are often world-readable or checked into git. SMTP passwords stored there are a credential leak waiting to happen.
|
||||||
|
|
||||||
|
The shoutrrr SMTP driver also has a quirk: it attempts AUTH over an unencrypted connection to remote SMTP servers, which most mail servers (correctly) reject with `535 5.7.8 authentication failed` or similar.
|
||||||
|
|
||||||
|
## The Solution
|
||||||
|
|
||||||
|
Route Watchtower's outbound mail through **localhost port 25** using `network_mode: host`. The local Postfix MTA — already running on the host for relay purposes — handles authentication to the upstream mail server. Watchtower never sees a credential.
|
||||||
|
|
||||||
|
```
|
||||||
|
Watchtower → localhost:25 (Postfix, trusted via mynetworks — no auth required)
|
||||||
|
→ Postfix → upstream mail server → delivery
|
||||||
|
```
|
||||||
|
|
||||||
|
## docker-compose.yml
|
||||||
|
|
||||||
|
```yaml
|
||||||
|
services:
|
||||||
|
watchtower:
|
||||||
|
image: containrrr/watchtower
|
||||||
|
restart: always
|
||||||
|
network_mode: host
|
||||||
|
volumes:
|
||||||
|
- /var/run/docker.sock:/var/run/docker.sock
|
||||||
|
environment:
|
||||||
|
- DOCKER_API_VERSION=1.44
|
||||||
|
- WATCHTOWER_CLEANUP=true
|
||||||
|
- WATCHTOWER_SCHEDULE=0 0 4 * * *
|
||||||
|
- WATCHTOWER_INCLUDE_STOPPED=false
|
||||||
|
- WATCHTOWER_NOTIFICATIONS=email
|
||||||
|
- WATCHTOWER_NOTIFICATION_EMAIL_FROM=watchtower@yourdomain.com
|
||||||
|
- WATCHTOWER_NOTIFICATION_EMAIL_TO=you@yourdomain.com
|
||||||
|
- WATCHTOWER_NOTIFICATION_EMAIL_SERVER=localhost
|
||||||
|
- WATCHTOWER_NOTIFICATION_EMAIL_SERVER_PORT=25
|
||||||
|
- WATCHTOWER_NOTIFICATION_EMAIL_SERVER_TLS_SKIP_VERIFY=true
|
||||||
|
- WATCHTOWER_NOTIFICATION_EMAIL_DELAY=2
|
||||||
|
```
|
||||||
|
|
||||||
|
**Key settings:**
|
||||||
|
- `network_mode: host` — required so `localhost` resolves to the host's loopback interface (and port 25). Without this, `localhost` resolves to the container's own loopback, which has no Postfix.
|
||||||
|
- `EMAIL_SERVER=localhost`, `PORT=25` — target the local Postfix
|
||||||
|
- `TLS_SKIP_VERIFY=true` — shoutrrr still negotiates STARTTLS even on port 25; a self-signed or expired local Postfix cert is fine to skip
|
||||||
|
- No `EMAIL_SERVER_USER` or `EMAIL_SERVER_PASSWORD` — Postfix trusts `127.0.0.1` via `mynetworks`, no auth needed
|
||||||
|
|
||||||
|
## Prerequisites
|
||||||
|
|
||||||
|
The host needs a Postfix instance that:
|
||||||
|
1. Listens on `localhost:25`
|
||||||
|
2. Includes `127.0.0.0/8` in `mynetworks` so local processes can relay without authentication
|
||||||
|
3. Is configured to relay outbound to your actual mail server
|
||||||
|
|
||||||
|
This is standard for any host already running a Postfix relay. If Postfix isn't installed, a minimal relay-only config is a few lines in `main.cf`.
|
||||||
|
|
||||||
|
## Why Not Just Use an Env File?
|
||||||
|
|
||||||
|
A separate env file (mode 0600) is better than inline compose, but you still have a credential that breaks on rotation. The localhost relay pattern eliminates the credential entirely.
|
||||||
|
|
||||||
|
| Approach | Credentials stored | Rotation-safe |
|
||||||
|
|---|---|---|
|
||||||
|
| Inline in compose | Yes (plaintext, often 0644) | ❌ |
|
||||||
|
| Separate env file (0600) | Yes (protected but present) | ❌ |
|
||||||
|
| Localhost Postfix relay | None | ✅ |
|
||||||
|
|
||||||
|
## Testing
|
||||||
|
|
||||||
|
After `docker compose up -d`, check the Watchtower logs for a startup notification:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
docker logs <watchtower-container-name> 2>&1 | head -20
|
||||||
|
# Look for: "Sending notification..."
|
||||||
|
```
|
||||||
|
|
||||||
|
Confirm Postfix delivered it:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
grep watchtower /var/log/mail.log | tail -5
|
||||||
|
# Look for: status=sent (250 2.0.0 Ok)
|
||||||
|
```
|
||||||
|
|
||||||
|
## Gotchas
|
||||||
|
|
||||||
|
- **`network_mode: host` is Linux-only.** Docker Desktop on macOS/Windows doesn't support host networking. This pattern only works on Linux hosts.
|
||||||
|
- **`network_mode: host` drops port mappings.** Any `ports:` entries are silently ignored under `network_mode: host`. Watchtower doesn't expose ports, so this isn't an issue.
|
||||||
|
- **Postfix TLS cert warning.** shoutrrr attempts STARTTLS on port 25 regardless. If the local Postfix has a self-signed or expired cert, `TLS_SKIP_VERIFY=true` suppresses the error. For a proper fix, renew the Postfix cert.
|
||||||
|
- **`WATCHTOWER_DISABLE_CONTAINERS`.** If you run stacks that manage their own updates (Nextcloud AIO, etc.), list those containers here (space-separated) to prevent Watchtower from interfering.
|
||||||
|
|
||||||
|
## See Also
|
||||||
|
|
||||||
|
- [docker-healthchecks](docker-healthchecks.md)
|
||||||
|
- [debugging-broken-docker-containers](debugging-broken-docker-containers.md)
|
||||||
@@ -1,3 +1,7 @@
|
|||||||
|
---
|
||||||
|
created: 2026-04-13T10:15
|
||||||
|
updated: 2026-04-13T10:15
|
||||||
|
---
|
||||||
# 🏠 Self-Hosting & Homelab
|
# 🏠 Self-Hosting & Homelab
|
||||||
|
|
||||||
Guides for running your own services at home, including Docker, reverse proxies, DNS, storage, monitoring, and security.
|
Guides for running your own services at home, including Docker, reverse proxies, DNS, storage, monitoring, and security.
|
||||||
@@ -23,7 +27,15 @@ Guides for running your own services at home, including Docker, reverse proxies,
|
|||||||
## Monitoring
|
## Monitoring
|
||||||
|
|
||||||
- [Tuning Netdata Web Log Alerts](monitoring/tuning-netdata-web-log-alerts.md)
|
- [Tuning Netdata Web Log Alerts](monitoring/tuning-netdata-web-log-alerts.md)
|
||||||
|
- [Tuning Netdata Docker Health Alarms](monitoring/netdata-docker-health-alarm-tuning.md)
|
||||||
|
- [Deploying Netdata to a New Server](monitoring/netdata-new-server-setup.md)
|
||||||
|
|
||||||
## Security
|
## Security
|
||||||
|
|
||||||
- [Linux Server Hardening Checklist](security/linux-server-hardening-checklist.md)
|
- [Linux Server Hardening Checklist](security/linux-server-hardening-checklist.md)
|
||||||
|
- [Standardizing unattended-upgrades with Ansible](security/ansible-unattended-upgrades-fleet.md)
|
||||||
|
- [Fail2ban Custom Jail: Apache 404 Scanner Detection](security/fail2ban-apache-404-scanner-jail.md)
|
||||||
|
- [Fail2ban Custom Jail: Apache PHP Webshell Probe Detection](security/fail2ban-apache-php-probe-jail.md)
|
||||||
|
- [Fail2ban Custom Jail: WordPress Login Brute Force](security/fail2ban-wordpress-login-jail.md)
|
||||||
|
- [SELinux: Fixing Fail2ban grep execmem Denial](security/selinux-fail2ban-execmem-fix.md)
|
||||||
|
- [UFW Firewall Management](security/ufw-firewall-management.md)
|
||||||
|
|||||||
157
02-selfhosting/monitoring/netdata-docker-health-alarm-tuning.md
Normal file
157
02-selfhosting/monitoring/netdata-docker-health-alarm-tuning.md
Normal file
@@ -0,0 +1,157 @@
|
|||||||
|
---
|
||||||
|
title: "Tuning Netdata Docker Health Alarms to Prevent Update Flapping"
|
||||||
|
domain: selfhosting
|
||||||
|
category: monitoring
|
||||||
|
tags: [netdata, docker, nextcloud, alarms, health, monitoring]
|
||||||
|
status: published
|
||||||
|
created: 2026-03-18
|
||||||
|
updated: 2026-03-28
|
||||||
|
---
|
||||||
|
|
||||||
|
# Tuning Netdata Docker Health Alarms to Prevent Update Flapping
|
||||||
|
|
||||||
|
Netdata's default `docker_container_unhealthy` alarm fires on a 10-second average with no delay. When Nextcloud AIO (or any stack with a watchtower/auto-update setup) does its nightly update cycle, containers restart in sequence and briefly show as unhealthy — generating a flood of false alerts.
|
||||||
|
|
||||||
|
## The Default Alarm
|
||||||
|
|
||||||
|
```ini
|
||||||
|
template: docker_container_unhealthy
|
||||||
|
on: docker.container_health_status
|
||||||
|
every: 10s
|
||||||
|
lookup: average -10s of unhealthy
|
||||||
|
warn: $this > 0
|
||||||
|
```
|
||||||
|
|
||||||
|
A single container being unhealthy for 10 seconds triggers it. No grace period, no delay.
|
||||||
|
|
||||||
|
## The Fix
|
||||||
|
|
||||||
|
Create a custom override at `/etc/netdata/health.d/docker.conf` (maps to the Netdata config volume if running in Docker). This file takes precedence over the stock config in `/usr/lib/netdata/conf.d/health.d/docker.conf`.
|
||||||
|
|
||||||
|
### General Container Alarm
|
||||||
|
|
||||||
|
This alarm covers all containers **except** `nextcloud-aio-nextcloud`, which gets its own dedicated alarm (see below).
|
||||||
|
|
||||||
|
```ini
|
||||||
|
# Custom override — reduces flapping during nightly container updates.
|
||||||
|
# General container unhealthy alarm — all containers except nextcloud-aio-nextcloud
|
||||||
|
|
||||||
|
template: docker_container_unhealthy
|
||||||
|
on: docker.container_health_status
|
||||||
|
class: Errors
|
||||||
|
type: Containers
|
||||||
|
component: Docker
|
||||||
|
units: status
|
||||||
|
every: 30s
|
||||||
|
lookup: average -5m of unhealthy
|
||||||
|
chart labels: container_name=!nextcloud-aio-nextcloud *
|
||||||
|
warn: $this > 0
|
||||||
|
delay: up 3m down 5m multiplier 1.5 max 30m
|
||||||
|
summary: Docker container ${label:container_name} health
|
||||||
|
info: ${label:container_name} docker container health status is unhealthy
|
||||||
|
to: sysadmin
|
||||||
|
```
|
||||||
|
|
||||||
|
| Setting | Default | Tuned | Effect |
|
||||||
|
|---|---|---|---|
|
||||||
|
| `every` | 10s | 30s | Check less frequently |
|
||||||
|
| `lookup` | average -10s | average -5m | Smooths transient unhealthy samples over 5 minutes |
|
||||||
|
| `delay: up 3m` | none | 3m | Won't fire until unhealthy condition persists for 3 continuous minutes |
|
||||||
|
| `delay: down 5m` | none | 5m (max 30m) | Grace period after recovery before clearing |
|
||||||
|
|
||||||
|
### Dedicated Nextcloud AIO Alarm
|
||||||
|
|
||||||
|
Added 2026-03-23, updated 2026-03-28. The `nextcloud-aio-nextcloud` container needs a more lenient window than other containers. Its healthcheck (`/healthcheck.sh`) verifies PostgreSQL connectivity (port 5432) and PHP-FPM (port 9000). PHP-FPM takes ~90 seconds to warm up after a normal restart — but during nightly AIO update cycles, the full startup (occ upgrade, app updates, migrations) can take 5+ minutes. On 2026-03-27, a startup hung and left the container unhealthy for 20 hours until the next nightly cycle replaced it.
|
||||||
|
|
||||||
|
The dedicated alarm uses a 10-minute lookup window and 10-minute delay to absorb normal startup, while still catching sustained failures:
|
||||||
|
|
||||||
|
```ini
|
||||||
|
# Dedicated alarm for nextcloud-aio-nextcloud — lenient window to absorb nightly update cycle
|
||||||
|
# PHP-FPM can take 5+ minutes to warm up; only alert on sustained failure
|
||||||
|
|
||||||
|
template: docker_nextcloud_unhealthy
|
||||||
|
on: docker.container_health_status
|
||||||
|
class: Errors
|
||||||
|
type: Containers
|
||||||
|
component: Docker
|
||||||
|
units: status
|
||||||
|
every: 30s
|
||||||
|
lookup: average -10m of unhealthy
|
||||||
|
chart labels: container_name=nextcloud-aio-nextcloud
|
||||||
|
warn: $this > 0
|
||||||
|
delay: up 10m down 5m multiplier 1.5 max 30m
|
||||||
|
summary: Nextcloud container health sustained
|
||||||
|
info: nextcloud-aio-nextcloud has been unhealthy for a sustained period — not a transient update blip
|
||||||
|
to: sysadmin
|
||||||
|
```
|
||||||
|
|
||||||
|
## Watchdog Cron: Auto-Restart on Sustained Unhealthy
|
||||||
|
|
||||||
|
If the Nextcloud container stays unhealthy for more than 1 hour (well past any normal startup window), a cron watchdog on majorlab auto-restarts it and logs the event. This was added 2026-03-28 after an incident where the container sat unhealthy for 20 hours until the next nightly backup cycle replaced it.
|
||||||
|
|
||||||
|
**File:** `/etc/cron.d/nextcloud-health-watchdog`
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# Restart nextcloud-aio-nextcloud if unhealthy for >1 hour
|
||||||
|
*/15 * * * * root docker inspect --format={{.State.Health.Status}} nextcloud-aio-nextcloud 2>/dev/null | grep -q unhealthy && [ "$(docker inspect --format={{.State.StartedAt}} nextcloud-aio-nextcloud | xargs -I{} date -d {} +\%s)" -lt "$(date -d "1 hour ago" +\%s)" ] && docker restart nextcloud-aio-nextcloud && logger -t nextcloud-watchdog "Restarted unhealthy nextcloud-aio-nextcloud"
|
||||||
|
```
|
||||||
|
|
||||||
|
- Runs every 15 minutes as root
|
||||||
|
- Only restarts if the container has been running for >1 hour (avoids interfering with normal startup)
|
||||||
|
- Logs to syslog as `nextcloud-watchdog` — check with `journalctl -t nextcloud-watchdog`
|
||||||
|
- Netdata will still fire the `docker_nextcloud_unhealthy` alert during the unhealthy window, but the outage is capped at ~1 hour instead of persisting until the next nightly cycle
|
||||||
|
|
||||||
|
## Also: Suppress `docker_container_down` for Normally-Exiting Containers
|
||||||
|
|
||||||
|
Nextcloud AIO runs `borgbackup` (scheduled backups) and `watchtower` (auto-updates) as containers that exit with code 0 after completing their work. The stock `docker_container_down` alarm fires on any exited container, generating false alerts after every nightly cycle.
|
||||||
|
|
||||||
|
Add a second override to the same file using `chart labels` to exclude them:
|
||||||
|
|
||||||
|
```ini
|
||||||
|
# Suppress docker_container_down for Nextcloud AIO containers that exit normally
|
||||||
|
# (borgbackup runs on schedule then exits; watchtower does updates then exits)
|
||||||
|
template: docker_container_down
|
||||||
|
on: docker.container_running_state
|
||||||
|
class: Errors
|
||||||
|
type: Containers
|
||||||
|
component: Docker
|
||||||
|
units: status
|
||||||
|
every: 30s
|
||||||
|
lookup: average -5m of down
|
||||||
|
chart labels: container_name=!nextcloud-aio-borgbackup !nextcloud-aio-watchtower *
|
||||||
|
warn: $this > 0
|
||||||
|
delay: up 3m down 5m multiplier 1.5 max 30m
|
||||||
|
summary: Docker container ${label:container_name} down
|
||||||
|
info: ${label:container_name} docker container is down
|
||||||
|
to: sysadmin
|
||||||
|
```
|
||||||
|
|
||||||
|
The `chart labels` line uses Netdata's simple pattern syntax — `!` prefix excludes a container, `*` matches everything else. All other exited containers still alert normally.
|
||||||
|
|
||||||
|
## Applying the Config
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# If Netdata runs in Docker, write to the config volume
|
||||||
|
sudo tee /var/lib/docker/volumes/netdata_netdataconfig/_data/health.d/docker.conf > /dev/null << 'EOF'
|
||||||
|
# paste config here
|
||||||
|
EOF
|
||||||
|
|
||||||
|
# Reload health alarms without restarting the container
|
||||||
|
sudo docker exec netdata netdatacli reload-health
|
||||||
|
```
|
||||||
|
|
||||||
|
No container restart needed — `reload-health` picks up the new config immediately.
|
||||||
|
|
||||||
|
## Verify
|
||||||
|
|
||||||
|
In the Netdata UI, navigate to **Alerts → Manage Alerts** and search for `docker_container_unhealthy`. The lookup and delay values should reflect the new config.
|
||||||
|
|
||||||
|
## Notes
|
||||||
|
|
||||||
|
- Both `docker_container_unhealthy` and `docker_container_down` are overridden in this config. Any container not explicitly excluded in the `chart labels` filter will still alert normally.
|
||||||
|
- If you want per-container silencing instead of a blanket delay, use the `host labels` or `chart labels` filter to scope the alarm to specific containers.
|
||||||
|
- Config volume path on majorlab: `/var/lib/docker/volumes/netdata_netdataconfig/_data/`
|
||||||
|
|
||||||
|
## See Also
|
||||||
|
|
||||||
|
- [Tuning Netdata Web Log Alerts](tuning-netdata-web-log-alerts.md) — similar tuning for web_log redirect alerts
|
||||||
162
02-selfhosting/monitoring/netdata-n8n-enriched-alerts.md
Normal file
162
02-selfhosting/monitoring/netdata-n8n-enriched-alerts.md
Normal file
@@ -0,0 +1,162 @@
|
|||||||
|
---
|
||||||
|
title: "Netdata n8n Enriched Alert Emails"
|
||||||
|
domain: selfhosting
|
||||||
|
category: monitoring
|
||||||
|
tags: [netdata, n8n, alerts, email, monitoring, automation]
|
||||||
|
status: published
|
||||||
|
created: 2026-04-02
|
||||||
|
updated: 2026-04-02
|
||||||
|
---
|
||||||
|
# Netdata → n8n Enriched Alert Emails
|
||||||
|
|
||||||
|
**Status:** Live across all MajorsHouse fleet servers as of 2026-03-21
|
||||||
|
|
||||||
|
Replaces Netdata's plain-text alert emails with rich HTML emails that include a plain-English explanation, a suggested remediation command, and a direct link to the relevant MajorWiki article.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## How It Works
|
||||||
|
|
||||||
|
```
|
||||||
|
Netdata alarm fires
|
||||||
|
→ custom_sender() in health_alarm_notify.conf
|
||||||
|
→ POST JSON payload to n8n webhook
|
||||||
|
→ Code node enriches with suggestion + wiki link
|
||||||
|
→ Send Email node sends HTML email via SMTP
|
||||||
|
→ Respond node returns 200 OK
|
||||||
|
```
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## n8n Workflow
|
||||||
|
|
||||||
|
**Name:** Netdata Enriched Alerts
|
||||||
|
**URL:** https://n8n.majorshouse.com
|
||||||
|
**Webhook endpoint:** `POST https://n8n.majorshouse.com/webhook/netdata-alert`
|
||||||
|
**Workflow ID:** `a1b2c3d4-aaaa-bbbb-cccc-000000000001`
|
||||||
|
|
||||||
|
### Nodes
|
||||||
|
|
||||||
|
1. **Netdata Webhook** — receives POST from Netdata's `custom_sender()`
|
||||||
|
2. **Enrich Alert** — Code node; matches alarm/chart/family to enrichment table, builds HTML email body in `$json.emailBody`
|
||||||
|
3. **Send Enriched Email** — sends via SMTP port 465 (SMTP account 2), from `netdata@majorshouse.com` to `marcus@majorshouse.com`
|
||||||
|
4. **Respond OK** — returns `ok` with HTTP 200 to Netdata
|
||||||
|
|
||||||
|
### Enrichment Keys
|
||||||
|
|
||||||
|
The Code node matches on `alarm`, `chart`, or `family` field (case-insensitive substring):
|
||||||
|
|
||||||
|
| Key | Title | Wiki Article |
|
||||||
|
|-----|-------|-------------|
|
||||||
|
| `disk_space` | Disk Space Alert | snapraid-mergerfs-setup |
|
||||||
|
| `ram` | Memory Alert | managing-linux-services-systemd-ansible |
|
||||||
|
| `cpu` | CPU Alert | managing-linux-services-systemd-ansible |
|
||||||
|
| `load` | Load Average Alert | managing-linux-services-systemd-ansible |
|
||||||
|
| `net` | Network Alert | tailscale-homelab-remote-access |
|
||||||
|
| `docker` | Docker Container Alert | debugging-broken-docker-containers |
|
||||||
|
| `web_log` | Web Log Alert | tuning-netdata-web-log-alerts |
|
||||||
|
| `health` | Docker Health Alarm | netdata-docker-health-alarm-tuning |
|
||||||
|
| `mdstat` | RAID Array Alert | mdadm-usb-hub-disconnect-recovery |
|
||||||
|
| `systemd` | Systemd Service Alert | docker-caddy-selinux-post-reboot-recovery |
|
||||||
|
| _(no match)_ | Server Alert | netdata-new-server-setup |
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Netdata Configuration
|
||||||
|
|
||||||
|
### Config File Locations
|
||||||
|
|
||||||
|
| Server | Path |
|
||||||
|
|--------|------|
|
||||||
|
| majorhome, majormail, majordiscord, tttpod, teelia | `/etc/netdata/health_alarm_notify.conf` |
|
||||||
|
| majorlinux, majortoot, dca | `/usr/lib/netdata/conf.d/health_alarm_notify.conf` |
|
||||||
|
|
||||||
|
### Required Settings
|
||||||
|
|
||||||
|
```bash
|
||||||
|
DEFAULT_RECIPIENT_CUSTOM="n8n"
|
||||||
|
role_recipients_custom[sysadmin]="${DEFAULT_RECIPIENT_CUSTOM}"
|
||||||
|
```
|
||||||
|
|
||||||
|
### custom_sender() Function
|
||||||
|
|
||||||
|
```bash
|
||||||
|
custom_sender() {
|
||||||
|
local to="${1}"
|
||||||
|
local payload
|
||||||
|
payload=$(jq -n \
|
||||||
|
--arg hostname "${host}" \
|
||||||
|
--arg alarm "${name}" \
|
||||||
|
--arg chart "${chart}" \
|
||||||
|
--arg family "${family}" \
|
||||||
|
--arg status "${status}" \
|
||||||
|
--arg old_status "${old_status}" \
|
||||||
|
--arg value "${value_string}" \
|
||||||
|
--arg units "${units}" \
|
||||||
|
--arg info "${info}" \
|
||||||
|
--arg alert_url "${goto_url}" \
|
||||||
|
--arg severity "${severity}" \
|
||||||
|
--arg raised_for "${raised_for}" \
|
||||||
|
--arg total_warnings "${total_warnings}" \
|
||||||
|
--arg total_critical "${total_critical}" \
|
||||||
|
'{hostname:$hostname,alarm:$alarm,chart:$chart,family:$family,status:$status,old_status:$old_status,value:$value,units:$units,info:$info,alert_url:$alert_url,severity:$severity,raised_for:$raised_for,total_warnings:$total_warnings,total_critical:$total_critical}')
|
||||||
|
local httpcode
|
||||||
|
httpcode=$(docurl -s -o /dev/null -w "%{http_code}" \
|
||||||
|
-X POST \
|
||||||
|
-H "Content-Type: application/json" \
|
||||||
|
-d "${payload}" \
|
||||||
|
"https://n8n.majorshouse.com/webhook/netdata-alert")
|
||||||
|
if [ "${httpcode}" = "200" ]; then
|
||||||
|
info "sent enriched notification to n8n for ${status} of ${host}.${name}"
|
||||||
|
sent=$((sent + 1))
|
||||||
|
else
|
||||||
|
error "failed to send notification to n8n, HTTP code: ${httpcode}"
|
||||||
|
fi
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
!!! note "jq required"
|
||||||
|
The `custom_sender()` function requires `jq` to be installed. Verify with `which jq` on each server.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Deploying to a New Server
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# 1. Find the config file
|
||||||
|
find /etc/netdata /usr/lib/netdata -name health_alarm_notify.conf 2>/dev/null
|
||||||
|
|
||||||
|
# 2. Edit it — add the two lines and the custom_sender() function above
|
||||||
|
|
||||||
|
# 3. Test connectivity from the server
|
||||||
|
curl -s -o /dev/null -w "%{http_code}" \
|
||||||
|
-X POST https://n8n.majorshouse.com/webhook/netdata-alert \
|
||||||
|
-H "Content-Type: application/json" \
|
||||||
|
-d '{"hostname":"test","alarm":"disk_space._","status":"WARNING"}'
|
||||||
|
# Expected: 200
|
||||||
|
|
||||||
|
# 4. Restart Netdata
|
||||||
|
systemctl restart netdata
|
||||||
|
|
||||||
|
# 5. Send a test alarm
|
||||||
|
/usr/libexec/netdata/plugins.d/alarm-notify.sh test custom
|
||||||
|
```
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Troubleshooting
|
||||||
|
|
||||||
|
**Emails not arriving — check n8n execution log:**
|
||||||
|
Go to https://n8n.majorshouse.com → open "Netdata Enriched Alerts" → Executions tab. Look for `error` status entries.
|
||||||
|
|
||||||
|
**Email body empty:**
|
||||||
|
The Send Email node's HTML field must be `={{ $json.emailBody }}`. Shell variable expansion can silently strip `$json` if the workflow is patched via inline SSH commands — always use a Python script file.
|
||||||
|
|
||||||
|
**`000` curl response from a server:**
|
||||||
|
Usually a timeout, not a DNS or connection failure. Re-test with `--max-time 30`.
|
||||||
|
|
||||||
|
**`custom_sender()` syntax error in Netdata logs:**
|
||||||
|
Bash heredocs don't work inside sourced config files. Use `jq -n --arg ...` as shown above — no heredocs.
|
||||||
|
|
||||||
|
**n8n `N8N_TRUST_PROXY` must be set:**
|
||||||
|
Without `N8N_TRUST_PROXY=true` in the Docker environment, Caddy's `X-Forwarded-For` header causes n8n's rate limiter to abort requests before parsing the body. Set in `/opt/n8n/compose.yml`.
|
||||||
161
02-selfhosting/monitoring/netdata-new-server-setup.md
Normal file
161
02-selfhosting/monitoring/netdata-new-server-setup.md
Normal file
@@ -0,0 +1,161 @@
|
|||||||
|
---
|
||||||
|
title: "Deploying Netdata to a New Server"
|
||||||
|
domain: selfhosting
|
||||||
|
category: monitoring
|
||||||
|
tags: [netdata, monitoring, email, notifications, netdata-cloud, ubuntu, debian, n8n]
|
||||||
|
status: published
|
||||||
|
created: 2026-03-18
|
||||||
|
updated: 2026-03-22
|
||||||
|
---
|
||||||
|
|
||||||
|
# Deploying Netdata to a New Server
|
||||||
|
|
||||||
|
This covers the full Netdata setup for a new server in the fleet: install, email notification config, n8n webhook integration, and Netdata Cloud claim. Applies to Ubuntu/Debian servers.
|
||||||
|
|
||||||
|
## 1. Install Prerequisites
|
||||||
|
|
||||||
|
Install `jq` before anything else. It is required by the `custom_sender()` function in `health_alarm_notify.conf` to build the JSON payload sent to the n8n webhook. **If `jq` is missing, the webhook will fire with an empty body and n8n alert emails will have no information in them.**
|
||||||
|
|
||||||
|
```bash
|
||||||
|
apt install -y jq
|
||||||
|
```
|
||||||
|
|
||||||
|
Verify:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
jq --version
|
||||||
|
```
|
||||||
|
|
||||||
|
## 2. Install Netdata
|
||||||
|
|
||||||
|
Use the official kickstart script:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
wget -O /tmp/netdata-install.sh https://get.netdata.cloud/kickstart.sh
|
||||||
|
sh /tmp/netdata-install.sh --non-interactive --stable-channel --disable-telemetry
|
||||||
|
```
|
||||||
|
|
||||||
|
Verify it's running:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
systemctl is-active netdata
|
||||||
|
curl -s http://localhost:19999/api/v1/info | python3 -c "import sys,json; d=json.load(sys.stdin); print('Netdata', d['version'])"
|
||||||
|
```
|
||||||
|
|
||||||
|
## 3. Configure Email Notifications
|
||||||
|
|
||||||
|
Copy the default config and set the three required values:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
cp /usr/lib/netdata/conf.d/health_alarm_notify.conf /etc/netdata/health_alarm_notify.conf
|
||||||
|
```
|
||||||
|
|
||||||
|
Edit `/etc/netdata/health_alarm_notify.conf`:
|
||||||
|
|
||||||
|
```ini
|
||||||
|
EMAIL_SENDER="netdata@majorshouse.com"
|
||||||
|
SEND_EMAIL="YES"
|
||||||
|
DEFAULT_RECIPIENT_EMAIL="marcus@majorshouse.com"
|
||||||
|
```
|
||||||
|
|
||||||
|
Or apply with `sed` in one shot:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
sed -i 's/^#\?EMAIL_SENDER=.*/EMAIL_SENDER="netdata@majorshouse.com"/' /etc/netdata/health_alarm_notify.conf
|
||||||
|
sed -i 's/^#\?SEND_EMAIL=.*/SEND_EMAIL="YES"/' /etc/netdata/health_alarm_notify.conf
|
||||||
|
sed -i 's/^#\?DEFAULT_RECIPIENT_EMAIL=.*/DEFAULT_RECIPIENT_EMAIL="marcus@majorshouse.com"/' /etc/netdata/health_alarm_notify.conf
|
||||||
|
```
|
||||||
|
|
||||||
|
Restart and test:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
systemctl restart netdata
|
||||||
|
/usr/libexec/netdata/plugins.d/alarm-notify.sh test 2>&1 | grep -E '(OK|FAILED|email)'
|
||||||
|
```
|
||||||
|
|
||||||
|
You should see three `# OK` lines (WARNING → CRITICAL → CLEAR test cycle) and confirmation that email was sent to `marcus@majorshouse.com`.
|
||||||
|
|
||||||
|
> [!note] Delivery via local Postfix
|
||||||
|
> Email is relayed through the server's local Postfix instance. Ensure Postfix is installed and `/usr/sbin/sendmail` resolves.
|
||||||
|
|
||||||
|
## 4. Configure n8n Webhook Notifications
|
||||||
|
|
||||||
|
Copy the `health_alarm_notify.conf` from an existing server (e.g. majormail) which contains the `custom_sender()` function. This sends enriched JSON payloads to the n8n webhook at `https://n8n.majorshouse.com/webhook/netdata-alert`.
|
||||||
|
|
||||||
|
> [!warning] jq required
|
||||||
|
> The `custom_sender()` function uses `jq` to build the JSON payload. If `jq` is not installed, `payload` will be empty, curl will send `Content-Length: 0`, and n8n will produce alert emails with `Host: unknown`, blank alert/value fields, and `Status: UNKNOWN`. Always install `jq` first (Step 1).
|
||||||
|
|
||||||
|
After deploying the config, run a test to confirm the webhook fires correctly:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
systemctl restart netdata
|
||||||
|
/usr/libexec/netdata/plugins.d/alarm-notify.sh test 2>&1 | grep -E '(custom|n8n|OK|FAILED)'
|
||||||
|
```
|
||||||
|
|
||||||
|
Verify in n8n that the latest execution shows a non-empty body with `hostname`, `alarm`, and `status` fields populated.
|
||||||
|
|
||||||
|
## 5. Claim to Netdata Cloud
|
||||||
|
|
||||||
|
Get the claim command from **Netdata Cloud → Space Settings → Nodes → Add Nodes**. It will look like:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
wget -O /tmp/netdata-kickstart.sh https://get.netdata.cloud/kickstart.sh
|
||||||
|
sh /tmp/netdata-kickstart.sh --stable-channel \
|
||||||
|
--claim-token <token> \
|
||||||
|
--claim-rooms <room-id> \
|
||||||
|
--claim-url https://app.netdata.cloud
|
||||||
|
```
|
||||||
|
|
||||||
|
Verify the claim was accepted:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
cat /var/lib/netdata/cloud.d/claimed_id
|
||||||
|
```
|
||||||
|
|
||||||
|
A UUID will be present if claimed successfully. The node should appear in Netdata Cloud within ~60 seconds.
|
||||||
|
|
||||||
|
## 6. Verify Alerts
|
||||||
|
|
||||||
|
Check that no unexpected alerts are active after setup:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
curl -s 'http://localhost:19999/api/v1/alarms?active' | python3 -c "
|
||||||
|
import sys, json
|
||||||
|
d = json.load(sys.stdin)
|
||||||
|
active = [v for v in d.get('alarms', {}).values() if v.get('status') not in ('CLEAR', 'UNINITIALIZED', 'UNDEFINED')]
|
||||||
|
print(f'{len(active)} active alert(s)')
|
||||||
|
for v in active:
|
||||||
|
print(f' [{v[\"status\"]}] {v[\"name\"]} on {v[\"chart\"]}')
|
||||||
|
"
|
||||||
|
```
|
||||||
|
|
||||||
|
## Fleet-wide Alert Check
|
||||||
|
|
||||||
|
To audit all servers at once (requires Tailscale SSH access):
|
||||||
|
|
||||||
|
```bash
|
||||||
|
for host in majorlab majorhome majormail majordiscord majortoot majorlinux tttpod dca teelia; do
|
||||||
|
echo "=== $host ==="
|
||||||
|
ssh root@$host "curl -s 'http://localhost:19999/api/v1/alarms?active' | python3 -c \
|
||||||
|
\"import sys,json; d=json.load(sys.stdin); active=[v for v in d.get('alarms',{}).values() if v.get('status') not in ('CLEAR','UNINITIALIZED','UNDEFINED')]; print(str(len(active))+' active')\""
|
||||||
|
done
|
||||||
|
```
|
||||||
|
|
||||||
|
## Fleet-wide jq Audit
|
||||||
|
|
||||||
|
To check that all servers with `custom_sender` have `jq` installed:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
for host in majorlab majorhome majormail majordiscord majortoot majorlinux tttpod dca teelia; do
|
||||||
|
echo -n "=== $host: "
|
||||||
|
ssh -o ConnectTimeout=5 root@$host \
|
||||||
|
'has_cs=$(grep -l "custom_sender\|n8n.majorshouse.com" /etc/netdata/health_alarm_notify.conf 2>/dev/null | wc -l); has_jq=$(which jq 2>/dev/null && echo yes || echo NO); echo "custom_sender=$has_cs jq=$has_jq"'
|
||||||
|
done
|
||||||
|
```
|
||||||
|
|
||||||
|
Any server showing `custom_sender=1 jq=NO` needs `apt install -y jq` immediately.
|
||||||
|
|
||||||
|
## Related
|
||||||
|
|
||||||
|
- [Tuning Netdata Web Log Alerts](tuning-netdata-web-log-alerts.md)
|
||||||
|
- [Tuning Netdata Docker Health Alarms](netdata-docker-health-alarm-tuning.md)
|
||||||
137
02-selfhosting/monitoring/netdata-selinux-avc-chart.md
Normal file
137
02-selfhosting/monitoring/netdata-selinux-avc-chart.md
Normal file
@@ -0,0 +1,137 @@
|
|||||||
|
---
|
||||||
|
title: "Netdata SELinux AVC Denial Monitoring"
|
||||||
|
domain: selfhosting
|
||||||
|
category: monitoring
|
||||||
|
tags: [netdata, selinux, fedora, monitoring, ausearch, charts.d]
|
||||||
|
status: published
|
||||||
|
created: 2026-03-27
|
||||||
|
updated: 2026-03-27
|
||||||
|
---
|
||||||
|
|
||||||
|
# Netdata SELinux AVC Denial Monitoring
|
||||||
|
|
||||||
|
A custom `charts.d` plugin that tracks SELinux AVC denials over time via Netdata. Deployed on all Fedora boxes in the fleet where SELinux is Enforcing.
|
||||||
|
|
||||||
|
## What It Does
|
||||||
|
|
||||||
|
The plugin runs `ausearch -m avc` every 60 seconds and reports the count of AVC denial events from the last 10 minutes. This gives a real-time chart in Netdata Cloud showing SELinux denial spikes — useful for catching misconfigurations after service changes or package updates.
|
||||||
|
|
||||||
|
## Where It's Deployed
|
||||||
|
|
||||||
|
| Host | OS | SELinux | Chart Installed |
|
||||||
|
|------|----|---------|-----------------|
|
||||||
|
| majorhome | Fedora 43 | Enforcing | Yes |
|
||||||
|
| majorlab | Fedora 43 | Enforcing | Yes |
|
||||||
|
| majormail | Fedora 43 | Enforcing | Yes |
|
||||||
|
| majordiscord | Fedora 43 | Enforcing | Yes |
|
||||||
|
|
||||||
|
Ubuntu hosts (dca, teelia, tttpod, majortoot, majorlinux) do not run SELinux and do not have this chart.
|
||||||
|
|
||||||
|
## Installation
|
||||||
|
|
||||||
|
### 1. Create the Chart Plugin
|
||||||
|
|
||||||
|
Create `/etc/netdata/charts.d/selinux.chart.sh`:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
cat > /etc/netdata/charts.d/selinux.chart.sh << 'EOF'
|
||||||
|
# SELinux AVC denial counter for Netdata charts.d
|
||||||
|
selinux_update_every=60
|
||||||
|
selinux_priority=90000
|
||||||
|
|
||||||
|
selinux_check() {
|
||||||
|
which ausearch >/dev/null 2>&1 || return 1
|
||||||
|
return 0
|
||||||
|
}
|
||||||
|
|
||||||
|
selinux_create() {
|
||||||
|
cat <<CHART
|
||||||
|
CHART selinux.avc_denials '' 'SELinux AVC Denials (last 10 min)' 'denials' selinux '' line 90000 $selinux_update_every ''
|
||||||
|
DIMENSION denials '' absolute 1 1
|
||||||
|
CHART
|
||||||
|
return 0
|
||||||
|
}
|
||||||
|
|
||||||
|
selinux_update() {
|
||||||
|
local count
|
||||||
|
count=$(sudo /usr/bin/ausearch -m avc -if /var/log/audit/audit.log -ts recent 2>/dev/null | grep -c "type=AVC")
|
||||||
|
echo "BEGIN selinux.avc_denials $1"
|
||||||
|
echo "SET denials = ${count}"
|
||||||
|
echo "END"
|
||||||
|
return 0
|
||||||
|
}
|
||||||
|
EOF
|
||||||
|
```
|
||||||
|
|
||||||
|
### 2. Grant Netdata Sudo Access to ausearch
|
||||||
|
|
||||||
|
`ausearch` requires root to read the audit log. Add a sudoers entry for the `netdata` user:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
echo 'netdata ALL=(root) NOPASSWD: /usr/bin/ausearch -m avc -if /var/log/audit/audit.log -ts recent' > /etc/sudoers.d/netdata-selinux
|
||||||
|
chmod 440 /etc/sudoers.d/netdata-selinux
|
||||||
|
visudo -c
|
||||||
|
```
|
||||||
|
|
||||||
|
The `visudo -c` validates syntax. If it reports errors, fix the file before proceeding — a broken sudoers file can lock out sudo entirely.
|
||||||
|
|
||||||
|
### 3. Restart Netdata
|
||||||
|
|
||||||
|
```bash
|
||||||
|
systemctl restart netdata
|
||||||
|
```
|
||||||
|
|
||||||
|
### 4. Verify
|
||||||
|
|
||||||
|
Check that the chart is collecting data:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
curl -s 'http://localhost:19999/api/v1/chart?chart=selinux.avc_denials' | python3 -c "
|
||||||
|
import sys, json
|
||||||
|
d = json.load(sys.stdin)
|
||||||
|
print(f'Chart: {d[\"id\"]}')
|
||||||
|
print(f'Update every: {d[\"update_every\"]}s')
|
||||||
|
print(f'Type: {d[\"chart_type\"]}')
|
||||||
|
"
|
||||||
|
```
|
||||||
|
|
||||||
|
If the chart doesn't appear, check that `charts.d` is enabled in `/etc/netdata/netdata.conf` and that the plugin file is readable by the `netdata` user.
|
||||||
|
|
||||||
|
## Known Side Effect: pam_systemd Log Noise
|
||||||
|
|
||||||
|
Because the `netdata` user calls `sudo ausearch` every 60 seconds, `pam_systemd` logs a warning each time:
|
||||||
|
|
||||||
|
```
|
||||||
|
pam_systemd(sudo:session): Failed to check if /run/user/0/bus exists, ignoring: Permission denied
|
||||||
|
```
|
||||||
|
|
||||||
|
This is cosmetic. The `sudo` command succeeds — `pam_systemd` just can't find a D-Bus user session for the `netdata` service account, which is expected. The message volume scales with the collection interval (1,440/day at 60-second intervals).
|
||||||
|
|
||||||
|
**To suppress it**, the `system-auth` PAM config on Fedora already marks `pam_systemd.so` as `-session optional` (the `-` prefix means "don't fail if the module errors"). The messages are informational log noise, not actual failures. No PAM changes are needed.
|
||||||
|
|
||||||
|
If the log volume is a concern for log analysis or monitoring, filter it at the journald level:
|
||||||
|
|
||||||
|
```ini
|
||||||
|
# /etc/rsyslog.d/suppress-pam-systemd.conf
|
||||||
|
:msg, contains, "pam_systemd(sudo:session): Failed to check" stop
|
||||||
|
```
|
||||||
|
|
||||||
|
Or in Netdata's log alert config, exclude the pattern from any log-based alerts.
|
||||||
|
|
||||||
|
## Fleet Audit
|
||||||
|
|
||||||
|
To verify the chart is deployed and functioning on all Fedora hosts:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
for host in majorhome majorlab majormail majordiscord; do
|
||||||
|
echo -n "=== $host: "
|
||||||
|
ssh root@$host "curl -s 'http://localhost:19999/api/v1/chart?chart=selinux.avc_denials' 2>/dev/null | python3 -c 'import sys,json; d=json.load(sys.stdin); print(d[\"id\"], \"every\", str(d[\"update_every\"])+\"s\")' 2>/dev/null || echo 'NOT FOUND'"
|
||||||
|
done
|
||||||
|
```
|
||||||
|
|
||||||
|
## Related
|
||||||
|
|
||||||
|
- [Deploying Netdata to a New Server](netdata-new-server-setup.md)
|
||||||
|
- [Tuning Netdata Web Log Alerts](tuning-netdata-web-log-alerts.md)
|
||||||
|
- [Tuning Netdata Docker Health Alarms](netdata-docker-health-alarm-tuning.md)
|
||||||
|
- [SELinux: Fixing Dovecot Mail Spool Context](../../05-troubleshooting/selinux-dovecot-vmail-context.md)
|
||||||
@@ -85,4 +85,4 @@ curl -s http://localhost:19999/api/v1/alarms?all | grep -A 15 "web_log_1m_redire
|
|||||||
|
|
||||||
## See Also
|
## See Also
|
||||||
|
|
||||||
- [[Netdata service monitoring]]
|
- Netdata service monitoring
|
||||||
|
|||||||
@@ -135,6 +135,6 @@ yourdomain.com {
|
|||||||
|
|
||||||
## See Also
|
## See Also
|
||||||
|
|
||||||
- [[self-hosting-starter-guide]]
|
- [self-hosting-starter-guide](../docker/self-hosting-starter-guide.md)
|
||||||
- [[linux-server-hardening-checklist]]
|
- [linux-server-hardening-checklist](../security/linux-server-hardening-checklist.md)
|
||||||
- [[debugging-broken-docker-containers]]
|
- [debugging-broken-docker-containers](../docker/debugging-broken-docker-containers.md)
|
||||||
|
|||||||
94
02-selfhosting/security/ansible-unattended-upgrades-fleet.md
Normal file
94
02-selfhosting/security/ansible-unattended-upgrades-fleet.md
Normal file
@@ -0,0 +1,94 @@
|
|||||||
|
---
|
||||||
|
title: Standardizing unattended-upgrades Across Ubuntu Fleet with Ansible
|
||||||
|
domain: selfhosting
|
||||||
|
category: security
|
||||||
|
tags:
|
||||||
|
- ansible
|
||||||
|
- ubuntu
|
||||||
|
- apt
|
||||||
|
- unattended-upgrades
|
||||||
|
- fleet-management
|
||||||
|
status: published
|
||||||
|
created: '2026-03-16'
|
||||||
|
updated: '2026-03-16'
|
||||||
|
---
|
||||||
|
|
||||||
|
# Standardizing unattended-upgrades Across Ubuntu Fleet with Ansible
|
||||||
|
|
||||||
|
When some Ubuntu hosts in a fleet self-update via `unattended-upgrades` and others don't, they drift apart over time — different kernel versions, different reboot states, inconsistent behavior. This article covers how to diagnose the drift and enforce uniform auto-update config across all Ubuntu hosts using Ansible.
|
||||||
|
|
||||||
|
## Diagnosing the Problem
|
||||||
|
|
||||||
|
If only some Ubuntu hosts are flagging for reboot, check:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# What triggered the reboot flag?
|
||||||
|
cat /var/run/reboot-required.pkgs
|
||||||
|
|
||||||
|
# Is unattended-upgrades installed and active?
|
||||||
|
systemctl status unattended-upgrades
|
||||||
|
cat /etc/apt/apt.conf.d/20auto-upgrades
|
||||||
|
|
||||||
|
# When did apt last run?
|
||||||
|
ls -lt /var/log/apt/history.log*
|
||||||
|
```
|
||||||
|
|
||||||
|
The reboot flag is written to `/var/run/reboot-required` by `update-notifier-common` when packages like the kernel, glibc, or systemd are updated. If some hosts have `unattended-upgrades` running and others don't, the ones that self-updated will flag for reboot while the others lag behind.
|
||||||
|
|
||||||
|
## The Fix — Ansible Playbook
|
||||||
|
|
||||||
|
Add these tasks to your update playbook **before** the apt cache update step:
|
||||||
|
|
||||||
|
```yaml
|
||||||
|
- name: Ensure unattended-upgrades is installed on Ubuntu servers
|
||||||
|
ansible.builtin.apt:
|
||||||
|
name:
|
||||||
|
- unattended-upgrades
|
||||||
|
- update-notifier-common
|
||||||
|
state: present
|
||||||
|
update_cache: true
|
||||||
|
when: ansible_facts['os_family'] == "Debian"
|
||||||
|
|
||||||
|
- name: Enforce uniform auto-update config on Ubuntu servers
|
||||||
|
ansible.builtin.copy:
|
||||||
|
dest: /etc/apt/apt.conf.d/20auto-upgrades
|
||||||
|
content: |
|
||||||
|
APT::Periodic::Update-Package-Lists "1";
|
||||||
|
APT::Periodic::Unattended-Upgrade "1";
|
||||||
|
owner: root
|
||||||
|
group: root
|
||||||
|
mode: '0644'
|
||||||
|
when: ansible_facts['os_family'] == "Debian"
|
||||||
|
|
||||||
|
- name: Ensure unattended-upgrades service is enabled and running
|
||||||
|
ansible.builtin.systemd:
|
||||||
|
name: unattended-upgrades
|
||||||
|
enabled: true
|
||||||
|
state: started
|
||||||
|
when: ansible_facts['os_family'] == "Debian"
|
||||||
|
```
|
||||||
|
|
||||||
|
Running this across the `ubuntu` group ensures every host has the same config on every Ansible run — idempotent and safe.
|
||||||
|
|
||||||
|
## Rebooting Flagged Hosts
|
||||||
|
|
||||||
|
Once identified, reboot specific hosts without touching the rest:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# Reboot just the flagging hosts
|
||||||
|
ansible-playbook reboot.yml -l teelia,tttpod
|
||||||
|
|
||||||
|
# Run full update on remaining hosts to bring them up to the same kernel
|
||||||
|
ansible-playbook update.yml -l dca,majorlinux,majortoot
|
||||||
|
```
|
||||||
|
|
||||||
|
## Notes
|
||||||
|
|
||||||
|
- `unattended-upgrades` runs daily on its own schedule — hosts that haven't checked yet will lag behind but catch up within 24 hours
|
||||||
|
- The other hosts showing `ok` (not `changed`) on the config tasks means they were already correctly configured
|
||||||
|
- After a kernel update is pulled, only an actual reboot clears the `/var/run/reboot-required` flag — Ansible reporting the flag is informational only
|
||||||
|
|
||||||
|
## See Also
|
||||||
|
|
||||||
|
- [Ansible Getting Started](../../01-linux/shell-scripting/ansible-getting-started.md)
|
||||||
|
- [Linux Server Hardening Checklist](linux-server-hardening-checklist.md)
|
||||||
154
02-selfhosting/security/clamav-fleet-deployment.md
Normal file
154
02-selfhosting/security/clamav-fleet-deployment.md
Normal file
@@ -0,0 +1,154 @@
|
|||||||
|
---
|
||||||
|
title: ClamAV Fleet Deployment with Ansible
|
||||||
|
domain: selfhosting
|
||||||
|
category: security
|
||||||
|
tags:
|
||||||
|
- clamav
|
||||||
|
- antivirus
|
||||||
|
- security
|
||||||
|
- ansible
|
||||||
|
- fleet
|
||||||
|
- cron
|
||||||
|
status: published
|
||||||
|
created: 2026-04-18
|
||||||
|
updated: 2026-04-18T11:13
|
||||||
|
---
|
||||||
|
# ClamAV Fleet Deployment with Ansible
|
||||||
|
|
||||||
|
## Overview
|
||||||
|
|
||||||
|
ClamAV is the standard open-source antivirus for Linux servers. For internet-facing hosts, a weekly scan with fresh definitions catches known malware, web shells, and suspicious files before they cause damage. The key operational concern is CPU impact — an unthrottled `clamscan` will saturate a core for hours on a busy host. The solution is `nice` and `ionice` wrappers.
|
||||||
|
|
||||||
|
> This guide covers deployment to internet-facing hosts. Internal-only hosts (storage, inference, gaming) are lower priority and can be skipped.
|
||||||
|
|
||||||
|
## What Gets Deployed
|
||||||
|
|
||||||
|
- `clamav` + `clamav-update` packages (provides `clamscan` + `freshclam`)
|
||||||
|
- `freshclam` service enabled for automatic definition updates
|
||||||
|
- A quarantine directory at `/var/lib/clamav/quarantine/`
|
||||||
|
- A weekly `clamscan` cron job, niced to background priority
|
||||||
|
- SELinux context set on the quarantine directory (Fedora hosts)
|
||||||
|
|
||||||
|
## Ansible Playbook
|
||||||
|
|
||||||
|
```yaml
|
||||||
|
- name: Deploy ClamAV to internet-facing hosts
|
||||||
|
hosts: internet_facing # dca, majorlinux, teelia, tttpod, majortoot, majormail
|
||||||
|
become: true
|
||||||
|
|
||||||
|
tasks:
|
||||||
|
|
||||||
|
- name: Install ClamAV packages
|
||||||
|
ansible.builtin.package:
|
||||||
|
name:
|
||||||
|
- clamav
|
||||||
|
- clamav-update
|
||||||
|
state: present
|
||||||
|
|
||||||
|
- name: Enable and start freshclam
|
||||||
|
ansible.builtin.service:
|
||||||
|
name: clamav-freshclam
|
||||||
|
enabled: true
|
||||||
|
state: started
|
||||||
|
|
||||||
|
- name: Create quarantine directory
|
||||||
|
ansible.builtin.file:
|
||||||
|
path: /var/lib/clamav/quarantine
|
||||||
|
state: directory
|
||||||
|
owner: root
|
||||||
|
group: root
|
||||||
|
mode: '0700'
|
||||||
|
|
||||||
|
- name: Set SELinux context on quarantine dir (Fedora/RHEL)
|
||||||
|
ansible.builtin.command:
|
||||||
|
cmd: chcon -t var_t /var/lib/clamav/quarantine
|
||||||
|
when: ansible_os_family == "RedHat"
|
||||||
|
changed_when: false
|
||||||
|
|
||||||
|
- name: Deploy weekly clamscan cron job
|
||||||
|
ansible.builtin.cron:
|
||||||
|
name: "Weekly ClamAV scan"
|
||||||
|
user: root
|
||||||
|
weekday: "0" # Sunday
|
||||||
|
hour: "3"
|
||||||
|
minute: "0"
|
||||||
|
job: >-
|
||||||
|
nice -n 19 ionice -c 3
|
||||||
|
clamscan -r /
|
||||||
|
--exclude-dir=^/proc
|
||||||
|
--exclude-dir=^/sys
|
||||||
|
--exclude-dir=^/dev
|
||||||
|
--exclude-dir=^/run
|
||||||
|
--move=/var/lib/clamav/quarantine
|
||||||
|
--log=/var/log/clamav/scan.log
|
||||||
|
--quiet
|
||||||
|
2>&1 | logger -t clamscan
|
||||||
|
```
|
||||||
|
|
||||||
|
## The nice/ionice Flags
|
||||||
|
|
||||||
|
Without throttling, `clamscan -r /` will peg a CPU core for 30–90 minutes depending on disk size and file count. On production hosts this causes Netdata alerts and visible service degradation.
|
||||||
|
|
||||||
|
| Flag | Value | Meaning |
|
||||||
|
|------|-------|---------|
|
||||||
|
| `nice -n 19` | Lowest CPU priority | Kernel will preempt this process for anything else |
|
||||||
|
| `ionice -c 3` | Idle I/O class | Disk I/O only runs when no other process needs the disk |
|
||||||
|
|
||||||
|
With both flags set, `clamscan` becomes essentially invisible under normal load. The scan takes longer (possibly 2–4× on busy disks), but this is acceptable for a weekly background job.
|
||||||
|
|
||||||
|
> **SELinux on Fedora/Fedora:** `ionice` may trigger AVC denials under SELinux Enforcing. If scans silently fail on Fedora hosts, check `ausearch -m avc -ts recent` for `clamscan` denials. See [selinux-fail2ban-execmem-fix](../../05-troubleshooting/selinux-fail2ban-execmem-fix.md) for the pattern.
|
||||||
|
|
||||||
|
## Excluded Paths
|
||||||
|
|
||||||
|
Always exclude virtual/pseudo filesystems — scanning them wastes time and can trigger false positives or kernel errors:
|
||||||
|
|
||||||
|
```
|
||||||
|
--exclude-dir=^/proc # Process info (not real files)
|
||||||
|
--exclude-dir=^/sys # Kernel interfaces
|
||||||
|
--exclude-dir=^/dev # Device nodes
|
||||||
|
--exclude-dir=^/run # Runtime tmpfs
|
||||||
|
```
|
||||||
|
|
||||||
|
You may also want to exclude large data directories (`/var/lib/docker`, backup volumes, media stores) if scan time is a concern. These are lower-risk targets anyway.
|
||||||
|
|
||||||
|
## Quarantine vs Delete
|
||||||
|
|
||||||
|
`--move=/var/lib/clamav/quarantine` moves detected files rather than deleting them. This is safer than `--remove` — you can inspect and restore false positives. Review the quarantine directory periodically:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
ls -la /var/lib/clamav/quarantine/
|
||||||
|
```
|
||||||
|
|
||||||
|
If a file is a confirmed false positive, restore it and add it to `/etc/clamav/whitelist.ign2`.
|
||||||
|
|
||||||
|
## Checking Scan Results
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# View last scan log
|
||||||
|
cat /var/log/clamav/scan.log
|
||||||
|
|
||||||
|
# Summary line from the log
|
||||||
|
grep -E "^Infected|^Scanned" /var/log/clamav/scan.log | tail -5
|
||||||
|
|
||||||
|
# Check freshclam is keeping definitions current
|
||||||
|
systemctl status clamav-freshclam
|
||||||
|
freshclam --version
|
||||||
|
```
|
||||||
|
|
||||||
|
## Verifying Deployment
|
||||||
|
|
||||||
|
Test that ClamAV can detect malware using the EICAR test file (a harmless string that all AV tools recognize as test malware):
|
||||||
|
|
||||||
|
```bash
|
||||||
|
echo 'X5O!P%@AP[4\PZX54(P^)7CC)7}$EICAR-STANDARD-ANTIVIRUS-TEST-FILE!$H+H*' \
|
||||||
|
> /tmp/eicar-test.txt
|
||||||
|
clamscan /tmp/eicar-test.txt
|
||||||
|
# Expected: /tmp/eicar-test.txt: Eicar-Signature FOUND
|
||||||
|
rm /tmp/eicar-test.txt
|
||||||
|
```
|
||||||
|
|
||||||
|
## See Also
|
||||||
|
|
||||||
|
- [clamscan-cpu-spike-nice-ionice](../../05-troubleshooting/security/clamscan-cpu-spike-nice-ionice.md) — troubleshooting CPU spikes from unthrottled scans
|
||||||
|
- [linux-server-hardening-checklist](linux-server-hardening-checklist.md)
|
||||||
|
- [ssh-hardening-ansible-fleet](ssh-hardening-ansible-fleet.md)
|
||||||
127
02-selfhosting/security/fail2ban-apache-404-scanner-jail.md
Normal file
127
02-selfhosting/security/fail2ban-apache-404-scanner-jail.md
Normal file
@@ -0,0 +1,127 @@
|
|||||||
|
---
|
||||||
|
title: "Fail2ban Custom Jail: Apache 404 Scanner Detection"
|
||||||
|
domain: selfhosting
|
||||||
|
category: security
|
||||||
|
tags: [fail2ban, apache, security, scanner, firewall]
|
||||||
|
status: published
|
||||||
|
created: 2026-04-02
|
||||||
|
updated: 2026-04-02
|
||||||
|
---
|
||||||
|
# Fail2ban Custom Jail: Apache 404 Scanner Detection
|
||||||
|
|
||||||
|
## The Problem
|
||||||
|
|
||||||
|
Automated vulnerability scanners probe web servers by requesting dozens of common config file paths — `.env`, `env.php`, `next.config.js`, `nuxt.config.ts`, etc. — in rapid succession. These all return **404 Not Found**, which is correct behavior from Apache.
|
||||||
|
|
||||||
|
However, the built-in Fail2ban jails (`apache-noscript`, `apache-botsearch`) don't catch these because they parse the **error log**, not the **access log**. If Apache doesn't write a corresponding "File does not exist" entry to the error log for every 404, the scanner slips through undetected.
|
||||||
|
|
||||||
|
This also triggers false alerts in monitoring tools like **Netdata**, which sees the success ratio drop (e.g., `web_log_1m_successful` goes CRITICAL at 2.83%) because 404s aren't counted as successful responses.
|
||||||
|
|
||||||
|
## The Solution
|
||||||
|
|
||||||
|
Create a custom Fail2ban filter that reads the **access log** and matches 404 responses directly.
|
||||||
|
|
||||||
|
### Step 1 — Create the filter
|
||||||
|
|
||||||
|
Create `/etc/fail2ban/filter.d/apache-404scan.conf`:
|
||||||
|
|
||||||
|
```ini
|
||||||
|
# Fail2Ban filter to catch rapid 404 scanning in Apache access logs
|
||||||
|
# Targets vulnerability scanners probing for .env, config files, etc.
|
||||||
|
|
||||||
|
[Definition]
|
||||||
|
|
||||||
|
# Match 404 responses in combined/common access log format
|
||||||
|
failregex = ^<HOST> -.*"(GET|POST|HEAD|PUT|DELETE|OPTIONS|PATCH) .+" 404 \d+
|
||||||
|
|
||||||
|
ignoreregex = ^<HOST> -.*(robots\.txt|favicon\.ico|apple-touch-icon)
|
||||||
|
|
||||||
|
datepattern = %%d/%%b/%%Y:%%H:%%M:%%S %%z
|
||||||
|
```
|
||||||
|
|
||||||
|
### Step 2 — Add the jail
|
||||||
|
|
||||||
|
Add to `/etc/fail2ban/jail.local`:
|
||||||
|
|
||||||
|
```ini
|
||||||
|
[apache-404scan]
|
||||||
|
enabled = true
|
||||||
|
port = http,https
|
||||||
|
filter = apache-404scan
|
||||||
|
logpath = /var/log/apache2/access.log
|
||||||
|
maxretry = 10
|
||||||
|
findtime = 1m
|
||||||
|
bantime = 24h
|
||||||
|
backend = polling
|
||||||
|
```
|
||||||
|
|
||||||
|
**10 hits in 1 minute** is aggressive enough to catch scanners (which fire 30–50+ requests in seconds) while avoiding false positives from a legitimate user hitting a few broken links.
|
||||||
|
|
||||||
|
> **Critical: `backend = polling` is required** if your `jail.local` or `jail.d/` sets `backend = systemd` in `[DEFAULT]` (common on Fedora/RHEL). Without it, fail2ban ignores the `logpath` and reads from journald instead — which Apache doesn't write to. The jail will appear active (`fail2ban-client status` shows it running) but `fail2ban-client get apache-404scan logpath` will return "No file is currently monitored" and zero IPs will ever be banned. This fails silently.
|
||||||
|
|
||||||
|
### Step 3 — Test the regex
|
||||||
|
|
||||||
|
```bash
|
||||||
|
fail2ban-regex /var/log/apache2/access.log /etc/fail2ban/filter.d/apache-404scan.conf
|
||||||
|
```
|
||||||
|
|
||||||
|
You should see matches. In a real-world test against a server under active scanning, this matched **2831 out of 8901** access log lines.
|
||||||
|
|
||||||
|
### Step 4 — Reload Fail2ban
|
||||||
|
|
||||||
|
```bash
|
||||||
|
systemctl restart fail2ban
|
||||||
|
fail2ban-client status apache-404scan
|
||||||
|
```
|
||||||
|
|
||||||
|
## Why Default Jails Miss This
|
||||||
|
|
||||||
|
| Jail | Log Source | What It Matches | Why It Misses |
|
||||||
|
|---|---|---|---|
|
||||||
|
| `apache-noscript` | error log | "script not found or unable to stat" | Only matches script-type files (.php, .asp, .exe, .pl) |
|
||||||
|
| `apache-botsearch` | error log | "File does not exist" for specific paths | Requires Apache to write error log entries for 404s |
|
||||||
|
| **`apache-404scan`** | **access log** | **Any 404 response** | **Catches everything** |
|
||||||
|
|
||||||
|
The key insight: URL-encoded probes like `/%2f%2eenv%2econfig` that return 404 in the access log may not generate error log entries at all, making them invisible to the default filters.
|
||||||
|
|
||||||
|
## Pair With Recidive
|
||||||
|
|
||||||
|
If you have the `recidive` jail enabled, repeat offenders get permanently banned:
|
||||||
|
|
||||||
|
```ini
|
||||||
|
[recidive]
|
||||||
|
enabled = true
|
||||||
|
bantime = -1
|
||||||
|
findtime = 86400
|
||||||
|
maxretry = 3
|
||||||
|
```
|
||||||
|
|
||||||
|
Three 24-hour bans within a day = permanent firewall block.
|
||||||
|
|
||||||
|
## Quick Diagnostic Commands
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# Test filter against current access log
|
||||||
|
fail2ban-regex /var/log/apache2/access.log /etc/fail2ban/filter.d/apache-404scan.conf
|
||||||
|
|
||||||
|
# Check jail status and banned IPs
|
||||||
|
fail2ban-client status apache-404scan
|
||||||
|
|
||||||
|
# IMPORTANT: verify the jail is actually monitoring the file
|
||||||
|
fail2ban-client get apache-404scan logpath
|
||||||
|
# Should show: /var/log/apache2/access.log
|
||||||
|
# If it shows "No file is currently monitored" — add backend = polling to the jail
|
||||||
|
|
||||||
|
# Watch bans in real time
|
||||||
|
tail -f /var/log/fail2ban.log | grep apache-404scan
|
||||||
|
|
||||||
|
# Count 404s in today's log
|
||||||
|
grep '" 404 ' /var/log/apache2/access.log | wc -l
|
||||||
|
```
|
||||||
|
|
||||||
|
## Key Notes
|
||||||
|
|
||||||
|
- The `ignoreregex` excludes `robots.txt`, `favicon.ico`, and `apple-touch-icon` — these are commonly requested and produce harmless 404s.
|
||||||
|
- Make sure your Tailscale subnet (`100.64.0.0/10`) is in the `ignoreip` list under `[DEFAULT]` so you don't ban your own monitoring or uptime checks.
|
||||||
|
- This filter works with both Apache **combined** and **common** log formats.
|
||||||
|
- Complements the existing `apache-dirscan` jail (which catches error-log-based directory enumeration). Use both for full coverage.
|
||||||
127
02-selfhosting/security/fail2ban-apache-bad-request-jail.md
Normal file
127
02-selfhosting/security/fail2ban-apache-bad-request-jail.md
Normal file
@@ -0,0 +1,127 @@
|
|||||||
|
---
|
||||||
|
title: "Fail2ban Custom Jail: Apache Bad Request Detection"
|
||||||
|
domain: selfhosting
|
||||||
|
category: security
|
||||||
|
tags: [fail2ban, apache, security, firewall, bad-request]
|
||||||
|
status: published
|
||||||
|
created: 2026-04-17
|
||||||
|
updated: 2026-04-17
|
||||||
|
---
|
||||||
|
# Fail2ban Custom Jail: Apache Bad Request Detection
|
||||||
|
|
||||||
|
## The Problem
|
||||||
|
|
||||||
|
fail2ban ships a stock `nginx-bad-request` filter for catching malformed HTTP requests (400s), but **there is no Apache equivalent**. Apache servers are left unprotected against the same class of attack: scanners that send garbage request lines to probe for vulnerabilities or overwhelm the access log.
|
||||||
|
|
||||||
|
Unlike the nginx version, this filter has to be written from scratch.
|
||||||
|
|
||||||
|
## The Solution
|
||||||
|
|
||||||
|
Create a custom filter targeting **400 Bad Request** responses in Apache's Combined Log Format, then wire it to a jail.
|
||||||
|
|
||||||
|
### Step 1 — Create the filter
|
||||||
|
|
||||||
|
Create `/etc/fail2ban/filter.d/apache-bad-request.conf`:
|
||||||
|
|
||||||
|
```ini
|
||||||
|
# Fail2Ban filter: catch 400 Bad Request responses in Apache access logs
|
||||||
|
# Targets malformed HTTP requests — garbage request lines, empty methods, etc.
|
||||||
|
# No stock equivalent exists; nginx-bad-request ships with fail2ban but Apache does not.
|
||||||
|
|
||||||
|
[Definition]
|
||||||
|
|
||||||
|
# Match 400 responses in Apache Combined/Common Log Format
|
||||||
|
failregex = ^<HOST> -.*".*" 400 \d+
|
||||||
|
|
||||||
|
ignoreregex =
|
||||||
|
|
||||||
|
datepattern = %%d/%%b/%%Y:%%H:%%M:%%S %%z
|
||||||
|
```
|
||||||
|
|
||||||
|
### Step 2 — Validate the filter
|
||||||
|
|
||||||
|
Always test before deploying:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
fail2ban-regex /var/log/apache2/access.log /etc/fail2ban/filter.d/apache-bad-request.conf
|
||||||
|
```
|
||||||
|
|
||||||
|
Against a live server under typical traffic this matched **155 lines with zero false positives**. If you see unexpected matches, refine `ignoreregex`.
|
||||||
|
|
||||||
|
### Step 3 — Create the jail drop-in
|
||||||
|
|
||||||
|
Create `/etc/fail2ban/jail.d/apache-bad-request.conf`:
|
||||||
|
|
||||||
|
```ini
|
||||||
|
[apache-bad-request]
|
||||||
|
enabled = true
|
||||||
|
port = http,https
|
||||||
|
filter = apache-bad-request
|
||||||
|
logpath = /var/log/apache2/access.log
|
||||||
|
maxretry = 10
|
||||||
|
findtime = 60
|
||||||
|
bantime = 1h
|
||||||
|
```
|
||||||
|
|
||||||
|
> **Note:** On Fedora/RHEL, the log path may be `/var/log/httpd/access_log`. If your `[DEFAULT]` sets `backend = systemd`, add `backend = polling` to the jail — otherwise it silently ignores `logpath` and reads journald instead.
|
||||||
|
|
||||||
|
### Step 4 — Reload fail2ban
|
||||||
|
|
||||||
|
```bash
|
||||||
|
systemctl reload fail2ban
|
||||||
|
fail2ban-client status apache-bad-request
|
||||||
|
```
|
||||||
|
|
||||||
|
## Deploy Fleet-Wide with Ansible
|
||||||
|
|
||||||
|
If you run multiple Apache hosts, use Ansible to deploy both the filter and jail atomically:
|
||||||
|
|
||||||
|
```yaml
|
||||||
|
- name: Deploy apache-bad-request fail2ban filter
|
||||||
|
ansible.builtin.template:
|
||||||
|
src: templates/fail2ban_apache_bad_request_filter.conf.j2
|
||||||
|
dest: /etc/fail2ban/filter.d/apache-bad-request.conf
|
||||||
|
notify: Reload fail2ban
|
||||||
|
|
||||||
|
- name: Deploy apache-bad-request fail2ban jail
|
||||||
|
ansible.builtin.template:
|
||||||
|
src: templates/fail2ban_apache_bad_request_jail.conf.j2
|
||||||
|
dest: /etc/fail2ban/jail.d/apache-bad-request.conf
|
||||||
|
notify: Reload fail2ban
|
||||||
|
```
|
||||||
|
|
||||||
|
## Why Not Use nginx-bad-request on Apache?
|
||||||
|
|
||||||
|
The `nginx-bad-request` filter parses nginx's log format, which differs from Apache's Combined Log Format. The timestamp format, field ordering, and quoting differ enough that the regex won't match. You need a separate filter.
|
||||||
|
|
||||||
|
| | nginx-bad-request | apache-bad-request |
|
||||||
|
|---|---|---|
|
||||||
|
| Ships with fail2ban | ✅ Yes | ❌ No — must write custom |
|
||||||
|
| Log source | nginx access log | Apache access log |
|
||||||
|
| What it catches | 400 responses (malformed requests) | 400 responses (malformed requests) |
|
||||||
|
| Regex target | nginx Combined Log Format | Apache Combined Log Format |
|
||||||
|
|
||||||
|
## Diagnostic Commands
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# Validate filter against live log
|
||||||
|
fail2ban-regex /var/log/apache2/access.log /etc/fail2ban/filter.d/apache-bad-request.conf
|
||||||
|
|
||||||
|
# Check jail status
|
||||||
|
fail2ban-client status apache-bad-request
|
||||||
|
|
||||||
|
# Confirm the jail is monitoring the correct log file
|
||||||
|
fail2ban-client get apache-bad-request logpath
|
||||||
|
|
||||||
|
# Watch bans in real time
|
||||||
|
tail -f /var/log/fail2ban.log | grep apache-bad-request
|
||||||
|
|
||||||
|
# Count 400s in today's access log
|
||||||
|
grep '" 400 ' /var/log/apache2/access.log | wc -l
|
||||||
|
```
|
||||||
|
|
||||||
|
## See Also
|
||||||
|
|
||||||
|
- [fail2ban-nginx-bad-request-jail](fail2ban-nginx-bad-request-jail.md) — the nginx equivalent (stock filter, just needs wiring)
|
||||||
|
- [fail2ban-apache-404-scanner-jail](fail2ban-apache-404-scanner-jail.md) — catches 404 probe scanners
|
||||||
|
- [fail2ban-apache-php-probe-jail](fail2ban-apache-php-probe-jail.md)
|
||||||
146
02-selfhosting/security/fail2ban-apache-php-probe-jail.md
Normal file
146
02-selfhosting/security/fail2ban-apache-php-probe-jail.md
Normal file
@@ -0,0 +1,146 @@
|
|||||||
|
---
|
||||||
|
title: "Fail2ban Custom Jail: Apache PHP Webshell Probe Detection"
|
||||||
|
domain: selfhosting
|
||||||
|
category: security
|
||||||
|
tags:
|
||||||
|
- fail2ban
|
||||||
|
- apache
|
||||||
|
- security
|
||||||
|
- php
|
||||||
|
- webshell
|
||||||
|
- scanner
|
||||||
|
status: published
|
||||||
|
created: 2026-04-09
|
||||||
|
updated: 2026-04-13T10:15
|
||||||
|
---
|
||||||
|
# Fail2ban Custom Jail: Apache PHP Webshell Probe Detection
|
||||||
|
|
||||||
|
## The Problem
|
||||||
|
|
||||||
|
Automated scanners flood web servers with rapid-fire requests for non-existent `.php` files — `bless.php`, `alfa.php`, `lock360.php`, `about.php`, `cgi-bin/bypass.php`, and hundreds of others. These are classic **webshell/backdoor probes** looking for compromised PHP files left behind by prior attackers.
|
||||||
|
|
||||||
|
On servers that force HTTPS (or have HTTP→HTTPS redirects in place), these probes often return **301 Moved Permanently** instead of 404. That causes three problems:
|
||||||
|
|
||||||
|
1. **The `apache-404scan` jail misses them** — it only matches 404 responses
|
||||||
|
2. **Netdata fires false `web_log_1m_redirects` alerts** — the redirect ratio spikes to 96%+ during scans
|
||||||
|
3. **The scanner is never banned**, and will return repeatedly
|
||||||
|
|
||||||
|
This was the exact trigger for the 2026-04-09 `[MajorLinux] Web Log Alert` incident where `45.86.202.224` sent 202 PHP probe requests in a few minutes, all returning 301.
|
||||||
|
|
||||||
|
## The Solution
|
||||||
|
|
||||||
|
Create a custom Fail2ban filter that matches **any `.php` request returning a redirect, forbidden, or not-found response** — while excluding legitimate WordPress PHP endpoints.
|
||||||
|
|
||||||
|
### Step 1 — Create the filter
|
||||||
|
|
||||||
|
Create `/etc/fail2ban/filter.d/apache-php-probe.conf`:
|
||||||
|
|
||||||
|
```ini
|
||||||
|
# Fail2Ban filter to catch PHP file probing (webshell/backdoor scanners)
|
||||||
|
# These requests hit non-existent .php files and get 301/302/403/404 responses
|
||||||
|
|
||||||
|
[Definition]
|
||||||
|
|
||||||
|
failregex = ^<HOST> -.*"(GET|POST|HEAD) /[^ ]*\.php[^ ]* HTTP/[0-9.]+" (301|302|403|404) \d+
|
||||||
|
|
||||||
|
ignoreregex = ^<HOST> -.*(wp-cron\.php|xmlrpc\.php|wp-login\.php|wp-admin|index\.php|wp-comments-post\.php)
|
||||||
|
|
||||||
|
datepattern = %%d/%%b/%%Y:%%H:%%M:%%S %%z
|
||||||
|
```
|
||||||
|
|
||||||
|
**Why the ignoreregex matters:** Legitimate WordPress traffic hits `wp-cron.php`, `xmlrpc.php` (often 403-blocked on hardened sites), `wp-login.php`, and `index.php` constantly. Without exclusions the jail would ban your own WordPress admins. Note that `wp-login.php` brute force is caught separately by the `wordpress` jail.
|
||||||
|
|
||||||
|
### Step 2 — Add the jail
|
||||||
|
|
||||||
|
Add to `/etc/fail2ban/jail.local`:
|
||||||
|
|
||||||
|
```ini
|
||||||
|
[apache-php-probe]
|
||||||
|
enabled = true
|
||||||
|
port = http,https
|
||||||
|
filter = apache-php-probe
|
||||||
|
logpath = /var/log/apache2/access.log
|
||||||
|
maxretry = 5
|
||||||
|
findtime = 1m
|
||||||
|
bantime = 48h
|
||||||
|
```
|
||||||
|
|
||||||
|
**5 hits in 1 minute** is tight — scanners fire 20–200 PHP probes in seconds, while a real user hitting one broken PHP link won't trip the threshold. The 48-hour bantime is longer than `apache-404scan`'s 24h because PHP webshell scanning is a stronger signal of malicious intent.
|
||||||
|
|
||||||
|
### Step 3 — Test the regex
|
||||||
|
|
||||||
|
```bash
|
||||||
|
fail2ban-regex /var/log/apache2/access.log /etc/fail2ban/filter.d/apache-php-probe.conf
|
||||||
|
```
|
||||||
|
|
||||||
|
Verify it matches the scanner requests and does **not** match legitimate WordPress traffic.
|
||||||
|
|
||||||
|
### Step 4 — Reload Fail2ban
|
||||||
|
|
||||||
|
```bash
|
||||||
|
systemctl restart fail2ban
|
||||||
|
fail2ban-client status apache-php-probe
|
||||||
|
```
|
||||||
|
|
||||||
|
## Why This Complements `apache-404scan`
|
||||||
|
|
||||||
|
| Jail | Catches | Misses |
|
||||||
|
|---|---|---|
|
||||||
|
| `apache-404scan` | Any 404 (config file probes, `.env`, random paths) | PHP probes redirected to HTTPS (301) |
|
||||||
|
| **`apache-php-probe`** | **PHP webshell probes (301/302/403/404)** | Non-`.php` probes |
|
||||||
|
|
||||||
|
Running both jails together covers:
|
||||||
|
- **HTTP→HTTPS redirected PHP probes** (301 responses)
|
||||||
|
- **Directly-served PHP probes** (404 responses)
|
||||||
|
- **Blocked PHP paths** like `xmlrpc.php` in non-WP contexts (403 responses)
|
||||||
|
|
||||||
|
## Pair With Recidive
|
||||||
|
|
||||||
|
The `recidive` jail catches repeat offenders across all jails:
|
||||||
|
|
||||||
|
```ini
|
||||||
|
[recidive]
|
||||||
|
enabled = true
|
||||||
|
bantime = -1
|
||||||
|
findtime = 86400
|
||||||
|
maxretry = 3
|
||||||
|
```
|
||||||
|
|
||||||
|
A scanner that trips `apache-php-probe` three times in 24 hours gets a **permanent** firewall-level ban.
|
||||||
|
|
||||||
|
## Manual IP Blocking via UFW
|
||||||
|
|
||||||
|
For known scanners you want to block immediately without waiting for the jail to trip, use UFW:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# Insert at top of rule list (priority over Apache ALLOW rules)
|
||||||
|
ufw insert 1 deny from <IP> to any comment "PHP webshell scanner YYYY-MM-DD"
|
||||||
|
```
|
||||||
|
|
||||||
|
This bypasses fail2ban entirely and is useful for:
|
||||||
|
- Scanners you spot in logs after the fact
|
||||||
|
- Known-malicious subnets from threat intel
|
||||||
|
- Entire CIDR blocks (`ufw insert 1 deny from 45.86.202.0/24`)
|
||||||
|
|
||||||
|
## Quick Diagnostic Commands
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# Count recent PHP probes returning 301/403/404
|
||||||
|
awk '/09\/Apr\/2026:18:/ && /\.php/ && ($9==301 || $9==403 || $9==404)' /var/log/apache2/access.log | wc -l
|
||||||
|
|
||||||
|
# Top probed PHP filenames (useful for writing additional ignoreregex)
|
||||||
|
grep '\.php' /var/log/apache2/access.log | awk '{print $7}' | sort | uniq -c | sort -rn | head -20
|
||||||
|
|
||||||
|
# Top scanner IPs by PHP probe count
|
||||||
|
grep '\.php' /var/log/apache2/access.log | awk '$9 ~ /^(301|403|404)$/ {print $1}' | sort | uniq -c | sort -rn | head -10
|
||||||
|
|
||||||
|
# Watch bans in real time
|
||||||
|
tail -f /var/log/fail2ban.log | grep apache-php-probe
|
||||||
|
```
|
||||||
|
|
||||||
|
## Key Notes
|
||||||
|
|
||||||
|
- **This jail only makes sense on servers that redirect HTTP→HTTPS.** On plain-HTTPS-only servers, PHP probes return 404 and `apache-404scan` already catches them.
|
||||||
|
- **Add your own WordPress plugin paths to `ignoreregex`** if you use non-standard endpoints (e.g., custom admin URLs, REST API `.php` handlers).
|
||||||
|
- **This filter pairs naturally with Netdata `web_log_1m_redirects` alerts** — during a scan, Netdata fires first (threshold crossed), then fail2ban bans the IP within seconds.
|
||||||
|
- Also see: [Fail2ban Custom Jail: Apache 404 Scanner Detection](fail2ban-apache-404-scanner-jail.md) for the sibling 404-based filter.
|
||||||
89
02-selfhosting/security/fail2ban-nginx-bad-request-jail.md
Normal file
89
02-selfhosting/security/fail2ban-nginx-bad-request-jail.md
Normal file
@@ -0,0 +1,89 @@
|
|||||||
|
---
|
||||||
|
title: "Fail2ban: Enable the nginx-bad-request Jail"
|
||||||
|
domain: selfhosting
|
||||||
|
category: security
|
||||||
|
tags: [fail2ban, nginx, security, firewall, bad-request]
|
||||||
|
status: published
|
||||||
|
created: 2026-04-17
|
||||||
|
updated: 2026-04-17
|
||||||
|
---
|
||||||
|
# Fail2ban: Enable the nginx-bad-request Jail
|
||||||
|
|
||||||
|
## The Problem
|
||||||
|
|
||||||
|
Automated scanners sometimes send **malformed HTTP requests** — empty request lines, truncated headers, or garbage data — that nginx rejects with a `400 Bad Request`. These aren't caught by the default fail2ban jails (`nginx-botsearch`, `nginx-http-auth`) because those target URL-probe patterns and auth failures, not raw protocol abuse.
|
||||||
|
|
||||||
|
In a real incident: a single IP (`185.177.72.70`) sent **2,778 malformed requests in ~4 minutes**, driving Netdata's `web_log_1m_bad_requests` to 93.7% and triggering a CRITICAL alert. The neighboring IP (`185.177.72.61`) was already banned — the `/24` was known-bad and operating in shifts.
|
||||||
|
|
||||||
|
## The Solution
|
||||||
|
|
||||||
|
fail2ban ships a `nginx-bad-request` filter out of the box. It's just not wired to a jail by default. Enabling it is a one-step drop-in.
|
||||||
|
|
||||||
|
### Step 1 — Create the jail drop-in
|
||||||
|
|
||||||
|
Create `/etc/fail2ban/jail.d/nginx-bad-request.conf`:
|
||||||
|
|
||||||
|
```ini
|
||||||
|
[nginx-bad-request]
|
||||||
|
enabled = true
|
||||||
|
port = http,https
|
||||||
|
filter = nginx-bad-request
|
||||||
|
logpath = /var/log/nginx/access.log
|
||||||
|
maxretry = 10
|
||||||
|
findtime = 60
|
||||||
|
bantime = 1h
|
||||||
|
```
|
||||||
|
|
||||||
|
**Settings rationale:**
|
||||||
|
- `maxretry = 10` — a legitimate browser never sends 10 malformed requests; this threshold catches burst scanners immediately
|
||||||
|
- `findtime = 60` — 60-second window; the attack pattern fires dozens of requests per minute
|
||||||
|
- `bantime = 1h` — reasonable starting point; pair with `recidive` for repeat offenders
|
||||||
|
|
||||||
|
### Step 2 — Verify the filter matches your log format
|
||||||
|
|
||||||
|
Before reloading, confirm the stock filter matches your nginx logs:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
fail2ban-regex /var/log/nginx/access.log nginx-bad-request
|
||||||
|
```
|
||||||
|
|
||||||
|
In a real-world test against an active server this matched **2,829 lines with zero false positives**.
|
||||||
|
|
||||||
|
### Step 3 — Reload fail2ban
|
||||||
|
|
||||||
|
```bash
|
||||||
|
systemctl reload fail2ban
|
||||||
|
fail2ban-client status nginx-bad-request
|
||||||
|
```
|
||||||
|
|
||||||
|
You can also ban an IP manually while the jail is loading:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
fail2ban-client set nginx-bad-request banip 185.177.72.70
|
||||||
|
```
|
||||||
|
|
||||||
|
## Verify It's Working
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# Check jail status and active bans
|
||||||
|
fail2ban-client status nginx-bad-request
|
||||||
|
|
||||||
|
# Watch bans in real time
|
||||||
|
tail -f /var/log/fail2ban.log | grep nginx-bad-request
|
||||||
|
|
||||||
|
# Confirm the jail is monitoring the right file
|
||||||
|
fail2ban-client get nginx-bad-request logpath
|
||||||
|
```
|
||||||
|
|
||||||
|
## Key Notes
|
||||||
|
|
||||||
|
- The stock filter is at `/etc/fail2ban/filter.d/nginx-bad-request.conf` — no need to create it.
|
||||||
|
- If your `[DEFAULT]` section sets `backend = systemd` (common on Fedora/RHEL), add `backend = polling` to the jail or it will silently ignore `logpath` and monitor journald instead — where nginx doesn't write.
|
||||||
|
- Make sure your Tailscale subnet (`100.64.0.0/10`) is in `ignoreip` under `[DEFAULT]` to avoid banning your own monitoring.
|
||||||
|
- This jail targets **400 Bad Request** responses. For 404 scanner detection, see [fail2ban-apache-404-scanner-jail](fail2ban-apache-404-scanner-jail.md).
|
||||||
|
|
||||||
|
## See Also
|
||||||
|
|
||||||
|
- [fail2ban-apache-bad-request-jail](fail2ban-apache-bad-request-jail.md) — Apache equivalent (no stock filter; custom filter required)
|
||||||
|
- [fail2ban-apache-404-scanner-jail](fail2ban-apache-404-scanner-jail.md)
|
||||||
|
- [fail2ban-apache-php-probe-jail](fail2ban-apache-php-probe-jail.md)
|
||||||
131
02-selfhosting/security/fail2ban-wordpress-login-jail.md
Normal file
131
02-selfhosting/security/fail2ban-wordpress-login-jail.md
Normal file
@@ -0,0 +1,131 @@
|
|||||||
|
---
|
||||||
|
title: "Fail2ban Custom Jail: WordPress Login Brute Force"
|
||||||
|
domain: selfhosting
|
||||||
|
category: security
|
||||||
|
tags: [fail2ban, wordpress, apache, security, brute-force]
|
||||||
|
status: published
|
||||||
|
created: 2026-04-02
|
||||||
|
updated: 2026-04-02
|
||||||
|
---
|
||||||
|
# Fail2ban Custom Jail: WordPress Login Brute Force
|
||||||
|
|
||||||
|
## The Problem
|
||||||
|
|
||||||
|
WordPress login brute force attacks are extremely common. Bots hammer `/wp-login.php` with POST requests, cycling through common credentials. The default Fail2ban `apache-auth` jail doesn't catch these because WordPress returns **HTTP 200** on failed logins — not 401 — so nothing appears as an authentication failure in the Apache error log.
|
||||||
|
|
||||||
|
There are pre-packaged filters (`wordpress-hard.conf`, `wordpress-soft.conf`) that ship with some Fail2ban installations, but these require the **[WP fail2ban](https://wordpress.org/plugins/wp-fail2ban/)** WordPress plugin to be installed. That plugin writes login failures to syslog, which the filters then match. Without the plugin, those filters do nothing.
|
||||||
|
|
||||||
|
## The Solution
|
||||||
|
|
||||||
|
Create a lightweight filter that reads the **Apache access log** and matches repeated POST requests to `wp-login.php` directly. No WordPress plugin needed.
|
||||||
|
|
||||||
|
### Step 1 — Create the filter
|
||||||
|
|
||||||
|
Create `/etc/fail2ban/filter.d/wordpress-login.conf`:
|
||||||
|
|
||||||
|
```ini
|
||||||
|
# Fail2Ban filter for WordPress login brute force
|
||||||
|
# Matches POST requests to wp-login.php in Apache access log
|
||||||
|
|
||||||
|
[Definition]
|
||||||
|
|
||||||
|
failregex = ^<HOST> .* "POST /wp-login\.php
|
||||||
|
|
||||||
|
ignoreregex =
|
||||||
|
```
|
||||||
|
|
||||||
|
### Step 2 — Add the jail
|
||||||
|
|
||||||
|
Add to `/etc/fail2ban/jail.local`:
|
||||||
|
|
||||||
|
```ini
|
||||||
|
[wordpress-login]
|
||||||
|
enabled = true
|
||||||
|
port = http,https
|
||||||
|
filter = wordpress-login
|
||||||
|
logpath = /var/log/apache2/access.log
|
||||||
|
maxretry = 5
|
||||||
|
findtime = 60
|
||||||
|
bantime = 30d
|
||||||
|
backend = polling
|
||||||
|
```
|
||||||
|
|
||||||
|
**5 attempts in 60 seconds** is tight enough to catch bots (which fire hundreds of requests per minute) while giving a real human a reasonable margin for typos.
|
||||||
|
|
||||||
|
> **Critical: `backend = polling` is required** on Ubuntu 24.04 and other systemd-based distros where `backend = auto` defaults to `systemd`. Without it, Fail2ban ignores `logpath` and reads from journald, which Apache doesn't write to. The jail silently monitors nothing. See [[fail2ban-apache-404-scanner-jail]] for more detail on this gotcha.
|
||||||
|
|
||||||
|
### Step 3 — Test the regex
|
||||||
|
|
||||||
|
```bash
|
||||||
|
fail2ban-regex /var/log/apache2/access.log /etc/fail2ban/filter.d/wordpress-login.conf
|
||||||
|
```
|
||||||
|
|
||||||
|
In a real-world test against an active brute force (3 IPs, ~1,700 hits each), this matched **5,178 lines**.
|
||||||
|
|
||||||
|
### Step 4 — Reload and verify
|
||||||
|
|
||||||
|
```bash
|
||||||
|
systemctl restart fail2ban
|
||||||
|
fail2ban-client status wordpress-login
|
||||||
|
```
|
||||||
|
|
||||||
|
### Manually banning known attackers
|
||||||
|
|
||||||
|
If you've already identified brute-force IPs from the logs, ban them immediately rather than waiting for new hits:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# Find top offenders
|
||||||
|
grep "POST /wp-login.php" /var/log/apache2/access.log | awk '{print $1}' | sort | uniq -c | sort -rn | head -10
|
||||||
|
|
||||||
|
# Ban them
|
||||||
|
fail2ban-client set wordpress-login banip <IP>
|
||||||
|
```
|
||||||
|
|
||||||
|
## Why Default Jails Miss This
|
||||||
|
|
||||||
|
| Jail | Log Source | What It Matches | Why It Misses |
|
||||||
|
|---|---|---|---|
|
||||||
|
| `apache-auth` | error log | 401 authentication failures | WordPress returns 200, not 401 |
|
||||||
|
| `wordpress-hard` | syslog | WP fail2ban plugin messages | Requires plugin installation |
|
||||||
|
| `wordpress-soft` | syslog | WP fail2ban plugin messages | Requires plugin installation |
|
||||||
|
| **`wordpress-login`** | **access log** | **POST to wp-login.php** | **No plugin needed** |
|
||||||
|
|
||||||
|
## Optional: Extend to XML-RPC
|
||||||
|
|
||||||
|
WordPress's `xmlrpc.php` is another common brute-force target. To cover both, update the filter:
|
||||||
|
|
||||||
|
```ini
|
||||||
|
failregex = ^<HOST> .* "POST /wp-login\.php
|
||||||
|
^<HOST> .* "POST /xmlrpc\.php
|
||||||
|
```
|
||||||
|
|
||||||
|
## Quick Diagnostic Commands
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# Test filter against current access log
|
||||||
|
fail2ban-regex /var/log/apache2/access.log /etc/fail2ban/filter.d/wordpress-login.conf
|
||||||
|
|
||||||
|
# Check jail status and banned IPs
|
||||||
|
fail2ban-client status wordpress-login
|
||||||
|
|
||||||
|
# Verify the jail is reading the correct file
|
||||||
|
fail2ban-client get wordpress-login logpath
|
||||||
|
|
||||||
|
# Count wp-login POSTs in today's log
|
||||||
|
grep "POST /wp-login.php" /var/log/apache2/access.log | wc -l
|
||||||
|
|
||||||
|
# Watch bans in real time
|
||||||
|
tail -f /var/log/fail2ban.log | grep wordpress-login
|
||||||
|
```
|
||||||
|
|
||||||
|
## Key Notes
|
||||||
|
|
||||||
|
- This filter works with both Apache **combined** and **common** log formats.
|
||||||
|
- Make sure your Tailscale subnet (`100.64.0.0/10`) is in the `ignoreip` list under `[DEFAULT]` so legitimate admin access isn't banned.
|
||||||
|
- The `recidive` jail (if enabled) will escalate repeat offenders — three 30-day bans within a day triggers a 90-day block.
|
||||||
|
- Complements the [[fail2ban-apache-404-scanner-jail|Apache 404 Scanner Jail]] for full access-log coverage.
|
||||||
|
|
||||||
|
## See Also
|
||||||
|
|
||||||
|
- [[fail2ban-apache-404-scanner-jail]] — catches vulnerability scanners via 404 floods
|
||||||
|
- [[tuning-netdata-web-log-alerts]] — suppress false Netdata alerts from normal HTTP traffic
|
||||||
173
02-selfhosting/security/firewalld-fleet-hardening.md
Normal file
173
02-selfhosting/security/firewalld-fleet-hardening.md
Normal file
@@ -0,0 +1,173 @@
|
|||||||
|
---
|
||||||
|
title: Firewall Hardening with firewalld on Fedora Fleet
|
||||||
|
domain: selfhosting
|
||||||
|
category: security
|
||||||
|
tags:
|
||||||
|
- firewall
|
||||||
|
- firewalld
|
||||||
|
- iptables
|
||||||
|
- fedora
|
||||||
|
- ansible
|
||||||
|
- security
|
||||||
|
- hardening
|
||||||
|
status: published
|
||||||
|
created: 2026-04-18
|
||||||
|
updated: 2026-04-18T11:13
|
||||||
|
---
|
||||||
|
# Firewall Hardening with firewalld on Fedora Fleet
|
||||||
|
|
||||||
|
## Overview
|
||||||
|
|
||||||
|
Fedora and RHEL-based hosts use `firewalld` as the default firewall manager, backed by `nftables`. Over time, firewall rules accumulate stale entries — decommissioned services, old IP allowances, leftover port forwards — that widen the attack surface silently. This article covers the audit-and-harden pattern for Fedora fleet hosts using Ansible.
|
||||||
|
|
||||||
|
> For Ubuntu/Debian hosts, see [ufw-firewall-management](ufw-firewall-management.md).
|
||||||
|
|
||||||
|
## The Problem with Accumulated Rules
|
||||||
|
|
||||||
|
Rules added manually or by service installers (`firewall-cmd --add-port=...`) don't get cleaned up when services are removed. Common sources of stale rules:
|
||||||
|
|
||||||
|
- Monitoring agents (Zabbix, old Netdata exporters)
|
||||||
|
- Media servers moved to another host (Jellyfin, Plex)
|
||||||
|
- Development ports left open during testing
|
||||||
|
- IP-specific allowances for home IPs that have since changed
|
||||||
|
|
||||||
|
These stale rules are invisible in day-to-day operation but show up during audits as unnecessary exposure.
|
||||||
|
|
||||||
|
## Auditing Current Rules
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# Show all active rules (nftables, what firewalld actually uses)
|
||||||
|
nft list ruleset
|
||||||
|
|
||||||
|
# Show firewalld zones and services
|
||||||
|
firewall-cmd --list-all-zones
|
||||||
|
|
||||||
|
# Show permanent config (what survives reboot)
|
||||||
|
firewall-cmd --permanent --list-all
|
||||||
|
```
|
||||||
|
|
||||||
|
Cross-reference open ports against running services:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# What's actually listening?
|
||||||
|
ss -tlnp
|
||||||
|
|
||||||
|
# Match against firewall rules — anything open that has no listener is stale
|
||||||
|
```
|
||||||
|
|
||||||
|
## Ansible Hardening Approach
|
||||||
|
|
||||||
|
Rather than patching rules incrementally, the cleanest approach is to **flush and rebuild**: remove all non-essential rules and explicitly whitelist only what the host legitimately serves. This avoids drift and makes the resulting ruleset self-documenting.
|
||||||
|
|
||||||
|
The Ansible playbook uses `ansible.posix.firewalld` to manage rules declaratively and a flush task to clear the slate before applying the desired state.
|
||||||
|
|
||||||
|
### Pattern: Flush → Rebuild
|
||||||
|
|
||||||
|
```yaml
|
||||||
|
- name: Remove stale firewalld rules
|
||||||
|
ansible.posix.firewalld:
|
||||||
|
port: "{{ item }}"
|
||||||
|
permanent: true
|
||||||
|
state: disabled
|
||||||
|
loop:
|
||||||
|
- 8096/tcp # Jellyfin — decommissioned
|
||||||
|
- 10050/tcp # Zabbix agent — removed
|
||||||
|
- 10051/tcp # Zabbix server — removed
|
||||||
|
ignore_errors: true # OK if rule doesn't exist
|
||||||
|
|
||||||
|
- name: Apply minimal whitelist
|
||||||
|
ansible.posix.firewalld:
|
||||||
|
port: "{{ item }}"
|
||||||
|
permanent: true
|
||||||
|
state: enabled
|
||||||
|
loop: "{{ allowed_ports }}"
|
||||||
|
notify: Reload firewalld
|
||||||
|
```
|
||||||
|
|
||||||
|
Define `allowed_ports` per host in `host_vars/`:
|
||||||
|
|
||||||
|
```yaml
|
||||||
|
# host_vars/majorlab/firewall.yml
|
||||||
|
allowed_ports:
|
||||||
|
- 80/tcp # Caddy HTTP
|
||||||
|
- 443/tcp # Caddy HTTPS
|
||||||
|
- 22/tcp # SSH (public)
|
||||||
|
- 2222/tcp # SSH (alt)
|
||||||
|
- 3478/tcp # Nextcloud Talk TURN
|
||||||
|
```
|
||||||
|
|
||||||
|
### Tailscale SSH: Restrict to ts-input Zone
|
||||||
|
|
||||||
|
For hosts where SSH should only be accessible via Tailscale, move the SSH rule from the public zone to the `ts-input` interface:
|
||||||
|
|
||||||
|
```yaml
|
||||||
|
- name: Remove SSH from public zone
|
||||||
|
ansible.posix.firewalld:
|
||||||
|
zone: public
|
||||||
|
service: ssh
|
||||||
|
permanent: true
|
||||||
|
state: disabled
|
||||||
|
|
||||||
|
- name: Allow SSH on Tailscale interface only
|
||||||
|
ansible.posix.firewalld:
|
||||||
|
zone: trusted
|
||||||
|
interface: tailscale0
|
||||||
|
permanent: true
|
||||||
|
state: enabled
|
||||||
|
notify: Reload firewalld
|
||||||
|
```
|
||||||
|
|
||||||
|
> **Note:** The Tailscale interface is `tailscale0` unless customized. Confirm with `ip link show` before applying.
|
||||||
|
|
||||||
|
## Per-Host Hardening Reference
|
||||||
|
|
||||||
|
Different host roles need different rule sets. These are the minimal whitelists for common MajorsHouse host types:
|
||||||
|
|
||||||
|
| Host Role | Open Ports | Notes |
|
||||||
|
|-----------|-----------|-------|
|
||||||
|
| Reverse proxy (Caddy) | 80, 443, 22/2222 | No app ports exposed — Caddy proxies internally |
|
||||||
|
| Storage/media (Plex) | 32400 (public), 22 (Tailscale-only) | Plex needs public; SSH Tailscale-only |
|
||||||
|
| Bot/Discord host | 25 (Postfix), 25000 (webUI), 6514 (syslog-TLS) | No inbound SSH needed if Tailscale-only |
|
||||||
|
| Mail server | 25, 587, 993, 443, 22 | Standard mail ports |
|
||||||
|
|
||||||
|
## Default Policy
|
||||||
|
|
||||||
|
Set the default zone policy to `DROP` (not `REJECT`) to make the host non-discoverable:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
firewall-cmd --set-default-zone=drop --permanent
|
||||||
|
firewall-cmd --reload
|
||||||
|
```
|
||||||
|
|
||||||
|
`DROP` silently discards packets; `REJECT` sends an ICMP unreachable back, confirming the host exists.
|
||||||
|
|
||||||
|
## Verifying After Apply
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# Confirm active rules match intent
|
||||||
|
firewall-cmd --list-all
|
||||||
|
|
||||||
|
# Spot-check a port that should be closed
|
||||||
|
nmap -p 10050 <host-ip>
|
||||||
|
# Expected: filtered (not open, not closed)
|
||||||
|
|
||||||
|
# Confirm a port that should be open
|
||||||
|
nmap -p 443 <host-ip>
|
||||||
|
# Expected: open
|
||||||
|
```
|
||||||
|
|
||||||
|
## Ansible Handler
|
||||||
|
|
||||||
|
```yaml
|
||||||
|
handlers:
|
||||||
|
- name: Reload firewalld
|
||||||
|
ansible.builtin.service:
|
||||||
|
name: firewalld
|
||||||
|
state: reloaded
|
||||||
|
```
|
||||||
|
|
||||||
|
## See Also
|
||||||
|
|
||||||
|
- [ufw-firewall-management](ufw-firewall-management.md) — Ubuntu/Debian equivalent
|
||||||
|
- [ssh-hardening-ansible-fleet](ssh-hardening-ansible-fleet.md)
|
||||||
|
- [linux-server-hardening-checklist](linux-server-hardening-checklist.md)
|
||||||
@@ -194,6 +194,38 @@ sudo systemctl disable --now servicename
|
|||||||
|
|
||||||
Common ones to disable on a dedicated server: `avahi-daemon`, `cups`, `bluetooth`.
|
Common ones to disable on a dedicated server: `avahi-daemon`, `cups`, `bluetooth`.
|
||||||
|
|
||||||
|
## 8. Mail Server: SpamAssassin
|
||||||
|
|
||||||
|
If you're running Postfix (like on majormail), SpamAssassin filters incoming spam before it hits your mailbox.
|
||||||
|
|
||||||
|
**Install (Fedora/RHEL):**
|
||||||
|
|
||||||
|
```bash
|
||||||
|
sudo dnf install spamassassin
|
||||||
|
sudo systemctl enable --now spamassassin
|
||||||
|
```
|
||||||
|
|
||||||
|
**Integrate with Postfix** by adding a content filter in `/etc/postfix/master.cf`. See the [full setup guide](https://www.davekb.com/browse_computer_tips:spamassassin_with_postfix:txt) for Postfix integration on RedHat-based systems.
|
||||||
|
|
||||||
|
**Train the filter with sa-learn:**
|
||||||
|
|
||||||
|
SpamAssassin gets better when you feed it examples of spam and ham (legitimate mail):
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# Train on known spam
|
||||||
|
sa-learn --spam /path/to/spam-folder/
|
||||||
|
|
||||||
|
# Train on known good mail
|
||||||
|
sa-learn --ham /path/to/ham-folder/
|
||||||
|
|
||||||
|
# Check what sa-learn knows
|
||||||
|
sa-learn --dump magic
|
||||||
|
```
|
||||||
|
|
||||||
|
Run `sa-learn` periodically against your Maildir to keep the Bayesian filter accurate. The more examples it sees, the fewer false positives and missed spam you'll get.
|
||||||
|
|
||||||
|
Reference: [sa-learn documentation](https://spamassassin.apache.org/full/3.0.x/dist/doc/sa-learn.html)
|
||||||
|
|
||||||
## Gotchas & Notes
|
## Gotchas & Notes
|
||||||
|
|
||||||
- **Don't lock yourself out.** Test SSH key auth in a second terminal before disabling passwords. Keep the original session open.
|
- **Don't lock yourself out.** Test SSH key auth in a second terminal before disabling passwords. Keep the original session open.
|
||||||
@@ -204,5 +236,5 @@ Common ones to disable on a dedicated server: `avahi-daemon`, `cups`, `bluetooth
|
|||||||
|
|
||||||
## See Also
|
## See Also
|
||||||
|
|
||||||
- [[managing-linux-services-systemd-ansible]]
|
- [managing-linux-services-systemd-ansible](../../01-linux/process-management/managing-linux-services-systemd-ansible.md)
|
||||||
- [[debugging-broken-docker-containers]]
|
- [debugging-broken-docker-containers](../docker/debugging-broken-docker-containers.md)
|
||||||
|
|||||||
95
02-selfhosting/security/selinux-fail2ban-execmem-fix.md
Normal file
95
02-selfhosting/security/selinux-fail2ban-execmem-fix.md
Normal file
@@ -0,0 +1,95 @@
|
|||||||
|
---
|
||||||
|
title: "SELinux: Fixing Fail2ban grep execmem Denial on Fedora"
|
||||||
|
domain: selfhosting
|
||||||
|
category: security
|
||||||
|
tags: [selinux, fail2ban, fedora, execmem, security]
|
||||||
|
status: published
|
||||||
|
created: 2026-04-02
|
||||||
|
updated: 2026-04-02
|
||||||
|
---
|
||||||
|
# SELinux: Fixing Fail2ban grep execmem Denial on Fedora
|
||||||
|
|
||||||
|
## The Problem
|
||||||
|
|
||||||
|
After a reboot on Fedora 43, Netdata fires a `selinux_avc_denials` WARNING alert. The audit log shows:
|
||||||
|
|
||||||
|
```
|
||||||
|
avc: denied { execmem } for comm="grep"
|
||||||
|
scontext=system_u:system_r:fail2ban_t:s0
|
||||||
|
tcontext=system_u:system_r:fail2ban_t:s0
|
||||||
|
tclass=process permissive=0
|
||||||
|
```
|
||||||
|
|
||||||
|
Fail2ban spawns `grep` to scan log files when its jails start. SELinux denies `execmem` (executable memory) for processes running in the `fail2ban_t` domain. The `fail2ban-selinux` package does not include this permission.
|
||||||
|
|
||||||
|
## Impact
|
||||||
|
|
||||||
|
- Fail2ban still functions — the denial affects grep's memory allocation strategy, not its ability to run
|
||||||
|
- Netdata will keep alerting on every reboot (fail2ban restarts and triggers the denial)
|
||||||
|
- No security risk — this is fail2ban's own grep subprocess, not an external exploit
|
||||||
|
|
||||||
|
## The Fix
|
||||||
|
|
||||||
|
Create a targeted SELinux policy module that allows `execmem` for `fail2ban_t`:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
cd /tmp
|
||||||
|
|
||||||
|
cat > my-fail2ban-grep.te << "EOF"
|
||||||
|
module my-fail2ban-grep 1.0;
|
||||||
|
|
||||||
|
require {
|
||||||
|
type fail2ban_t;
|
||||||
|
class process execmem;
|
||||||
|
}
|
||||||
|
|
||||||
|
allow fail2ban_t self:process execmem;
|
||||||
|
EOF
|
||||||
|
|
||||||
|
# Compile the module
|
||||||
|
checkmodule -M -m -o my-fail2ban-grep.mod my-fail2ban-grep.te
|
||||||
|
|
||||||
|
# Package it
|
||||||
|
semodule_package -o my-fail2ban-grep.pp -m my-fail2ban-grep.mod
|
||||||
|
|
||||||
|
# Install at priority 300 (above default policy)
|
||||||
|
semodule -X 300 -i my-fail2ban-grep.pp
|
||||||
|
```
|
||||||
|
|
||||||
|
## Verifying
|
||||||
|
|
||||||
|
Confirm the module is loaded:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
semodule -l | grep fail2ban-grep
|
||||||
|
# Expected: my-fail2ban-grep
|
||||||
|
```
|
||||||
|
|
||||||
|
Check that no new AVC denials appear after restarting fail2ban:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
systemctl restart fail2ban
|
||||||
|
ausearch -m avc --start recent | grep fail2ban
|
||||||
|
# Expected: no output (no new denials)
|
||||||
|
```
|
||||||
|
|
||||||
|
## Why Not `audit2allow` Directly?
|
||||||
|
|
||||||
|
The common shortcut `ausearch -c grep --raw | audit2allow -M my-policy` can fail if:
|
||||||
|
|
||||||
|
- The AVC events have already rotated out of the audit log
|
||||||
|
- `ausearch` returns no matching records (outputs "Nothing to do")
|
||||||
|
|
||||||
|
Writing the `.te` file manually is more reliable and self-documenting.
|
||||||
|
|
||||||
|
## Environment
|
||||||
|
|
||||||
|
- **OS:** Fedora 43
|
||||||
|
- **SELinux:** Enforcing, targeted policy
|
||||||
|
- **Fail2ban:** 1.1.0 (`fail2ban-selinux-1.1.0-15.fc43.noarch`)
|
||||||
|
- **Kernel:** 6.19.x
|
||||||
|
|
||||||
|
## See Also
|
||||||
|
|
||||||
|
- [Docker & Caddy Recovery After Reboot (Fedora + SELinux)](../../05-troubleshooting/docker-caddy-selinux-post-reboot-recovery.md) — another SELinux fix for post-reboot service issues
|
||||||
|
- [SELinux: Fixing Dovecot Mail Spool Context](../../05-troubleshooting/selinux-dovecot-vmail-context.md) — custom SELinux context for mail spool
|
||||||
138
02-selfhosting/security/ssh-hardening-ansible-fleet.md
Normal file
138
02-selfhosting/security/ssh-hardening-ansible-fleet.md
Normal file
@@ -0,0 +1,138 @@
|
|||||||
|
---
|
||||||
|
title: "SSH Hardening Fleet-Wide with Ansible"
|
||||||
|
domain: selfhosting
|
||||||
|
category: security
|
||||||
|
tags: [ssh, ansible, security, hardening, fleet]
|
||||||
|
status: published
|
||||||
|
created: 2026-04-17
|
||||||
|
updated: 2026-04-17
|
||||||
|
---
|
||||||
|
# SSH Hardening Fleet-Wide with Ansible
|
||||||
|
|
||||||
|
## Overview
|
||||||
|
|
||||||
|
Default SSH daemon settings on both Ubuntu and Fedora/RHEL are permissive. A drop-in configuration file (`/etc/ssh/sshd_config.d/99-hardening.conf`) lets you tighten settings without touching the distro-managed base config — and Ansible can deploy it atomically across every fleet host with a single playbook run.
|
||||||
|
|
||||||
|
## Settings to Change
|
||||||
|
|
||||||
|
| Setting | Default | Hardened | Reason |
|
||||||
|
|---|---|---|---|
|
||||||
|
| `PermitRootLogin` | `yes` | `without-password` | Prevent password-based root login; key auth still works for Ansible |
|
||||||
|
| `X11Forwarding` | `yes` | `no` | Nothing in a typical homelab fleet uses X11 tunneling |
|
||||||
|
| `AllowTcpForwarding` | `yes` | `no` | Eliminates a tunneling vector if a service account is compromised |
|
||||||
|
| `MaxAuthTries` | `6` | `3` | Cuts per-connection brute-force attempts in half |
|
||||||
|
| `LoginGraceTime` | `120` | `30` | Reduces the window for slow-connect attacks |
|
||||||
|
|
||||||
|
## The Drop-in Approach
|
||||||
|
|
||||||
|
Rather than editing `/etc/ssh/sshd_config` directly (which may be managed by the distro or overwritten on upgrades), place overrides in `/etc/ssh/sshd_config.d/99-hardening.conf`. The `Include /etc/ssh/sshd_config.d/*.conf` directive in the base config loads these in alphabetical order, and **first match wins** — so `99-` ensures your overrides come last and take precedence.
|
||||||
|
|
||||||
|
> **Fedora/RHEL gotcha:** Fedora ships `/etc/ssh/sshd_config.d/50-redhat.conf` which sets `X11Forwarding yes`. Because first-match-wins applies, `50-redhat.conf` loads before `99-hardening.conf` and wins. You must patch `50-redhat.conf` in-place before deploying your drop-in, or the X11Forwarding setting will be silently ignored.
|
||||||
|
|
||||||
|
## Ansible Playbook
|
||||||
|
|
||||||
|
```yaml
|
||||||
|
- name: Harden SSH daemon fleet-wide
|
||||||
|
hosts: all:!raspbian
|
||||||
|
become: true
|
||||||
|
gather_facts: true
|
||||||
|
|
||||||
|
tasks:
|
||||||
|
|
||||||
|
- name: Ensure sshd_config.d directory exists
|
||||||
|
ansible.builtin.file:
|
||||||
|
path: /etc/ssh/sshd_config.d
|
||||||
|
state: directory
|
||||||
|
owner: root
|
||||||
|
group: root
|
||||||
|
mode: '0755'
|
||||||
|
|
||||||
|
- name: Ensure Include directive is present in sshd_config
|
||||||
|
ansible.builtin.lineinfile:
|
||||||
|
path: /etc/ssh/sshd_config
|
||||||
|
line: "Include /etc/ssh/sshd_config.d/*.conf"
|
||||||
|
insertbefore: BOF
|
||||||
|
state: present
|
||||||
|
|
||||||
|
# Fedora only: neutralize 50-redhat.conf's X11Forwarding yes
|
||||||
|
# (first-match-wins means it would override our 99- drop-in)
|
||||||
|
- name: Comment out X11Forwarding in 50-redhat.conf (Fedora)
|
||||||
|
ansible.builtin.replace:
|
||||||
|
path: /etc/ssh/sshd_config.d/50-redhat.conf
|
||||||
|
regexp: '^(X11Forwarding yes)'
|
||||||
|
replace: '# \1 # disabled by ansible hardening'
|
||||||
|
when: ansible_os_family == "RedHat"
|
||||||
|
ignore_errors: true
|
||||||
|
|
||||||
|
- name: Deploy SSH hardening drop-in
|
||||||
|
ansible.builtin.copy:
|
||||||
|
dest: /etc/ssh/sshd_config.d/99-hardening.conf
|
||||||
|
content: |
|
||||||
|
# Managed by Ansible — do not edit manually
|
||||||
|
PermitRootLogin without-password
|
||||||
|
X11Forwarding no
|
||||||
|
AllowTcpForwarding no
|
||||||
|
MaxAuthTries 3
|
||||||
|
LoginGraceTime 30
|
||||||
|
owner: root
|
||||||
|
group: root
|
||||||
|
mode: '0644'
|
||||||
|
notify: Reload sshd
|
||||||
|
|
||||||
|
- name: Verify effective SSH settings
|
||||||
|
ansible.builtin.command:
|
||||||
|
cmd: sshd -T
|
||||||
|
register: sshd_effective
|
||||||
|
changed_when: false
|
||||||
|
|
||||||
|
- name: Assert hardened settings are active
|
||||||
|
ansible.builtin.assert:
|
||||||
|
that:
|
||||||
|
- "'permitrootlogin without-password' in sshd_effective.stdout"
|
||||||
|
- "'x11forwarding no' in sshd_effective.stdout"
|
||||||
|
- "'allowtcpforwarding no' in sshd_effective.stdout"
|
||||||
|
- "'maxauthtries 3' in sshd_effective.stdout"
|
||||||
|
- "'logingracetime 30' in sshd_effective.stdout"
|
||||||
|
fail_msg: "One or more SSH hardening settings not effective — check for conflicting config"
|
||||||
|
when: not ansible_check_mode
|
||||||
|
|
||||||
|
handlers:
|
||||||
|
|
||||||
|
- name: Reload sshd
|
||||||
|
ansible.builtin.service:
|
||||||
|
# Ubuntu/Debian: 'ssh' | Fedora/RHEL: 'sshd'
|
||||||
|
name: "{{ 'ssh' if ansible_os_family == 'Debian' else 'sshd' }}"
|
||||||
|
state: reloaded
|
||||||
|
```
|
||||||
|
|
||||||
|
## Edge Cases
|
||||||
|
|
||||||
|
**Ubuntu vs Fedora service name:** The SSH daemon is `ssh` on Debian/Ubuntu and `sshd` on Fedora/RHEL. The handler uses `ansible_os_family` to pick the right name automatically.
|
||||||
|
|
||||||
|
**Missing Include directive:** Some minimal installs don't have `Include /etc/ssh/sshd_config.d/*.conf` in their base config. The `lineinfile` task adds it if absent. Without this, the drop-in directory exists but is never loaded.
|
||||||
|
|
||||||
|
**Fedora's 50-redhat.conf:** Sets `X11Forwarding yes` with first-match priority. The playbook patches it before deploying the drop-in.
|
||||||
|
|
||||||
|
**`sshd -T` in check mode:** `sshd -T` reads the *current* running config, not the pending changes. The assert task is guarded with `when: not ansible_check_mode` to prevent false failures during dry runs.
|
||||||
|
|
||||||
|
**PermitRootLogin on hosts that already had it set:** Some hosts (e.g., those managed by another tool) may already have `PermitRootLogin without-password` set elsewhere. The drop-in still applies cleanly — it just becomes a no-op for that setting.
|
||||||
|
|
||||||
|
## Verify Manually
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# Check effective settings on any host
|
||||||
|
ssh root@<host> "sshd -T | grep -E 'permitrootlogin|x11forwarding|allowtcpforwarding|maxauthtries|logingracetime'"
|
||||||
|
|
||||||
|
# Expected:
|
||||||
|
# permitrootlogin without-password
|
||||||
|
# x11forwarding no
|
||||||
|
# allowtcpforwarding no
|
||||||
|
# maxauthtries 3
|
||||||
|
# logingracetime 30
|
||||||
|
```
|
||||||
|
|
||||||
|
## See Also
|
||||||
|
|
||||||
|
- [linux-server-hardening-checklist](linux-server-hardening-checklist.md)
|
||||||
|
- [ansible-unattended-upgrades-fleet](ansible-unattended-upgrades-fleet.md)
|
||||||
|
- [ufw-firewall-management](ufw-firewall-management.md)
|
||||||
192
02-selfhosting/security/ufw-firewall-management.md
Normal file
192
02-selfhosting/security/ufw-firewall-management.md
Normal file
@@ -0,0 +1,192 @@
|
|||||||
|
---
|
||||||
|
title: "UFW Firewall Management"
|
||||||
|
domain: selfhosting
|
||||||
|
category: security
|
||||||
|
tags: [security, firewall, ufw, ubuntu, networking]
|
||||||
|
status: published
|
||||||
|
created: 2026-04-02
|
||||||
|
updated: 2026-04-03
|
||||||
|
---
|
||||||
|
|
||||||
|
# UFW Firewall Management
|
||||||
|
|
||||||
|
UFW (Uncomplicated Firewall) is the standard firewall tool on Ubuntu. It wraps iptables/nftables into something you can actually manage without losing your mind. This covers the syntax and patterns I use across the MajorsHouse fleet.
|
||||||
|
|
||||||
|
## The Short Answer
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# Enable UFW
|
||||||
|
sudo ufw enable
|
||||||
|
|
||||||
|
# Allow a port
|
||||||
|
sudo ufw allow 80
|
||||||
|
|
||||||
|
# Block a specific IP
|
||||||
|
sudo ufw insert 1 deny from 203.0.113.50
|
||||||
|
|
||||||
|
# Check status
|
||||||
|
sudo ufw status numbered
|
||||||
|
```
|
||||||
|
|
||||||
|
## Basic Rules
|
||||||
|
|
||||||
|
### Allow by Port
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# Allow HTTP and HTTPS
|
||||||
|
sudo ufw allow 80
|
||||||
|
sudo ufw allow 443
|
||||||
|
|
||||||
|
# Allow a port range
|
||||||
|
sudo ufw allow 6000:6010/tcp
|
||||||
|
|
||||||
|
# Allow a named application profile
|
||||||
|
sudo ufw allow 'Apache Full'
|
||||||
|
```
|
||||||
|
|
||||||
|
### Allow by Interface
|
||||||
|
|
||||||
|
Useful when you only want traffic on a specific network interface — this is how SSH is restricted to Tailscale across the fleet:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# Allow SSH only on the Tailscale interface
|
||||||
|
sudo ufw allow in on tailscale0 to any port 22
|
||||||
|
|
||||||
|
# Then deny SSH globally (evaluated after the allow above)
|
||||||
|
sudo ufw deny 22
|
||||||
|
```
|
||||||
|
|
||||||
|
Rule order matters. UFW evaluates rules top to bottom and stops at the first match.
|
||||||
|
|
||||||
|
### Allow by Source IP
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# Allow a specific IP to access SSH
|
||||||
|
sudo ufw allow from 100.86.14.126 to any port 22
|
||||||
|
|
||||||
|
# Allow a subnet
|
||||||
|
sudo ufw allow from 192.168.50.0/24 to any port 22
|
||||||
|
```
|
||||||
|
|
||||||
|
## Blocking IPs
|
||||||
|
|
||||||
|
### Insert Rules at the Top
|
||||||
|
|
||||||
|
When blocking IPs, use `insert 1` to place the deny rule at the top of the chain. Otherwise it may never be evaluated because an earlier ALLOW rule matches first.
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# Block a single IP
|
||||||
|
sudo ufw insert 1 deny from 203.0.113.50
|
||||||
|
|
||||||
|
# Block a subnet
|
||||||
|
sudo ufw insert 1 deny from 203.0.113.0/24
|
||||||
|
|
||||||
|
# Block an IP from a specific port only
|
||||||
|
sudo ufw insert 1 deny from 203.0.113.50 to any port 443
|
||||||
|
```
|
||||||
|
|
||||||
|
### Don't Accumulate Manual Blocks
|
||||||
|
|
||||||
|
Manual `ufw deny` rules pile up fast. On one of my servers, I found **30,142 manual DENY rules** — a 3 MB rules file that every packet had to traverse. Use Fail2ban for automated blocking instead. It manages bans with expiry and doesn't pollute your UFW rules.
|
||||||
|
|
||||||
|
If you inherit a server with thousands of manual blocks:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# Nuclear option — reset and re-add only the rules you need
|
||||||
|
sudo ufw --force reset
|
||||||
|
sudo ufw allow 'Apache Full'
|
||||||
|
sudo ufw allow in on tailscale0 to any port 22
|
||||||
|
sudo ufw deny 22
|
||||||
|
sudo ufw enable
|
||||||
|
```
|
||||||
|
|
||||||
|
## Managing Rules
|
||||||
|
|
||||||
|
### View Rules
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# Simple view
|
||||||
|
sudo ufw status
|
||||||
|
|
||||||
|
# Numbered (needed for deletion and insert position)
|
||||||
|
sudo ufw status numbered
|
||||||
|
|
||||||
|
# Verbose (shows default policies and logging)
|
||||||
|
sudo ufw status verbose
|
||||||
|
```
|
||||||
|
|
||||||
|
### Delete Rules
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# Delete by rule number
|
||||||
|
sudo ufw delete 3
|
||||||
|
|
||||||
|
# Delete by rule specification
|
||||||
|
sudo ufw delete allow 8080
|
||||||
|
```
|
||||||
|
|
||||||
|
### Default Policies
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# Deny all incoming, allow all outgoing (recommended baseline)
|
||||||
|
sudo ufw default deny incoming
|
||||||
|
sudo ufw default allow outgoing
|
||||||
|
```
|
||||||
|
|
||||||
|
## Don't Forget Web Server Ports
|
||||||
|
|
||||||
|
If you're running a web server behind UFW, make sure ports 80 and 443 are explicitly allowed. This sounds obvious, but it's easy to miss — especially on servers where UFW was enabled after the web server was already running, or where a firewall reset dropped rules that were never persisted.
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# Allow HTTP and HTTPS
|
||||||
|
sudo ufw allow 80
|
||||||
|
sudo ufw allow 443
|
||||||
|
|
||||||
|
# Or use an application profile
|
||||||
|
sudo ufw allow 'Apache Full'
|
||||||
|
```
|
||||||
|
|
||||||
|
If your site suddenly stops responding after enabling UFW or resetting rules, check `sudo ufw status numbered` first. Missing web ports is the most common cause.
|
||||||
|
|
||||||
|
## UFW with Fail2ban
|
||||||
|
|
||||||
|
On Ubuntu servers, Fail2ban and UFW operate at different layers. Fail2ban typically creates its own nftables table (`inet f2b-table`) at a higher priority than UFW's chains. This means:
|
||||||
|
|
||||||
|
- Fail2ban bans take effect **before** UFW rules are evaluated
|
||||||
|
- A banned IP is rejected even if UFW has an ALLOW rule for that port
|
||||||
|
- Add trusted IPs (your own, monitoring, etc.) to `ignoreip` in `/etc/fail2ban/jail.local` to prevent self-lockout
|
||||||
|
|
||||||
|
```ini
|
||||||
|
# /etc/fail2ban/jail.local
|
||||||
|
[DEFAULT]
|
||||||
|
ignoreip = 127.0.0.1/8 ::1 100.0.0.0/8
|
||||||
|
```
|
||||||
|
|
||||||
|
The `100.0.0.0/8` range covers all Tailscale IPs, which prevents banning fleet traffic.
|
||||||
|
|
||||||
|
## UFW Logging
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# Enable logging (low/medium/high/full)
|
||||||
|
sudo ufw logging medium
|
||||||
|
```
|
||||||
|
|
||||||
|
Logs go to `/var/log/ufw.log`. Useful for seeing what's getting blocked, but `medium` or `low` is usually enough — `high` and `full` can be noisy.
|
||||||
|
|
||||||
|
## Fleet Reference
|
||||||
|
|
||||||
|
UFW is used on these MajorsHouse servers:
|
||||||
|
|
||||||
|
| Host | Key UFW Rules |
|
||||||
|
|---|---|
|
||||||
|
| majortoot | SSH on tailscale0, deny 22 globally |
|
||||||
|
| majorlinux | SSH on tailscale0, deny 22 globally |
|
||||||
|
| tttpod | SSH on tailscale0, deny 22 globally, Apache Full (added 2026-04-03) |
|
||||||
|
| teelia | SSH on tailscale0, deny 22 globally, Apache Full |
|
||||||
|
|
||||||
|
The Fedora servers (majorlab, majorhome, majormail, majordiscord) use iptables or firewalld instead.
|
||||||
|
|
||||||
|
## See Also
|
||||||
|
|
||||||
|
- [Linux Server Hardening Checklist](linux-server-hardening-checklist.md) — initial firewall setup as part of server provisioning
|
||||||
|
- [Fail2ban & UFW Rule Bloat Cleanup](../../05-troubleshooting/networking/fail2ban-ufw-rule-bloat-cleanup.md) — what happens when manual blocks get out of hand
|
||||||
121
02-selfhosting/services/ghost-smtp-mailgun-setup.md
Normal file
121
02-selfhosting/services/ghost-smtp-mailgun-setup.md
Normal file
@@ -0,0 +1,121 @@
|
|||||||
|
---
|
||||||
|
title: Ghost Email Configuration with Mailgun
|
||||||
|
domain: selfhosting
|
||||||
|
category: services
|
||||||
|
tags:
|
||||||
|
- ghost
|
||||||
|
- mailgun
|
||||||
|
- smtp
|
||||||
|
- email
|
||||||
|
- docker
|
||||||
|
- newsletter
|
||||||
|
status: published
|
||||||
|
created: 2026-04-18
|
||||||
|
updated: 2026-04-18T11:13
|
||||||
|
---
|
||||||
|
# Ghost Email Configuration with Mailgun
|
||||||
|
|
||||||
|
## Overview
|
||||||
|
|
||||||
|
Ghost uses **two separate mail systems** that must be configured independently. This is the most common source of confusion in Ghost email setup — configuring one does not configure the other.
|
||||||
|
|
||||||
|
| System | Purpose | Where configured |
|
||||||
|
|--------|---------|-----------------|
|
||||||
|
| **Newsletter / Member email** | Sending posts to subscribers | Ghost Admin UI → Settings → Email (stored in DB) |
|
||||||
|
| **Transactional / Staff email** | Magic links, password resets, admin notifications | `docker-compose.yml` environment variables |
|
||||||
|
|
||||||
|
Both should route through Mailgun for consistent deliverability and tracking.
|
||||||
|
|
||||||
|
## Prerequisites
|
||||||
|
|
||||||
|
- A Mailgun account with a verified sending domain
|
||||||
|
- DNS access for your sending domain
|
||||||
|
- Ghost running in Docker (this guide assumes Docker Compose)
|
||||||
|
|
||||||
|
## Step 1 — DNS Records
|
||||||
|
|
||||||
|
Add these records to your sending domain before configuring Ghost. Mailgun will verify them before allowing sends.
|
||||||
|
|
||||||
|
| Type | Name | Value |
|
||||||
|
|------|------|-------|
|
||||||
|
| TXT | `@` | `v=spf1 include:mailgun.org ~all` |
|
||||||
|
| TXT | `pdk1._domainkey` | *(provided by Mailgun — long DKIM key)* |
|
||||||
|
| CNAME | `email` | `mailgun.org` |
|
||||||
|
|
||||||
|
The tracking CNAME (`email.yourdomain.com`) enables Mailgun's open/click tracking. Ghost's EmailAnalytics feature requires it.
|
||||||
|
|
||||||
|
After adding records, verify in Mailgun → Sending → Domains → your domain → DNS Records. All records should show green.
|
||||||
|
|
||||||
|
## Step 2 — Newsletter Email (Mailgun API)
|
||||||
|
|
||||||
|
Configure in **Ghost Admin → Settings → Email newsletter**. Ghost stores these settings in its database `settings` table — not in the compose file.
|
||||||
|
|
||||||
|
| Setting | Value |
|
||||||
|
|---------|-------|
|
||||||
|
| Mailgun region | US (api.mailgun.net) or EU (api.eu.mailgun.net) |
|
||||||
|
| Mailgun domain | `yourdomain.com` |
|
||||||
|
| Mailgun API key | Private API key from Mailgun dashboard |
|
||||||
|
|
||||||
|
Ghost uses the Mailgun API (not SMTP) for newsletter delivery. This enables open tracking, click tracking, and the EmailAnalytics dashboard.
|
||||||
|
|
||||||
|
> **Verify via DB:** If Ghost is MySQL-backed, you can confirm the settings landed:
|
||||||
|
> ```bash
|
||||||
|
> docker exec <db-container> mysql -u root -p<password> ghost \
|
||||||
|
> -e "SELECT key_name, value FROM settings WHERE key_name LIKE 'mailgun%';"
|
||||||
|
> ```
|
||||||
|
|
||||||
|
## Step 3 — Transactional Email (SMTP via Mailgun)
|
||||||
|
|
||||||
|
Configure in `docker-compose.yml` as environment variables. Ghost's default transport (`Direct`) attempts raw SMTP delivery, which is blocked by most hosting providers and treated as spam. Mailgun SMTP is the reliable path.
|
||||||
|
|
||||||
|
```yaml
|
||||||
|
services:
|
||||||
|
ghost:
|
||||||
|
image: ghost:6-alpine
|
||||||
|
environment:
|
||||||
|
# ... other Ghost config ...
|
||||||
|
mail__transport: SMTP
|
||||||
|
mail__from: noreply@yourdomain.com
|
||||||
|
mail__options__host: smtp.mailgun.org
|
||||||
|
mail__options__port: 587
|
||||||
|
mail__options__auth__user: postmaster@yourdomain.com
|
||||||
|
mail__options__auth__pass: <mailgun-smtp-password>
|
||||||
|
```
|
||||||
|
|
||||||
|
The SMTP password is separate from the API key. Find it in Mailgun → Sending → Domains → your domain → SMTP credentials → `postmaster@yourdomain.com`.
|
||||||
|
|
||||||
|
After updating the compose file, restart Ghost:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
cd /root/<stack-dir> && docker compose up -d
|
||||||
|
```
|
||||||
|
|
||||||
|
Check logs for a clean boot with no mail-related warnings:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
docker logs <ghost-container> 2>&1 | grep -i mail
|
||||||
|
```
|
||||||
|
|
||||||
|
## Verifying the Full Stack
|
||||||
|
|
||||||
|
**Newsletter:** Send a test post to members (even with 1 subscriber). Check Ghost Admin → Posts → sent post → Email analytics. Delivered count should increment within minutes.
|
||||||
|
|
||||||
|
**Transactional:** Trigger a staff magic link (Ghost Admin → sign out → request magic link). The email should arrive within seconds.
|
||||||
|
|
||||||
|
**Mailgun logs:** Mailgun → Logs → Events shows all API and SMTP activity. Filter by domain to isolate Ghost sends.
|
||||||
|
|
||||||
|
## Common Issues
|
||||||
|
|
||||||
|
**Newsletter sends but staff emails don't arrive (or vice versa):** The two systems are independent. Check both configurations separately.
|
||||||
|
|
||||||
|
**`transport: Direct` in config:** Ghost writes a `config.production.json` inside the container. If `mail.transport` shows `Direct`, the environment variables didn't apply — verify the compose key names (double underscores for nested config).
|
||||||
|
|
||||||
|
**Mailgun API key vs SMTP password:** These are different credentials. The API key (starts with `key-`) is for the newsletter system. The SMTP password is for the transactional system. Don't mix them.
|
||||||
|
|
||||||
|
**Domain state: `unverified` in Mailgun:** DNS records haven't propagated or are wrong. Use `dig TXT yourdomain.com` and `dig TXT pdk1._domainkey.yourdomain.com` to verify from outside your network.
|
||||||
|
|
||||||
|
## See Also
|
||||||
|
|
||||||
|
- [ghost-emailanalytics-lag-warning](../../05-troubleshooting/ghost-emailanalytics-lag-warning.md)
|
||||||
|
- [docker-healthchecks](../docker/docker-healthchecks.md)
|
||||||
|
- [watchtower-smtp-localhost-relay](../docker/watchtower-smtp-localhost-relay.md)
|
||||||
68
02-selfhosting/services/mastodon-instance-tuning.md
Normal file
68
02-selfhosting/services/mastodon-instance-tuning.md
Normal file
@@ -0,0 +1,68 @@
|
|||||||
|
---
|
||||||
|
title: "Mastodon Instance Tuning"
|
||||||
|
domain: selfhosting
|
||||||
|
category: services
|
||||||
|
tags: [mastodon, fediverse, self-hosting, majortoot, docker]
|
||||||
|
status: published
|
||||||
|
created: 2026-04-02
|
||||||
|
updated: 2026-04-02
|
||||||
|
---
|
||||||
|
|
||||||
|
# Mastodon Instance Tuning
|
||||||
|
|
||||||
|
Running your own Mastodon instance means you control the rules — including limits the upstream project imposes by default. These are the tweaks applied to **majortoot** (MajorsHouse's Mastodon instance).
|
||||||
|
|
||||||
|
## Increase Character Limit
|
||||||
|
|
||||||
|
Mastodon's default 500-character post limit is low for longer-form thoughts. You can raise it, but it requires modifying the source — there's no config toggle.
|
||||||
|
|
||||||
|
The process depends on your deployment method (Docker vs bare metal) and Mastodon version. The community-maintained guide covers the approaches:
|
||||||
|
|
||||||
|
- [How to increase the max number of characters of a post](https://qa.mastoadmin.social/questions/10010000000000011/how-do-i-increase-the-max-number-of-characters-of-a-post)
|
||||||
|
|
||||||
|
**Key points:**
|
||||||
|
- The limit is enforced in both the backend (Ruby) and frontend (React). Both must be changed or the UI will reject posts the API would accept.
|
||||||
|
- After changing, you need to rebuild assets and restart services.
|
||||||
|
- Other instances will still display the full post — the character limit is per-instance, not a federation constraint.
|
||||||
|
- Some Mastodon forks (Glitch, Hometown) expose this as a config option without source patches.
|
||||||
|
|
||||||
|
## Media Cache Management
|
||||||
|
|
||||||
|
Federated content (avatars, headers, media from remote posts) gets cached locally. On a small instance this grows slowly, but over months it adds up — especially if you follow active accounts on large instances.
|
||||||
|
|
||||||
|
Reference: [Fedicache — Understanding Mastodon's media cache](https://notes.neatnik.net/2024/08/fedicache)
|
||||||
|
|
||||||
|
**Clean up cached remote media:**
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# Preview what would be removed (older than 7 days)
|
||||||
|
tootctl media remove --days 7 --dry-run
|
||||||
|
|
||||||
|
# Actually remove it
|
||||||
|
tootctl media remove --days 7
|
||||||
|
|
||||||
|
# For Docker deployments
|
||||||
|
docker exec mastodon-web tootctl media remove --days 7
|
||||||
|
```
|
||||||
|
|
||||||
|
**Automate with cron or systemd timer:**
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# Weekly cache cleanup — crontab
|
||||||
|
0 3 * * 0 docker exec mastodon-web tootctl media remove --days 7
|
||||||
|
```
|
||||||
|
|
||||||
|
**What gets removed:** Only cached copies of remote media. Local uploads (your posts, your users' posts) are never touched. Remote media will be re-fetched on demand if someone views the post again.
|
||||||
|
|
||||||
|
**Storage impact:** On a single-user instance, remote media cache can still reach several GB over a few months of active federation. Regular cleanup keeps disk usage predictable.
|
||||||
|
|
||||||
|
## Gotchas & Notes
|
||||||
|
|
||||||
|
- **Character limit changes break on upgrades.** Any source patch gets overwritten when you pull a new Mastodon release. Track your changes and reapply after updates.
|
||||||
|
- **`tootctl` is your admin CLI.** It handles media cleanup, user management, federation diagnostics, and more. Run `tootctl --help` for the full list.
|
||||||
|
- **Monitor disk usage.** Even with cache cleanup, the PostgreSQL database and local media uploads grow over time. Keep an eye on it.
|
||||||
|
|
||||||
|
## See Also
|
||||||
|
|
||||||
|
- [self-hosting-starter-guide](../docker/self-hosting-starter-guide.md)
|
||||||
|
- [docker-healthchecks](../docker/docker-healthchecks.md)
|
||||||
121
02-selfhosting/services/updating-n8n-docker.md
Normal file
121
02-selfhosting/services/updating-n8n-docker.md
Normal file
@@ -0,0 +1,121 @@
|
|||||||
|
---
|
||||||
|
title: "Updating n8n Running in Docker"
|
||||||
|
domain: selfhosting
|
||||||
|
category: services
|
||||||
|
tags: [n8n, docker, update, self-hosting, automation]
|
||||||
|
status: published
|
||||||
|
created: 2026-03-30
|
||||||
|
updated: 2026-03-30
|
||||||
|
---
|
||||||
|
|
||||||
|
# Updating n8n Running in Docker
|
||||||
|
|
||||||
|
n8n's in-app update notification checks against their npm release version, which often gets published before the `latest` Docker Hub tag is updated. This means you may see an update prompt in the UI even though `docker pull` reports the image as current. Pull a pinned version tag instead.
|
||||||
|
|
||||||
|
## Check Current vs Latest Version
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# Check what's running
|
||||||
|
docker exec n8n-n8n-1 n8n --version
|
||||||
|
|
||||||
|
# Check what npm (n8n's upstream) says is latest
|
||||||
|
docker exec n8n-n8n-1 npm show n8n version
|
||||||
|
```
|
||||||
|
|
||||||
|
If the versions differ, the Docker Hub `latest` tag hasn't caught up yet. Use the pinned version tag.
|
||||||
|
|
||||||
|
## Get the Running Container's Config
|
||||||
|
|
||||||
|
Before stopping anything, capture the full environment so you can recreate the container identically:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
docker inspect n8n-n8n-1 --format '{{json .Config.Env}}'
|
||||||
|
docker inspect n8n-n8n-1 --format '{{range .Mounts}}{{.Source}} -> {{.Destination}}{{println}}{{end}}'
|
||||||
|
```
|
||||||
|
|
||||||
|
For MajorsHouse, the relevant env vars are:
|
||||||
|
|
||||||
|
```
|
||||||
|
N8N_EDITOR_BASE_URL=https://n8n.majorshouse.com/
|
||||||
|
N8N_PORT=5678
|
||||||
|
TZ=America/New_York
|
||||||
|
N8N_TRUST_PROXY=true
|
||||||
|
GENERIC_TIMEZONE=America/New_York
|
||||||
|
N8N_HOST=n8n.majorshouse.com
|
||||||
|
N8N_PROTOCOL=https
|
||||||
|
WEBHOOK_URL=https://n8n.majorshouse.com/
|
||||||
|
```
|
||||||
|
|
||||||
|
Data volume: `n8n_n8n_data:/home/node/.n8n`
|
||||||
|
|
||||||
|
## Perform the Update
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# 1. Pull the specific version (replace 2.14.2 with target version)
|
||||||
|
docker pull docker.n8n.io/n8nio/n8n:2.14.2
|
||||||
|
|
||||||
|
# 2. Stop and remove the old container
|
||||||
|
docker stop n8n-n8n-1 && docker rm n8n-n8n-1
|
||||||
|
|
||||||
|
# 3. Start fresh with the new image and same settings
|
||||||
|
docker run -d \
|
||||||
|
--name n8n-n8n-1 \
|
||||||
|
--restart unless-stopped \
|
||||||
|
-p 127.0.0.1:5678:5678 \
|
||||||
|
-v n8n_n8n_data:/home/node/.n8n \
|
||||||
|
-e N8N_EDITOR_BASE_URL=https://n8n.majorshouse.com/ \
|
||||||
|
-e N8N_PORT=5678 \
|
||||||
|
-e TZ=America/New_York \
|
||||||
|
-e N8N_TRUST_PROXY=true \
|
||||||
|
-e GENERIC_TIMEZONE=America/New_York \
|
||||||
|
-e N8N_HOST=n8n.majorshouse.com \
|
||||||
|
-e N8N_PROTOCOL=https \
|
||||||
|
-e WEBHOOK_URL=https://n8n.majorshouse.com/ \
|
||||||
|
docker.n8n.io/n8nio/n8n:2.14.2
|
||||||
|
|
||||||
|
# 4. Verify
|
||||||
|
docker exec n8n-n8n-1 n8n --version
|
||||||
|
docker ps --filter name=n8n-n8n-1 --format '{{.Status}}'
|
||||||
|
```
|
||||||
|
|
||||||
|
No restart of Caddy or other services required. Workflows, credentials, and execution history are preserved in the data volume.
|
||||||
|
|
||||||
|
## Reset a Forgotten Admin Password
|
||||||
|
|
||||||
|
n8n uses SQLite at `/home/node/.n8n/database.sqlite` (mapped to `n8n_n8n_data` on the host). Use Python to generate a valid bcrypt hash and update it directly — do **not** use shell variable interpolation, as `$` characters in bcrypt hashes will be eaten.
|
||||||
|
|
||||||
|
```bash
|
||||||
|
python3 -c "
|
||||||
|
import bcrypt, sqlite3
|
||||||
|
pw = b'your-new-password'
|
||||||
|
h = bcrypt.hashpw(pw, bcrypt.gensalt(rounds=10)).decode()
|
||||||
|
db = sqlite3.connect('/var/lib/docker/volumes/n8n_n8n_data/_data/database.sqlite')
|
||||||
|
db.execute(\"UPDATE user SET password=? WHERE email='marcus@majorshouse.com'\", (h,))
|
||||||
|
db.commit()
|
||||||
|
db.close()
|
||||||
|
db2 = sqlite3.connect('/var/lib/docker/volumes/n8n_n8n_data/_data/database.sqlite')
|
||||||
|
row = db2.execute(\"SELECT password FROM user WHERE email='marcus@majorshouse.com'\").fetchone()
|
||||||
|
print('Valid:', bcrypt.checkpw(pw, row[0].encode()))
|
||||||
|
"
|
||||||
|
```
|
||||||
|
|
||||||
|
`Valid: True` confirms the hash is correct. No container restart needed.
|
||||||
|
|
||||||
|
## Why Arcane Doesn't Always Catch It
|
||||||
|
|
||||||
|
[Arcane](https://getarcaneapp.com) watches Docker Hub for image digest changes. When n8n publishes a new release, there's often a delay before the `latest` tag on Docker Hub is updated to match. During that window:
|
||||||
|
|
||||||
|
- n8n's in-app updater (checks npm) reports an update available
|
||||||
|
- `docker pull latest` and Arcane both report the image as current
|
||||||
|
|
||||||
|
Once Docker Hub catches up, Arcane will notify normally. For immediate updates, use pinned version tags as shown above.
|
||||||
|
|
||||||
|
## Troubleshooting
|
||||||
|
|
||||||
|
**Password still rejected after update:** Shell variable interpolation (`$2b`, `$10`, etc.) silently truncates bcrypt hashes when passed as inline SQL strings. Always use the Python script approach above.
|
||||||
|
|
||||||
|
**Container exits immediately after recreate:** Check `docker logs n8n-n8n-1`. Most commonly a missing env var or a volume permission issue.
|
||||||
|
|
||||||
|
**Webhooks not firing after update:** Verify `N8N_TRUST_PROXY=true` is set. Without it, Caddy's `X-Forwarded-For` header causes n8n's rate limiter to drop webhook requests before parsing the body.
|
||||||
|
|
||||||
|
**`npm show n8n version` returns old version:** npm registry cache inside the container. Run `docker exec n8n-n8n-1 npm show n8n version --no-cache` to force a fresh check.
|
||||||
@@ -148,6 +148,29 @@ WantedBy=timers.target
|
|||||||
sudo systemctl enable --now rsync-backup.timer
|
sudo systemctl enable --now rsync-backup.timer
|
||||||
```
|
```
|
||||||
|
|
||||||
|
## Cold Storage — AWS Glacier Deep Archive
|
||||||
|
|
||||||
|
rsync handles local and remote backups, but for true offsite cold storage — disaster recovery, archival copies you rarely need to retrieve — AWS Glacier Deep Archive is the cheapest option at ~$1/TB/month.
|
||||||
|
|
||||||
|
Upload files directly to an S3 bucket with the `DEEP_ARCHIVE` storage class:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# Single file
|
||||||
|
aws s3 cp backup.tar.gz s3://your-bucket/ --storage-class DEEP_ARCHIVE
|
||||||
|
|
||||||
|
# Entire directory
|
||||||
|
aws s3 sync /backup/offsite/ s3://your-bucket/offsite/ --storage-class DEEP_ARCHIVE
|
||||||
|
```
|
||||||
|
|
||||||
|
**When to use it:** Long-term backups you'd only need in a disaster scenario — media archives, yearly snapshots, irreplaceable data. Not for anything you'd need to restore quickly.
|
||||||
|
|
||||||
|
**Retrieval tradeoffs:**
|
||||||
|
- **Standard retrieval:** 12 hours, cheapest restore cost
|
||||||
|
- **Bulk retrieval:** Up to 48 hours, even cheaper
|
||||||
|
- **Expedited:** Not available for Deep Archive — if you need faster access, use regular Glacier or S3 Infrequent Access
|
||||||
|
|
||||||
|
**In the MajorsHouse backup strategy**, rsync handles the daily local and cross-host backups. Glacier Deep Archive is the final tier — offsite, durable, cheap, and slow to retrieve by design. A good backup plan has both.
|
||||||
|
|
||||||
## Gotchas & Notes
|
## Gotchas & Notes
|
||||||
|
|
||||||
- **Test with `--dry-run` first.** Especially when using `--delete`. See what would be removed before actually removing it.
|
- **Test with `--dry-run` first.** Especially when using `--delete`. See what would be removed before actually removing it.
|
||||||
@@ -158,5 +181,5 @@ sudo systemctl enable --now rsync-backup.timer
|
|||||||
|
|
||||||
## See Also
|
## See Also
|
||||||
|
|
||||||
- [[self-hosting-starter-guide]]
|
- [self-hosting-starter-guide](../docker/self-hosting-starter-guide.md)
|
||||||
- [[bash-scripting-patterns]]
|
- [bash-scripting-patterns](../../01-linux/shell-scripting/bash-scripting-patterns.md)
|
||||||
|
|||||||
94
03-opensource/alternatives/freshrss.md
Normal file
94
03-opensource/alternatives/freshrss.md
Normal file
@@ -0,0 +1,94 @@
|
|||||||
|
---
|
||||||
|
title: "FreshRSS — Self-Hosted RSS Reader"
|
||||||
|
domain: opensource
|
||||||
|
category: alternatives
|
||||||
|
tags: [freshrss, rss, self-hosting, docker, privacy]
|
||||||
|
status: published
|
||||||
|
created: 2026-04-02
|
||||||
|
updated: 2026-04-02
|
||||||
|
---
|
||||||
|
# FreshRSS — Self-Hosted RSS Reader
|
||||||
|
|
||||||
|
## Problem
|
||||||
|
|
||||||
|
RSS is the best way to follow websites, blogs, and podcasts without algorithmic feeds, engagement bait, or data harvesting. But hosted RSS services like Feedly gate features behind subscriptions and still have access to your reading habits. Google killed Google Reader in 2013 and has been trying to kill RSS ever since.
|
||||||
|
|
||||||
|
## Solution
|
||||||
|
|
||||||
|
[FreshRSS](https://freshrss.org) is a self-hosted RSS aggregator. It fetches and stores your feeds on your own server, presents a clean reading interface, and syncs with mobile apps via standard APIs (Fever, Google Reader, Nextcloud News). No subscription, no tracking, no feed limits.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Deployment (Docker)
|
||||||
|
|
||||||
|
```yaml
|
||||||
|
services:
|
||||||
|
freshrss:
|
||||||
|
image: freshrss/freshrss:latest
|
||||||
|
container_name: freshrss
|
||||||
|
restart: unless-stopped
|
||||||
|
ports:
|
||||||
|
- "8086:80"
|
||||||
|
volumes:
|
||||||
|
- ./freshrss/data:/var/www/FreshRSS/data
|
||||||
|
- ./freshrss/extensions:/var/www/FreshRSS/extensions
|
||||||
|
environment:
|
||||||
|
- TZ=America/New_York
|
||||||
|
- CRON_MIN=*/15 # fetch feeds every 15 minutes
|
||||||
|
```
|
||||||
|
|
||||||
|
### Caddy reverse proxy
|
||||||
|
|
||||||
|
```
|
||||||
|
rss.yourdomain.com {
|
||||||
|
reverse_proxy localhost:8086
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Initial Setup
|
||||||
|
|
||||||
|
1. Browse to your FreshRSS URL and run through the setup wizard
|
||||||
|
2. Create an admin account
|
||||||
|
3. Go to **Settings → Authentication** — enable API access if you want mobile app sync
|
||||||
|
4. Start adding feeds under **Subscriptions → Add a feed**
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Mobile App Sync
|
||||||
|
|
||||||
|
FreshRSS exposes a Google Reader-compatible API that most RSS apps support:
|
||||||
|
|
||||||
|
| App | Platform | Protocol |
|
||||||
|
|---|---|---|
|
||||||
|
| NetNewsWire | iOS / macOS | Fever or GReader |
|
||||||
|
| Reeder | iOS / macOS | GReader |
|
||||||
|
| ReadYou | Android | GReader |
|
||||||
|
| FeedMe | Android | GReader / Fever |
|
||||||
|
|
||||||
|
**API URL format:** `https://rss.yourdomain.com/api/greader.php`
|
||||||
|
|
||||||
|
Enable the API in FreshRSS: **Settings → Authentication → Allow API access**
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Feed Auto-Refresh
|
||||||
|
|
||||||
|
The `CRON_MIN=*/15` environment variable runs feed fetching every 15 minutes inside the container. For more control, add a host-level cron job:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# Fetch all feeds every 10 minutes
|
||||||
|
*/10 * * * * docker exec freshrss php /var/www/FreshRSS/app/actualize_script.php
|
||||||
|
```
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Why RSS Over Social Media
|
||||||
|
|
||||||
|
- **You control the feed** — no algorithm decides what you see or in what order
|
||||||
|
- **No engagement optimization** — content ranked by publish date, not outrage potential
|
||||||
|
- **Portable** — OPML export lets you move your subscriptions to any reader
|
||||||
|
- **Works forever** — RSS has been around since 1999 and isn't going anywhere
|
||||||
|
|
||||||
|
---
|
||||||
100
03-opensource/alternatives/gitea.md
Normal file
100
03-opensource/alternatives/gitea.md
Normal file
@@ -0,0 +1,100 @@
|
|||||||
|
---
|
||||||
|
title: "Gitea — Self-Hosted Git"
|
||||||
|
domain: opensource
|
||||||
|
category: alternatives
|
||||||
|
tags: [gitea, git, self-hosting, docker, ci-cd]
|
||||||
|
status: published
|
||||||
|
created: 2026-04-02
|
||||||
|
updated: 2026-04-02
|
||||||
|
---
|
||||||
|
# Gitea — Self-Hosted Git
|
||||||
|
|
||||||
|
## Problem
|
||||||
|
|
||||||
|
GitHub is the default home for code, but it's a Microsoft-owned centralized service. Your repositories, commit history, issues, and CI/CD pipelines are all under someone else's control. For personal projects and private infrastructure, there's no reason to depend on it.
|
||||||
|
|
||||||
|
## Solution
|
||||||
|
|
||||||
|
[Gitea](https://gitea.com) is a lightweight, self-hosted Git service. It provides the full GitHub-style workflow — repositories, branches, pull requests, webhooks, and a web UI — in a single binary or Docker container that runs comfortably on low-spec hardware.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Deployment (Docker)
|
||||||
|
|
||||||
|
```yaml
|
||||||
|
services:
|
||||||
|
gitea:
|
||||||
|
image: docker.gitea.com/gitea:latest
|
||||||
|
container_name: gitea
|
||||||
|
restart: unless-stopped
|
||||||
|
ports:
|
||||||
|
- "3002:3000"
|
||||||
|
- "222:22" # SSH git access
|
||||||
|
volumes:
|
||||||
|
- ./gitea:/data
|
||||||
|
environment:
|
||||||
|
- USER_UID=1000
|
||||||
|
- USER_GID=1000
|
||||||
|
- GITEA__database__DB_TYPE=sqlite3
|
||||||
|
```
|
||||||
|
|
||||||
|
SQLite is fine for personal use. For team use, swap in PostgreSQL or MySQL.
|
||||||
|
|
||||||
|
### Caddy reverse proxy
|
||||||
|
|
||||||
|
```
|
||||||
|
git.yourdomain.com {
|
||||||
|
reverse_proxy localhost:3002
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Initial Setup
|
||||||
|
|
||||||
|
1. Browse to your Gitea URL — the first-run wizard handles configuration
|
||||||
|
2. Set the server URL to your public domain
|
||||||
|
3. Create an admin account
|
||||||
|
4. Configure SSH access if you want `git@git.yourdomain.com` cloning
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Webhooks
|
||||||
|
|
||||||
|
Gitea's webhook system is how automated pipelines get triggered on push. Example use case — auto-deploy a MkDocs wiki on every push:
|
||||||
|
|
||||||
|
1. Go to repo → **Settings → Webhooks → Add Webhook**
|
||||||
|
2. Set the payload URL to your webhook endpoint (e.g. `https://notes.yourdomain.com/webhook`)
|
||||||
|
3. Set content type to `application/json`
|
||||||
|
4. Select **Push events**
|
||||||
|
|
||||||
|
The webhook fires on every `git push`, allowing the receiving server to pull and rebuild automatically. See [MajorWiki Setup & Pipeline](../../05-troubleshooting/majwiki-setup-and-pipeline.md) for a complete example.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Migrating from GitHub
|
||||||
|
|
||||||
|
Gitea can mirror GitHub repos and import them directly:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# Clone from GitHub, push to Gitea
|
||||||
|
git clone --mirror https://github.com/user/repo.git
|
||||||
|
cd repo.git
|
||||||
|
git remote set-url origin https://git.yourdomain.com/user/repo.git
|
||||||
|
git push --mirror
|
||||||
|
```
|
||||||
|
|
||||||
|
Or use the Gitea web UI: **+ → New Migration → GitHub**
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Why Not Just Use GitHub?
|
||||||
|
|
||||||
|
For public open source — GitHub is fine, the network effects are real. For private infrastructure code, personal projects, and anything you'd rather not hand to Microsoft:
|
||||||
|
|
||||||
|
- Full control over your data and access
|
||||||
|
- No rate limits, no storage quotas on your own hardware
|
||||||
|
- Webhooks and integrations without paying for GitHub Actions minutes
|
||||||
|
- Works entirely over Tailscale — no public exposure required
|
||||||
|
|
||||||
|
---
|
||||||
93
03-opensource/alternatives/searxng.md
Normal file
93
03-opensource/alternatives/searxng.md
Normal file
@@ -0,0 +1,93 @@
|
|||||||
|
---
|
||||||
|
title: "SearXNG — Private Self-Hosted Search"
|
||||||
|
domain: opensource
|
||||||
|
category: alternatives
|
||||||
|
tags: [searxng, search, privacy, self-hosting, docker]
|
||||||
|
status: published
|
||||||
|
created: 2026-04-02
|
||||||
|
updated: 2026-04-02
|
||||||
|
---
|
||||||
|
# SearXNG — Private Self-Hosted Search
|
||||||
|
|
||||||
|
## Problem
|
||||||
|
|
||||||
|
Every search query sent to Google, Bing, or DuckDuckGo is logged, profiled, and used to build an advertising model of you. Even "private" search engines are still third-party services with their own data retention policies.
|
||||||
|
|
||||||
|
## Solution
|
||||||
|
|
||||||
|
[SearXNG](https://github.com/searxng/searxng) is a self-hosted metasearch engine. It queries multiple search engines simultaneously on your behalf — without sending any identifying information — and aggregates the results. The search engines see a request from your server, not from you.
|
||||||
|
|
||||||
|
Your queries stay on your infrastructure.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Deployment (Docker)
|
||||||
|
|
||||||
|
```yaml
|
||||||
|
services:
|
||||||
|
searxng:
|
||||||
|
image: searxng/searxng:latest
|
||||||
|
container_name: searxng
|
||||||
|
restart: unless-stopped
|
||||||
|
ports:
|
||||||
|
- "8090:8080"
|
||||||
|
volumes:
|
||||||
|
- ./searxng:/etc/searxng
|
||||||
|
environment:
|
||||||
|
- SEARXNG_BASE_URL=https://search.yourdomain.com/
|
||||||
|
```
|
||||||
|
|
||||||
|
SearXNG requires a `settings.yml` in the mounted config directory. Generate one from the default:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
docker run --rm searxng/searxng cat /etc/searxng/settings.yml > ./searxng/settings.yml
|
||||||
|
```
|
||||||
|
|
||||||
|
Key settings to configure in `settings.yml`:
|
||||||
|
|
||||||
|
```yaml
|
||||||
|
server:
|
||||||
|
secret_key: "generate-a-random-string-here"
|
||||||
|
bind_address: "0.0.0.0"
|
||||||
|
|
||||||
|
search:
|
||||||
|
safe_search: 0
|
||||||
|
default_lang: "en"
|
||||||
|
|
||||||
|
engines:
|
||||||
|
# Enable/disable specific engines here
|
||||||
|
```
|
||||||
|
|
||||||
|
### Caddy reverse proxy
|
||||||
|
|
||||||
|
```
|
||||||
|
search.yourdomain.com {
|
||||||
|
reverse_proxy localhost:8090
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Using SearXNG as an AI Search Backend
|
||||||
|
|
||||||
|
SearXNG integrates directly with Open WebUI as a web search provider, giving your local AI access to current web results without any third-party API keys:
|
||||||
|
|
||||||
|
**Open WebUI → Settings → Web Search:**
|
||||||
|
- Enable web search
|
||||||
|
- Set provider to `searxng`
|
||||||
|
- Set URL to `http://searxng:8080` (internal Docker network) or your Tailscale/local address
|
||||||
|
|
||||||
|
This is how MajorTwin gets current web context — queries go through SearXNG, not Google.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Why Not DuckDuckGo?
|
||||||
|
|
||||||
|
DDG is better than Google for privacy, but it's still a centralized third-party service. SearXNG:
|
||||||
|
|
||||||
|
- Runs on your own hardware
|
||||||
|
- Has no account, no cookies, no session tracking
|
||||||
|
- Lets you choose which upstream engines to use and weight
|
||||||
|
- Can be kept entirely off the public internet (Tailscale-only)
|
||||||
|
|
||||||
|
---
|
||||||
107
03-opensource/dev-tools/rsync.md
Normal file
107
03-opensource/dev-tools/rsync.md
Normal file
@@ -0,0 +1,107 @@
|
|||||||
|
---
|
||||||
|
title: "rsync — Fast, Resumable File Transfers"
|
||||||
|
domain: opensource
|
||||||
|
category: dev-tools
|
||||||
|
tags: [rsync, backup, file-transfer, linux, cli]
|
||||||
|
status: published
|
||||||
|
created: 2026-04-02
|
||||||
|
updated: 2026-04-02
|
||||||
|
---
|
||||||
|
# rsync — Fast, Resumable File Transfers
|
||||||
|
|
||||||
|
## Problem
|
||||||
|
|
||||||
|
Copying large files or directory trees between drives or servers is slow, fragile, and unresumable with `cp`. A dropped connection or a single error means starting over. You also want to skip files that already exist at the destination without re-copying them.
|
||||||
|
|
||||||
|
## Solution
|
||||||
|
|
||||||
|
`rsync` is a file synchronization tool that only transfers what has changed, preserves metadata, and can resume interrupted transfers. It works locally and over SSH.
|
||||||
|
|
||||||
|
### Installation (Fedora)
|
||||||
|
|
||||||
|
```bash
|
||||||
|
sudo dnf install rsync
|
||||||
|
```
|
||||||
|
|
||||||
|
### Basic Local Copy
|
||||||
|
|
||||||
|
```bash
|
||||||
|
rsync -av /source/ /destination/
|
||||||
|
```
|
||||||
|
|
||||||
|
- `-a` — archive mode: preserves permissions, timestamps, symlinks, ownership
|
||||||
|
- `-v` — verbose: shows what's being transferred
|
||||||
|
|
||||||
|
**Trailing slash on source matters:**
|
||||||
|
- `/source/` — copy the *contents* of source into destination
|
||||||
|
- `/source` — copy the source *directory itself* into destination
|
||||||
|
|
||||||
|
### Resume an Interrupted Transfer
|
||||||
|
|
||||||
|
```bash
|
||||||
|
rsync -av --partial --progress /source/ /destination/
|
||||||
|
```
|
||||||
|
|
||||||
|
- `--partial` — keeps partially transferred files so they can be resumed
|
||||||
|
- `--progress` — shows per-file progress and speed
|
||||||
|
|
||||||
|
### Skip Already-Transferred Files
|
||||||
|
|
||||||
|
```bash
|
||||||
|
rsync -av --ignore-existing /source/ /destination/
|
||||||
|
```
|
||||||
|
|
||||||
|
Useful when restarting a migration — skips anything already at the destination regardless of timestamp comparison.
|
||||||
|
|
||||||
|
### Dry Run First
|
||||||
|
|
||||||
|
Always preview what rsync will do before committing:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
rsync -av --dry-run /source/ /destination/
|
||||||
|
```
|
||||||
|
|
||||||
|
No files are moved. Output shows exactly what would happen.
|
||||||
|
|
||||||
|
### Transfer Over SSH
|
||||||
|
|
||||||
|
```bash
|
||||||
|
rsync -av -e ssh /source/ user@remotehost:/destination/
|
||||||
|
```
|
||||||
|
|
||||||
|
Or with a non-standard port:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
rsync -av -e "ssh -p 2222" /source/ user@remotehost:/destination/
|
||||||
|
```
|
||||||
|
|
||||||
|
### Exclude Patterns
|
||||||
|
|
||||||
|
```bash
|
||||||
|
rsync -av --exclude='*.tmp' --exclude='.Trash*' /source/ /destination/
|
||||||
|
```
|
||||||
|
|
||||||
|
### Real-World Use
|
||||||
|
|
||||||
|
Migrating ~286 files from `/majorRAID` to `/majorstorage` during a RAID dissolution project:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
rsync -av --partial --progress --ignore-existing \
|
||||||
|
/majorRAID/ /majorstorage/ \
|
||||||
|
2>&1 | tee /root/raid_migrate.log
|
||||||
|
```
|
||||||
|
|
||||||
|
Run inside a `tmux` or `screen` session so it survives SSH disconnects:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
tmux new-session -d -s rsync-migrate \
|
||||||
|
"rsync -av --partial --progress /majorRAID/ /majorstorage/ | tee /root/raid_migrate.log"
|
||||||
|
```
|
||||||
|
|
||||||
|
### Check Progress on a Running Transfer
|
||||||
|
|
||||||
|
```bash
|
||||||
|
tail -f /root/raid_migrate.log
|
||||||
|
```
|
||||||
|
|
||||||
|
---
|
||||||
81
03-opensource/dev-tools/screen.md
Normal file
81
03-opensource/dev-tools/screen.md
Normal file
@@ -0,0 +1,81 @@
|
|||||||
|
---
|
||||||
|
title: "screen — Simple Persistent Terminal Sessions"
|
||||||
|
domain: opensource
|
||||||
|
category: dev-tools
|
||||||
|
tags: [screen, terminal, ssh, linux, cli]
|
||||||
|
status: published
|
||||||
|
created: 2026-04-02
|
||||||
|
updated: 2026-04-02
|
||||||
|
---
|
||||||
|
# screen — Simple Persistent Terminal Sessions
|
||||||
|
|
||||||
|
## Problem
|
||||||
|
|
||||||
|
Same problem as tmux: SSH sessions die, jobs get killed, long-running tasks need to survive disconnects. screen is the older, simpler alternative to tmux — universally available and gets the job done with minimal setup.
|
||||||
|
|
||||||
|
## Solution
|
||||||
|
|
||||||
|
`screen` creates detachable terminal sessions. It's installed by default on many systems, making it useful when tmux isn't available.
|
||||||
|
|
||||||
|
### Installation (Fedora)
|
||||||
|
|
||||||
|
```bash
|
||||||
|
sudo dnf install screen
|
||||||
|
```
|
||||||
|
|
||||||
|
### Core Workflow
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# Start a named session
|
||||||
|
screen -S mysession
|
||||||
|
|
||||||
|
# Detach (keeps running)
|
||||||
|
Ctrl+a, d
|
||||||
|
|
||||||
|
# List sessions
|
||||||
|
screen -list
|
||||||
|
|
||||||
|
# Reattach
|
||||||
|
screen -r mysession
|
||||||
|
|
||||||
|
# If session shows as "Attached" (stuck)
|
||||||
|
screen -d -r mysession
|
||||||
|
```
|
||||||
|
|
||||||
|
### Start a Background Job Directly
|
||||||
|
|
||||||
|
```bash
|
||||||
|
screen -dmS mysession bash -c "long-running-command 2>&1 | tee /root/output.log"
|
||||||
|
```
|
||||||
|
|
||||||
|
- `-d` — start detached
|
||||||
|
- `-m` — create new session even if already inside screen
|
||||||
|
- `-S` — name the session
|
||||||
|
|
||||||
|
### Capture Current Output Without Attaching
|
||||||
|
|
||||||
|
```bash
|
||||||
|
screen -S mysession -X hardcopy /tmp/screen_output.txt
|
||||||
|
cat /tmp/screen_output.txt
|
||||||
|
```
|
||||||
|
|
||||||
|
### Send a Command to a Running Session
|
||||||
|
|
||||||
|
```bash
|
||||||
|
screen -S mysession -X stuff "tail -f /root/output.log\n"
|
||||||
|
```
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## screen vs tmux
|
||||||
|
|
||||||
|
| Feature | screen | tmux |
|
||||||
|
|---|---|---|
|
||||||
|
| Availability | Installed by default on most systems | Usually needs installing |
|
||||||
|
| Split panes | Basic (Ctrl+a, S) | Better (Ctrl+b, ") |
|
||||||
|
| Scripting | Limited | More capable |
|
||||||
|
| Config complexity | Simple | More options |
|
||||||
|
|
||||||
|
Use screen when it's already there or for quick throwaway sessions. Use tmux for anything more complex. See [tmux](tmux.md).
|
||||||
|
|
||||||
|
---
|
||||||
98
03-opensource/dev-tools/tmux.md
Normal file
98
03-opensource/dev-tools/tmux.md
Normal file
@@ -0,0 +1,98 @@
|
|||||||
|
---
|
||||||
|
title: "tmux — Persistent Terminal Sessions"
|
||||||
|
domain: opensource
|
||||||
|
category: dev-tools
|
||||||
|
tags: [tmux, terminal, ssh, multiplexer, linux]
|
||||||
|
status: published
|
||||||
|
created: 2026-04-02
|
||||||
|
updated: 2026-04-02
|
||||||
|
---
|
||||||
|
# tmux — Persistent Terminal Sessions
|
||||||
|
|
||||||
|
## Problem
|
||||||
|
|
||||||
|
SSH sessions die when your connection drops, your laptop closes, or you walk away. Long-running jobs — storage migrations, file scans, downloads — get killed mid-run. You need a way to detach from a session, come back later, and pick up exactly where you left off.
|
||||||
|
|
||||||
|
## Solution
|
||||||
|
|
||||||
|
`tmux` is a terminal multiplexer. It runs sessions that persist independently of your SSH connection. You can detach, disconnect, reconnect from a different machine, and reattach to find everything still running.
|
||||||
|
|
||||||
|
### Installation (Fedora)
|
||||||
|
|
||||||
|
```bash
|
||||||
|
sudo dnf install tmux
|
||||||
|
```
|
||||||
|
|
||||||
|
### Core Workflow
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# Start a named session
|
||||||
|
tmux new-session -s mysession
|
||||||
|
|
||||||
|
# Detach from a session (keeps it running)
|
||||||
|
Ctrl+b, d
|
||||||
|
|
||||||
|
# List running sessions
|
||||||
|
tmux ls
|
||||||
|
|
||||||
|
# Reattach to a session
|
||||||
|
tmux attach -t mysession
|
||||||
|
|
||||||
|
# Kill a session when done
|
||||||
|
tmux kill-session -t mysession
|
||||||
|
```
|
||||||
|
|
||||||
|
### Start a Background Job Directly
|
||||||
|
|
||||||
|
Skip the interactive session entirely — start a job in a new detached session in one command:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
tmux new-session -d -s rmlint2 "rmlint /majorstorage// /mnt/usb// /majorRAID 2>&1 | tee /majorRAID/rmlint_scan2.log"
|
||||||
|
```
|
||||||
|
|
||||||
|
The job runs immediately in the background. Attach later to check progress:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
tmux attach -t rmlint2
|
||||||
|
```
|
||||||
|
|
||||||
|
### Capture Output Without Attaching
|
||||||
|
|
||||||
|
Read the current state of a session without interrupting it:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
tmux capture-pane -t rmlint2 -p
|
||||||
|
```
|
||||||
|
|
||||||
|
### Split Panes
|
||||||
|
|
||||||
|
Monitor multiple things in one terminal window:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# Horizontal split (top/bottom)
|
||||||
|
Ctrl+b, "
|
||||||
|
|
||||||
|
# Vertical split (left/right)
|
||||||
|
Ctrl+b, %
|
||||||
|
|
||||||
|
# Switch between panes
|
||||||
|
Ctrl+b, arrow keys
|
||||||
|
```
|
||||||
|
|
||||||
|
### Real-World Use
|
||||||
|
|
||||||
|
On **majorhome**, all long-running storage operations run inside named tmux sessions so they survive SSH disconnects:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
tmux new-session -d -s rmlint2 "rmlint ..." # dedup scan
|
||||||
|
tmux new-session -d -s rsync-migrate "rsync ..." # file migration
|
||||||
|
tmux ls # check what's running
|
||||||
|
```
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## tmux vs screen
|
||||||
|
|
||||||
|
Both work. tmux has better split-pane support and scripting. screen is simpler and more universally installed. I use both — tmux for new jobs, screen for legacy ones. See the [screen](screen.md) article for reference.
|
||||||
|
|
||||||
|
---
|
||||||
81
03-opensource/dev-tools/ventoy.md
Normal file
81
03-opensource/dev-tools/ventoy.md
Normal file
@@ -0,0 +1,81 @@
|
|||||||
|
---
|
||||||
|
title: "Ventoy — Multi-Boot USB Tool"
|
||||||
|
domain: opensource
|
||||||
|
category: dev-tools
|
||||||
|
tags: [ventoy, usb, boot, iso, linux, tools]
|
||||||
|
status: published
|
||||||
|
created: 2026-04-02
|
||||||
|
updated: 2026-04-02
|
||||||
|
---
|
||||||
|
|
||||||
|
# Ventoy — Multi-Boot USB Tool
|
||||||
|
|
||||||
|
Ventoy turns a USB drive into a multi-boot device. Drop ISO files onto the drive and boot directly from them — no need to flash a new image every time you want to try a different distro or run a recovery tool.
|
||||||
|
|
||||||
|
## What It Is
|
||||||
|
|
||||||
|
[Ventoy](https://www.ventoy.net/) creates a special partition layout on a USB drive. After the one-time install, you just copy ISO (or WIM, VHD, IMG) files to the drive. On boot, Ventoy presents a menu of every image on the drive and boots whichever one you pick.
|
||||||
|
|
||||||
|
No re-formatting. No Rufus. No balenaEtcher. Just drag and drop.
|
||||||
|
|
||||||
|
## Installation
|
||||||
|
|
||||||
|
### Linux
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# Download the latest release
|
||||||
|
wget https://github.com/ventoy/Ventoy/releases/download/v1.1.05/ventoy-1.1.05-linux.tar.gz
|
||||||
|
|
||||||
|
# Extract
|
||||||
|
tar -xzf ventoy-1.1.05-linux.tar.gz
|
||||||
|
cd ventoy-1.1.05
|
||||||
|
|
||||||
|
# Install to USB drive (WARNING: this formats the drive)
|
||||||
|
sudo ./Ventoy2Disk.sh -i /dev/sdX
|
||||||
|
```
|
||||||
|
|
||||||
|
Replace `/dev/sdX` with your USB drive. Use `lsblk` to identify it — triple-check before running, this wipes the drive.
|
||||||
|
|
||||||
|
### Windows
|
||||||
|
|
||||||
|
Download the Windows package from the Ventoy releases page, run `Ventoy2Disk.exe`, select your USB drive, and click Install.
|
||||||
|
|
||||||
|
## Usage
|
||||||
|
|
||||||
|
After installation, the USB drive shows up as a regular FAT32/exFAT partition. Copy ISOs onto it:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# Copy ISOs to the drive
|
||||||
|
cp ~/Downloads/Fedora-43-x86_64.iso /mnt/ventoy/
|
||||||
|
cp ~/Downloads/ubuntu-24.04-desktop.iso /mnt/ventoy/
|
||||||
|
cp ~/Downloads/memtest86.iso /mnt/ventoy/
|
||||||
|
```
|
||||||
|
|
||||||
|
Boot from the USB. Ventoy's menu lists every ISO it finds. Select one and it boots directly.
|
||||||
|
|
||||||
|
## Updating Ventoy
|
||||||
|
|
||||||
|
When a new version comes out, update without losing your ISOs:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# Update mode (-u) preserves existing files
|
||||||
|
sudo ./Ventoy2Disk.sh -u /dev/sdX
|
||||||
|
```
|
||||||
|
|
||||||
|
## Why It's Useful
|
||||||
|
|
||||||
|
- **Distro testing:** Keep 5-10 distro ISOs on one stick. Boot into any of them without reflashing.
|
||||||
|
- **Recovery toolkit:** Carry GParted, Clonezilla, memtest86, and a live Linux on a single drive.
|
||||||
|
- **OS installation:** One USB for every machine you need to set up.
|
||||||
|
- **Persistence:** Ventoy supports persistent storage for some distros, so live sessions can save data across reboots.
|
||||||
|
|
||||||
|
## Gotchas & Notes
|
||||||
|
|
||||||
|
- **Secure Boot:** Ventoy supports Secure Boot but it requires enrolling a key on first boot. Follow the on-screen prompts.
|
||||||
|
- **exFAT for large ISOs:** The default FAT32 partition has a 4GB file size limit. Use exFAT if any of your ISOs exceed that (Windows ISOs often do). Ventoy supports both.
|
||||||
|
- **UEFI vs Legacy:** Ventoy handles both automatically. It detects the boot mode and presents the appropriate menu.
|
||||||
|
- **Some ISOs don't work.** Heavily customized or non-standard ISOs may fail to boot. Standard distro ISOs and common tools work reliably.
|
||||||
|
|
||||||
|
## See Also
|
||||||
|
|
||||||
|
- [linux-distro-guide-beginners](../../01-linux/distro-specific/linux-distro-guide-beginners.md)
|
||||||
@@ -2,14 +2,21 @@
|
|||||||
|
|
||||||
A curated collection of my favorite open-source tools and privacy-respecting alternatives to mainstream software.
|
A curated collection of my favorite open-source tools and privacy-respecting alternatives to mainstream software.
|
||||||
|
|
||||||
|
## 🔄 Alternatives
|
||||||
|
- [SearXNG: Private Self-Hosted Search](alternatives/searxng.md)
|
||||||
|
- [FreshRSS: Self-Hosted RSS Reader](alternatives/freshrss.md)
|
||||||
|
- [Gitea: Self-Hosted Git](alternatives/gitea.md)
|
||||||
|
|
||||||
## 🚀 Productivity
|
## 🚀 Productivity
|
||||||
- [rmlint: Duplicate File Scanning](productivity/rmlint-duplicate-scanning.md)
|
- [rmlint: Duplicate File Scanning](productivity/rmlint-duplicate-scanning.md)
|
||||||
|
|
||||||
## 🛠️ Development Tools
|
## 🛠️ Development Tools
|
||||||
- *Coming soon*
|
- [tmux: Persistent Terminal Sessions](dev-tools/tmux.md)
|
||||||
|
- [screen: Simple Persistent Sessions](dev-tools/screen.md)
|
||||||
|
- [rsync: Fast, Resumable File Transfers](dev-tools/rsync.md)
|
||||||
|
|
||||||
## 🎨 Media & Creative
|
## 🎨 Media & Creative
|
||||||
- *Coming soon*
|
- [yt-dlp: Video Downloading](media-creative/yt-dlp.md)
|
||||||
|
|
||||||
## 🔐 Privacy & Security
|
## 🔐 Privacy & Security
|
||||||
- *Coming soon*
|
- [Vaultwarden: Self-Hosted Password Manager](privacy-security/vaultwarden.md)
|
||||||
|
|||||||
157
03-opensource/media-creative/yt-dlp.md
Normal file
157
03-opensource/media-creative/yt-dlp.md
Normal file
@@ -0,0 +1,157 @@
|
|||||||
|
---
|
||||||
|
title: "yt-dlp — Video Downloading"
|
||||||
|
domain: opensource
|
||||||
|
category: media-creative
|
||||||
|
tags: [yt-dlp, video, youtube, downloads, cli]
|
||||||
|
status: published
|
||||||
|
created: 2026-04-02
|
||||||
|
updated: 2026-04-02
|
||||||
|
---
|
||||||
|
# yt-dlp — Video Downloading
|
||||||
|
|
||||||
|
## What It Is
|
||||||
|
|
||||||
|
`yt-dlp` is a feature-rich command-line video downloader, forked from youtube-dl with active maintenance and significantly better performance. It supports YouTube, Twitch, and hundreds of other sites.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Installation
|
||||||
|
|
||||||
|
### Fedora
|
||||||
|
```bash
|
||||||
|
sudo dnf install yt-dlp
|
||||||
|
# or latest via pip:
|
||||||
|
sudo pip install yt-dlp --break-system-packages
|
||||||
|
```
|
||||||
|
|
||||||
|
### Update
|
||||||
|
```bash
|
||||||
|
sudo pip install -U yt-dlp --break-system-packages
|
||||||
|
# or if installed as standalone binary:
|
||||||
|
yt-dlp -U
|
||||||
|
```
|
||||||
|
|
||||||
|
Keep it current — YouTube pushes extractor changes frequently and old versions break.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Basic Usage
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# Download a single video (best quality)
|
||||||
|
yt-dlp https://www.youtube.com/watch?v=VIDEO_ID
|
||||||
|
|
||||||
|
# Download to a specific directory with title as filename
|
||||||
|
yt-dlp -o "/path/to/output/%(title)s.%(ext)s" URL
|
||||||
|
```
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Plex-Optimized Download
|
||||||
|
|
||||||
|
Download best quality and auto-convert to HEVC for Apple TV direct play:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
yt-dlp URL
|
||||||
|
```
|
||||||
|
|
||||||
|
That's it — if your config is set up correctly (see Config File section below). The config handles format selection, output path, subtitles, and automatic AV1/VP9 → HEVC conversion.
|
||||||
|
|
||||||
|
> [!note] `bestvideo[ext=mp4]` caps at 1080p because YouTube only serves H.264 up to 1080p. Use `bestvideo+bestaudio` to get true 4K, then let the post-download hook convert AV1/VP9 to HEVC. See [Plex 4K Codec Compatibility](../../04-streaming/plex/plex-4k-codec-compatibility.md) for the full setup.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Playlists and Channels
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# Download a full playlist
|
||||||
|
yt-dlp -o "%(playlist_index)s - %(title)s.%(ext)s" PLAYLIST_URL
|
||||||
|
|
||||||
|
# Download only videos not already present
|
||||||
|
yt-dlp --download-archive archive.txt PLAYLIST_URL
|
||||||
|
```
|
||||||
|
|
||||||
|
`--download-archive` maintains a file of completed video IDs — re-running the command skips already-downloaded videos automatically.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Format Selection
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# List all available formats for a video
|
||||||
|
yt-dlp --list-formats URL
|
||||||
|
|
||||||
|
# Download best video + best audio, merge to mp4
|
||||||
|
yt-dlp -f 'bestvideo+bestaudio' --merge-output-format mp4 URL
|
||||||
|
|
||||||
|
# Download audio only (MP3)
|
||||||
|
yt-dlp -x --audio-format mp3 URL
|
||||||
|
```
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Config File
|
||||||
|
|
||||||
|
Persist your preferred flags so you don't repeat them every command:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
mkdir -p ~/.config/yt-dlp
|
||||||
|
cat > ~/.config/yt-dlp/config << 'EOF'
|
||||||
|
--remote-components ejs:github
|
||||||
|
--format bestvideo+bestaudio
|
||||||
|
--merge-output-format mp4
|
||||||
|
--output /plex/plex/%(title)s.%(ext)s
|
||||||
|
--write-auto-subs
|
||||||
|
--embed-subs
|
||||||
|
--exec /usr/local/bin/yt-dlp-hevc-convert.sh {}
|
||||||
|
EOF
|
||||||
|
```
|
||||||
|
|
||||||
|
After this, a bare `yt-dlp URL` downloads best quality, saves to `/plex/plex/`, embeds subtitles, and auto-converts AV1/VP9 to HEVC. See [Plex 4K Codec Compatibility](../../04-streaming/plex/plex-4k-codec-compatibility.md) for the conversion hook setup.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Running Long Downloads in the Background
|
||||||
|
|
||||||
|
For large downloads or playlists, run inside `screen` or `tmux` so they survive SSH disconnects:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
screen -dmS yt-download bash -c \
|
||||||
|
"yt-dlp -o '/plex/plex/%(title)s.%(ext)s' PLAYLIST_URL 2>&1 | tee ~/yt-download.log"
|
||||||
|
|
||||||
|
# Check progress
|
||||||
|
screen -r yt-download
|
||||||
|
# or
|
||||||
|
tail -f ~/yt-download.log
|
||||||
|
```
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Subtitle Downloads
|
||||||
|
|
||||||
|
The config above handles subtitles automatically via `--write-auto-subs` and `--embed-subs`. For one-off downloads where you want explicit control over subtitle embedding alongside specific format selection:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
yt-dlp -f 'bestvideo[vcodec^=avc]+bestaudio[ext=m4a]/bestvideo+bestaudio' \
|
||||||
|
--merge-output-format mp4 \
|
||||||
|
-o "/plex/plex/%(title)s.%(ext)s" \
|
||||||
|
--write-auto-subs --embed-subs URL
|
||||||
|
```
|
||||||
|
|
||||||
|
This forces H.264 video + M4A audio when available — useful when you want guaranteed Apple TV / Plex compatibility without running the HEVC conversion hook.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Troubleshooting
|
||||||
|
|
||||||
|
For YouTube JS challenge errors, missing formats, and n-challenge failures on Fedora — see [yt-dlp YouTube JS Challenge Fix](../../05-troubleshooting/yt-dlp-fedora-js-challenge.md).
|
||||||
|
|
||||||
|
**YouTube player client errors:** If downloads fail with extractor errors, YouTube may have broken the default player client. Override it:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
yt-dlp --extractor-args "youtube:player-client=default,-tv_simply" URL
|
||||||
|
```
|
||||||
|
|
||||||
|
This can also be added to your config file as a persistent workaround until yt-dlp pushes a fix upstream. Keep yt-dlp updated — these breakages get patched regularly.
|
||||||
|
|
||||||
|
---
|
||||||
100
03-opensource/privacy-security/vaultwarden.md
Normal file
100
03-opensource/privacy-security/vaultwarden.md
Normal file
@@ -0,0 +1,100 @@
|
|||||||
|
---
|
||||||
|
title: "Vaultwarden — Self-Hosted Password Manager"
|
||||||
|
domain: opensource
|
||||||
|
category: privacy-security
|
||||||
|
tags: [vaultwarden, bitwarden, passwords, self-hosting, docker]
|
||||||
|
status: published
|
||||||
|
created: 2026-04-02
|
||||||
|
updated: 2026-04-02
|
||||||
|
---
|
||||||
|
# Vaultwarden — Self-Hosted Password Manager
|
||||||
|
|
||||||
|
## Problem
|
||||||
|
|
||||||
|
Password managers are a necessity, but handing your credentials to a third-party cloud service is a trust problem. Bitwarden is open source and privacy-respecting, but if you're already running a homelab, there's no reason to depend on their servers.
|
||||||
|
|
||||||
|
## Solution
|
||||||
|
|
||||||
|
[Vaultwarden](https://github.com/dani-garcia/vaultwarden) is an unofficial, lightweight Bitwarden-compatible server written in Rust. It exposes the same API that all official Bitwarden clients speak — desktop apps, browser extensions, mobile apps — so you get the full Bitwarden UX pointed at your own hardware.
|
||||||
|
|
||||||
|
Your passwords never leave your network.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Deployment (Docker + Caddy)
|
||||||
|
|
||||||
|
### docker-compose.yml
|
||||||
|
|
||||||
|
```yaml
|
||||||
|
services:
|
||||||
|
vaultwarden:
|
||||||
|
image: vaultwarden/server:latest
|
||||||
|
container_name: vaultwarden
|
||||||
|
restart: unless-stopped
|
||||||
|
environment:
|
||||||
|
- DOMAIN=https://vault.yourdomain.com
|
||||||
|
- SIGNUPS_ALLOWED=false # disable after creating your account
|
||||||
|
volumes:
|
||||||
|
- ./vw-data:/data
|
||||||
|
ports:
|
||||||
|
- "8080:80"
|
||||||
|
```
|
||||||
|
|
||||||
|
Start it:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
sudo docker compose up -d
|
||||||
|
```
|
||||||
|
|
||||||
|
### Caddy reverse proxy
|
||||||
|
|
||||||
|
```
|
||||||
|
vault.yourdomain.com {
|
||||||
|
reverse_proxy localhost:8080
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
Caddy handles TLS automatically. No extra cert config needed.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Initial Setup
|
||||||
|
|
||||||
|
1. Browse to `https://vault.yourdomain.com` and create your account
|
||||||
|
2. Set `SIGNUPS_ALLOWED=false` in the compose file and restart the container
|
||||||
|
3. Install any official Bitwarden client (browser extension, desktop, mobile)
|
||||||
|
4. In the client, set the **Server URL** to `https://vault.yourdomain.com` before logging in
|
||||||
|
|
||||||
|
That's it. The client has no idea it's not talking to Bitwarden's servers.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Access Model
|
||||||
|
|
||||||
|
On MajorInfrastructure, Vaultwarden runs on **majorlab** and is accessible:
|
||||||
|
|
||||||
|
- **Internally** — via Caddy on the local network
|
||||||
|
- **Remotely** — via Tailscale; vault is reachable from any device on the tailnet without exposing it to the public internet
|
||||||
|
|
||||||
|
This means the Caddy vhost does not need to be publicly routable. You can choose to expose it publicly (Let's Encrypt works fine) or keep it Tailscale-only.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Backup
|
||||||
|
|
||||||
|
Vaultwarden stores everything in a single SQLite database at `./vw-data/db.sqlite3`. Back it up like any file:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# Simple copy (stop container first for consistency, or use sqlite backup mode)
|
||||||
|
sqlite3 /path/to/vw-data/db.sqlite3 ".backup '/path/to/backup/vw-backup-$(date +%F).sqlite3'"
|
||||||
|
```
|
||||||
|
|
||||||
|
Or include the `vw-data/` directory in your regular rsync backup run.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Why Not Bitwarden (Official)?
|
||||||
|
|
||||||
|
The official Bitwarden server is also open source but requires significantly more resources (multiple services, SQL Server). Vaultwarden runs in a single container on minimal RAM and handles everything a personal or family vault needs.
|
||||||
|
|
||||||
|
---
|
||||||
@@ -1,3 +1,12 @@
|
|||||||
|
---
|
||||||
|
title: "rmlint — Extreme Duplicate File Scanning"
|
||||||
|
domain: opensource
|
||||||
|
category: productivity
|
||||||
|
tags: [rmlint, duplicates, storage, cleanup, linux]
|
||||||
|
status: published
|
||||||
|
created: 2026-04-02
|
||||||
|
updated: 2026-04-02
|
||||||
|
---
|
||||||
# rmlint — Extreme Duplicate File Scanning
|
# rmlint — Extreme Duplicate File Scanning
|
||||||
|
|
||||||
## Problem
|
## Problem
|
||||||
@@ -52,7 +61,3 @@ After scanning and clearing duplicates, you can reclaim significant space. In my
|
|||||||
Run a scan monthly or before any major storage consolidation project.
|
Run a scan monthly or before any major storage consolidation project.
|
||||||
|
|
||||||
---
|
---
|
||||||
|
|
||||||
## Tags
|
|
||||||
|
|
||||||
#rmlint #linux #storage #cleanup #duplicates
|
|
||||||
|
|||||||
@@ -5,3 +5,7 @@ Guides for live streaming and podcast production, with a focus on OBS Studio.
|
|||||||
## OBS Studio
|
## OBS Studio
|
||||||
|
|
||||||
- [OBS Studio Setup & Encoding](obs/obs-studio-setup-encoding.md)
|
- [OBS Studio Setup & Encoding](obs/obs-studio-setup-encoding.md)
|
||||||
|
|
||||||
|
## Plex
|
||||||
|
|
||||||
|
- [Plex 4K Codec Compatibility (Apple TV)](plex/plex-4k-codec-compatibility.md)
|
||||||
|
|||||||
@@ -118,6 +118,31 @@ echo "v4l2loopback" | sudo tee /etc/modules-load.d/v4l2loopback.conf
|
|||||||
echo "options v4l2loopback devices=1 video_nr=10 card_label=OBS Virtual Camera exclusive_caps=1" | sudo tee /etc/modprobe.d/v4l2loopback.conf
|
echo "options v4l2loopback devices=1 video_nr=10 card_label=OBS Virtual Camera exclusive_caps=1" | sudo tee /etc/modprobe.d/v4l2loopback.conf
|
||||||
```
|
```
|
||||||
|
|
||||||
|
## Plugins & Capture Sources
|
||||||
|
|
||||||
|
### Captions Plugin (Accessibility)
|
||||||
|
|
||||||
|
[OBS Captions Plugin](https://github.com/ratwithacompiler/OBS-captions-plugin) adds real-time closed captions to streams using speech-to-text. Viewers can toggle captions on/off in their player — important for accessibility and for viewers watching without sound.
|
||||||
|
|
||||||
|
Install from the plugin's GitHub releases page, then configure in Tools → Captions.
|
||||||
|
|
||||||
|
### VLC Video Source (Capture Card)
|
||||||
|
|
||||||
|
For capturing from an Elgato 4K60 Pro MK.2 (or similar DirectShow capture card) via VLC as an OBS source, use this device string:
|
||||||
|
|
||||||
|
```
|
||||||
|
:dshow-vdev=Game Capture 4K60 Pro MK.2
|
||||||
|
:dshow-adev=Game Capture 4K60 Pro MK.2 Audio (Game Capture 4K60 Pro MK.2)
|
||||||
|
:dshow-aspect-ratio=16:9
|
||||||
|
:dshow-chroma=YUY2
|
||||||
|
:dshow-fps=0
|
||||||
|
:no-dshow-config
|
||||||
|
:no-dshow-tuner
|
||||||
|
:live-caching=0
|
||||||
|
```
|
||||||
|
|
||||||
|
Set `live-caching=0` to minimize capture latency. This is useful when OBS's native Game Capture isn't an option (e.g., capturing a separate machine's output through the card).
|
||||||
|
|
||||||
## Gotchas & Notes
|
## Gotchas & Notes
|
||||||
|
|
||||||
- **Test your stream before going live.** Record a short clip and watch it back. Artifacts in the recording will be worse in the stream.
|
- **Test your stream before going live.** Record a short clip and watch it back. Artifacts in the recording will be worse in the stream.
|
||||||
@@ -128,5 +153,5 @@ echo "options v4l2loopback devices=1 video_nr=10 card_label=OBS Virtual Camera e
|
|||||||
|
|
||||||
## See Also
|
## See Also
|
||||||
|
|
||||||
- [[linux-file-permissions]]
|
- [linux-file-permissions](../../01-linux/files-permissions/linux-file-permissions.md)
|
||||||
- [[bash-scripting-patterns]]
|
- [bash-scripting-patterns](../../01-linux/shell-scripting/bash-scripting-patterns.md)
|
||||||
|
|||||||
157
04-streaming/plex/plex-4k-codec-compatibility.md
Normal file
157
04-streaming/plex/plex-4k-codec-compatibility.md
Normal file
@@ -0,0 +1,157 @@
|
|||||||
|
---
|
||||||
|
title: "Plex 4K Codec Compatibility (Apple TV)"
|
||||||
|
domain: streaming
|
||||||
|
category: plex
|
||||||
|
tags: [plex, 4k, hevc, apple-tv, transcoding, codec]
|
||||||
|
status: published
|
||||||
|
created: 2026-04-02
|
||||||
|
updated: 2026-04-02
|
||||||
|
---
|
||||||
|
# Plex 4K Codec Compatibility (Apple TV)
|
||||||
|
|
||||||
|
4K content on YouTube is delivered in AV1 or VP9 — neither of which the Plex app on Apple TV can direct play. This forces Plex to transcode, and most home server CPUs can't transcode 4K in real time. The fix is converting to HEVC before Plex ever sees the file.
|
||||||
|
|
||||||
|
## Codec Compatibility Matrix
|
||||||
|
|
||||||
|
| Codec | Apple TV (Plex direct play) | YouTube 4K | Notes |
|
||||||
|
|---|---|---|---|
|
||||||
|
| H.264 (AVC) | ✅ | ❌ (max 1080p) | Most compatible, but no 4K |
|
||||||
|
| HEVC (H.265) | ✅ | ❌ | Best choice: 4K compatible, widely supported |
|
||||||
|
| VP9 | ❌ | ✅ | Google's royalty-free codec, forces transcode |
|
||||||
|
| AV1 | ❌ | ✅ | Best compression, requires modern hardware to decode |
|
||||||
|
|
||||||
|
**Target format: HEVC.** Direct plays on Apple TV, supports 4K/HDR, and modern hardware can encode it quickly.
|
||||||
|
|
||||||
|
## Why AV1 and VP9 Cause Problems
|
||||||
|
|
||||||
|
When Plex can't direct play a file it transcodes it on the server. AV1 and VP9 decoding is CPU-intensive — most home server CPUs can't keep up with 4K60 in real time. Intel Quick Sync (HD 630 era) supports VP9 hardware decode but not AV1. AV1 hardware support requires 11th-gen Intel or RTX 30-series+.
|
||||||
|
|
||||||
|
## Batch Converting Existing Files
|
||||||
|
|
||||||
|
For files already in your Plex library, use this script to find all AV1/VP9 files and convert them to HEVC via VAAPI (Intel Quick Sync):
|
||||||
|
|
||||||
|
```bash
|
||||||
|
#!/bin/bash
|
||||||
|
VAAPI_DEV=/dev/dri/renderD128
|
||||||
|
PLEX_DIR="/plex/plex"
|
||||||
|
LOG="/root/av1_to_hevc.log"
|
||||||
|
TMPDIR="/tmp/av1_convert"
|
||||||
|
|
||||||
|
mkdir -p "$TMPDIR"
|
||||||
|
echo "=== AV1→HEVC batch started $(date) ===" | tee -a "$LOG"
|
||||||
|
|
||||||
|
find "$PLEX_DIR" -iname "*.mp4" -o -iname "*.mkv" | while IFS= read -r f; do
|
||||||
|
codec=$(mediainfo --Inform='Video;%Format%' "$f" 2>/dev/null)
|
||||||
|
[ "$codec" != "AV1" ] && [ "$codec" != "VP9" ] && continue
|
||||||
|
|
||||||
|
echo "[$(date +%H:%M:%S)] Converting: $(basename "$f")" | tee -a "$LOG"
|
||||||
|
tmp="${TMPDIR}/$(basename "${f%.*}").mp4"
|
||||||
|
|
||||||
|
ffmpeg -hide_banner -loglevel error \
|
||||||
|
-vaapi_device "$VAAPI_DEV" \
|
||||||
|
-i "$f" \
|
||||||
|
-vf 'format=nv12,hwupload' \
|
||||||
|
-c:v hevc_vaapi \
|
||||||
|
-qp 22 \
|
||||||
|
-c:a copy \
|
||||||
|
-movflags +faststart \
|
||||||
|
"$tmp"
|
||||||
|
|
||||||
|
if [ $? -eq 0 ] && [ -s "$tmp" ]; then
|
||||||
|
mv "$tmp" "${f%.*}_hevc.mp4"
|
||||||
|
rm -f "$f"
|
||||||
|
else
|
||||||
|
rm -f "$tmp"
|
||||||
|
echo " FAILED — original kept." | tee -a "$LOG"
|
||||||
|
fi
|
||||||
|
done
|
||||||
|
```
|
||||||
|
|
||||||
|
Run in a tmux session so it survives SSH disconnect:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
tmux new-session -d -s av1-convert '/root/av1_to_hevc.sh'
|
||||||
|
tail -f /root/av1_to_hevc.log
|
||||||
|
```
|
||||||
|
|
||||||
|
After completion, trigger a Plex library scan to pick up the renamed files.
|
||||||
|
|
||||||
|
## Automating Future Downloads (yt-dlp)
|
||||||
|
|
||||||
|
Prevent the problem at the source with a post-download conversion hook.
|
||||||
|
|
||||||
|
### 1. Create the conversion script
|
||||||
|
|
||||||
|
Save to `/usr/local/bin/yt-dlp-hevc-convert.sh`:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
#!/bin/bash
|
||||||
|
INPUT="$1"
|
||||||
|
VAAPI_DEV=/dev/dri/renderD128
|
||||||
|
LOG=/var/log/yt-dlp-convert.log
|
||||||
|
|
||||||
|
[ -z "$INPUT" ] && exit 0
|
||||||
|
[ ! -f "$INPUT" ] && exit 0
|
||||||
|
|
||||||
|
CODEC=$(mediainfo --Inform='Video;%Format%' "$INPUT" 2>/dev/null)
|
||||||
|
if [ "$CODEC" != "AV1" ] && [ "$CODEC" != "VP9" ]; then
|
||||||
|
exit 0
|
||||||
|
fi
|
||||||
|
|
||||||
|
echo "[$(date '+%Y-%m-%d %H:%M:%S')] Converting ($CODEC): $(basename "$INPUT")" >> "$LOG"
|
||||||
|
TMPOUT="${INPUT%.*}_hevc_tmp.mp4"
|
||||||
|
|
||||||
|
ffmpeg -hide_banner -loglevel error \
|
||||||
|
-vaapi_device "$VAAPI_DEV" \
|
||||||
|
-i "$INPUT" \
|
||||||
|
-vf 'format=nv12,hwupload' \
|
||||||
|
-c:v hevc_vaapi \
|
||||||
|
-qp 22 \
|
||||||
|
-c:a copy \
|
||||||
|
-movflags +faststart \
|
||||||
|
"$TMPOUT"
|
||||||
|
|
||||||
|
if [ $? -eq 0 ] && [ -s "$TMPOUT" ]; then
|
||||||
|
mv "$TMPOUT" "${INPUT%.*}.mp4"
|
||||||
|
[ "${INPUT%.*}.mp4" != "$INPUT" ] && rm -f "$INPUT"
|
||||||
|
echo "[$(date '+%Y-%m-%d %H:%M:%S')] OK: $(basename "${INPUT%.*}.mp4")" >> "$LOG"
|
||||||
|
else
|
||||||
|
rm -f "$TMPOUT"
|
||||||
|
echo "[$(date '+%Y-%m-%d %H:%M:%S')] FAILED — original kept: $(basename "$INPUT")" >> "$LOG"
|
||||||
|
fi
|
||||||
|
```
|
||||||
|
|
||||||
|
```bash
|
||||||
|
chmod +x /usr/local/bin/yt-dlp-hevc-convert.sh
|
||||||
|
```
|
||||||
|
|
||||||
|
### 2. Configure yt-dlp
|
||||||
|
|
||||||
|
`~/.config/yt-dlp/config`:
|
||||||
|
|
||||||
|
```
|
||||||
|
--remote-components ejs:github
|
||||||
|
--format bestvideo+bestaudio
|
||||||
|
--merge-output-format mp4
|
||||||
|
--output /plex/plex/%(title)s.%(ext)s
|
||||||
|
--write-auto-subs
|
||||||
|
--embed-subs
|
||||||
|
--exec /usr/local/bin/yt-dlp-hevc-convert.sh {}
|
||||||
|
```
|
||||||
|
|
||||||
|
With this config, `yt-dlp <URL>` downloads the best available quality (including 4K AV1/VP9), then immediately converts any AV1 or VP9 output to HEVC before Plex indexes it.
|
||||||
|
|
||||||
|
> [!note] The `--format bestvideo+bestaudio` selector gets true 4K from YouTube (served as AV1 or VP9). The hook converts it to HEVC. Without the hook, using `bestvideo[ext=mp4]` would cap downloads at 1080p since YouTube only serves H.264 up to 1080p.
|
||||||
|
|
||||||
|
## Enabling Hardware Transcoding in Plex
|
||||||
|
|
||||||
|
Even with automatic conversion in place, enable hardware acceleration in Plex as a fallback for any files that slip through:
|
||||||
|
|
||||||
|
**Plex Web → Settings → Transcoder → "Use hardware acceleration when available"**
|
||||||
|
|
||||||
|
This requires Plex Pass. On Intel systems with Quick Sync, VP9 will hardware transcode even without pre-conversion. AV1 will still fall back to CPU on pre-Alder Lake hardware.
|
||||||
|
|
||||||
|
## Related
|
||||||
|
|
||||||
|
- [yt-dlp: Video Downloading](../../03-opensource/media-creative/yt-dlp.md)
|
||||||
|
- [OBS Studio Setup & Encoding](../obs/obs-studio-setup-encoding.md)
|
||||||
135
05-troubleshooting/ansible-check-mode-false-positives.md
Normal file
135
05-troubleshooting/ansible-check-mode-false-positives.md
Normal file
@@ -0,0 +1,135 @@
|
|||||||
|
---
|
||||||
|
title: Ansible Check Mode False Positives in Verify/Assert Tasks
|
||||||
|
domain: selfhosting
|
||||||
|
category: troubleshooting
|
||||||
|
tags:
|
||||||
|
- ansible
|
||||||
|
- check-mode
|
||||||
|
- dry-run
|
||||||
|
- assert
|
||||||
|
- handlers
|
||||||
|
- troubleshooting
|
||||||
|
status: published
|
||||||
|
created: 2026-04-18
|
||||||
|
updated: 2026-04-18T11:13
|
||||||
|
---
|
||||||
|
# Ansible Check Mode False Positives in Verify/Assert Tasks
|
||||||
|
|
||||||
|
## The Problem
|
||||||
|
|
||||||
|
`ansible-playbook --check` (dry-run mode) reports failures on verify and assert tasks that depend on handler-triggered side effects — even when the playbook is correct and would succeed on a real run.
|
||||||
|
|
||||||
|
**Symptom:** Running `--check` produces errors like:
|
||||||
|
|
||||||
|
```
|
||||||
|
TASK [Assert hardened settings are active] ***
|
||||||
|
fatal: [host]: FAILED! => {
|
||||||
|
"assertion": "'permitrootlogin without-password' in sshd_effective.stdout",
|
||||||
|
"msg": "One or more SSH hardening settings not effective"
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
But a real run (`ansible-playbook` without `--check`) succeeds cleanly.
|
||||||
|
|
||||||
|
## Why It Happens
|
||||||
|
|
||||||
|
In check mode, Ansible simulates tasks but **does not execute handlers**. This means:
|
||||||
|
|
||||||
|
1. A config file task reports `changed` (it would deploy the file)
|
||||||
|
2. The handler (`reload sshd`, `reload firewalld`, etc.) is **never fired**
|
||||||
|
3. A subsequent verify task runs `sshd -T` or `ufw status verbose` against the **current live state** (pre-change)
|
||||||
|
4. The assert compares the current state against the expected post-change state and fails
|
||||||
|
|
||||||
|
The verify task is reading reality accurately — the change hasn't happened yet — but the failure is misleading. It suggests the playbook is broken when it's actually correct.
|
||||||
|
|
||||||
|
## The Fix
|
||||||
|
|
||||||
|
Guard verify and assert tasks that depend on handler side effects with `when: not ansible_check_mode`:
|
||||||
|
|
||||||
|
```yaml
|
||||||
|
- name: Verify effective SSH settings post-reload
|
||||||
|
ansible.builtin.command:
|
||||||
|
cmd: sshd -T
|
||||||
|
register: sshd_effective
|
||||||
|
changed_when: false
|
||||||
|
when: not ansible_check_mode # sshd hasn't reloaded in check mode
|
||||||
|
|
||||||
|
- name: Assert hardened settings are active
|
||||||
|
ansible.builtin.assert:
|
||||||
|
that:
|
||||||
|
- "'permitrootlogin without-password' in sshd_effective.stdout"
|
||||||
|
- "'x11forwarding no' in sshd_effective.stdout"
|
||||||
|
fail_msg: "SSH hardening settings not effective — check for conflicting config"
|
||||||
|
when: not ansible_check_mode # result would be pre-change state
|
||||||
|
```
|
||||||
|
|
||||||
|
This skips the verify/assert during check mode (where they'd produce false failures) while keeping them active on real runs (where they catch actual misconfigurations).
|
||||||
|
|
||||||
|
## When to Apply This Guard
|
||||||
|
|
||||||
|
Apply `when: not ansible_check_mode` to any task that:
|
||||||
|
|
||||||
|
- Reads the **active/effective state** of a service after a config change (`sshd -T`, `ufw status verbose`, `firewall-cmd --list-all`, `nginx -T`)
|
||||||
|
- **Asserts** that the post-change state matches expectations
|
||||||
|
- Depends on a **handler** having fired first (service reload, daemon restart)
|
||||||
|
|
||||||
|
Don't apply it to tasks that check pre-existing state (e.g., verifying a file exists before modifying it) — those are valid in check mode.
|
||||||
|
|
||||||
|
## Common Patterns
|
||||||
|
|
||||||
|
### SSH daemon verify
|
||||||
|
|
||||||
|
```yaml
|
||||||
|
- name: Verify effective sshd settings
|
||||||
|
ansible.builtin.command: sshd -T
|
||||||
|
register: sshd_out
|
||||||
|
changed_when: false
|
||||||
|
when: not ansible_check_mode
|
||||||
|
|
||||||
|
- name: Assert sshd hardening active
|
||||||
|
ansible.builtin.assert:
|
||||||
|
that:
|
||||||
|
- "'maxauthtries 3' in sshd_out.stdout"
|
||||||
|
when: not ansible_check_mode
|
||||||
|
```
|
||||||
|
|
||||||
|
### UFW status verify
|
||||||
|
|
||||||
|
```yaml
|
||||||
|
- name: Show UFW status
|
||||||
|
ansible.builtin.command: ufw status verbose
|
||||||
|
register: ufw_status
|
||||||
|
changed_when: false
|
||||||
|
when: not ansible_check_mode
|
||||||
|
|
||||||
|
- name: Confirm default deny incoming
|
||||||
|
ansible.builtin.assert:
|
||||||
|
that:
|
||||||
|
- "'Default: deny (incoming)' in ufw_status.stdout"
|
||||||
|
when: not ansible_check_mode
|
||||||
|
```
|
||||||
|
|
||||||
|
### nginx config verify
|
||||||
|
|
||||||
|
```yaml
|
||||||
|
- name: Test nginx config
|
||||||
|
ansible.builtin.command: nginx -t
|
||||||
|
changed_when: false
|
||||||
|
when: not ansible_check_mode
|
||||||
|
```
|
||||||
|
|
||||||
|
## Trade-off
|
||||||
|
|
||||||
|
Guarding with `when: not ansible_check_mode` means check mode won't validate these assertions. The benefit — no false failures — outweighs the gap because:
|
||||||
|
|
||||||
|
- Check mode is showing you what *would* change, not whether the result is valid
|
||||||
|
- Real runs still assert and will catch actual misconfigurations
|
||||||
|
- The alternative (failing check runs) erodes trust in `--check` output
|
||||||
|
|
||||||
|
If you need to verify the effective post-change state in check mode, consider splitting the playbook into a deploy pass and a separate verify-only playbook run without `--check`.
|
||||||
|
|
||||||
|
## See Also
|
||||||
|
|
||||||
|
- [ssh-hardening-ansible-fleet](../02-selfhosting/security/ssh-hardening-ansible-fleet.md)
|
||||||
|
- [ufw-firewall-management](../02-selfhosting/security/ufw-firewall-management.md)
|
||||||
|
- [ansible-getting-started](../01-linux/shell-scripting/ansible-getting-started.md)
|
||||||
72
05-troubleshooting/ansible-ssh-timeout-dnf-upgrade.md
Normal file
72
05-troubleshooting/ansible-ssh-timeout-dnf-upgrade.md
Normal file
@@ -0,0 +1,72 @@
|
|||||||
|
---
|
||||||
|
title: Ansible SSH Timeout During dnf upgrade on Fedora Hosts
|
||||||
|
domain: troubleshooting
|
||||||
|
category: ansible
|
||||||
|
tags:
|
||||||
|
- ansible
|
||||||
|
- ssh
|
||||||
|
- fedora
|
||||||
|
- dnf
|
||||||
|
- timeout
|
||||||
|
- fleet-management
|
||||||
|
status: published
|
||||||
|
created: '2026-03-28'
|
||||||
|
updated: '2026-03-28'
|
||||||
|
---
|
||||||
|
|
||||||
|
# Ansible SSH Timeout During dnf upgrade on Fedora Hosts
|
||||||
|
|
||||||
|
## Symptom
|
||||||
|
|
||||||
|
Running `ansible-playbook update.yml` against Fedora/CentOS hosts fails with:
|
||||||
|
|
||||||
|
```
|
||||||
|
fatal: [hostname]: UNREACHABLE! => {"changed": false,
|
||||||
|
"msg": "Failed to connect to the host via ssh: Shared connection to <IP> closed."}
|
||||||
|
```
|
||||||
|
|
||||||
|
The failure occurs specifically during `ansible.builtin.dnf` tasks that upgrade all packages (`name: '*'`, `state: latest`), because the operation takes long enough for the SSH connection to drop.
|
||||||
|
|
||||||
|
## Root Cause
|
||||||
|
|
||||||
|
Without explicit SSH keepalive settings in `ansible.cfg`, OpenSSH defaults apply. Long-running tasks like full `dnf upgrade` across a fleet can exceed idle timeouts, causing the control connection to close mid-task.
|
||||||
|
|
||||||
|
## Fix
|
||||||
|
|
||||||
|
Add a `[ssh_connection]` section to `ansible.cfg`:
|
||||||
|
|
||||||
|
```ini
|
||||||
|
[ssh_connection]
|
||||||
|
ssh_args = -o ServerAliveInterval=30 -o ServerAliveCountMax=10 -o ControlMaster=auto -o ControlPersist=60s
|
||||||
|
```
|
||||||
|
|
||||||
|
| Setting | Purpose |
|
||||||
|
|---------|---------|
|
||||||
|
| `ServerAliveInterval=30` | Send a keepalive every 30 seconds |
|
||||||
|
| `ServerAliveCountMax=10` | Allow 10 missed keepalives before disconnect (~5 min tolerance) |
|
||||||
|
| `ControlMaster=auto` | Reuse SSH connections across tasks |
|
||||||
|
| `ControlPersist=60s` | Keep the master connection open 60s after last use |
|
||||||
|
|
||||||
|
## Related Fix: do-agent Task Guard
|
||||||
|
|
||||||
|
In the same playbook run, a second failure surfaced on hosts where the `ansible.builtin.uri` task to fetch the latest `do-agent` release was **skipped** (non-RedHat hosts or hosts without do-agent installed). The registered variable existed but contained a skipped result with no `.json` attribute, causing:
|
||||||
|
|
||||||
|
```
|
||||||
|
object of type 'dict' has no attribute 'json'
|
||||||
|
```
|
||||||
|
|
||||||
|
Fix: add guards to downstream tasks that reference the URI result:
|
||||||
|
|
||||||
|
```yaml
|
||||||
|
when:
|
||||||
|
- do_agent_release is defined
|
||||||
|
- do_agent_release is not skipped
|
||||||
|
- do_agent_release.json is defined
|
||||||
|
```
|
||||||
|
|
||||||
|
## Environment
|
||||||
|
|
||||||
|
- **Controller:** macOS (MajorAir)
|
||||||
|
- **Targets:** Fedora 43 (majorlab, majormail, majorhome, majordiscord)
|
||||||
|
- **Ansible:** community edition via Homebrew
|
||||||
|
- **Committed:** `d9c6bdb` in MajorAnsible repo
|
||||||
68
05-troubleshooting/ansible-vault-password-file-missing.md
Normal file
68
05-troubleshooting/ansible-vault-password-file-missing.md
Normal file
@@ -0,0 +1,68 @@
|
|||||||
|
---
|
||||||
|
title: "Ansible: Vault Password File Not Found"
|
||||||
|
domain: troubleshooting
|
||||||
|
category: general
|
||||||
|
tags: [ansible, vault, credentials, configuration]
|
||||||
|
status: published
|
||||||
|
created: 2026-04-02
|
||||||
|
updated: 2026-04-02
|
||||||
|
---
|
||||||
|
# Ansible: Vault Password File Not Found
|
||||||
|
|
||||||
|
## Error
|
||||||
|
|
||||||
|
```
|
||||||
|
[WARNING]: Error getting vault password file (default): The vault password file /Users/majorlinux/.ansible/vault_pass was not found
|
||||||
|
[ERROR]: The vault password file /Users/majorlinux/.ansible/vault_pass was not found
|
||||||
|
```
|
||||||
|
|
||||||
|
## Cause
|
||||||
|
|
||||||
|
Ansible is configured to look for a vault password file at `~/.ansible/vault_pass`, but the file does not exist. This is typically set in `ansible.cfg` via the `vault_password_file` directive.
|
||||||
|
|
||||||
|
## Solutions
|
||||||
|
|
||||||
|
### Option 1: Remove the vault config (if you're not using Vault)
|
||||||
|
|
||||||
|
Check your `ansible.cfg` for this line and remove it if Vault is not needed:
|
||||||
|
|
||||||
|
```ini
|
||||||
|
[defaults]
|
||||||
|
vault_password_file = ~/.ansible/vault_pass
|
||||||
|
```
|
||||||
|
|
||||||
|
### Option 2: Create the vault password file
|
||||||
|
|
||||||
|
```bash
|
||||||
|
echo 'your_vault_password' > ~/.ansible/vault_pass
|
||||||
|
chmod 600 ~/.ansible/vault_pass
|
||||||
|
```
|
||||||
|
|
||||||
|
> **Security note:** Keep permissions tight (`600`) so only your user can read the file. The actual vault password is stored in Bitwarden under the "Ansible Vault Password" entry.
|
||||||
|
|
||||||
|
### Option 3: Pass the password at runtime (no file needed)
|
||||||
|
|
||||||
|
```bash
|
||||||
|
ansible-playbook test.yml --ask-vault-pass
|
||||||
|
```
|
||||||
|
|
||||||
|
## Diagnosing the Source of the Config
|
||||||
|
|
||||||
|
To find which config file is setting `vault_password_file`, run:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
ansible-config dump --only-changed
|
||||||
|
```
|
||||||
|
|
||||||
|
This shows all non-default config values and their source files. Config is loaded in this order of precedence:
|
||||||
|
|
||||||
|
1. `ANSIBLE_CONFIG` environment variable
|
||||||
|
2. `./ansible.cfg` (current directory)
|
||||||
|
3. `~/.ansible.cfg`
|
||||||
|
4. `/etc/ansible/ansible.cfg`
|
||||||
|
|
||||||
|
## Related
|
||||||
|
|
||||||
|
- [Ansible Getting Started](../01-linux/shell-scripting/ansible-getting-started.md)
|
||||||
|
- Vault password is stored in Bitwarden under **"Ansible Vault Password"**
|
||||||
|
- Ansible playbooks live at `~/MajorAnsible` on MajorAir/MajorMac
|
||||||
@@ -0,0 +1,89 @@
|
|||||||
|
---
|
||||||
|
title: "Ansible Ignores ansible.cfg on WSL2 Windows Mounts"
|
||||||
|
domain: troubleshooting
|
||||||
|
category: ansible
|
||||||
|
tags: [ansible, wsl, wsl2, windows, vault, configuration]
|
||||||
|
status: published
|
||||||
|
created: 2026-04-03
|
||||||
|
updated: 2026-04-03
|
||||||
|
---
|
||||||
|
|
||||||
|
# Ansible Ignores ansible.cfg on WSL2 Windows Mounts
|
||||||
|
|
||||||
|
## Problem
|
||||||
|
|
||||||
|
Running Ansible from a repo on a Windows drive (`/mnt/c/`, `/mnt/d/`, etc.) in WSL2 silently ignores the local `ansible.cfg`. You'll see:
|
||||||
|
|
||||||
|
```
|
||||||
|
[WARNING]: Ansible is being run in a world writable directory
|
||||||
|
(/mnt/d/MajorAnsible), ignoring it as an ansible.cfg source.
|
||||||
|
```
|
||||||
|
|
||||||
|
This causes vault decryption to fail (`Attempting to decrypt but no vault secrets found`), inventory to fall back to `/etc/ansible/hosts`, and `remote_user` to reset to defaults — even though `ansible.cfg` is right there in the project directory.
|
||||||
|
|
||||||
|
## Cause
|
||||||
|
|
||||||
|
WSL2 mounts Windows NTFS drives with broad permissions (typically `0777`). Ansible refuses to load `ansible.cfg` from any world-writable directory as a security measure — a malicious user on a shared system could inject a rogue config.
|
||||||
|
|
||||||
|
This is hardcoded behavior in Ansible and cannot be overridden with a flag.
|
||||||
|
|
||||||
|
## Solutions
|
||||||
|
|
||||||
|
### Option 1: Environment Variables (Recommended)
|
||||||
|
|
||||||
|
Export the settings that `ansible.cfg` would normally provide. Add to `~/.bashrc`:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
export ANSIBLE_VAULT_PASSWORD_FILE=~/.ansible/vault_pass
|
||||||
|
```
|
||||||
|
|
||||||
|
Other common settings you may need:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
export ANSIBLE_REMOTE_USER=root
|
||||||
|
export ANSIBLE_INVENTORY=/mnt/d/MajorAnsible/inventory/inventory.yml
|
||||||
|
```
|
||||||
|
|
||||||
|
### Option 2: Pass Flags Explicitly
|
||||||
|
|
||||||
|
```bash
|
||||||
|
ansible-playbook -i inventory/ playbook.yml --vault-password-file ~/.ansible/vault_pass
|
||||||
|
```
|
||||||
|
|
||||||
|
This works but is tedious for daily use.
|
||||||
|
|
||||||
|
### Option 3: Clone to a Native Linux Path
|
||||||
|
|
||||||
|
Clone the repo inside the WSL2 filesystem instead of on the Windows mount:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
git clone https://git.example.com/repo.git ~/MajorAnsible
|
||||||
|
```
|
||||||
|
|
||||||
|
Native WSL2 paths (`/home/user/...`) have proper Linux permissions, so `ansible.cfg` loads normally. The tradeoff is that Windows tools can't easily access the repo.
|
||||||
|
|
||||||
|
### Option 4: Fix Mount Permissions (Not Recommended)
|
||||||
|
|
||||||
|
You can change WSL2 mount permissions via `/etc/wsl.conf`:
|
||||||
|
|
||||||
|
```ini
|
||||||
|
[automount]
|
||||||
|
options = "metadata,umask=022"
|
||||||
|
```
|
||||||
|
|
||||||
|
This requires a `wsl --shutdown` and remount. It may break other Windows-Linux interop workflows and affects all mounted drives.
|
||||||
|
|
||||||
|
## Diagnosis
|
||||||
|
|
||||||
|
To confirm whether Ansible is loading your config:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
ansible --version
|
||||||
|
```
|
||||||
|
|
||||||
|
Look for the `config file` line. If it shows `None` instead of your project's `ansible.cfg`, the config is being ignored.
|
||||||
|
|
||||||
|
## Related
|
||||||
|
|
||||||
|
- [Ansible: Vault Password File Not Found](ansible-vault-password-file-missing.md) — general vault password troubleshooting
|
||||||
|
- [Ansible Docs: Avoiding Security Risks with ansible.cfg](https://docs.ansible.com/ansible/latest/reference_appendices/config.html#cfg-in-world-writable-dir)
|
||||||
178
05-troubleshooting/claude-mem-setting-sources-empty-arg.md
Normal file
178
05-troubleshooting/claude-mem-setting-sources-empty-arg.md
Normal file
@@ -0,0 +1,178 @@
|
|||||||
|
---
|
||||||
|
title: "claude-mem Silently Fails with Claude Code 2.1+ (Empty --setting-sources)"
|
||||||
|
domain: troubleshooting
|
||||||
|
category: claude-code
|
||||||
|
tags: [claude-code, claude-mem, cli, subprocess, version-mismatch, shim]
|
||||||
|
status: published
|
||||||
|
created: 2026-04-17
|
||||||
|
updated: 2026-04-17
|
||||||
|
---
|
||||||
|
|
||||||
|
# claude-mem Silently Fails with Claude Code 2.1+ (Empty `--setting-sources`)
|
||||||
|
|
||||||
|
## Symptom
|
||||||
|
|
||||||
|
After installing the `claude-mem` plugin (v12.1.3) in Claude Code (v2.1.112), every Claude Code session starts with:
|
||||||
|
|
||||||
|
```
|
||||||
|
No previous sessions found for this project yet.
|
||||||
|
```
|
||||||
|
|
||||||
|
…even for directories where you've worked repeatedly. Session records *do* appear in `~/.claude-mem/claude-mem.db` (table `sdk_sessions`), but:
|
||||||
|
|
||||||
|
- `session_summaries` count stays at **0**
|
||||||
|
- `observations` count stays at **0**
|
||||||
|
- Chroma vector DB stays empty
|
||||||
|
|
||||||
|
Tailing `~/.claude-mem/logs/claude-mem-YYYY-MM-DD.log` shows the Stop hook firing on every assistant turn, but always:
|
||||||
|
|
||||||
|
```
|
||||||
|
[HOOK ] → Stop: Requesting summary {hasLastAssistantMessage=true}
|
||||||
|
[HOOK ] Summary processing complete {waitedMs=503, summaryStored=null}
|
||||||
|
```
|
||||||
|
|
||||||
|
No errors, no stack traces — just a silent `null`. Raising `CLAUDE_MEM_LOG_LEVEL` to `DEBUG` reveals the true error:
|
||||||
|
|
||||||
|
```
|
||||||
|
[WARN ] [SDK_SPAWN] Claude process exited {code=1, signal=null, pid=…}
|
||||||
|
[ERROR] [SESSION] Generator failed {provider=claude, error=Claude Code process exited with code 1}
|
||||||
|
```
|
||||||
|
|
||||||
|
## Root cause
|
||||||
|
|
||||||
|
`claude-mem` 12.1.3 spawns the `claude` CLI as a subprocess to generate per-turn observations and session summaries. The argv it passes includes:
|
||||||
|
|
||||||
|
```
|
||||||
|
claude --output-format stream-json --verbose --input-format stream-json \
|
||||||
|
--model claude-sonnet-4-6 \
|
||||||
|
--disallowedTools Bash,Read,Write,… \
|
||||||
|
--setting-sources \ ← no value!
|
||||||
|
--permission-mode default
|
||||||
|
```
|
||||||
|
|
||||||
|
`claude-mem` intends to pass `--setting-sources ""` (empty string, meaning "no sources"). Claude Code **v2.1.x** now validates this flag and rejects empty values — it requires one of `user`, `project`, or `local`. With no value present, the CLI's argument parser consumes the next flag (`--permission-mode`) as the value and produces:
|
||||||
|
|
||||||
|
```
|
||||||
|
Error processing --setting-sources: Invalid setting source: --permission-mode.
|
||||||
|
Valid options are: user, project, local
|
||||||
|
```
|
||||||
|
|
||||||
|
The child process exits immediately with code 1 (within ~130 ms). `claude-mem` only logs `exited with code 1` and discards stderr by default, which is why the failure looks silent.
|
||||||
|
|
||||||
|
This is a **version-mismatch bug** between `claude-mem` 12.1.3 (latest as of 2026-04-17) and `claude-code` 2.1.x. Earlier Claude Code releases accepted empty values.
|
||||||
|
|
||||||
|
## Investigation path
|
||||||
|
|
||||||
|
1. Confirm worker processes are alive:
|
||||||
|
```bash
|
||||||
|
pgrep -fl "worker-service|mcp-server.cjs|chroma-mcp"
|
||||||
|
cat ~/.claude-mem/supervisor.json
|
||||||
|
```
|
||||||
|
2. Confirm sessions are being *recorded* but not *summarised*:
|
||||||
|
```bash
|
||||||
|
sqlite3 ~/.claude-mem/claude-mem.db \
|
||||||
|
"SELECT COUNT(*) FROM sdk_sessions; -- nonzero
|
||||||
|
SELECT COUNT(*) FROM session_summaries; -- 0 = pipeline broken
|
||||||
|
SELECT COUNT(*) FROM observations; -- 0 = pipeline broken"
|
||||||
|
```
|
||||||
|
3. Grep the log for `summaryStored=null` — if every Stop hook ends in `null`, summarisation is failing.
|
||||||
|
4. Raise log level to expose the real error:
|
||||||
|
```bash
|
||||||
|
# In ~/.claude-mem/settings.json
|
||||||
|
"CLAUDE_MEM_LOG_LEVEL": "DEBUG"
|
||||||
|
```
|
||||||
|
Kill and respawn workers (`pkill -f worker-service.cjs`). New logs should show `SDK_SPAWN Claude process exited {code=1}`.
|
||||||
|
5. Capture the exact argv by replacing `CLAUDE_CODE_PATH` with a debug shim that logs `$@` before exec'ing the real binary (see the fix below for the production shim — the debug version just tees argv to a log file).
|
||||||
|
|
||||||
|
## The fix
|
||||||
|
|
||||||
|
Apply in this order.
|
||||||
|
|
||||||
|
### 1. Fix the settings `claude-mem` ships with empty
|
||||||
|
|
||||||
|
Edit `~/.claude-mem/settings.json`:
|
||||||
|
|
||||||
|
```json
|
||||||
|
{
|
||||||
|
"CLAUDE_CODE_PATH": "/Users/you/.local/bin/claude-shim",
|
||||||
|
"CLAUDE_MEM_TIER_SUMMARY_MODEL": "claude-sonnet-4-6"
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
Both ship empty in a fresh install. `CLAUDE_CODE_PATH` points at the shim (below), not the real binary. `CLAUDE_MEM_TIER_SUMMARY_MODEL` is required when `CLAUDE_MEM_TIER_ROUTING_ENABLED=true`.
|
||||||
|
|
||||||
|
### 2. Install the shim
|
||||||
|
|
||||||
|
`/Users/you/.local/bin/claude-shim`:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
#!/bin/bash
|
||||||
|
# Workaround shim for claude-mem 12.1.3 <-> Claude Code 2.1.x incompat.
|
||||||
|
# claude-mem passes `--setting-sources` with no value; Claude CLI 2.1+ rejects
|
||||||
|
# empty and consumes the next flag as the value. Fix: inject "user" when missing.
|
||||||
|
|
||||||
|
REAL=/Users/you/.local/bin/claude
|
||||||
|
|
||||||
|
new_args=()
|
||||||
|
i=0
|
||||||
|
args=("$@")
|
||||||
|
while [ $i -lt ${#args[@]} ]; do
|
||||||
|
cur="${args[$i]}"
|
||||||
|
new_args+=("$cur")
|
||||||
|
if [ "$cur" = "--setting-sources" ]; then
|
||||||
|
next="${args[$((i+1))]}"
|
||||||
|
case "$next" in
|
||||||
|
user|project|local) : ;; # already valid
|
||||||
|
*) new_args+=("user") ;; # inject missing value
|
||||||
|
esac
|
||||||
|
fi
|
||||||
|
i=$((i+1))
|
||||||
|
done
|
||||||
|
|
||||||
|
exec "$REAL" "${new_args[@]}"
|
||||||
|
```
|
||||||
|
|
||||||
|
Chmod it executable: `chmod +x ~/.local/bin/claude-shim`.
|
||||||
|
|
||||||
|
### 3. Restart workers
|
||||||
|
|
||||||
|
```bash
|
||||||
|
pkill -f "worker-service.cjs --daemon"
|
||||||
|
```
|
||||||
|
|
||||||
|
They respawn automatically on the next Claude Code hook fire. Verify:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# Within ~15 s:
|
||||||
|
sqlite3 ~/.claude-mem/claude-mem.db "SELECT COUNT(*) FROM observations;"
|
||||||
|
# Should be growing as you continue the session.
|
||||||
|
```
|
||||||
|
|
||||||
|
### 4. Sanity-check the shim is being used
|
||||||
|
|
||||||
|
```bash
|
||||||
|
ps -eww | grep -F 'setting-sources user'
|
||||||
|
```
|
||||||
|
|
||||||
|
Every live `claude` child should have `--setting-sources user` in its argv, not a bare `--setting-sources`.
|
||||||
|
|
||||||
|
## Why a shim instead of patching `claude-mem`
|
||||||
|
|
||||||
|
The offending code is inside the minified `worker-service.cjs` bundle shipped by `@anthropic-ai/claude-code` SDK, which `claude-mem` vendors. Patching the bundle is possible but fragile: any `claude-mem` update overwrites it. The shim is a one-file wrapper at a stable path, survives plugin updates, and becomes a no-op the moment upstream ships a fix.
|
||||||
|
|
||||||
|
## When to remove the shim
|
||||||
|
|
||||||
|
Check for a newer `claude-mem` release or an Anthropic SDK update that stops passing `--setting-sources` with an empty value. Test by:
|
||||||
|
|
||||||
|
1. Point `CLAUDE_CODE_PATH` back at the real `/Users/you/.local/bin/claude`.
|
||||||
|
2. Restart workers.
|
||||||
|
3. Confirm `observations` count keeps growing.
|
||||||
|
|
||||||
|
If it does, remove the shim. If not, restore the shim path and wait for a later release.
|
||||||
|
|
||||||
|
## Related
|
||||||
|
|
||||||
|
- Install notes: `20-Projects/Personal-Tasks.md` — "Install claude-mem plugin on MajorMac — 2026-04-15"
|
||||||
|
- Config file: `~/.claude-mem/settings.json`
|
||||||
|
- Logs: `~/.claude-mem/logs/claude-mem-YYYY-MM-DD.log`
|
||||||
|
- DB: `~/.claude-mem/claude-mem.db` (SQLite, FTS5 enabled)
|
||||||
@@ -0,0 +1,84 @@
|
|||||||
|
---
|
||||||
|
title: "Cron Heartbeat False Alarm: /var/run Cleared by Reboot"
|
||||||
|
domain: troubleshooting
|
||||||
|
category: general
|
||||||
|
tags:
|
||||||
|
- cron
|
||||||
|
- systemd
|
||||||
|
- tmpfs
|
||||||
|
- monitoring
|
||||||
|
- backups
|
||||||
|
- heartbeat
|
||||||
|
status: published
|
||||||
|
created: 2026-04-13
|
||||||
|
updated: 2026-04-13T10:10
|
||||||
|
---
|
||||||
|
# Cron Heartbeat False Alarm: /var/run Cleared by Reboot
|
||||||
|
|
||||||
|
If a cron-driven watchdog emails you that a job "may never have run" — but the job's log clearly shows it completed successfully — check whether the heartbeat file lives under `/var/run` (or `/run`). On most modern Linux distros, `/run` is a **tmpfs** and is wiped on every reboot. Any file there survives only until the next boot.
|
||||||
|
|
||||||
|
## Symptoms
|
||||||
|
|
||||||
|
- A heartbeat-based watchdog fires a missing-heartbeat or stale-heartbeat alert
|
||||||
|
- The job the watchdog is monitoring actually ran successfully — its log file shows a clean completion long before the alert fired
|
||||||
|
- The host was rebooted between when the job wrote its heartbeat and when the watchdog checked it
|
||||||
|
- `stat /var/run/<your-heartbeat>` returns `No such file or directory`
|
||||||
|
- `readlink -f /var/run` returns `/run`, and `mount | grep ' /run '` shows `tmpfs`
|
||||||
|
|
||||||
|
## Why It Happens
|
||||||
|
|
||||||
|
Systemd distros mount `/run` as a tmpfs for runtime state. `/var/run` is kept only as a compatibility symlink to `/run`. The whole filesystem is memory-backed: when the host reboots, every file under `/run` vanishes unless a `tmpfiles.d` rule explicitly recreates it. The convention is that only things like PID files and sockets — state that is meaningful **only for the current boot** — should live there.
|
||||||
|
|
||||||
|
A daily backup or maintenance job that touches a heartbeat file to prove it ran is *not* boot-scoped state. If the job runs at 03:00, the host reboots at 07:00 for a kernel update, and a watchdog checks the heartbeat at 08:00, the watchdog sees nothing — even though the job ran four hours earlier and exited 0.
|
||||||
|
|
||||||
|
The common mitigation of checking the heartbeat's mtime against a max age (e.g. "alert if older than 25h") does **not** protect against this. It catches stale heartbeats from real failures, but a deleted file has no mtime to compare.
|
||||||
|
|
||||||
|
## Fix
|
||||||
|
|
||||||
|
Move the heartbeat out of tmpfs and into a persistent directory. Good options:
|
||||||
|
|
||||||
|
- `/var/lib/<service>/heartbeat` — canonical home for persistent service state
|
||||||
|
- `/var/log/<service>-heartbeat` — acceptable if you want it alongside existing logs
|
||||||
|
- Any path on a real disk-backed filesystem
|
||||||
|
|
||||||
|
Both the writer (the monitored job) and the reader (the watchdog) need to agree on the new path. Make sure the parent directory exists before the first write:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
HEARTBEAT="/var/lib/myservice/heartbeat"
|
||||||
|
mkdir -p "$(dirname "$HEARTBEAT")"
|
||||||
|
# ... later, on success:
|
||||||
|
touch "$HEARTBEAT"
|
||||||
|
```
|
||||||
|
|
||||||
|
The `mkdir -p` is cheap to run unconditionally and avoids a first-run-after-deploy edge case where the directory hasn't been created yet.
|
||||||
|
|
||||||
|
## Verification
|
||||||
|
|
||||||
|
After deploying the fix:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# 1. Run the monitored job manually (or wait for its next scheduled run)
|
||||||
|
sudo bash /path/to/monitored-job.sh
|
||||||
|
|
||||||
|
# 2. Confirm the heartbeat was created on persistent storage
|
||||||
|
ls -la /var/lib/myservice/heartbeat
|
||||||
|
|
||||||
|
# 3. Reboot and re-check — the file should survive
|
||||||
|
sudo reboot
|
||||||
|
# ... after reboot ...
|
||||||
|
ls -la /var/lib/myservice/heartbeat # still there, mtime unchanged
|
||||||
|
|
||||||
|
# 4. Run the watchdog manually to confirm it passes
|
||||||
|
sudo bash /path/to/watchdog.sh
|
||||||
|
```
|
||||||
|
|
||||||
|
## Why Not Use `tmpfiles.d` Instead
|
||||||
|
|
||||||
|
systemd-tmpfiles can recreate files in `/run` at boot via a `f /run/<name> 0644 root root - -` entry. That works, but it's the wrong tool for this problem: a boot-created empty file has the boot time as its mtime, which defeats the watchdog's age check. The watchdog would see a fresh heartbeat after every reboot even if the monitored job hasn't actually run in days.
|
||||||
|
|
||||||
|
Keep `/run` for true runtime state (PIDs, sockets, locks). Put success markers on persistent storage.
|
||||||
|
|
||||||
|
## Related
|
||||||
|
|
||||||
|
- [Docker & Caddy Recovery After Reboot (Fedora + SELinux)](docker-caddy-selinux-post-reboot-recovery.md) — another class of post-reboot surprise
|
||||||
|
- [rsync Backup Patterns](../02-selfhosting/storage-backup/rsync-backup-patterns.md) — reusable backup script patterns
|
||||||
@@ -1,3 +1,12 @@
|
|||||||
|
---
|
||||||
|
title: "Docker & Caddy Recovery After Reboot (Fedora + SELinux)"
|
||||||
|
domain: troubleshooting
|
||||||
|
category: general
|
||||||
|
tags: [docker, caddy, selinux, fedora, reboot, majorlab]
|
||||||
|
status: published
|
||||||
|
created: 2026-04-02
|
||||||
|
updated: 2026-04-02
|
||||||
|
---
|
||||||
# Docker & Caddy Recovery After Reboot (Fedora + SELinux)
|
# Docker & Caddy Recovery After Reboot (Fedora + SELinux)
|
||||||
|
|
||||||
## 🛑 Problem
|
## 🛑 Problem
|
||||||
|
|||||||
84
05-troubleshooting/docker/n8n-proxy-trust-x-forwarded-for.md
Normal file
84
05-troubleshooting/docker/n8n-proxy-trust-x-forwarded-for.md
Normal file
@@ -0,0 +1,84 @@
|
|||||||
|
---
|
||||||
|
title: "n8n Behind Reverse Proxy: X-Forwarded-For Trust Fix"
|
||||||
|
domain: troubleshooting
|
||||||
|
category: docker
|
||||||
|
tags: [n8n, caddy, reverse-proxy, docker, express]
|
||||||
|
status: published
|
||||||
|
created: 2026-04-02
|
||||||
|
updated: 2026-04-02
|
||||||
|
---
|
||||||
|
# n8n Behind Reverse Proxy: X-Forwarded-For Trust Fix
|
||||||
|
|
||||||
|
## The Problem
|
||||||
|
|
||||||
|
When running n8n behind a reverse proxy (Caddy, Nginx, Traefik), the logs fill with:
|
||||||
|
|
||||||
|
```
|
||||||
|
ValidationError: The 'X-Forwarded-For' header is set but the Express 'trust proxy' setting is false (default).
|
||||||
|
This could indicate a misconfiguration which would prevent express-rate-limit from accurately identifying users.
|
||||||
|
```
|
||||||
|
|
||||||
|
This means n8n's Express rate limiter sees every request as coming from the proxy's internal IP, not the real client. Rate limiting and audit logging both break.
|
||||||
|
|
||||||
|
## Why `N8N_TRUST_PROXY=true` Isn't Enough
|
||||||
|
|
||||||
|
Older n8n versions accepted `N8N_TRUST_PROXY=true` to trust proxy headers. Newer versions (1.x+) use Express's `trust proxy` setting, which requires knowing *how many* proxy hops to trust. Without `N8N_PROXY_HOPS`, Express ignores the `X-Forwarded-For` header entirely even if `N8N_TRUST_PROXY=true` is set.
|
||||||
|
|
||||||
|
## The Fix
|
||||||
|
|
||||||
|
Add `N8N_PROXY_HOPS=1` to your n8n environment:
|
||||||
|
|
||||||
|
### Docker Compose
|
||||||
|
|
||||||
|
```yaml
|
||||||
|
services:
|
||||||
|
n8n:
|
||||||
|
image: docker.n8n.io/n8nio/n8n:latest
|
||||||
|
environment:
|
||||||
|
- N8N_HOST=n8n.example.com
|
||||||
|
- N8N_PROTOCOL=https
|
||||||
|
- N8N_TRUST_PROXY=true
|
||||||
|
- N8N_PROXY_HOPS=1 # <-- Add this
|
||||||
|
```
|
||||||
|
|
||||||
|
Set `N8N_PROXY_HOPS` to the number of reverse proxies between the client and n8n:
|
||||||
|
- **1** — single proxy (Caddy/Nginx directly in front of n8n)
|
||||||
|
- **2** — two proxies (e.g., Cloudflare → Caddy → n8n)
|
||||||
|
|
||||||
|
### Recreate the Container
|
||||||
|
|
||||||
|
```bash
|
||||||
|
cd /opt/n8n # or wherever your compose file lives
|
||||||
|
docker compose down
|
||||||
|
docker compose up -d
|
||||||
|
```
|
||||||
|
|
||||||
|
If you get a container name conflict:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
docker rm -f n8n-n8n-1
|
||||||
|
docker compose up -d
|
||||||
|
```
|
||||||
|
|
||||||
|
## Verifying the Fix
|
||||||
|
|
||||||
|
Check the logs after restart:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
docker logs --since 5m n8n-n8n-1 2>&1 | grep -i "forwarded\|proxy\|ValidationError"
|
||||||
|
```
|
||||||
|
|
||||||
|
If the fix worked, there should be zero `ValidationError` lines. A clean startup looks like:
|
||||||
|
|
||||||
|
```
|
||||||
|
n8n ready on ::, port 5678
|
||||||
|
Version: 2.14.2
|
||||||
|
Editor is now accessible via:
|
||||||
|
https://n8n.example.com
|
||||||
|
```
|
||||||
|
|
||||||
|
## Key Notes
|
||||||
|
|
||||||
|
- Keep `N8N_TRUST_PROXY=true` alongside `N8N_PROXY_HOPS` — both are needed.
|
||||||
|
- The `mount of type volume should not define bind option` warning from Docker Compose when using `:z` (SELinux) volume labels is cosmetic and can be ignored.
|
||||||
|
- If n8n reports "Last session crashed" after a `docker rm -f` recreation, this is expected — the old container was force-killed, so n8n sees it as a crash. It recovers automatically.
|
||||||
@@ -0,0 +1,82 @@
|
|||||||
|
---
|
||||||
|
title: "Nextcloud AIO Container Unhealthy for 20 Hours After Nightly Update"
|
||||||
|
domain: troubleshooting
|
||||||
|
category: docker
|
||||||
|
tags: [nextcloud, docker, healthcheck, netdata, php-fpm, aio]
|
||||||
|
status: published
|
||||||
|
created: 2026-03-28
|
||||||
|
updated: 2026-03-28
|
||||||
|
---
|
||||||
|
|
||||||
|
# Nextcloud AIO Container Unhealthy for 20 Hours After Nightly Update
|
||||||
|
|
||||||
|
## Symptom
|
||||||
|
|
||||||
|
Netdata alert `docker_nextcloud_unhealthy` fired on majorlab and stayed in Warning for 20 hours. The `nextcloud-aio-nextcloud` container was running but its Docker healthcheck kept failing. No user-facing errors were visible in `nextcloud.log`.
|
||||||
|
|
||||||
|
## Investigation
|
||||||
|
|
||||||
|
### Timeline (2026-03-27, all UTC)
|
||||||
|
|
||||||
|
| Time | Event |
|
||||||
|
|---|---|
|
||||||
|
| 04:00 | Nightly backup script started, mastercontainer update kicked off |
|
||||||
|
| 04:03 | `nextcloud-aio-nextcloud` container recreated |
|
||||||
|
| 04:05 | Backup finished |
|
||||||
|
| 07:25 | Mastercontainer logged "Initial startup of Nextcloud All-in-One complete!" (3h20m delay) |
|
||||||
|
| 10:22 | First entry in `nextcloud.log` (deprecation warnings only — no errors) |
|
||||||
|
| 04:00 (Mar 28) | Next nightly backup replaced the container; new container came up healthy in ~25 minutes |
|
||||||
|
|
||||||
|
### Key findings
|
||||||
|
|
||||||
|
- **No image update** — the container image dated to Feb 26, so this was not caused by a version change.
|
||||||
|
- **No app-level errors** — `nextcloud.log` contained only `files_rightclick` deprecation warnings (level 3). No level 2/4 entries.
|
||||||
|
- **PHP-FPM never stabilized** — the healthcheck (`/healthcheck.sh`) tests `nc -z 127.0.0.1 9000` (PHP-FPM). The container was running but FPM wasn't responding to the port check.
|
||||||
|
- **6-hour log gap** — no `nextcloud.log` entries between container start (04:03) and first log (10:22), suggesting the AIO init scripts (occ upgrade, app updates, cron jobs) ran for hours before the app became partially responsive.
|
||||||
|
- **RestartCount: 0** — the container never restarted on its own. It sat there unhealthy for the full 20 hours.
|
||||||
|
- **Disk space fine** — 40% used on `/`.
|
||||||
|
|
||||||
|
### Healthcheck details
|
||||||
|
|
||||||
|
```bash
|
||||||
|
#!/bin/bash
|
||||||
|
# /healthcheck.sh inside nextcloud-aio-nextcloud
|
||||||
|
nc -z "$POSTGRES_HOST" "$POSTGRES_PORT" || exit 0 # postgres down = pass (graceful)
|
||||||
|
nc -z 127.0.0.1 9000 || exit 1 # PHP-FPM down = fail
|
||||||
|
```
|
||||||
|
|
||||||
|
If PostgreSQL is unreachable, the check passes (exits 0). The only failure path is PHP-FPM not listening on port 9000.
|
||||||
|
|
||||||
|
## Root Cause
|
||||||
|
|
||||||
|
The AIO nightly update cycle recreated the container, but the startup/migration process hung or ran extremely long, preventing PHP-FPM from fully initializing. The container sat in this state for 20 hours with no self-recovery mechanism until the next nightly cycle replaced it.
|
||||||
|
|
||||||
|
The exact migration or occ command that stalled could not be confirmed — the old container's entrypoint logs were lost when the Mar 28 backup cycle replaced it.
|
||||||
|
|
||||||
|
## Fix
|
||||||
|
|
||||||
|
Two changes deployed on 2026-03-28:
|
||||||
|
|
||||||
|
### 1. Dedicated Netdata alarm with lenient window
|
||||||
|
|
||||||
|
Split `nextcloud-aio-nextcloud` into its own Netdata alarm (`docker_nextcloud_unhealthy`) with a 10-minute lookup and 10-minute delay, separate from the general container alarm. See [Tuning Netdata Docker Health Alarms](../../02-selfhosting/monitoring/netdata-docker-health-alarm-tuning.md).
|
||||||
|
|
||||||
|
### 2. Watchdog cron for auto-restart
|
||||||
|
|
||||||
|
Deployed `/etc/cron.d/nextcloud-health-watchdog` on majorlab:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
*/15 * * * * root docker inspect --format={{.State.Health.Status}} nextcloud-aio-nextcloud 2>/dev/null | grep -q unhealthy && [ "$(docker inspect --format={{.State.StartedAt}} nextcloud-aio-nextcloud | xargs -I{} date -d {} +\%s)" -lt "$(date -d "1 hour ago" +\%s)" ] && docker restart nextcloud-aio-nextcloud && logger -t nextcloud-watchdog "Restarted unhealthy nextcloud-aio-nextcloud"
|
||||||
|
```
|
||||||
|
|
||||||
|
- Checks every 15 minutes
|
||||||
|
- Only restarts if the container has been running >1 hour (avoids interfering with normal startup)
|
||||||
|
- Logs to syslog: `journalctl -t nextcloud-watchdog`
|
||||||
|
|
||||||
|
This caps future unhealthy outages at ~1 hour instead of persisting until the next nightly cycle.
|
||||||
|
|
||||||
|
## See Also
|
||||||
|
|
||||||
|
- [Tuning Netdata Docker Health Alarms](../../02-selfhosting/monitoring/netdata-docker-health-alarm-tuning.md)
|
||||||
|
- [Debugging Broken Docker Containers](../../02-selfhosting/docker/debugging-broken-docker-containers.md)
|
||||||
|
- [Docker Healthchecks](../../02-selfhosting/docker/docker-healthchecks.md)
|
||||||
141
05-troubleshooting/fedora-networking-kernel-recovery.md
Normal file
141
05-troubleshooting/fedora-networking-kernel-recovery.md
Normal file
@@ -0,0 +1,141 @@
|
|||||||
|
---
|
||||||
|
title: "Fedora Networking & Kernel Troubleshooting"
|
||||||
|
domain: troubleshooting
|
||||||
|
category: networking
|
||||||
|
tags: [fedora, networking, kernel, grub, nmcli, troubleshooting]
|
||||||
|
status: published
|
||||||
|
created: 2026-04-02
|
||||||
|
updated: 2026-04-02
|
||||||
|
---
|
||||||
|
|
||||||
|
# Fedora Networking & Kernel Troubleshooting
|
||||||
|
|
||||||
|
Two common issues on the MajorsHouse Fedora fleet (majorlab, majorhome): network connectivity dropping after updates or reboots, and kernel upgrades that break things. These are the quick fixes and the deeper recovery paths.
|
||||||
|
|
||||||
|
## Networking Drops After Reboot or Update
|
||||||
|
|
||||||
|
### Quick Fix
|
||||||
|
|
||||||
|
If a Fedora box loses network connectivity after a reboot or `dnf upgrade`, NetworkManager may not have brought the connection back up automatically:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
nmcli connection up "Wired connection 1"
|
||||||
|
```
|
||||||
|
|
||||||
|
This re-activates the default wired connection. If the connection name differs on your system:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# List all known connections
|
||||||
|
nmcli connection show
|
||||||
|
|
||||||
|
# Bring up by name
|
||||||
|
nmcli connection up "your-connection-name"
|
||||||
|
```
|
||||||
|
|
||||||
|
### Why This Happens
|
||||||
|
|
||||||
|
- NetworkManager may not auto-activate a connection if it was configured as manual or if the profile was reset during an upgrade.
|
||||||
|
- Kernel updates can temporarily break network drivers, especially on hardware with out-of-tree modules. The new kernel loads, the old driver doesn't match, and the NIC doesn't come up.
|
||||||
|
- On headless servers (like majorlab and majorhome), there's no desktop network applet to reconnect — it stays down until you fix it via console or IPMI.
|
||||||
|
|
||||||
|
### Make It Persistent
|
||||||
|
|
||||||
|
Ensure the connection auto-activates on boot:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# Check current autoconnect setting
|
||||||
|
nmcli connection show "Wired connection 1" | grep autoconnect
|
||||||
|
|
||||||
|
# Enable if not set
|
||||||
|
nmcli connection modify "Wired connection 1" connection.autoconnect yes
|
||||||
|
```
|
||||||
|
|
||||||
|
## Kernel Issues — Booting an Older Kernel
|
||||||
|
|
||||||
|
When a new kernel causes problems (network, storage, GPU, or boot failures), boot into the previous working kernel via GRUB.
|
||||||
|
|
||||||
|
### At the GRUB Menu
|
||||||
|
|
||||||
|
1. Reboot the machine.
|
||||||
|
2. Hold **Shift** (BIOS) or press **Esc** (UEFI) to show the GRUB menu.
|
||||||
|
3. Select **Advanced options** or an older kernel entry.
|
||||||
|
4. Boot into the working kernel.
|
||||||
|
|
||||||
|
### From the Command Line (Headless)
|
||||||
|
|
||||||
|
If you have console access but no GRUB menu:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# List installed kernels
|
||||||
|
sudo grubby --info=ALL | grep -E "^(index|kernel|title)"
|
||||||
|
|
||||||
|
# Set the previous kernel as default (by index)
|
||||||
|
sudo grubby --set-default-index=1
|
||||||
|
|
||||||
|
# Or set by kernel path
|
||||||
|
sudo grubby --set-default=/boot/vmlinuz-6.19.9-200.fc43.x86_64
|
||||||
|
|
||||||
|
# Reboot into it
|
||||||
|
sudo reboot
|
||||||
|
```
|
||||||
|
|
||||||
|
### Remove a Bad Kernel
|
||||||
|
|
||||||
|
Once you've confirmed the older kernel works:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# Remove the broken kernel
|
||||||
|
sudo dnf remove kernel-core-6.19.10-200.fc43.x86_64
|
||||||
|
|
||||||
|
# Verify GRUB updated
|
||||||
|
sudo grubby --default-kernel
|
||||||
|
```
|
||||||
|
|
||||||
|
### Prevent Auto-Updates From Reinstalling It
|
||||||
|
|
||||||
|
If the same kernel version keeps coming back and keeps breaking:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# Temporarily exclude it from updates
|
||||||
|
sudo dnf upgrade --exclude=kernel*
|
||||||
|
|
||||||
|
# Or pin in dnf.conf
|
||||||
|
echo "excludepkgs=kernel*" | sudo tee -a /etc/dnf/dnf.conf
|
||||||
|
```
|
||||||
|
|
||||||
|
Remove the exclusion once a fixed kernel version is released.
|
||||||
|
|
||||||
|
## Quick Diagnostic Commands
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# Check current kernel
|
||||||
|
uname -r
|
||||||
|
|
||||||
|
# Check network status
|
||||||
|
nmcli general status
|
||||||
|
nmcli device status
|
||||||
|
ip addr show
|
||||||
|
|
||||||
|
# Check if NetworkManager is running
|
||||||
|
systemctl status NetworkManager
|
||||||
|
|
||||||
|
# Check recent kernel/network errors
|
||||||
|
journalctl -b -p err | grep -iE "kernel|network|eth|ens|nm"
|
||||||
|
|
||||||
|
# Check which kernels are installed
|
||||||
|
rpm -qa kernel-core | sort -V
|
||||||
|
```
|
||||||
|
|
||||||
|
## Gotchas & Notes
|
||||||
|
|
||||||
|
- **Always have console access** (IPMI, physical KVM, or Proxmox console) for headless servers before doing kernel updates. If the new kernel breaks networking, SSH won't save you.
|
||||||
|
- **Fedora keeps 3 kernels by default** (`installonly_limit=3` in `/etc/dnf/dnf.conf`). If you need more fallback options, increase this number before upgrading.
|
||||||
|
- **Test kernel updates on one server first.** Update majorlab, confirm it survives a reboot, then update majorhome.
|
||||||
|
- **`grubby` is Fedora's preferred tool** for managing GRUB entries. Avoid editing `grub.cfg` directly.
|
||||||
|
|
||||||
|
Reference: [Fedora — Working with the GRUB 2 Boot Loader](https://docs.fedoraproject.org/en-US/fedora/latest/system-administrators-guide/kernel-module-driver-configuration/Working_with_the_GRUB_2_Boot_Loader/)
|
||||||
|
|
||||||
|
## See Also
|
||||||
|
|
||||||
|
- [docker-caddy-selinux-post-reboot-recovery](docker-caddy-selinux-post-reboot-recovery.md)
|
||||||
|
- [managing-linux-services-systemd-ansible](../01-linux/process-management/managing-linux-services-systemd-ansible.md)
|
||||||
56
05-troubleshooting/gemini-cli-manual-update.md
Normal file
56
05-troubleshooting/gemini-cli-manual-update.md
Normal file
@@ -0,0 +1,56 @@
|
|||||||
|
---
|
||||||
|
title: "Gemini CLI: Manual Update Guide"
|
||||||
|
domain: troubleshooting
|
||||||
|
category: general
|
||||||
|
tags: [gemini, cli, npm, update, google]
|
||||||
|
status: published
|
||||||
|
created: 2026-04-02
|
||||||
|
updated: 2026-04-02
|
||||||
|
---
|
||||||
|
# 🛠️ Gemini CLI: Manual Update Guide
|
||||||
|
|
||||||
|
If the automatic update fails or you need to force a specific version of the Gemini CLI, use these steps.
|
||||||
|
|
||||||
|
## 🔴 Symptom: Automatic Update Failed
|
||||||
|
You may see an error message like:
|
||||||
|
`✕ Automatic update failed. Please try updating manually`
|
||||||
|
|
||||||
|
## 🟢 Manual Update Procedure
|
||||||
|
|
||||||
|
### 1. Verify Current Version
|
||||||
|
Check the version currently installed on your system:
|
||||||
|
```bash
|
||||||
|
gemini --version
|
||||||
|
```
|
||||||
|
|
||||||
|
### 2. Check Latest Version
|
||||||
|
Query the npm registry for the latest available version:
|
||||||
|
```bash
|
||||||
|
npm show @google/gemini-cli version
|
||||||
|
```
|
||||||
|
|
||||||
|
### 3. Perform Manual Update
|
||||||
|
Use `npm` with `sudo` to update the global package:
|
||||||
|
```bash
|
||||||
|
sudo npm install -g @google/gemini-cli@latest
|
||||||
|
```
|
||||||
|
|
||||||
|
### 4. Confirm Update
|
||||||
|
Verify that the new version is active:
|
||||||
|
```bash
|
||||||
|
gemini --version
|
||||||
|
```
|
||||||
|
|
||||||
|
## 🛠️ Troubleshooting Update Failures
|
||||||
|
|
||||||
|
### Permissions Issues
|
||||||
|
If you encounter `EACCES` errors without `sudo`, ensure your user has permissions or use `sudo` as shown above.
|
||||||
|
|
||||||
|
### Registry Connectivity
|
||||||
|
If `npm` cannot reach the registry, check your internet connection or any local firewall/proxy settings.
|
||||||
|
|
||||||
|
### Cache Issues
|
||||||
|
If the version doesn't update, try clearing the npm cache:
|
||||||
|
```bash
|
||||||
|
npm cache clean --force
|
||||||
|
```
|
||||||
106
05-troubleshooting/ghost-emailanalytics-lag-warning.md
Normal file
106
05-troubleshooting/ghost-emailanalytics-lag-warning.md
Normal file
@@ -0,0 +1,106 @@
|
|||||||
|
---
|
||||||
|
title: Ghost EmailAnalytics Lag Warning — What It Means and When to Worry
|
||||||
|
domain: selfhosting
|
||||||
|
category: troubleshooting
|
||||||
|
tags:
|
||||||
|
- ghost
|
||||||
|
- email
|
||||||
|
- mailgun
|
||||||
|
- emailanalytics
|
||||||
|
- docker
|
||||||
|
- troubleshooting
|
||||||
|
status: published
|
||||||
|
created: 2026-04-18
|
||||||
|
updated: 2026-04-18T11:13
|
||||||
|
---
|
||||||
|
# Ghost EmailAnalytics Lag Warning — What It Means and When to Worry
|
||||||
|
|
||||||
|
## The Warning
|
||||||
|
|
||||||
|
Ghost logs a recurring warning every 5 minutes when its EmailAnalytics job falls behind:
|
||||||
|
|
||||||
|
```
|
||||||
|
WARN [EmailAnalytics] Opened events processing is 738.0 minutes behind (threshold: 30)
|
||||||
|
```
|
||||||
|
|
||||||
|
This is followed by:
|
||||||
|
|
||||||
|
```
|
||||||
|
INFO [EmailAnalytics] Job complete - No events
|
||||||
|
INFO [EmailAnalytics] Skipping fetchMissing because end (...) is before begin (...)
|
||||||
|
```
|
||||||
|
|
||||||
|
The counter increments by 5 with every cycle. On a small newsletter, it will grow indefinitely and never reset on its own — until a subscriber opens an email or a new newsletter is sent.
|
||||||
|
|
||||||
|
## Why It Happens
|
||||||
|
|
||||||
|
Ghost's EmailAnalytics polls Mailgun every 5 minutes for new "opened" events. The cursor is anchored to the timestamp of the last email delivery. If no new opened events arrive from Mailgun, the cursor doesn't advance and the lag counter grows.
|
||||||
|
|
||||||
|
This is **expected behavior** when:
|
||||||
|
- All subscribers have already opened (their open was recorded)
|
||||||
|
- One or more subscribers have not opened the email and haven't opened any subsequent emails
|
||||||
|
- There are no new emails to send
|
||||||
|
|
||||||
|
The lag counter = time since the last opened event was recorded, not time since the last email was sent.
|
||||||
|
|
||||||
|
## The `fetchMissing end == begin` Skip
|
||||||
|
|
||||||
|
```
|
||||||
|
INFO [EmailAnalytics] Skipping fetchMissing because end (Fri Apr 17 2026 15:44:57 ...) is before begin (Fri Apr 17 2026 15:44:57 ...)
|
||||||
|
```
|
||||||
|
|
||||||
|
This fires when the cursor window collapses to zero width — the start and end of the query window are identical. Ghost's guard clause skips a nonsensical zero-width Mailgun API call. This is not a bug or data loss — it's a safety check.
|
||||||
|
|
||||||
|
## What `status: submitted` Means
|
||||||
|
|
||||||
|
In Ghost's `emails` database table, all successfully sent newsletters show `status: submitted`. This is the normal terminal state after Ghost hands the email batch off to Mailgun. There is no `status: sent` — `submitted` = success.
|
||||||
|
|
||||||
|
You can verify delivery success by checking the counts:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
docker exec <db-container> mysql -u root -p<password> ghost \
|
||||||
|
-e "SELECT subject, status, email_count, delivered_count, opened_count, failed_count FROM emails ORDER BY created_at DESC LIMIT 5;"
|
||||||
|
```
|
||||||
|
|
||||||
|
A healthy result: `email_count == delivered_count`, `failed_count == 0`, regardless of `opened_count`.
|
||||||
|
|
||||||
|
## When to Actually Worry
|
||||||
|
|
||||||
|
The lag warning is **benign** in these cases:
|
||||||
|
- `delivered_count == email_count` (all emails delivered)
|
||||||
|
- `failed_count == 0`
|
||||||
|
- Mailgun domain state is active
|
||||||
|
- The warning appeared after a successful send and has been growing since
|
||||||
|
|
||||||
|
Investigate further if:
|
||||||
|
- `delivered_count < email_count` — some emails never left Mailgun
|
||||||
|
- `failed_count > 0` — delivery failures
|
||||||
|
- The warning appeared immediately after a Ghost upgrade or Mailgun credential change
|
||||||
|
- Mailgun Events API shows 0 delivered events (not just 0 opened events) for the send window
|
||||||
|
|
||||||
|
## Checking Mailgun Directly
|
||||||
|
|
||||||
|
If you suspect the lag reflects a real delivery problem, query Mailgun's Events API:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# Check for delivered events in the send window
|
||||||
|
curl -s --user "api:<your-mailgun-api-key>" \
|
||||||
|
"https://api.mailgun.net/v3/<your-domain>/events?event=delivered&begin=<RFC2822-timestamp>&limit=10" \
|
||||||
|
| python3 -m json.tool | grep -E "event|recipient|timestamp" | head -30
|
||||||
|
```
|
||||||
|
|
||||||
|
If delivered events appear for your subscribers, Mailgun is working and the lag warning is purely cosmetic.
|
||||||
|
|
||||||
|
## How It Resolves
|
||||||
|
|
||||||
|
The lag warning self-resolves when:
|
||||||
|
1. **A subscriber opens an email** — Mailgun returns an "opened" event, the cursor advances, lag resets
|
||||||
|
2. **A new newsletter is sent** — the send triggers a fresh analytics cycle, cursor jumps forward
|
||||||
|
3. **Manually resetting the cursor** — possible via direct DB update, but not recommended unless you understand the implications for analytics continuity
|
||||||
|
|
||||||
|
For small newsletters (2–10 subscribers) where one subscriber consistently doesn't open emails, the warning is permanent background noise between sends. It does not indicate data loss or misconfiguration.
|
||||||
|
|
||||||
|
## See Also
|
||||||
|
|
||||||
|
- [ghost-smtp-mailgun-setup](../02-selfhosting/services/ghost-smtp-mailgun-setup.md)
|
||||||
|
- [debugging-broken-docker-containers](../02-selfhosting/docker/debugging-broken-docker-containers.md)
|
||||||
93
05-troubleshooting/gitea-runner-boot-race-network-target.md
Normal file
93
05-troubleshooting/gitea-runner-boot-race-network-target.md
Normal file
@@ -0,0 +1,93 @@
|
|||||||
|
---
|
||||||
|
title: "Gitea Actions Runner: Boot Race Condition Fix"
|
||||||
|
domain: troubleshooting
|
||||||
|
category: general
|
||||||
|
tags: [gitea, systemd, boot, dns, ci-cd]
|
||||||
|
status: published
|
||||||
|
created: 2026-04-02
|
||||||
|
updated: 2026-04-02
|
||||||
|
---
|
||||||
|
# Gitea Actions Runner: Boot Race Condition Fix
|
||||||
|
|
||||||
|
If your `gitea-runner` (act_runner) service fails to start on boot — crash-looping and eventually hitting systemd's restart rate limit — the service is likely starting before DNS is available.
|
||||||
|
|
||||||
|
## Symptoms
|
||||||
|
|
||||||
|
- `gitea-runner.service` enters a crash loop on boot
|
||||||
|
- `journalctl -u gitea-runner` shows connection/DNS errors on startup:
|
||||||
|
```
|
||||||
|
dial tcp: lookup git.example.com: no such host
|
||||||
|
```
|
||||||
|
or similar resolution failures
|
||||||
|
- Service eventually stops retrying (systemd restart rate limit reached)
|
||||||
|
- `systemctl status gitea-runner` shows `(Result: start-limit-hit)` after reboot
|
||||||
|
- Service works fine if started manually after boot completes
|
||||||
|
|
||||||
|
## Why It Happens
|
||||||
|
|
||||||
|
`After=network.target` only guarantees that the network **interfaces are configured** — not that DNS resolution is functional. systemd-resolved (or your local resolver) starts slightly later. `act_runner` tries to connect to the Gitea instance by hostname on startup, the DNS lookup fails, and the process exits.
|
||||||
|
|
||||||
|
With the default `Restart=always` and no `RestartSec`, systemd restarts the service immediately. After 5 rapid failures within the default burst window (10 attempts in 2 minutes), systemd hits the rate limit and stops restarting.
|
||||||
|
|
||||||
|
## Fix
|
||||||
|
|
||||||
|
### 1. Update the Service File
|
||||||
|
|
||||||
|
Edit `/etc/systemd/system/gitea-runner.service`:
|
||||||
|
|
||||||
|
```ini
|
||||||
|
[Unit]
|
||||||
|
Description=Gitea Actions Runner
|
||||||
|
After=network-online.target
|
||||||
|
Wants=network-online.target
|
||||||
|
|
||||||
|
[Service]
|
||||||
|
User=deploy
|
||||||
|
WorkingDirectory=/opt/gitea-runner
|
||||||
|
ExecStart=/opt/gitea-runner/act_runner daemon
|
||||||
|
Restart=always
|
||||||
|
RestartSec=10
|
||||||
|
|
||||||
|
[Install]
|
||||||
|
WantedBy=multi-user.target
|
||||||
|
```
|
||||||
|
|
||||||
|
Key changes:
|
||||||
|
- `After=network-online.target` + `Wants=network-online.target` — waits for full network stack including DNS
|
||||||
|
- `RestartSec=10` — adds a 10-second delay between restart attempts, preventing rapid failure bursts from hitting the rate limit
|
||||||
|
|
||||||
|
### 2. Add a Local /etc/hosts Entry (Optional but Recommended)
|
||||||
|
|
||||||
|
If your Gitea instance is on the same local network or reachable via Tailscale, add an entry to `/etc/hosts` so act_runner can resolve it without depending on external DNS:
|
||||||
|
|
||||||
|
```
|
||||||
|
127.0.0.1 git.example.com
|
||||||
|
```
|
||||||
|
|
||||||
|
Replace `git.example.com` with your Gitea hostname and the IP with the correct local address. This makes resolution instantaneous and eliminates the DNS dependency entirely for startup.
|
||||||
|
|
||||||
|
### 3. Reload and Restart
|
||||||
|
|
||||||
|
```bash
|
||||||
|
sudo systemctl daemon-reload
|
||||||
|
sudo systemctl restart gitea-runner
|
||||||
|
sudo systemctl status gitea-runner
|
||||||
|
```
|
||||||
|
|
||||||
|
Verify it shows `active (running)` and stays that way. Then reboot and confirm it comes up automatically.
|
||||||
|
|
||||||
|
## Why `network-online.target` and Not `network.target`
|
||||||
|
|
||||||
|
| Target | What it guarantees |
|
||||||
|
|---|---|
|
||||||
|
| `network.target` | Network interfaces are configured (IP assigned) |
|
||||||
|
| `network-online.target` | Network is fully operational (DNS resolvers reachable) |
|
||||||
|
|
||||||
|
Services that need to make outbound network connections (especially DNS lookups) on startup should always use `network-online.target`. This includes: mail servers, monitoring agents, CI runners, anything that connects to an external host by name.
|
||||||
|
|
||||||
|
> [!note] `network-online.target` can add a few seconds to boot time since systemd waits for the network stack to fully initialize. For server contexts this is always the right tradeoff.
|
||||||
|
|
||||||
|
## Related
|
||||||
|
|
||||||
|
- [Managing Linux Services with systemd](../01-linux/process-management/managing-linux-services-systemd-ansible.md)
|
||||||
|
- [MajorWiki Setup & Publishing Pipeline](majwiki-setup-and-pipeline.md)
|
||||||
@@ -1,3 +1,12 @@
|
|||||||
|
---
|
||||||
|
title: "Qwen2.5-14B OOM on RTX 3080 Ti (12GB)"
|
||||||
|
domain: troubleshooting
|
||||||
|
category: gpu-display
|
||||||
|
tags: [gpu, vram, oom, qwen, cuda, fine-tuning]
|
||||||
|
status: published
|
||||||
|
created: 2026-04-02
|
||||||
|
updated: 2026-04-02
|
||||||
|
---
|
||||||
# Qwen2.5-14B OOM on RTX 3080 Ti (12GB)
|
# Qwen2.5-14B OOM on RTX 3080 Ti (12GB)
|
||||||
|
|
||||||
## Problem
|
## Problem
|
||||||
@@ -52,7 +61,3 @@ If you have no choice but to try 14B training:
|
|||||||
Keep your NVIDIA drivers and CUDA toolkit updated. On Windows (MajorRig), ensure WSL2 has sufficient memory allocation in `.wslconfig`.
|
Keep your NVIDIA drivers and CUDA toolkit updated. On Windows (MajorRig), ensure WSL2 has sufficient memory allocation in `.wslconfig`.
|
||||||
|
|
||||||
---
|
---
|
||||||
|
|
||||||
## Tags
|
|
||||||
|
|
||||||
#gpu #cuda #oom #qwen #majortwin #llm #fine-tuning
|
|
||||||
|
|||||||
@@ -1,3 +1,7 @@
|
|||||||
|
---
|
||||||
|
created: 2026-03-15T06:37
|
||||||
|
updated: 2026-04-17T10:21
|
||||||
|
---
|
||||||
# 🔧 General Troubleshooting
|
# 🔧 General Troubleshooting
|
||||||
|
|
||||||
Practical fixes for common Linux, networking, and application problems.
|
Practical fixes for common Linux, networking, and application problems.
|
||||||
@@ -7,12 +11,37 @@ Practical fixes for common Linux, networking, and application problems.
|
|||||||
|
|
||||||
## 🌐 Networking & Web
|
## 🌐 Networking & Web
|
||||||
- [Apache Outage: Fail2ban Self-Ban + Missing iptables Rules](networking/fail2ban-self-ban-apache-outage.md)
|
- [Apache Outage: Fail2ban Self-Ban + Missing iptables Rules](networking/fail2ban-self-ban-apache-outage.md)
|
||||||
|
- [Mail Client Stops Receiving: Fail2ban IMAP Self-Ban](networking/fail2ban-imap-self-ban-mail-client.md)
|
||||||
|
- [firewalld: Mail Ports Wiped After Reload](networking/firewalld-mail-ports-reset.md)
|
||||||
|
- [Tailscale SSH: Unexpected Re-Authentication Prompt](networking/tailscale-ssh-reauth-prompt.md)
|
||||||
|
- [Windows OpenSSH: WSL Default Shell Breaks Remote Commands](networking/windows-openssh-wsl-default-shell-breaks-remote-commands.md)
|
||||||
- [ISP SNI Filtering & Caddy](isp-sni-filtering-caddy.md)
|
- [ISP SNI Filtering & Caddy](isp-sni-filtering-caddy.md)
|
||||||
- [yt-dlp YouTube JS Challenge Fix](yt-dlp-fedora-js-challenge.md)
|
- [yt-dlp YouTube JS Challenge Fix](yt-dlp-fedora-js-challenge.md)
|
||||||
|
- [wget/curl: URLs with Special Characters Fail in Bash](wget-url-special-characters.md)
|
||||||
|
|
||||||
|
## ⚙️ Ansible & Fleet Management
|
||||||
|
- [SSH Timeout During dnf upgrade on Fedora Hosts](ansible-ssh-timeout-dnf-upgrade.md)
|
||||||
|
- [Vault Password File Missing](ansible-vault-password-file-missing.md)
|
||||||
|
- [ansible.cfg Ignored on WSL2 Windows Mounts](ansible-wsl2-world-writable-mount-ignores-cfg.md)
|
||||||
|
|
||||||
## 📦 Docker & Systems
|
## 📦 Docker & Systems
|
||||||
- [Docker & Caddy Recovery After Reboot (Fedora + SELinux)](docker-caddy-selinux-post-reboot-recovery.md)
|
- [Docker & Caddy Recovery After Reboot (Fedora + SELinux)](docker-caddy-selinux-post-reboot-recovery.md)
|
||||||
|
- [Gitea Actions Runner: Boot Race Condition Fix](gitea-runner-boot-race-network-target.md)
|
||||||
|
- [Systemd Session Scope Fails at Login (`session-cN.scope`)](systemd/session-scope-failure-at-login.md)
|
||||||
- [MajorWiki Setup & Publishing Pipeline](majwiki-setup-and-pipeline.md)
|
- [MajorWiki Setup & Publishing Pipeline](majwiki-setup-and-pipeline.md)
|
||||||
|
- [Cron Heartbeat False Alarm: /var/run Cleared by Reboot](cron-heartbeat-tmpfs-reboot-false-alarm.md)
|
||||||
|
|
||||||
|
## 🔒 SELinux
|
||||||
|
- [SELinux: Fixing Dovecot Mail Spool Context (/var/vmail)](selinux-dovecot-vmail-context.md)
|
||||||
|
|
||||||
|
## 💾 Storage
|
||||||
|
- [mdadm RAID Recovery After USB Hub Disconnect](storage/mdadm-usb-hub-disconnect-recovery.md)
|
||||||
|
|
||||||
## 📝 Application Specific
|
## 📝 Application Specific
|
||||||
- [Obsidian Vault Recovery — Loading Cache Hang](obsidian-cache-hang-recovery.md)
|
- [Obsidian Vault Recovery — Loading Cache Hang](obsidian-cache-hang-recovery.md)
|
||||||
|
- [Gemini CLI Manual Update](gemini-cli-manual-update.md)
|
||||||
|
|
||||||
|
## 🤖 AI / Local LLM
|
||||||
|
- [Ollama Drops Off Tailscale When Mac Sleeps](ollama-macos-sleep-tailscale-disconnect.md)
|
||||||
|
- [Windows OpenSSH Server (sshd) Stops After Reboot](networking/windows-sshd-stops-after-reboot.md)
|
||||||
|
- [claude-mem Silently Fails with Claude Code 2.1+ (Empty `--setting-sources`)](claude-mem-setting-sources-empty-arg.md)
|
||||||
|
|||||||
Some files were not shown because too many files have changed in this diff Show More
Reference in New Issue
Block a user