Compare commits
32 Commits
main
...
c7c7c9e5be
| Author | SHA1 | Date | |
|---|---|---|---|
| c7c7c9e5be | |||
| 565b37a605 | |||
| f3ea6e98f1 | |||
| 1b801a9590 | |||
| 598e6fa26a | |||
| 335c4b57f2 | |||
| 2830338f6a | |||
| 2cb15887b1 | |||
| a834b8868a | |||
| 76fa4a9313 | |||
| b87b5a8213 | |||
| daba8f80dc | |||
| 84b526ed5a | |||
| 4cf2a8e0a6 | |||
| 016072e972 | |||
| 994c0c9191 | |||
| 21988a2fa9 | |||
| 58cb5e7b2a | |||
| 394d5200ad | |||
| 1790aa771a | |||
| 34aadae03a | |||
| 29333fbe0a | |||
| afae561e7e | |||
| 6e67c2b0b1 | |||
| 01981e0610 | |||
| a689d8203a | |||
| 2861cade55 | |||
|
|
64df4b8cfb | ||
| 70d9657b7f | |||
| c4673f70e0 | |||
| 9d537dec5f | |||
| 639b23f861 |
18
.gitattributes
vendored
Normal file
18
.gitattributes
vendored
Normal file
@@ -0,0 +1,18 @@
|
||||
# Normalize line endings to LF for all text files
|
||||
* text=auto eol=lf
|
||||
|
||||
# Explicitly handle markdown
|
||||
*.md text eol=lf
|
||||
|
||||
# Explicitly handle config files
|
||||
*.yml text eol=lf
|
||||
*.yaml text eol=lf
|
||||
*.json text eol=lf
|
||||
*.toml text eol=lf
|
||||
|
||||
# Binary files — don't touch
|
||||
*.png binary
|
||||
*.jpg binary
|
||||
*.jpeg binary
|
||||
*.gif binary
|
||||
*.pdf binary
|
||||
86
01-linux/distro-specific/wsl2-backup-powershell.md
Normal file
86
01-linux/distro-specific/wsl2-backup-powershell.md
Normal file
@@ -0,0 +1,86 @@
|
||||
---
|
||||
title: WSL2 Backup via PowerShell Scheduled Task
|
||||
domain: linux
|
||||
category: distro-specific
|
||||
tags:
|
||||
- wsl2
|
||||
- windows
|
||||
- backup
|
||||
- powershell
|
||||
- majorrig
|
||||
status: published
|
||||
created: '2026-03-16'
|
||||
updated: '2026-03-16'
|
||||
---
|
||||
|
||||
# WSL2 Backup via PowerShell Scheduled Task
|
||||
|
||||
WSL2 distributions are stored as a VHDX file on disk. Unlike traditional VMs, there's no built-in snapshot or backup mechanism. This article covers a simple weekly backup strategy using `wsl --export` and a PowerShell scheduled task.
|
||||
|
||||
## The Short Answer
|
||||
|
||||
Save this as `C:\Users\majli\Scripts\backup-wsl.ps1` and register it as a weekly scheduled task.
|
||||
|
||||
## Backup Script
|
||||
|
||||
```powershell
|
||||
$BackupDir = "D:\WSL\Backups"
|
||||
$Date = Get-Date -Format "yyyy-MM-dd"
|
||||
$BackupFile = "$BackupDir\FedoraLinux-43-$Date.tar"
|
||||
$MaxBackups = 3
|
||||
|
||||
New-Item -ItemType Directory -Force -Path $BackupDir | Out-Null
|
||||
|
||||
# Must shut down WSL first — export fails if VHDX is locked
|
||||
Write-Host "Shutting down WSL2..."
|
||||
wsl --shutdown
|
||||
Start-Sleep -Seconds 5
|
||||
|
||||
Write-Host "Backing up FedoraLinux-43 to $BackupFile..."
|
||||
wsl --export FedoraLinux-43 $BackupFile
|
||||
|
||||
if ($LASTEXITCODE -eq 0) {
|
||||
Write-Host "Backup complete: $BackupFile"
|
||||
Get-ChildItem "$BackupDir\FedoraLinux-43-*.tar" |
|
||||
Sort-Object LastWriteTime -Descending |
|
||||
Select-Object -Skip $MaxBackups |
|
||||
Remove-Item -Force
|
||||
Write-Host "Cleanup done. Keeping last $MaxBackups backups."
|
||||
} else {
|
||||
Write-Host "ERROR: Backup failed!"
|
||||
}
|
||||
```
|
||||
|
||||
## Register the Scheduled Task
|
||||
|
||||
Run in PowerShell as Administrator:
|
||||
|
||||
```powershell
|
||||
$Action = New-ScheduledTaskAction -Execute "PowerShell.exe" `
|
||||
-Argument "-NonInteractive -File C:\Users\majli\Scripts\backup-wsl.ps1"
|
||||
$Trigger = New-ScheduledTaskTrigger -Weekly -DaysOfWeek Sunday -At 2am
|
||||
$Settings = New-ScheduledTaskSettingsSet -StartWhenAvailable -RunOnlyIfNetworkAvailable:$false
|
||||
Register-ScheduledTask -TaskName "WSL2 Backup - FedoraLinux43" `
|
||||
-Action $Action -Trigger $Trigger -Settings $Settings `
|
||||
-RunLevel Highest -Force
|
||||
```
|
||||
|
||||
## Restore from Backup
|
||||
|
||||
```powershell
|
||||
wsl --unregister FedoraLinux-43
|
||||
wsl --import FedoraLinux-43 D:\WSL\Fedora43 D:\WSL\Backups\FedoraLinux-43-YYYY-MM-DD.tar
|
||||
```
|
||||
|
||||
Then fix the default user — after import WSL resets to root. See [[wsl2-instance-migration-fedora43|WSL2 Instance Migration]] for the `/etc/wsl.conf` fix.
|
||||
|
||||
## Gotchas
|
||||
|
||||
- **`wsl --export` fails with `ERROR_SHARING_VIOLATION` if WSL is running.** The script includes `wsl --shutdown` before export to handle this. Any active WSL sessions will be terminated — schedule the task for a time when WSL is idle (2am works well).
|
||||
- **Backblaze picks up D:\WSL\Backups\ automatically** if D: drive is in scope — provides offsite backup without extra config.
|
||||
- **Each backup tar is ~500MB–1GB** depending on what's installed. Keep MaxBackups at 3 to balance retention vs disk usage.
|
||||
|
||||
## See Also
|
||||
|
||||
- [[wsl2-instance-migration-fedora43|WSL2 Instance Migration]]
|
||||
- [[wsl2-rebuild-fedora43-training-env|WSL2 Training Environment Rebuild]]
|
||||
203
01-linux/distro-specific/wsl2-rebuild-fedora43-training-env.md
Normal file
203
01-linux/distro-specific/wsl2-rebuild-fedora43-training-env.md
Normal file
@@ -0,0 +1,203 @@
|
||||
---
|
||||
title: WSL2 Fedora 43 Training Environment Rebuild
|
||||
domain: linux
|
||||
category: distro-specific
|
||||
tags:
|
||||
- wsl2
|
||||
- fedora
|
||||
- unsloth
|
||||
- pytorch
|
||||
- cuda
|
||||
- majorrig
|
||||
- majortwin
|
||||
status: published
|
||||
created: '2026-03-16'
|
||||
updated: '2026-03-16'
|
||||
---
|
||||
|
||||
# WSL2 Fedora 43 Training Environment Rebuild
|
||||
|
||||
How to rebuild the MajorTwin training environment from scratch on MajorRig after a WSL2 loss. Covers Fedora 43 install, Python 3.11 via pyenv, PyTorch with CUDA, Unsloth, and llama.cpp for GGUF conversion.
|
||||
|
||||
## The Short Answer
|
||||
|
||||
```bash
|
||||
# 1. Install Fedora 43 and move to D:
|
||||
wsl --install -d FedoraLinux-43 --no-launch
|
||||
wsl --export FedoraLinux-43 D:\WSL\fedora43.tar
|
||||
wsl --unregister FedoraLinux-43
|
||||
wsl --import FedoraLinux-43 D:\WSL\Fedora43 D:\WSL\fedora43.tar
|
||||
|
||||
# 2. Set default user
|
||||
echo -e "[boot]\nsystemd=true\n[user]\ndefault=majorlinux" | sudo tee /etc/wsl.conf
|
||||
useradd -m -G wheel majorlinux && passwd majorlinux
|
||||
echo "%wheel ALL=(ALL) ALL" | sudo tee /etc/sudoers.d/wheel
|
||||
|
||||
# 3. Install Python 3.11 via pyenv, PyTorch, Unsloth
|
||||
# See full steps below
|
||||
```
|
||||
|
||||
## Step 1 — System Packages
|
||||
|
||||
```bash
|
||||
sudo dnf update -y
|
||||
sudo dnf install -y git curl wget tmux screen htop rsync unzip \
|
||||
python3 python3-pip python3-devel gcc gcc-c++ make cmake \
|
||||
ninja-build pkg-config openssl-devel libffi-devel \
|
||||
gawk patch readline-devel sqlite-devel
|
||||
```
|
||||
|
||||
## Step 2 — Python 3.11 via pyenv
|
||||
|
||||
Fedora 43 ships Python 3.13. Unsloth requires 3.11. Use pyenv:
|
||||
|
||||
```bash
|
||||
curl https://pyenv.run | bash
|
||||
|
||||
# Add to ~/.bashrc
|
||||
export PYENV_ROOT="$HOME/.pyenv"
|
||||
[[ -d $PYENV_ROOT/bin ]] && export PATH="$PYENV_ROOT/bin:$PATH"
|
||||
eval "$(pyenv init - bash)"
|
||||
|
||||
source ~/.bashrc
|
||||
pyenv install 3.11.9
|
||||
pyenv global 3.11.9
|
||||
```
|
||||
|
||||
The tkinter warning during install is harmless — it's not needed for training.
|
||||
|
||||
## Step 3 — Training Virtualenv + PyTorch
|
||||
|
||||
```bash
|
||||
mkdir -p ~/majortwin/{staging,datasets,outputs,scripts}
|
||||
python -m venv ~/majortwin/venv
|
||||
source ~/majortwin/venv/bin/activate
|
||||
|
||||
pip install --upgrade pip
|
||||
pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu128
|
||||
|
||||
# Verify GPU
|
||||
python -c "import torch; print(torch.cuda.is_available(), torch.cuda.get_device_name(0))"
|
||||
```
|
||||
|
||||
Expected output: `True NVIDIA GeForce RTX 3080 Ti`
|
||||
|
||||
## Step 4 — Unsloth + Training Stack
|
||||
|
||||
```bash
|
||||
source ~/majortwin/venv/bin/activate
|
||||
|
||||
pip install "unsloth[colab-new] @ git+https://github.com/unslothai/unsloth.git"
|
||||
pip install transformers datasets accelerate peft trl bitsandbytes \
|
||||
sentencepiece protobuf scipy einops
|
||||
|
||||
# Pin transformers for unsloth-zoo compatibility
|
||||
pip install "transformers<=5.2.0"
|
||||
|
||||
# Verify
|
||||
python -c "import unsloth; print('Unsloth OK')"
|
||||
```
|
||||
|
||||
> [!warning] Never run `pip install -r requirements.txt` from inside llama.cpp while the training venv is active. It installs CPU-only PyTorch and downgrades transformers, breaking the CUDA setup.
|
||||
|
||||
## Step 5 — llama.cpp (CPU-only for GGUF conversion)
|
||||
|
||||
CUDA 12.8 is incompatible with Fedora 43's glibc for compiling llama.cpp (math function conflicts in `/usr/include/bits/mathcalls.h`). Build CPU-only — it's sufficient for GGUF conversion, which doesn't need GPU:
|
||||
|
||||
```bash
|
||||
# Install GCC 14 (CUDA 12.8 doesn't support GCC 15 which Fedora 43 ships)
|
||||
sudo dnf install -y gcc14 gcc14-c++
|
||||
|
||||
cd ~/majortwin
|
||||
git clone https://github.com/ggerganov/llama.cpp.git
|
||||
cd llama.cpp
|
||||
|
||||
cmake -B build \
|
||||
-DGGML_CUDA=OFF \
|
||||
-DCMAKE_BUILD_TYPE=Release \
|
||||
-DCMAKE_C_COMPILER=/usr/bin/gcc-14 \
|
||||
-DCMAKE_CXX_COMPILER=/usr/bin/g++-14
|
||||
|
||||
cmake --build build --config Release -j$(nproc) 2>&1 | tee /tmp/llama_build.log &
|
||||
tail -f /tmp/llama_build.log
|
||||
```
|
||||
|
||||
Verify:
|
||||
```bash
|
||||
ls ~/majortwin/llama.cpp/build/bin/llama-quantize && echo "OK"
|
||||
ls ~/majortwin/llama.cpp/build/bin/llama-cli && echo "OK"
|
||||
```
|
||||
|
||||
## Step 6 — Shell Environment
|
||||
|
||||
```bash
|
||||
cat >> ~/.bashrc << 'EOF'
|
||||
# MajorInfrastructure Paths
|
||||
export VAULT="/mnt/c/Users/majli/Documents/MajorVault"
|
||||
export MAJORANSIBLE="/mnt/d/MajorAnsible"
|
||||
export MAJORTWIN_D="/mnt/d/MajorTwin"
|
||||
export MAJORTWIN_WSL="$HOME/majortwin"
|
||||
export LLAMA_CPP="$HOME/majortwin/llama.cpp"
|
||||
|
||||
# Venv
|
||||
alias mtwin='source $MAJORTWIN_WSL/venv/bin/activate && cd $MAJORTWIN_WSL'
|
||||
alias vault='cd $VAULT'
|
||||
alias ll='ls -lah --color=auto'
|
||||
|
||||
# SSH Fleet Aliases
|
||||
alias majorhome='ssh majorlinux@100.120.209.106'
|
||||
alias dca='ssh root@100.104.11.146'
|
||||
alias majortoot='ssh root@100.110.197.17'
|
||||
alias majorlinuxvm='ssh root@100.87.200.5'
|
||||
alias majordiscord='ssh root@100.122.240.83'
|
||||
alias majorlab='ssh root@100.86.14.126'
|
||||
alias majormail='ssh root@100.84.165.52'
|
||||
alias teelia='ssh root@100.120.32.69'
|
||||
alias tttpod='ssh root@100.84.42.102'
|
||||
alias majorrig='ssh majorlinux@100.98.47.29' # port 2222 retired 2026-03-25, fleet uses port 22
|
||||
|
||||
# DNF5
|
||||
alias update='sudo dnf upgrade --refresh'
|
||||
alias install='sudo dnf install'
|
||||
alias clean='sudo dnf clean all'
|
||||
|
||||
# MajorTwin helpers
|
||||
stage_dataset() {
|
||||
cp "$VAULT/20-Projects/MajorTwin/03-Datasets/$1" "$MAJORTWIN_WSL/datasets/"
|
||||
echo "Staged: $1"
|
||||
}
|
||||
export_gguf() {
|
||||
cp "$MAJORTWIN_WSL/outputs/$1" "$MAJORTWIN_D/models/"
|
||||
echo "Exported: $1 → $MAJORTWIN_D/models/"
|
||||
}
|
||||
EOF
|
||||
|
||||
source ~/.bashrc
|
||||
```
|
||||
|
||||
## Key Rules
|
||||
|
||||
- **Always activate venv before pip installs:** `source ~/majortwin/venv/bin/activate`
|
||||
- **Never train from /mnt/c or /mnt/d** — stage files in `~/majortwin/staging/` first
|
||||
- **Never put ML artifacts inside MajorVault** — models, venvs, artifacts go on D: drive
|
||||
- **Max viable training model:** 7B at QLoRA 4-bit (RTX 3080 Ti, 12GB VRAM)
|
||||
- **Current base model:** Qwen2.5-7B-Instruct (ChatML format — stop token: `<|im_end|>` only)
|
||||
- **Transformers must be pinned:** `pip install "transformers<=5.2.0"` for unsloth-zoo compatibility
|
||||
|
||||
## D: Drive Layout
|
||||
|
||||
```
|
||||
D:\MajorTwin\
|
||||
models\ ← finished GGUFs
|
||||
datasets\ ← dataset archives
|
||||
artifacts\ ← training run artifacts
|
||||
training-runs\ ← logs, checkpoints
|
||||
D:\WSL\
|
||||
Fedora43\ ← WSL2 VHDX
|
||||
Backups\ ← weekly WSL2 backup tars
|
||||
```
|
||||
|
||||
## See Also
|
||||
|
||||
- [[wsl2-instance-migration-fedora43|WSL2 Instance Migration]]
|
||||
- [[wsl2-backup-powershell|WSL2 Backup via PowerShell]]
|
||||
74
01-linux/storage/snapraid-mergerfs-setup.md
Normal file
74
01-linux/storage/snapraid-mergerfs-setup.md
Normal file
@@ -0,0 +1,74 @@
|
||||
# SnapRAID & MergerFS Storage Setup
|
||||
|
||||
## Problem
|
||||
|
||||
Managing a collection of mismatched hard drives as a single pool while maintaining data redundancy (parity) without the overhead or risk of a traditional RAID 5/6 array.
|
||||
|
||||
## Solution
|
||||
|
||||
A combination of **MergerFS** for pooling and **SnapRAID** for parity. This is ideal for "mostly static" media storage (like MajorRAID) where files aren't changing every second.
|
||||
|
||||
### 1. Concepts
|
||||
|
||||
- **MergerFS:** A FUSE-based union filesystem. It takes multiple drives/folders and presents them as a single mount point. It does NOT provide redundancy.
|
||||
- **SnapRAID:** A backup/parity tool for disk arrays. It creates parity information on a dedicated drive. It is NOT real-time (you must run `snapraid sync`).
|
||||
|
||||
### 2. Implementation Strategy
|
||||
|
||||
1. **Clean the Pool:** Use `rmlint` to clear duplicates and reclaim space.
|
||||
2. **Identify the Parity Drive:** Choose your largest drive (or one equal to the largest data drive) to hold the parity information. In my setup, `/mnt/usb` (sdc) was cleared of 4TB of duplicates to be repurposed for this.
|
||||
3. **Configure MergerFS:** Pool the data drives (e.g., `/mnt/disk1`, `/mnt/disk2`) into `/storage`.
|
||||
4. **Configure SnapRAID:** Point SnapRAID to the data drives and the parity drive.
|
||||
|
||||
### 3. MergerFS Config (/etc/fstab)
|
||||
|
||||
```fstab
|
||||
# Example MergerFS pool
|
||||
/mnt/disk*:/mnt/usb-data /storage fuse.mergerfs defaults,allow_other,cache.files=off,use_ino,category.create=mfs,minfreespace=20G,fsname=mergerfsPool 0 0
|
||||
```
|
||||
|
||||
### 4. SnapRAID Config (/etc/snapraid.conf)
|
||||
|
||||
```conf
|
||||
# Parity file location
|
||||
parity /mnt/parity/snapraid.parity
|
||||
|
||||
# Data drives
|
||||
content /var/snapraid/snapraid.content
|
||||
content /mnt/disk1/.snapraid.content
|
||||
content /mnt/disk2/.snapraid.content
|
||||
|
||||
data d1 /mnt/disk1/
|
||||
data d2 /mnt/disk2/
|
||||
|
||||
# Exclusions
|
||||
exclude /lost+found/
|
||||
exclude /tmp/
|
||||
exclude .DS_Store
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Maintenance
|
||||
|
||||
### SnapRAID Sync
|
||||
|
||||
Run this daily (via cron) or after adding large amounts of data:
|
||||
|
||||
```bash
|
||||
snapraid sync
|
||||
```
|
||||
|
||||
### SnapRAID Scrub
|
||||
|
||||
Run this weekly to check for bitrot:
|
||||
|
||||
```bash
|
||||
snapraid scrub
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Tags
|
||||
|
||||
#snapraid #mergerfs #linux #storage #homelab #raid
|
||||
157
02-selfhosting/docker/docker-healthchecks.md
Normal file
157
02-selfhosting/docker/docker-healthchecks.md
Normal file
@@ -0,0 +1,157 @@
|
||||
---
|
||||
title: "Docker Healthchecks"
|
||||
domain: selfhosting
|
||||
category: docker
|
||||
tags: [docker, healthcheck, monitoring, uptime-kuma, compose]
|
||||
status: published
|
||||
created: 2026-03-23
|
||||
updated: 2026-03-23
|
||||
---
|
||||
|
||||
# Docker Healthchecks
|
||||
|
||||
A Docker healthcheck tells the daemon (and any monitoring tool) whether a container is actually working — not just running. Without one, a container shows as `Up` even if the app inside is crashed, deadlocked, or waiting on a dependency.
|
||||
|
||||
## Why It Matters
|
||||
|
||||
Tools like Uptime Kuma report containers without healthchecks as:
|
||||
|
||||
> Container has not reported health and is currently running. As it is running, it is considered UP. Consider adding a health check for better service visibility.
|
||||
|
||||
A healthcheck upgrades that to a real `(healthy)` or `(unhealthy)` status, making monitoring meaningful.
|
||||
|
||||
## Basic Syntax (docker-compose)
|
||||
|
||||
```yaml
|
||||
healthcheck:
|
||||
test: ["CMD", "wget", "-q", "--spider", "http://localhost:8080/health"]
|
||||
interval: 30s
|
||||
timeout: 10s
|
||||
retries: 3
|
||||
start_period: 30s
|
||||
```
|
||||
|
||||
| Field | Description |
|
||||
|---|---|
|
||||
| `test` | Command to run. Exit 0 = healthy, non-zero = unhealthy. |
|
||||
| `interval` | How often to run the check. |
|
||||
| `timeout` | How long to wait before marking as failed. |
|
||||
| `retries` | Failures before marking `unhealthy`. |
|
||||
| `start_period` | Grace period on startup before failures count. |
|
||||
|
||||
## Common Patterns
|
||||
|
||||
### HTTP service (wget — available in Alpine)
|
||||
```yaml
|
||||
healthcheck:
|
||||
test: ["CMD", "wget", "-q", "--spider", "http://localhost:2368/"]
|
||||
interval: 30s
|
||||
timeout: 10s
|
||||
retries: 3
|
||||
start_period: 30s
|
||||
```
|
||||
|
||||
### HTTP service (curl)
|
||||
```yaml
|
||||
healthcheck:
|
||||
test: ["CMD", "curl", "-f", "http://localhost:8080/health"]
|
||||
interval: 30s
|
||||
timeout: 10s
|
||||
retries: 3
|
||||
start_period: 30s
|
||||
```
|
||||
|
||||
### MySQL / MariaDB
|
||||
```yaml
|
||||
healthcheck:
|
||||
test: ["CMD", "mysqladmin", "ping", "-h", "localhost", "-u", "root", "-psecret"]
|
||||
interval: 10s
|
||||
timeout: 5s
|
||||
retries: 3
|
||||
start_period: 20s
|
||||
```
|
||||
|
||||
### PostgreSQL
|
||||
```yaml
|
||||
healthcheck:
|
||||
test: ["CMD-SHELL", "pg_isready -U postgres"]
|
||||
interval: 10s
|
||||
timeout: 5s
|
||||
retries: 5
|
||||
```
|
||||
|
||||
### Redis
|
||||
```yaml
|
||||
healthcheck:
|
||||
test: ["CMD", "redis-cli", "ping"]
|
||||
interval: 10s
|
||||
timeout: 5s
|
||||
retries: 3
|
||||
```
|
||||
|
||||
### TCP port check (no curl/wget available)
|
||||
```yaml
|
||||
healthcheck:
|
||||
test: ["CMD-SHELL", "nc -z localhost 8080 || exit 1"]
|
||||
interval: 30s
|
||||
timeout: 5s
|
||||
retries: 3
|
||||
```
|
||||
|
||||
## Using Healthchecks with `depends_on`
|
||||
|
||||
Healthchecks enable proper startup ordering. Instead of a fixed sleep, a dependent container waits until its dependency is actually ready:
|
||||
|
||||
```yaml
|
||||
services:
|
||||
app:
|
||||
depends_on:
|
||||
db:
|
||||
condition: service_healthy
|
||||
|
||||
db:
|
||||
image: mysql:8.0
|
||||
healthcheck:
|
||||
test: ["CMD", "mysqladmin", "ping", "-h", "localhost"]
|
||||
interval: 10s
|
||||
timeout: 5s
|
||||
retries: 3
|
||||
start_period: 20s
|
||||
```
|
||||
|
||||
This prevents the classic race condition where the app starts before the database is ready to accept connections.
|
||||
|
||||
## Checking Health Status
|
||||
|
||||
```bash
|
||||
# See health status in container list
|
||||
docker ps
|
||||
|
||||
# Get detailed health info including last check output
|
||||
docker inspect --format='{{json .State.Health}}' <container> | jq
|
||||
```
|
||||
|
||||
## Ghost Example
|
||||
|
||||
Ghost (Alpine-based) uses `wget` rather than `curl`:
|
||||
|
||||
```yaml
|
||||
healthcheck:
|
||||
test: ["CMD", "wget", "-q", "--spider", "http://localhost:2368/ghost/api/v4/admin/site/"]
|
||||
interval: 30s
|
||||
timeout: 10s
|
||||
retries: 3
|
||||
start_period: 30s
|
||||
```
|
||||
|
||||
## Gotchas & Notes
|
||||
|
||||
- **Alpine images** don't have `curl` by default — use `wget` or install curl in the image.
|
||||
- **`start_period`** is critical for slow-starting apps (databases, JVM services). Failures during this window don't count toward `retries`.
|
||||
- **`CMD` vs `CMD-SHELL`** — use `CMD` for direct exec (no shell needed), `CMD-SHELL` when you need pipes, `&&`, or shell builtins.
|
||||
- **Uptime Kuma** will pick up Docker healthcheck status automatically when monitoring via the Docker socket — no extra config needed.
|
||||
|
||||
## See Also
|
||||
|
||||
- [[debugging-broken-docker-containers]]
|
||||
- [[netdata-docker-health-alarm-tuning]]
|
||||
@@ -23,6 +23,8 @@ Guides for running your own services at home, including Docker, reverse proxies,
|
||||
## Monitoring
|
||||
|
||||
- [Tuning Netdata Web Log Alerts](monitoring/tuning-netdata-web-log-alerts.md)
|
||||
- [Tuning Netdata Docker Health Alarms](monitoring/netdata-docker-health-alarm-tuning.md)
|
||||
- [Deploying Netdata to a New Server](monitoring/netdata-new-server-setup.md)
|
||||
|
||||
## Security
|
||||
|
||||
|
||||
157
02-selfhosting/monitoring/netdata-docker-health-alarm-tuning.md
Normal file
157
02-selfhosting/monitoring/netdata-docker-health-alarm-tuning.md
Normal file
@@ -0,0 +1,157 @@
|
||||
---
|
||||
title: "Tuning Netdata Docker Health Alarms to Prevent Update Flapping"
|
||||
domain: selfhosting
|
||||
category: monitoring
|
||||
tags: [netdata, docker, nextcloud, alarms, health, monitoring]
|
||||
status: published
|
||||
created: 2026-03-18
|
||||
updated: 2026-03-28
|
||||
---
|
||||
|
||||
# Tuning Netdata Docker Health Alarms to Prevent Update Flapping
|
||||
|
||||
Netdata's default `docker_container_unhealthy` alarm fires on a 10-second average with no delay. When Nextcloud AIO (or any stack with a watchtower/auto-update setup) does its nightly update cycle, containers restart in sequence and briefly show as unhealthy — generating a flood of false alerts.
|
||||
|
||||
## The Default Alarm
|
||||
|
||||
```ini
|
||||
template: docker_container_unhealthy
|
||||
on: docker.container_health_status
|
||||
every: 10s
|
||||
lookup: average -10s of unhealthy
|
||||
warn: $this > 0
|
||||
```
|
||||
|
||||
A single container being unhealthy for 10 seconds triggers it. No grace period, no delay.
|
||||
|
||||
## The Fix
|
||||
|
||||
Create a custom override at `/etc/netdata/health.d/docker.conf` (maps to the Netdata config volume if running in Docker). This file takes precedence over the stock config in `/usr/lib/netdata/conf.d/health.d/docker.conf`.
|
||||
|
||||
### General Container Alarm
|
||||
|
||||
This alarm covers all containers **except** `nextcloud-aio-nextcloud`, which gets its own dedicated alarm (see below).
|
||||
|
||||
```ini
|
||||
# Custom override — reduces flapping during nightly container updates.
|
||||
# General container unhealthy alarm — all containers except nextcloud-aio-nextcloud
|
||||
|
||||
template: docker_container_unhealthy
|
||||
on: docker.container_health_status
|
||||
class: Errors
|
||||
type: Containers
|
||||
component: Docker
|
||||
units: status
|
||||
every: 30s
|
||||
lookup: average -5m of unhealthy
|
||||
chart labels: container_name=!nextcloud-aio-nextcloud *
|
||||
warn: $this > 0
|
||||
delay: up 3m down 5m multiplier 1.5 max 30m
|
||||
summary: Docker container ${label:container_name} health
|
||||
info: ${label:container_name} docker container health status is unhealthy
|
||||
to: sysadmin
|
||||
```
|
||||
|
||||
| Setting | Default | Tuned | Effect |
|
||||
|---|---|---|---|
|
||||
| `every` | 10s | 30s | Check less frequently |
|
||||
| `lookup` | average -10s | average -5m | Smooths transient unhealthy samples over 5 minutes |
|
||||
| `delay: up 3m` | none | 3m | Won't fire until unhealthy condition persists for 3 continuous minutes |
|
||||
| `delay: down 5m` | none | 5m (max 30m) | Grace period after recovery before clearing |
|
||||
|
||||
### Dedicated Nextcloud AIO Alarm
|
||||
|
||||
Added 2026-03-23, updated 2026-03-28. The `nextcloud-aio-nextcloud` container needs a more lenient window than other containers. Its healthcheck (`/healthcheck.sh`) verifies PostgreSQL connectivity (port 5432) and PHP-FPM (port 9000). PHP-FPM takes ~90 seconds to warm up after a normal restart — but during nightly AIO update cycles, the full startup (occ upgrade, app updates, migrations) can take 5+ minutes. On 2026-03-27, a startup hung and left the container unhealthy for 20 hours until the next nightly cycle replaced it.
|
||||
|
||||
The dedicated alarm uses a 10-minute lookup window and 10-minute delay to absorb normal startup, while still catching sustained failures:
|
||||
|
||||
```ini
|
||||
# Dedicated alarm for nextcloud-aio-nextcloud — lenient window to absorb nightly update cycle
|
||||
# PHP-FPM can take 5+ minutes to warm up; only alert on sustained failure
|
||||
|
||||
template: docker_nextcloud_unhealthy
|
||||
on: docker.container_health_status
|
||||
class: Errors
|
||||
type: Containers
|
||||
component: Docker
|
||||
units: status
|
||||
every: 30s
|
||||
lookup: average -10m of unhealthy
|
||||
chart labels: container_name=nextcloud-aio-nextcloud
|
||||
warn: $this > 0
|
||||
delay: up 10m down 5m multiplier 1.5 max 30m
|
||||
summary: Nextcloud container health sustained
|
||||
info: nextcloud-aio-nextcloud has been unhealthy for a sustained period — not a transient update blip
|
||||
to: sysadmin
|
||||
```
|
||||
|
||||
## Watchdog Cron: Auto-Restart on Sustained Unhealthy
|
||||
|
||||
If the Nextcloud container stays unhealthy for more than 1 hour (well past any normal startup window), a cron watchdog on majorlab auto-restarts it and logs the event. This was added 2026-03-28 after an incident where the container sat unhealthy for 20 hours until the next nightly backup cycle replaced it.
|
||||
|
||||
**File:** `/etc/cron.d/nextcloud-health-watchdog`
|
||||
|
||||
```bash
|
||||
# Restart nextcloud-aio-nextcloud if unhealthy for >1 hour
|
||||
*/15 * * * * root docker inspect --format={{.State.Health.Status}} nextcloud-aio-nextcloud 2>/dev/null | grep -q unhealthy && [ "$(docker inspect --format={{.State.StartedAt}} nextcloud-aio-nextcloud | xargs -I{} date -d {} +\%s)" -lt "$(date -d "1 hour ago" +\%s)" ] && docker restart nextcloud-aio-nextcloud && logger -t nextcloud-watchdog "Restarted unhealthy nextcloud-aio-nextcloud"
|
||||
```
|
||||
|
||||
- Runs every 15 minutes as root
|
||||
- Only restarts if the container has been running for >1 hour (avoids interfering with normal startup)
|
||||
- Logs to syslog as `nextcloud-watchdog` — check with `journalctl -t nextcloud-watchdog`
|
||||
- Netdata will still fire the `docker_nextcloud_unhealthy` alert during the unhealthy window, but the outage is capped at ~1 hour instead of persisting until the next nightly cycle
|
||||
|
||||
## Also: Suppress `docker_container_down` for Normally-Exiting Containers
|
||||
|
||||
Nextcloud AIO runs `borgbackup` (scheduled backups) and `watchtower` (auto-updates) as containers that exit with code 0 after completing their work. The stock `docker_container_down` alarm fires on any exited container, generating false alerts after every nightly cycle.
|
||||
|
||||
Add a second override to the same file using `chart labels` to exclude them:
|
||||
|
||||
```ini
|
||||
# Suppress docker_container_down for Nextcloud AIO containers that exit normally
|
||||
# (borgbackup runs on schedule then exits; watchtower does updates then exits)
|
||||
template: docker_container_down
|
||||
on: docker.container_running_state
|
||||
class: Errors
|
||||
type: Containers
|
||||
component: Docker
|
||||
units: status
|
||||
every: 30s
|
||||
lookup: average -5m of down
|
||||
chart labels: container_name=!nextcloud-aio-borgbackup !nextcloud-aio-watchtower *
|
||||
warn: $this > 0
|
||||
delay: up 3m down 5m multiplier 1.5 max 30m
|
||||
summary: Docker container ${label:container_name} down
|
||||
info: ${label:container_name} docker container is down
|
||||
to: sysadmin
|
||||
```
|
||||
|
||||
The `chart labels` line uses Netdata's simple pattern syntax — `!` prefix excludes a container, `*` matches everything else. All other exited containers still alert normally.
|
||||
|
||||
## Applying the Config
|
||||
|
||||
```bash
|
||||
# If Netdata runs in Docker, write to the config volume
|
||||
sudo tee /var/lib/docker/volumes/netdata_netdataconfig/_data/health.d/docker.conf > /dev/null << 'EOF'
|
||||
# paste config here
|
||||
EOF
|
||||
|
||||
# Reload health alarms without restarting the container
|
||||
sudo docker exec netdata netdatacli reload-health
|
||||
```
|
||||
|
||||
No container restart needed — `reload-health` picks up the new config immediately.
|
||||
|
||||
## Verify
|
||||
|
||||
In the Netdata UI, navigate to **Alerts → Manage Alerts** and search for `docker_container_unhealthy`. The lookup and delay values should reflect the new config.
|
||||
|
||||
## Notes
|
||||
|
||||
- Both `docker_container_unhealthy` and `docker_container_down` are overridden in this config. Any container not explicitly excluded in the `chart labels` filter will still alert normally.
|
||||
- If you want per-container silencing instead of a blanket delay, use the `host labels` or `chart labels` filter to scope the alarm to specific containers.
|
||||
- Config volume path on majorlab: `/var/lib/docker/volumes/netdata_netdataconfig/_data/`
|
||||
|
||||
## See Also
|
||||
|
||||
- [Tuning Netdata Web Log Alerts](tuning-netdata-web-log-alerts.md) — similar tuning for web_log redirect alerts
|
||||
153
02-selfhosting/monitoring/netdata-n8n-enriched-alerts.md
Normal file
153
02-selfhosting/monitoring/netdata-n8n-enriched-alerts.md
Normal file
@@ -0,0 +1,153 @@
|
||||
# Netdata → n8n Enriched Alert Emails
|
||||
|
||||
**Status:** Live across all MajorsHouse fleet servers as of 2026-03-21
|
||||
|
||||
Replaces Netdata's plain-text alert emails with rich HTML emails that include a plain-English explanation, a suggested remediation command, and a direct link to the relevant MajorWiki article.
|
||||
|
||||
---
|
||||
|
||||
## How It Works
|
||||
|
||||
```
|
||||
Netdata alarm fires
|
||||
→ custom_sender() in health_alarm_notify.conf
|
||||
→ POST JSON payload to n8n webhook
|
||||
→ Code node enriches with suggestion + wiki link
|
||||
→ Send Email node sends HTML email via SMTP
|
||||
→ Respond node returns 200 OK
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## n8n Workflow
|
||||
|
||||
**Name:** Netdata Enriched Alerts
|
||||
**URL:** https://n8n.majorshouse.com
|
||||
**Webhook endpoint:** `POST https://n8n.majorshouse.com/webhook/netdata-alert`
|
||||
**Workflow ID:** `a1b2c3d4-aaaa-bbbb-cccc-000000000001`
|
||||
|
||||
### Nodes
|
||||
|
||||
1. **Netdata Webhook** — receives POST from Netdata's `custom_sender()`
|
||||
2. **Enrich Alert** — Code node; matches alarm/chart/family to enrichment table, builds HTML email body in `$json.emailBody`
|
||||
3. **Send Enriched Email** — sends via SMTP port 465 (SMTP account 2), from `netdata@majorshouse.com` to `marcus@majorshouse.com`
|
||||
4. **Respond OK** — returns `ok` with HTTP 200 to Netdata
|
||||
|
||||
### Enrichment Keys
|
||||
|
||||
The Code node matches on `alarm`, `chart`, or `family` field (case-insensitive substring):
|
||||
|
||||
| Key | Title | Wiki Article |
|
||||
|-----|-------|-------------|
|
||||
| `disk_space` | Disk Space Alert | snapraid-mergerfs-setup |
|
||||
| `ram` | Memory Alert | managing-linux-services-systemd-ansible |
|
||||
| `cpu` | CPU Alert | managing-linux-services-systemd-ansible |
|
||||
| `load` | Load Average Alert | managing-linux-services-systemd-ansible |
|
||||
| `net` | Network Alert | tailscale-homelab-remote-access |
|
||||
| `docker` | Docker Container Alert | debugging-broken-docker-containers |
|
||||
| `web_log` | Web Log Alert | tuning-netdata-web-log-alerts |
|
||||
| `health` | Docker Health Alarm | netdata-docker-health-alarm-tuning |
|
||||
| `mdstat` | RAID Array Alert | mdadm-usb-hub-disconnect-recovery |
|
||||
| `systemd` | Systemd Service Alert | docker-caddy-selinux-post-reboot-recovery |
|
||||
| _(no match)_ | Server Alert | netdata-new-server-setup |
|
||||
|
||||
---
|
||||
|
||||
## Netdata Configuration
|
||||
|
||||
### Config File Locations
|
||||
|
||||
| Server | Path |
|
||||
|--------|------|
|
||||
| majorhome, majormail, majordiscord, tttpod, teelia | `/etc/netdata/health_alarm_notify.conf` |
|
||||
| majorlinux, majortoot, dca | `/usr/lib/netdata/conf.d/health_alarm_notify.conf` |
|
||||
|
||||
### Required Settings
|
||||
|
||||
```bash
|
||||
DEFAULT_RECIPIENT_CUSTOM="n8n"
|
||||
role_recipients_custom[sysadmin]="${DEFAULT_RECIPIENT_CUSTOM}"
|
||||
```
|
||||
|
||||
### custom_sender() Function
|
||||
|
||||
```bash
|
||||
custom_sender() {
|
||||
local to="${1}"
|
||||
local payload
|
||||
payload=$(jq -n \
|
||||
--arg hostname "${host}" \
|
||||
--arg alarm "${name}" \
|
||||
--arg chart "${chart}" \
|
||||
--arg family "${family}" \
|
||||
--arg status "${status}" \
|
||||
--arg old_status "${old_status}" \
|
||||
--arg value "${value_string}" \
|
||||
--arg units "${units}" \
|
||||
--arg info "${info}" \
|
||||
--arg alert_url "${goto_url}" \
|
||||
--arg severity "${severity}" \
|
||||
--arg raised_for "${raised_for}" \
|
||||
--arg total_warnings "${total_warnings}" \
|
||||
--arg total_critical "${total_critical}" \
|
||||
'{hostname:$hostname,alarm:$alarm,chart:$chart,family:$family,status:$status,old_status:$old_status,value:$value,units:$units,info:$info,alert_url:$alert_url,severity:$severity,raised_for:$raised_for,total_warnings:$total_warnings,total_critical:$total_critical}')
|
||||
local httpcode
|
||||
httpcode=$(docurl -s -o /dev/null -w "%{http_code}" \
|
||||
-X POST \
|
||||
-H "Content-Type: application/json" \
|
||||
-d "${payload}" \
|
||||
"https://n8n.majorshouse.com/webhook/netdata-alert")
|
||||
if [ "${httpcode}" = "200" ]; then
|
||||
info "sent enriched notification to n8n for ${status} of ${host}.${name}"
|
||||
sent=$((sent + 1))
|
||||
else
|
||||
error "failed to send notification to n8n, HTTP code: ${httpcode}"
|
||||
fi
|
||||
}
|
||||
```
|
||||
|
||||
!!! note "jq required"
|
||||
The `custom_sender()` function requires `jq` to be installed. Verify with `which jq` on each server.
|
||||
|
||||
---
|
||||
|
||||
## Deploying to a New Server
|
||||
|
||||
```bash
|
||||
# 1. Find the config file
|
||||
find /etc/netdata /usr/lib/netdata -name health_alarm_notify.conf 2>/dev/null
|
||||
|
||||
# 2. Edit it — add the two lines and the custom_sender() function above
|
||||
|
||||
# 3. Test connectivity from the server
|
||||
curl -s -o /dev/null -w "%{http_code}" \
|
||||
-X POST https://n8n.majorshouse.com/webhook/netdata-alert \
|
||||
-H "Content-Type: application/json" \
|
||||
-d '{"hostname":"test","alarm":"disk_space._","status":"WARNING"}'
|
||||
# Expected: 200
|
||||
|
||||
# 4. Restart Netdata
|
||||
systemctl restart netdata
|
||||
|
||||
# 5. Send a test alarm
|
||||
/usr/libexec/netdata/plugins.d/alarm-notify.sh test custom
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Troubleshooting
|
||||
|
||||
**Emails not arriving — check n8n execution log:**
|
||||
Go to https://n8n.majorshouse.com → open "Netdata Enriched Alerts" → Executions tab. Look for `error` status entries.
|
||||
|
||||
**Email body empty:**
|
||||
The Send Email node's HTML field must be `={{ $json.emailBody }}`. Shell variable expansion can silently strip `$json` if the workflow is patched via inline SSH commands — always use a Python script file.
|
||||
|
||||
**`000` curl response from a server:**
|
||||
Usually a timeout, not a DNS or connection failure. Re-test with `--max-time 30`.
|
||||
|
||||
**`custom_sender()` syntax error in Netdata logs:**
|
||||
Bash heredocs don't work inside sourced config files. Use `jq -n --arg ...` as shown above — no heredocs.
|
||||
|
||||
**n8n `N8N_TRUST_PROXY` must be set:**
|
||||
Without `N8N_TRUST_PROXY=true` in the Docker environment, Caddy's `X-Forwarded-For` header causes n8n's rate limiter to abort requests before parsing the body. Set in `/opt/n8n/compose.yml`.
|
||||
161
02-selfhosting/monitoring/netdata-new-server-setup.md
Normal file
161
02-selfhosting/monitoring/netdata-new-server-setup.md
Normal file
@@ -0,0 +1,161 @@
|
||||
---
|
||||
title: "Deploying Netdata to a New Server"
|
||||
domain: selfhosting
|
||||
category: monitoring
|
||||
tags: [netdata, monitoring, email, notifications, netdata-cloud, ubuntu, debian, n8n]
|
||||
status: published
|
||||
created: 2026-03-18
|
||||
updated: 2026-03-22
|
||||
---
|
||||
|
||||
# Deploying Netdata to a New Server
|
||||
|
||||
This covers the full Netdata setup for a new server in the fleet: install, email notification config, n8n webhook integration, and Netdata Cloud claim. Applies to Ubuntu/Debian servers.
|
||||
|
||||
## 1. Install Prerequisites
|
||||
|
||||
Install `jq` before anything else. It is required by the `custom_sender()` function in `health_alarm_notify.conf` to build the JSON payload sent to the n8n webhook. **If `jq` is missing, the webhook will fire with an empty body and n8n alert emails will have no information in them.**
|
||||
|
||||
```bash
|
||||
apt install -y jq
|
||||
```
|
||||
|
||||
Verify:
|
||||
|
||||
```bash
|
||||
jq --version
|
||||
```
|
||||
|
||||
## 2. Install Netdata
|
||||
|
||||
Use the official kickstart script:
|
||||
|
||||
```bash
|
||||
wget -O /tmp/netdata-install.sh https://get.netdata.cloud/kickstart.sh
|
||||
sh /tmp/netdata-install.sh --non-interactive --stable-channel --disable-telemetry
|
||||
```
|
||||
|
||||
Verify it's running:
|
||||
|
||||
```bash
|
||||
systemctl is-active netdata
|
||||
curl -s http://localhost:19999/api/v1/info | python3 -c "import sys,json; d=json.load(sys.stdin); print('Netdata', d['version'])"
|
||||
```
|
||||
|
||||
## 3. Configure Email Notifications
|
||||
|
||||
Copy the default config and set the three required values:
|
||||
|
||||
```bash
|
||||
cp /usr/lib/netdata/conf.d/health_alarm_notify.conf /etc/netdata/health_alarm_notify.conf
|
||||
```
|
||||
|
||||
Edit `/etc/netdata/health_alarm_notify.conf`:
|
||||
|
||||
```ini
|
||||
EMAIL_SENDER="netdata@majorshouse.com"
|
||||
SEND_EMAIL="YES"
|
||||
DEFAULT_RECIPIENT_EMAIL="marcus@majorshouse.com"
|
||||
```
|
||||
|
||||
Or apply with `sed` in one shot:
|
||||
|
||||
```bash
|
||||
sed -i 's/^#\?EMAIL_SENDER=.*/EMAIL_SENDER="netdata@majorshouse.com"/' /etc/netdata/health_alarm_notify.conf
|
||||
sed -i 's/^#\?SEND_EMAIL=.*/SEND_EMAIL="YES"/' /etc/netdata/health_alarm_notify.conf
|
||||
sed -i 's/^#\?DEFAULT_RECIPIENT_EMAIL=.*/DEFAULT_RECIPIENT_EMAIL="marcus@majorshouse.com"/' /etc/netdata/health_alarm_notify.conf
|
||||
```
|
||||
|
||||
Restart and test:
|
||||
|
||||
```bash
|
||||
systemctl restart netdata
|
||||
/usr/libexec/netdata/plugins.d/alarm-notify.sh test 2>&1 | grep -E '(OK|FAILED|email)'
|
||||
```
|
||||
|
||||
You should see three `# OK` lines (WARNING → CRITICAL → CLEAR test cycle) and confirmation that email was sent to `marcus@majorshouse.com`.
|
||||
|
||||
> [!note] Delivery via local Postfix
|
||||
> Email is relayed through the server's local Postfix instance. Ensure Postfix is installed and `/usr/sbin/sendmail` resolves.
|
||||
|
||||
## 4. Configure n8n Webhook Notifications
|
||||
|
||||
Copy the `health_alarm_notify.conf` from an existing server (e.g. majormail) which contains the `custom_sender()` function. This sends enriched JSON payloads to the n8n webhook at `https://n8n.majorshouse.com/webhook/netdata-alert`.
|
||||
|
||||
> [!warning] jq required
|
||||
> The `custom_sender()` function uses `jq` to build the JSON payload. If `jq` is not installed, `payload` will be empty, curl will send `Content-Length: 0`, and n8n will produce alert emails with `Host: unknown`, blank alert/value fields, and `Status: UNKNOWN`. Always install `jq` first (Step 1).
|
||||
|
||||
After deploying the config, run a test to confirm the webhook fires correctly:
|
||||
|
||||
```bash
|
||||
systemctl restart netdata
|
||||
/usr/libexec/netdata/plugins.d/alarm-notify.sh test 2>&1 | grep -E '(custom|n8n|OK|FAILED)'
|
||||
```
|
||||
|
||||
Verify in n8n that the latest execution shows a non-empty body with `hostname`, `alarm`, and `status` fields populated.
|
||||
|
||||
## 5. Claim to Netdata Cloud
|
||||
|
||||
Get the claim command from **Netdata Cloud → Space Settings → Nodes → Add Nodes**. It will look like:
|
||||
|
||||
```bash
|
||||
wget -O /tmp/netdata-kickstart.sh https://get.netdata.cloud/kickstart.sh
|
||||
sh /tmp/netdata-kickstart.sh --stable-channel \
|
||||
--claim-token <token> \
|
||||
--claim-rooms <room-id> \
|
||||
--claim-url https://app.netdata.cloud
|
||||
```
|
||||
|
||||
Verify the claim was accepted:
|
||||
|
||||
```bash
|
||||
cat /var/lib/netdata/cloud.d/claimed_id
|
||||
```
|
||||
|
||||
A UUID will be present if claimed successfully. The node should appear in Netdata Cloud within ~60 seconds.
|
||||
|
||||
## 6. Verify Alerts
|
||||
|
||||
Check that no unexpected alerts are active after setup:
|
||||
|
||||
```bash
|
||||
curl -s 'http://localhost:19999/api/v1/alarms?active' | python3 -c "
|
||||
import sys, json
|
||||
d = json.load(sys.stdin)
|
||||
active = [v for v in d.get('alarms', {}).values() if v.get('status') not in ('CLEAR', 'UNINITIALIZED', 'UNDEFINED')]
|
||||
print(f'{len(active)} active alert(s)')
|
||||
for v in active:
|
||||
print(f' [{v[\"status\"]}] {v[\"name\"]} on {v[\"chart\"]}')
|
||||
"
|
||||
```
|
||||
|
||||
## Fleet-wide Alert Check
|
||||
|
||||
To audit all servers at once (requires Tailscale SSH access):
|
||||
|
||||
```bash
|
||||
for host in majorlab majorhome majormail majordiscord majortoot majorlinux tttpod dca teelia; do
|
||||
echo "=== $host ==="
|
||||
ssh root@$host "curl -s 'http://localhost:19999/api/v1/alarms?active' | python3 -c \
|
||||
\"import sys,json; d=json.load(sys.stdin); active=[v for v in d.get('alarms',{}).values() if v.get('status') not in ('CLEAR','UNINITIALIZED','UNDEFINED')]; print(str(len(active))+' active')\""
|
||||
done
|
||||
```
|
||||
|
||||
## Fleet-wide jq Audit
|
||||
|
||||
To check that all servers with `custom_sender` have `jq` installed:
|
||||
|
||||
```bash
|
||||
for host in majorlab majorhome majormail majordiscord majortoot majorlinux tttpod dca teelia; do
|
||||
echo -n "=== $host: "
|
||||
ssh -o ConnectTimeout=5 root@$host \
|
||||
'has_cs=$(grep -l "custom_sender\|n8n.majorshouse.com" /etc/netdata/health_alarm_notify.conf 2>/dev/null | wc -l); has_jq=$(which jq 2>/dev/null && echo yes || echo NO); echo "custom_sender=$has_cs jq=$has_jq"'
|
||||
done
|
||||
```
|
||||
|
||||
Any server showing `custom_sender=1 jq=NO` needs `apt install -y jq` immediately.
|
||||
|
||||
## Related
|
||||
|
||||
- [Tuning Netdata Web Log Alerts](tuning-netdata-web-log-alerts.md)
|
||||
- [Tuning Netdata Docker Health Alarms](netdata-docker-health-alarm-tuning.md)
|
||||
137
02-selfhosting/monitoring/netdata-selinux-avc-chart.md
Normal file
137
02-selfhosting/monitoring/netdata-selinux-avc-chart.md
Normal file
@@ -0,0 +1,137 @@
|
||||
---
|
||||
title: "Netdata SELinux AVC Denial Monitoring"
|
||||
domain: selfhosting
|
||||
category: monitoring
|
||||
tags: [netdata, selinux, fedora, monitoring, ausearch, charts.d]
|
||||
status: published
|
||||
created: 2026-03-27
|
||||
updated: 2026-03-27
|
||||
---
|
||||
|
||||
# Netdata SELinux AVC Denial Monitoring
|
||||
|
||||
A custom `charts.d` plugin that tracks SELinux AVC denials over time via Netdata. Deployed on all Fedora boxes in the fleet where SELinux is Enforcing.
|
||||
|
||||
## What It Does
|
||||
|
||||
The plugin runs `ausearch -m avc` every 60 seconds and reports the count of AVC denial events from the last 10 minutes. This gives a real-time chart in Netdata Cloud showing SELinux denial spikes — useful for catching misconfigurations after service changes or package updates.
|
||||
|
||||
## Where It's Deployed
|
||||
|
||||
| Host | OS | SELinux | Chart Installed |
|
||||
|------|----|---------|-----------------|
|
||||
| majorhome | Fedora 43 | Enforcing | Yes |
|
||||
| majorlab | Fedora 43 | Enforcing | Yes |
|
||||
| majormail | Fedora 43 | Enforcing | Yes |
|
||||
| majordiscord | Fedora 43 | Enforcing | Yes |
|
||||
|
||||
Ubuntu hosts (dca, teelia, tttpod, majortoot, majorlinux) do not run SELinux and do not have this chart.
|
||||
|
||||
## Installation
|
||||
|
||||
### 1. Create the Chart Plugin
|
||||
|
||||
Create `/etc/netdata/charts.d/selinux.chart.sh`:
|
||||
|
||||
```bash
|
||||
cat > /etc/netdata/charts.d/selinux.chart.sh << 'EOF'
|
||||
# SELinux AVC denial counter for Netdata charts.d
|
||||
selinux_update_every=60
|
||||
selinux_priority=90000
|
||||
|
||||
selinux_check() {
|
||||
which ausearch >/dev/null 2>&1 || return 1
|
||||
return 0
|
||||
}
|
||||
|
||||
selinux_create() {
|
||||
cat <<CHART
|
||||
CHART selinux.avc_denials '' 'SELinux AVC Denials (last 10 min)' 'denials' selinux '' line 90000 $selinux_update_every ''
|
||||
DIMENSION denials '' absolute 1 1
|
||||
CHART
|
||||
return 0
|
||||
}
|
||||
|
||||
selinux_update() {
|
||||
local count
|
||||
count=$(sudo /usr/bin/ausearch -m avc -if /var/log/audit/audit.log -ts recent 2>/dev/null | grep -c "type=AVC")
|
||||
echo "BEGIN selinux.avc_denials $1"
|
||||
echo "SET denials = ${count}"
|
||||
echo "END"
|
||||
return 0
|
||||
}
|
||||
EOF
|
||||
```
|
||||
|
||||
### 2. Grant Netdata Sudo Access to ausearch
|
||||
|
||||
`ausearch` requires root to read the audit log. Add a sudoers entry for the `netdata` user:
|
||||
|
||||
```bash
|
||||
echo 'netdata ALL=(root) NOPASSWD: /usr/bin/ausearch -m avc -if /var/log/audit/audit.log -ts recent' > /etc/sudoers.d/netdata-selinux
|
||||
chmod 440 /etc/sudoers.d/netdata-selinux
|
||||
visudo -c
|
||||
```
|
||||
|
||||
The `visudo -c` validates syntax. If it reports errors, fix the file before proceeding — a broken sudoers file can lock out sudo entirely.
|
||||
|
||||
### 3. Restart Netdata
|
||||
|
||||
```bash
|
||||
systemctl restart netdata
|
||||
```
|
||||
|
||||
### 4. Verify
|
||||
|
||||
Check that the chart is collecting data:
|
||||
|
||||
```bash
|
||||
curl -s 'http://localhost:19999/api/v1/chart?chart=selinux.avc_denials' | python3 -c "
|
||||
import sys, json
|
||||
d = json.load(sys.stdin)
|
||||
print(f'Chart: {d[\"id\"]}')
|
||||
print(f'Update every: {d[\"update_every\"]}s')
|
||||
print(f'Type: {d[\"chart_type\"]}')
|
||||
"
|
||||
```
|
||||
|
||||
If the chart doesn't appear, check that `charts.d` is enabled in `/etc/netdata/netdata.conf` and that the plugin file is readable by the `netdata` user.
|
||||
|
||||
## Known Side Effect: pam_systemd Log Noise
|
||||
|
||||
Because the `netdata` user calls `sudo ausearch` every 60 seconds, `pam_systemd` logs a warning each time:
|
||||
|
||||
```
|
||||
pam_systemd(sudo:session): Failed to check if /run/user/0/bus exists, ignoring: Permission denied
|
||||
```
|
||||
|
||||
This is cosmetic. The `sudo` command succeeds — `pam_systemd` just can't find a D-Bus user session for the `netdata` service account, which is expected. The message volume scales with the collection interval (1,440/day at 60-second intervals).
|
||||
|
||||
**To suppress it**, the `system-auth` PAM config on Fedora already marks `pam_systemd.so` as `-session optional` (the `-` prefix means "don't fail if the module errors"). The messages are informational log noise, not actual failures. No PAM changes are needed.
|
||||
|
||||
If the log volume is a concern for log analysis or monitoring, filter it at the journald level:
|
||||
|
||||
```ini
|
||||
# /etc/rsyslog.d/suppress-pam-systemd.conf
|
||||
:msg, contains, "pam_systemd(sudo:session): Failed to check" stop
|
||||
```
|
||||
|
||||
Or in Netdata's log alert config, exclude the pattern from any log-based alerts.
|
||||
|
||||
## Fleet Audit
|
||||
|
||||
To verify the chart is deployed and functioning on all Fedora hosts:
|
||||
|
||||
```bash
|
||||
for host in majorhome majorlab majormail majordiscord; do
|
||||
echo -n "=== $host: "
|
||||
ssh root@$host "curl -s 'http://localhost:19999/api/v1/chart?chart=selinux.avc_denials' 2>/dev/null | python3 -c 'import sys,json; d=json.load(sys.stdin); print(d[\"id\"], \"every\", str(d[\"update_every\"])+\"s\")' 2>/dev/null || echo 'NOT FOUND'"
|
||||
done
|
||||
```
|
||||
|
||||
## Related
|
||||
|
||||
- [Deploying Netdata to a New Server](netdata-new-server-setup.md)
|
||||
- [Tuning Netdata Web Log Alerts](tuning-netdata-web-log-alerts.md)
|
||||
- [Tuning Netdata Docker Health Alarms](netdata-docker-health-alarm-tuning.md)
|
||||
- [SELinux: Fixing Dovecot Mail Spool Context](/05-troubleshooting/selinux-dovecot-vmail-context.md)
|
||||
94
02-selfhosting/security/ansible-unattended-upgrades-fleet.md
Normal file
94
02-selfhosting/security/ansible-unattended-upgrades-fleet.md
Normal file
@@ -0,0 +1,94 @@
|
||||
---
|
||||
title: Standardizing unattended-upgrades Across Ubuntu Fleet with Ansible
|
||||
domain: selfhosting
|
||||
category: security
|
||||
tags:
|
||||
- ansible
|
||||
- ubuntu
|
||||
- apt
|
||||
- unattended-upgrades
|
||||
- fleet-management
|
||||
status: published
|
||||
created: '2026-03-16'
|
||||
updated: '2026-03-16'
|
||||
---
|
||||
|
||||
# Standardizing unattended-upgrades Across Ubuntu Fleet with Ansible
|
||||
|
||||
When some Ubuntu hosts in a fleet self-update via `unattended-upgrades` and others don't, they drift apart over time — different kernel versions, different reboot states, inconsistent behavior. This article covers how to diagnose the drift and enforce uniform auto-update config across all Ubuntu hosts using Ansible.
|
||||
|
||||
## Diagnosing the Problem
|
||||
|
||||
If only some Ubuntu hosts are flagging for reboot, check:
|
||||
|
||||
```bash
|
||||
# What triggered the reboot flag?
|
||||
cat /var/run/reboot-required.pkgs
|
||||
|
||||
# Is unattended-upgrades installed and active?
|
||||
systemctl status unattended-upgrades
|
||||
cat /etc/apt/apt.conf.d/20auto-upgrades
|
||||
|
||||
# When did apt last run?
|
||||
ls -lt /var/log/apt/history.log*
|
||||
```
|
||||
|
||||
The reboot flag is written to `/var/run/reboot-required` by `update-notifier-common` when packages like the kernel, glibc, or systemd are updated. If some hosts have `unattended-upgrades` running and others don't, the ones that self-updated will flag for reboot while the others lag behind.
|
||||
|
||||
## The Fix — Ansible Playbook
|
||||
|
||||
Add these tasks to your update playbook **before** the apt cache update step:
|
||||
|
||||
```yaml
|
||||
- name: Ensure unattended-upgrades is installed on Ubuntu servers
|
||||
ansible.builtin.apt:
|
||||
name:
|
||||
- unattended-upgrades
|
||||
- update-notifier-common
|
||||
state: present
|
||||
update_cache: true
|
||||
when: ansible_facts['os_family'] == "Debian"
|
||||
|
||||
- name: Enforce uniform auto-update config on Ubuntu servers
|
||||
ansible.builtin.copy:
|
||||
dest: /etc/apt/apt.conf.d/20auto-upgrades
|
||||
content: |
|
||||
APT::Periodic::Update-Package-Lists "1";
|
||||
APT::Periodic::Unattended-Upgrade "1";
|
||||
owner: root
|
||||
group: root
|
||||
mode: '0644'
|
||||
when: ansible_facts['os_family'] == "Debian"
|
||||
|
||||
- name: Ensure unattended-upgrades service is enabled and running
|
||||
ansible.builtin.systemd:
|
||||
name: unattended-upgrades
|
||||
enabled: true
|
||||
state: started
|
||||
when: ansible_facts['os_family'] == "Debian"
|
||||
```
|
||||
|
||||
Running this across the `ubuntu` group ensures every host has the same config on every Ansible run — idempotent and safe.
|
||||
|
||||
## Rebooting Flagged Hosts
|
||||
|
||||
Once identified, reboot specific hosts without touching the rest:
|
||||
|
||||
```bash
|
||||
# Reboot just the flagging hosts
|
||||
ansible-playbook reboot.yml -l teelia,tttpod
|
||||
|
||||
# Run full update on remaining hosts to bring them up to the same kernel
|
||||
ansible-playbook update.yml -l dca,majorlinux,majortoot
|
||||
```
|
||||
|
||||
## Notes
|
||||
|
||||
- `unattended-upgrades` runs daily on its own schedule — hosts that haven't checked yet will lag behind but catch up within 24 hours
|
||||
- The other hosts showing `ok` (not `changed`) on the config tasks means they were already correctly configured
|
||||
- After a kernel update is pulled, only an actual reboot clears the `/var/run/reboot-required` flag — Ansible reporting the flag is informational only
|
||||
|
||||
## See Also
|
||||
|
||||
- [[ansible-getting-started|Ansible Getting Started]]
|
||||
- [[linux-server-hardening-checklist|Linux Server Hardening Checklist]]
|
||||
89
03-opensource/alternatives/freshrss.md
Normal file
89
03-opensource/alternatives/freshrss.md
Normal file
@@ -0,0 +1,89 @@
|
||||
# FreshRSS — Self-Hosted RSS Reader
|
||||
|
||||
## Problem
|
||||
|
||||
RSS is the best way to follow websites, blogs, and podcasts without algorithmic feeds, engagement bait, or data harvesting. But hosted RSS services like Feedly gate features behind subscriptions and still have access to your reading habits. Google killed Google Reader in 2013 and has been trying to kill RSS ever since.
|
||||
|
||||
## Solution
|
||||
|
||||
[FreshRSS](https://freshrss.org) is a self-hosted RSS aggregator. It fetches and stores your feeds on your own server, presents a clean reading interface, and syncs with mobile apps via standard APIs (Fever, Google Reader, Nextcloud News). No subscription, no tracking, no feed limits.
|
||||
|
||||
---
|
||||
|
||||
## Deployment (Docker)
|
||||
|
||||
```yaml
|
||||
services:
|
||||
freshrss:
|
||||
image: freshrss/freshrss:latest
|
||||
container_name: freshrss
|
||||
restart: unless-stopped
|
||||
ports:
|
||||
- "8086:80"
|
||||
volumes:
|
||||
- ./freshrss/data:/var/www/FreshRSS/data
|
||||
- ./freshrss/extensions:/var/www/FreshRSS/extensions
|
||||
environment:
|
||||
- TZ=America/New_York
|
||||
- CRON_MIN=*/15 # fetch feeds every 15 minutes
|
||||
```
|
||||
|
||||
### Caddy reverse proxy
|
||||
|
||||
```
|
||||
rss.yourdomain.com {
|
||||
reverse_proxy localhost:8086
|
||||
}
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Initial Setup
|
||||
|
||||
1. Browse to your FreshRSS URL and run through the setup wizard
|
||||
2. Create an admin account
|
||||
3. Go to **Settings → Authentication** — enable API access if you want mobile app sync
|
||||
4. Start adding feeds under **Subscriptions → Add a feed**
|
||||
|
||||
---
|
||||
|
||||
## Mobile App Sync
|
||||
|
||||
FreshRSS exposes a Google Reader-compatible API that most RSS apps support:
|
||||
|
||||
| App | Platform | Protocol |
|
||||
|---|---|---|
|
||||
| NetNewsWire | iOS / macOS | Fever or GReader |
|
||||
| Reeder | iOS / macOS | GReader |
|
||||
| ReadYou | Android | GReader |
|
||||
| FeedMe | Android | GReader / Fever |
|
||||
|
||||
**API URL format:** `https://rss.yourdomain.com/api/greader.php`
|
||||
|
||||
Enable the API in FreshRSS: **Settings → Authentication → Allow API access**
|
||||
|
||||
---
|
||||
|
||||
## Feed Auto-Refresh
|
||||
|
||||
The `CRON_MIN=*/15` environment variable runs feed fetching every 15 minutes inside the container. For more control, add a host-level cron job:
|
||||
|
||||
```bash
|
||||
# Fetch all feeds every 10 minutes
|
||||
*/10 * * * * docker exec freshrss php /var/www/FreshRSS/app/actualize_script.php
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Why RSS Over Social Media
|
||||
|
||||
- **You control the feed** — no algorithm decides what you see or in what order
|
||||
- **No engagement optimization** — content ranked by publish date, not outrage potential
|
||||
- **Portable** — OPML export lets you move your subscriptions to any reader
|
||||
- **Works forever** — RSS has been around since 1999 and isn't going anywhere
|
||||
|
||||
---
|
||||
|
||||
## Tags
|
||||
|
||||
#freshrss #rss #self-hosting #docker #linux #alternatives #privacy
|
||||
95
03-opensource/alternatives/gitea.md
Normal file
95
03-opensource/alternatives/gitea.md
Normal file
@@ -0,0 +1,95 @@
|
||||
# Gitea — Self-Hosted Git
|
||||
|
||||
## Problem
|
||||
|
||||
GitHub is the default home for code, but it's a Microsoft-owned centralized service. Your repositories, commit history, issues, and CI/CD pipelines are all under someone else's control. For personal projects and private infrastructure, there's no reason to depend on it.
|
||||
|
||||
## Solution
|
||||
|
||||
[Gitea](https://gitea.com) is a lightweight, self-hosted Git service. It provides the full GitHub-style workflow — repositories, branches, pull requests, webhooks, and a web UI — in a single binary or Docker container that runs comfortably on low-spec hardware.
|
||||
|
||||
---
|
||||
|
||||
## Deployment (Docker)
|
||||
|
||||
```yaml
|
||||
services:
|
||||
gitea:
|
||||
image: docker.gitea.com/gitea:latest
|
||||
container_name: gitea
|
||||
restart: unless-stopped
|
||||
ports:
|
||||
- "3002:3000"
|
||||
- "222:22" # SSH git access
|
||||
volumes:
|
||||
- ./gitea:/data
|
||||
environment:
|
||||
- USER_UID=1000
|
||||
- USER_GID=1000
|
||||
- GITEA__database__DB_TYPE=sqlite3
|
||||
```
|
||||
|
||||
SQLite is fine for personal use. For team use, swap in PostgreSQL or MySQL.
|
||||
|
||||
### Caddy reverse proxy
|
||||
|
||||
```
|
||||
git.yourdomain.com {
|
||||
reverse_proxy localhost:3002
|
||||
}
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Initial Setup
|
||||
|
||||
1. Browse to your Gitea URL — the first-run wizard handles configuration
|
||||
2. Set the server URL to your public domain
|
||||
3. Create an admin account
|
||||
4. Configure SSH access if you want `git@git.yourdomain.com` cloning
|
||||
|
||||
---
|
||||
|
||||
## Webhooks
|
||||
|
||||
Gitea's webhook system is how automated pipelines get triggered on push. Example use case — auto-deploy a MkDocs wiki on every push:
|
||||
|
||||
1. Go to repo → **Settings → Webhooks → Add Webhook**
|
||||
2. Set the payload URL to your webhook endpoint (e.g. `https://notes.yourdomain.com/webhook`)
|
||||
3. Set content type to `application/json`
|
||||
4. Select **Push events**
|
||||
|
||||
The webhook fires on every `git push`, allowing the receiving server to pull and rebuild automatically. See [MajorWiki Setup & Pipeline](../../05-troubleshooting/majwiki-setup-and-pipeline.md) for a complete example.
|
||||
|
||||
---
|
||||
|
||||
## Migrating from GitHub
|
||||
|
||||
Gitea can mirror GitHub repos and import them directly:
|
||||
|
||||
```bash
|
||||
# Clone from GitHub, push to Gitea
|
||||
git clone --mirror https://github.com/user/repo.git
|
||||
cd repo.git
|
||||
git remote set-url origin https://git.yourdomain.com/user/repo.git
|
||||
git push --mirror
|
||||
```
|
||||
|
||||
Or use the Gitea web UI: **+ → New Migration → GitHub**
|
||||
|
||||
---
|
||||
|
||||
## Why Not Just Use GitHub?
|
||||
|
||||
For public open source — GitHub is fine, the network effects are real. For private infrastructure code, personal projects, and anything you'd rather not hand to Microsoft:
|
||||
|
||||
- Full control over your data and access
|
||||
- No rate limits, no storage quotas on your own hardware
|
||||
- Webhooks and integrations without paying for GitHub Actions minutes
|
||||
- Works entirely over Tailscale — no public exposure required
|
||||
|
||||
---
|
||||
|
||||
## Tags
|
||||
|
||||
#gitea #git #self-hosting #docker #linux #alternatives #vcs
|
||||
88
03-opensource/alternatives/searxng.md
Normal file
88
03-opensource/alternatives/searxng.md
Normal file
@@ -0,0 +1,88 @@
|
||||
# SearXNG — Private Self-Hosted Search
|
||||
|
||||
## Problem
|
||||
|
||||
Every search query sent to Google, Bing, or DuckDuckGo is logged, profiled, and used to build an advertising model of you. Even "private" search engines are still third-party services with their own data retention policies.
|
||||
|
||||
## Solution
|
||||
|
||||
[SearXNG](https://github.com/searxng/searxng) is a self-hosted metasearch engine. It queries multiple search engines simultaneously on your behalf — without sending any identifying information — and aggregates the results. The search engines see a request from your server, not from you.
|
||||
|
||||
Your queries stay on your infrastructure.
|
||||
|
||||
---
|
||||
|
||||
## Deployment (Docker)
|
||||
|
||||
```yaml
|
||||
services:
|
||||
searxng:
|
||||
image: searxng/searxng:latest
|
||||
container_name: searxng
|
||||
restart: unless-stopped
|
||||
ports:
|
||||
- "8090:8080"
|
||||
volumes:
|
||||
- ./searxng:/etc/searxng
|
||||
environment:
|
||||
- SEARXNG_BASE_URL=https://search.yourdomain.com/
|
||||
```
|
||||
|
||||
SearXNG requires a `settings.yml` in the mounted config directory. Generate one from the default:
|
||||
|
||||
```bash
|
||||
docker run --rm searxng/searxng cat /etc/searxng/settings.yml > ./searxng/settings.yml
|
||||
```
|
||||
|
||||
Key settings to configure in `settings.yml`:
|
||||
|
||||
```yaml
|
||||
server:
|
||||
secret_key: "generate-a-random-string-here"
|
||||
bind_address: "0.0.0.0"
|
||||
|
||||
search:
|
||||
safe_search: 0
|
||||
default_lang: "en"
|
||||
|
||||
engines:
|
||||
# Enable/disable specific engines here
|
||||
```
|
||||
|
||||
### Caddy reverse proxy
|
||||
|
||||
```
|
||||
search.yourdomain.com {
|
||||
reverse_proxy localhost:8090
|
||||
}
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Using SearXNG as an AI Search Backend
|
||||
|
||||
SearXNG integrates directly with Open WebUI as a web search provider, giving your local AI access to current web results without any third-party API keys:
|
||||
|
||||
**Open WebUI → Settings → Web Search:**
|
||||
- Enable web search
|
||||
- Set provider to `searxng`
|
||||
- Set URL to `http://searxng:8080` (internal Docker network) or your Tailscale/local address
|
||||
|
||||
This is how MajorTwin gets current web context — queries go through SearXNG, not Google.
|
||||
|
||||
---
|
||||
|
||||
## Why Not DuckDuckGo?
|
||||
|
||||
DDG is better than Google for privacy, but it's still a centralized third-party service. SearXNG:
|
||||
|
||||
- Runs on your own hardware
|
||||
- Has no account, no cookies, no session tracking
|
||||
- Lets you choose which upstream engines to use and weight
|
||||
- Can be kept entirely off the public internet (Tailscale-only)
|
||||
|
||||
---
|
||||
|
||||
## Tags
|
||||
|
||||
#searxng #search #privacy #self-hosting #docker #linux #alternatives
|
||||
102
03-opensource/dev-tools/rsync.md
Normal file
102
03-opensource/dev-tools/rsync.md
Normal file
@@ -0,0 +1,102 @@
|
||||
# rsync — Fast, Resumable File Transfers
|
||||
|
||||
## Problem
|
||||
|
||||
Copying large files or directory trees between drives or servers is slow, fragile, and unresumable with `cp`. A dropped connection or a single error means starting over. You also want to skip files that already exist at the destination without re-copying them.
|
||||
|
||||
## Solution
|
||||
|
||||
`rsync` is a file synchronization tool that only transfers what has changed, preserves metadata, and can resume interrupted transfers. It works locally and over SSH.
|
||||
|
||||
### Installation (Fedora)
|
||||
|
||||
```bash
|
||||
sudo dnf install rsync
|
||||
```
|
||||
|
||||
### Basic Local Copy
|
||||
|
||||
```bash
|
||||
rsync -av /source/ /destination/
|
||||
```
|
||||
|
||||
- `-a` — archive mode: preserves permissions, timestamps, symlinks, ownership
|
||||
- `-v` — verbose: shows what's being transferred
|
||||
|
||||
**Trailing slash on source matters:**
|
||||
- `/source/` — copy the *contents* of source into destination
|
||||
- `/source` — copy the source *directory itself* into destination
|
||||
|
||||
### Resume an Interrupted Transfer
|
||||
|
||||
```bash
|
||||
rsync -av --partial --progress /source/ /destination/
|
||||
```
|
||||
|
||||
- `--partial` — keeps partially transferred files so they can be resumed
|
||||
- `--progress` — shows per-file progress and speed
|
||||
|
||||
### Skip Already-Transferred Files
|
||||
|
||||
```bash
|
||||
rsync -av --ignore-existing /source/ /destination/
|
||||
```
|
||||
|
||||
Useful when restarting a migration — skips anything already at the destination regardless of timestamp comparison.
|
||||
|
||||
### Dry Run First
|
||||
|
||||
Always preview what rsync will do before committing:
|
||||
|
||||
```bash
|
||||
rsync -av --dry-run /source/ /destination/
|
||||
```
|
||||
|
||||
No files are moved. Output shows exactly what would happen.
|
||||
|
||||
### Transfer Over SSH
|
||||
|
||||
```bash
|
||||
rsync -av -e ssh /source/ user@remotehost:/destination/
|
||||
```
|
||||
|
||||
Or with a non-standard port:
|
||||
|
||||
```bash
|
||||
rsync -av -e "ssh -p 2222" /source/ user@remotehost:/destination/
|
||||
```
|
||||
|
||||
### Exclude Patterns
|
||||
|
||||
```bash
|
||||
rsync -av --exclude='*.tmp' --exclude='.Trash*' /source/ /destination/
|
||||
```
|
||||
|
||||
### Real-World Use
|
||||
|
||||
Migrating ~286 files from `/majorRAID` to `/majorstorage` during a RAID dissolution project:
|
||||
|
||||
```bash
|
||||
rsync -av --partial --progress --ignore-existing \
|
||||
/majorRAID/ /majorstorage/ \
|
||||
2>&1 | tee /root/raid_migrate.log
|
||||
```
|
||||
|
||||
Run inside a `tmux` or `screen` session so it survives SSH disconnects:
|
||||
|
||||
```bash
|
||||
tmux new-session -d -s rsync-migrate \
|
||||
"rsync -av --partial --progress /majorRAID/ /majorstorage/ | tee /root/raid_migrate.log"
|
||||
```
|
||||
|
||||
### Check Progress on a Running Transfer
|
||||
|
||||
```bash
|
||||
tail -f /root/raid_migrate.log
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Tags
|
||||
|
||||
#rsync #linux #storage #file-transfer #sysadmin #dev-tools
|
||||
76
03-opensource/dev-tools/screen.md
Normal file
76
03-opensource/dev-tools/screen.md
Normal file
@@ -0,0 +1,76 @@
|
||||
# screen — Simple Persistent Terminal Sessions
|
||||
|
||||
## Problem
|
||||
|
||||
Same problem as tmux: SSH sessions die, jobs get killed, long-running tasks need to survive disconnects. screen is the older, simpler alternative to tmux — universally available and gets the job done with minimal setup.
|
||||
|
||||
## Solution
|
||||
|
||||
`screen` creates detachable terminal sessions. It's installed by default on many systems, making it useful when tmux isn't available.
|
||||
|
||||
### Installation (Fedora)
|
||||
|
||||
```bash
|
||||
sudo dnf install screen
|
||||
```
|
||||
|
||||
### Core Workflow
|
||||
|
||||
```bash
|
||||
# Start a named session
|
||||
screen -S mysession
|
||||
|
||||
# Detach (keeps running)
|
||||
Ctrl+a, d
|
||||
|
||||
# List sessions
|
||||
screen -list
|
||||
|
||||
# Reattach
|
||||
screen -r mysession
|
||||
|
||||
# If session shows as "Attached" (stuck)
|
||||
screen -d -r mysession
|
||||
```
|
||||
|
||||
### Start a Background Job Directly
|
||||
|
||||
```bash
|
||||
screen -dmS mysession bash -c "long-running-command 2>&1 | tee /root/output.log"
|
||||
```
|
||||
|
||||
- `-d` — start detached
|
||||
- `-m` — create new session even if already inside screen
|
||||
- `-S` — name the session
|
||||
|
||||
### Capture Current Output Without Attaching
|
||||
|
||||
```bash
|
||||
screen -S mysession -X hardcopy /tmp/screen_output.txt
|
||||
cat /tmp/screen_output.txt
|
||||
```
|
||||
|
||||
### Send a Command to a Running Session
|
||||
|
||||
```bash
|
||||
screen -S mysession -X stuff "tail -f /root/output.log\n"
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## screen vs tmux
|
||||
|
||||
| Feature | screen | tmux |
|
||||
|---|---|---|
|
||||
| Availability | Installed by default on most systems | Usually needs installing |
|
||||
| Split panes | Basic (Ctrl+a, S) | Better (Ctrl+b, ") |
|
||||
| Scripting | Limited | More capable |
|
||||
| Config complexity | Simple | More options |
|
||||
|
||||
Use screen when it's already there or for quick throwaway sessions. Use tmux for anything more complex. See [tmux](tmux.md).
|
||||
|
||||
---
|
||||
|
||||
## Tags
|
||||
|
||||
#screen #terminal #linux #ssh #productivity #dev-tools
|
||||
93
03-opensource/dev-tools/tmux.md
Normal file
93
03-opensource/dev-tools/tmux.md
Normal file
@@ -0,0 +1,93 @@
|
||||
# tmux — Persistent Terminal Sessions
|
||||
|
||||
## Problem
|
||||
|
||||
SSH sessions die when your connection drops, your laptop closes, or you walk away. Long-running jobs — storage migrations, file scans, downloads — get killed mid-run. You need a way to detach from a session, come back later, and pick up exactly where you left off.
|
||||
|
||||
## Solution
|
||||
|
||||
`tmux` is a terminal multiplexer. It runs sessions that persist independently of your SSH connection. You can detach, disconnect, reconnect from a different machine, and reattach to find everything still running.
|
||||
|
||||
### Installation (Fedora)
|
||||
|
||||
```bash
|
||||
sudo dnf install tmux
|
||||
```
|
||||
|
||||
### Core Workflow
|
||||
|
||||
```bash
|
||||
# Start a named session
|
||||
tmux new-session -s mysession
|
||||
|
||||
# Detach from a session (keeps it running)
|
||||
Ctrl+b, d
|
||||
|
||||
# List running sessions
|
||||
tmux ls
|
||||
|
||||
# Reattach to a session
|
||||
tmux attach -t mysession
|
||||
|
||||
# Kill a session when done
|
||||
tmux kill-session -t mysession
|
||||
```
|
||||
|
||||
### Start a Background Job Directly
|
||||
|
||||
Skip the interactive session entirely — start a job in a new detached session in one command:
|
||||
|
||||
```bash
|
||||
tmux new-session -d -s rmlint2 "rmlint /majorstorage// /mnt/usb// /majorRAID 2>&1 | tee /majorRAID/rmlint_scan2.log"
|
||||
```
|
||||
|
||||
The job runs immediately in the background. Attach later to check progress:
|
||||
|
||||
```bash
|
||||
tmux attach -t rmlint2
|
||||
```
|
||||
|
||||
### Capture Output Without Attaching
|
||||
|
||||
Read the current state of a session without interrupting it:
|
||||
|
||||
```bash
|
||||
tmux capture-pane -t rmlint2 -p
|
||||
```
|
||||
|
||||
### Split Panes
|
||||
|
||||
Monitor multiple things in one terminal window:
|
||||
|
||||
```bash
|
||||
# Horizontal split (top/bottom)
|
||||
Ctrl+b, "
|
||||
|
||||
# Vertical split (left/right)
|
||||
Ctrl+b, %
|
||||
|
||||
# Switch between panes
|
||||
Ctrl+b, arrow keys
|
||||
```
|
||||
|
||||
### Real-World Use
|
||||
|
||||
On **majorhome**, all long-running storage operations run inside named tmux sessions so they survive SSH disconnects:
|
||||
|
||||
```bash
|
||||
tmux new-session -d -s rmlint2 "rmlint ..." # dedup scan
|
||||
tmux new-session -d -s rsync-migrate "rsync ..." # file migration
|
||||
tmux ls # check what's running
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## tmux vs screen
|
||||
|
||||
Both work. tmux has better split-pane support and scripting. screen is simpler and more universally installed. I use both — tmux for new jobs, screen for legacy ones. See the [screen](screen.md) article for reference.
|
||||
|
||||
---
|
||||
|
||||
## Tags
|
||||
|
||||
#tmux #terminal #linux #ssh #productivity #dev-tools
|
||||
22
03-opensource/index.md
Normal file
22
03-opensource/index.md
Normal file
@@ -0,0 +1,22 @@
|
||||
# 📂 Open Source & Alternatives
|
||||
|
||||
A curated collection of my favorite open-source tools and privacy-respecting alternatives to mainstream software.
|
||||
|
||||
## 🔄 Alternatives
|
||||
- [SearXNG: Private Self-Hosted Search](alternatives/searxng.md)
|
||||
- [FreshRSS: Self-Hosted RSS Reader](alternatives/freshrss.md)
|
||||
- [Gitea: Self-Hosted Git](alternatives/gitea.md)
|
||||
|
||||
## 🚀 Productivity
|
||||
- [rmlint: Duplicate File Scanning](productivity/rmlint-duplicate-scanning.md)
|
||||
|
||||
## 🛠️ Development Tools
|
||||
- [tmux: Persistent Terminal Sessions](dev-tools/tmux.md)
|
||||
- [screen: Simple Persistent Sessions](dev-tools/screen.md)
|
||||
- [rsync: Fast, Resumable File Transfers](dev-tools/rsync.md)
|
||||
|
||||
## 🎨 Media & Creative
|
||||
- [yt-dlp: Video Downloading](media-creative/yt-dlp.md)
|
||||
|
||||
## 🔐 Privacy & Security
|
||||
- [Vaultwarden: Self-Hosted Password Manager](privacy-security/vaultwarden.md)
|
||||
129
03-opensource/media-creative/yt-dlp.md
Normal file
129
03-opensource/media-creative/yt-dlp.md
Normal file
@@ -0,0 +1,129 @@
|
||||
# yt-dlp — Video Downloading
|
||||
|
||||
## What It Is
|
||||
|
||||
`yt-dlp` is a feature-rich command-line video downloader, forked from youtube-dl with active maintenance and significantly better performance. It supports YouTube, Twitch, and hundreds of other sites.
|
||||
|
||||
---
|
||||
|
||||
## Installation
|
||||
|
||||
### Fedora
|
||||
```bash
|
||||
sudo dnf install yt-dlp
|
||||
# or latest via pip:
|
||||
sudo pip install yt-dlp --break-system-packages
|
||||
```
|
||||
|
||||
### Update
|
||||
```bash
|
||||
sudo pip install -U yt-dlp --break-system-packages
|
||||
# or if installed as standalone binary:
|
||||
yt-dlp -U
|
||||
```
|
||||
|
||||
Keep it current — YouTube pushes extractor changes frequently and old versions break.
|
||||
|
||||
---
|
||||
|
||||
## Basic Usage
|
||||
|
||||
```bash
|
||||
# Download a single video (best quality)
|
||||
yt-dlp https://www.youtube.com/watch?v=VIDEO_ID
|
||||
|
||||
# Download to a specific directory with title as filename
|
||||
yt-dlp -o "/path/to/output/%(title)s.%(ext)s" URL
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Plex-Optimized Download
|
||||
|
||||
Download best quality and auto-convert to HEVC for Apple TV direct play:
|
||||
|
||||
```bash
|
||||
yt-dlp URL
|
||||
```
|
||||
|
||||
That's it — if your config is set up correctly (see Config File section below). The config handles format selection, output path, subtitles, and automatic AV1/VP9 → HEVC conversion.
|
||||
|
||||
> [!note] `bestvideo[ext=mp4]` caps at 1080p because YouTube only serves H.264 up to 1080p. Use `bestvideo+bestaudio` to get true 4K, then let the post-download hook convert AV1/VP9 to HEVC. See [Plex 4K Codec Compatibility](../../04-streaming/plex/plex-4k-codec-compatibility.md) for the full setup.
|
||||
|
||||
---
|
||||
|
||||
## Playlists and Channels
|
||||
|
||||
```bash
|
||||
# Download a full playlist
|
||||
yt-dlp -o "%(playlist_index)s - %(title)s.%(ext)s" PLAYLIST_URL
|
||||
|
||||
# Download only videos not already present
|
||||
yt-dlp --download-archive archive.txt PLAYLIST_URL
|
||||
```
|
||||
|
||||
`--download-archive` maintains a file of completed video IDs — re-running the command skips already-downloaded videos automatically.
|
||||
|
||||
---
|
||||
|
||||
## Format Selection
|
||||
|
||||
```bash
|
||||
# List all available formats for a video
|
||||
yt-dlp --list-formats URL
|
||||
|
||||
# Download best video + best audio, merge to mp4
|
||||
yt-dlp -f 'bestvideo+bestaudio' --merge-output-format mp4 URL
|
||||
|
||||
# Download audio only (MP3)
|
||||
yt-dlp -x --audio-format mp3 URL
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Config File
|
||||
|
||||
Persist your preferred flags so you don't repeat them every command:
|
||||
|
||||
```bash
|
||||
mkdir -p ~/.config/yt-dlp
|
||||
cat > ~/.config/yt-dlp/config << 'EOF'
|
||||
--remote-components ejs:github
|
||||
--format bestvideo+bestaudio
|
||||
--merge-output-format mp4
|
||||
--output /plex/plex/%(title)s.%(ext)s
|
||||
--write-auto-subs
|
||||
--embed-subs
|
||||
--exec /usr/local/bin/yt-dlp-hevc-convert.sh {}
|
||||
EOF
|
||||
```
|
||||
|
||||
After this, a bare `yt-dlp URL` downloads best quality, saves to `/plex/plex/`, embeds subtitles, and auto-converts AV1/VP9 to HEVC. See [Plex 4K Codec Compatibility](../../04-streaming/plex/plex-4k-codec-compatibility.md) for the conversion hook setup.
|
||||
|
||||
---
|
||||
|
||||
## Running Long Downloads in the Background
|
||||
|
||||
For large downloads or playlists, run inside `screen` or `tmux` so they survive SSH disconnects:
|
||||
|
||||
```bash
|
||||
screen -dmS yt-download bash -c \
|
||||
"yt-dlp -o '/plex/plex/%(title)s.%(ext)s' PLAYLIST_URL 2>&1 | tee ~/yt-download.log"
|
||||
|
||||
# Check progress
|
||||
screen -r yt-download
|
||||
# or
|
||||
tail -f ~/yt-download.log
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Troubleshooting
|
||||
|
||||
For YouTube JS challenge errors, missing formats, and n-challenge failures on Fedora — see [yt-dlp YouTube JS Challenge Fix](../../05-troubleshooting/yt-dlp-fedora-js-challenge.md).
|
||||
|
||||
---
|
||||
|
||||
## Tags
|
||||
|
||||
#yt-dlp #youtube #video #plex #linux #media #dev-tools
|
||||
95
03-opensource/privacy-security/vaultwarden.md
Normal file
95
03-opensource/privacy-security/vaultwarden.md
Normal file
@@ -0,0 +1,95 @@
|
||||
# Vaultwarden — Self-Hosted Password Manager
|
||||
|
||||
## Problem
|
||||
|
||||
Password managers are a necessity, but handing your credentials to a third-party cloud service is a trust problem. Bitwarden is open source and privacy-respecting, but if you're already running a homelab, there's no reason to depend on their servers.
|
||||
|
||||
## Solution
|
||||
|
||||
[Vaultwarden](https://github.com/dani-garcia/vaultwarden) is an unofficial, lightweight Bitwarden-compatible server written in Rust. It exposes the same API that all official Bitwarden clients speak — desktop apps, browser extensions, mobile apps — so you get the full Bitwarden UX pointed at your own hardware.
|
||||
|
||||
Your passwords never leave your network.
|
||||
|
||||
---
|
||||
|
||||
## Deployment (Docker + Caddy)
|
||||
|
||||
### docker-compose.yml
|
||||
|
||||
```yaml
|
||||
services:
|
||||
vaultwarden:
|
||||
image: vaultwarden/server:latest
|
||||
container_name: vaultwarden
|
||||
restart: unless-stopped
|
||||
environment:
|
||||
- DOMAIN=https://vault.yourdomain.com
|
||||
- SIGNUPS_ALLOWED=false # disable after creating your account
|
||||
volumes:
|
||||
- ./vw-data:/data
|
||||
ports:
|
||||
- "8080:80"
|
||||
```
|
||||
|
||||
Start it:
|
||||
|
||||
```bash
|
||||
sudo docker compose up -d
|
||||
```
|
||||
|
||||
### Caddy reverse proxy
|
||||
|
||||
```
|
||||
vault.yourdomain.com {
|
||||
reverse_proxy localhost:8080
|
||||
}
|
||||
```
|
||||
|
||||
Caddy handles TLS automatically. No extra cert config needed.
|
||||
|
||||
---
|
||||
|
||||
## Initial Setup
|
||||
|
||||
1. Browse to `https://vault.yourdomain.com` and create your account
|
||||
2. Set `SIGNUPS_ALLOWED=false` in the compose file and restart the container
|
||||
3. Install any official Bitwarden client (browser extension, desktop, mobile)
|
||||
4. In the client, set the **Server URL** to `https://vault.yourdomain.com` before logging in
|
||||
|
||||
That's it. The client has no idea it's not talking to Bitwarden's servers.
|
||||
|
||||
---
|
||||
|
||||
## Access Model
|
||||
|
||||
On MajorInfrastructure, Vaultwarden runs on **majorlab** and is accessible:
|
||||
|
||||
- **Internally** — via Caddy on the local network
|
||||
- **Remotely** — via Tailscale; vault is reachable from any device on the tailnet without exposing it to the public internet
|
||||
|
||||
This means the Caddy vhost does not need to be publicly routable. You can choose to expose it publicly (Let's Encrypt works fine) or keep it Tailscale-only.
|
||||
|
||||
---
|
||||
|
||||
## Backup
|
||||
|
||||
Vaultwarden stores everything in a single SQLite database at `./vw-data/db.sqlite3`. Back it up like any file:
|
||||
|
||||
```bash
|
||||
# Simple copy (stop container first for consistency, or use sqlite backup mode)
|
||||
sqlite3 /path/to/vw-data/db.sqlite3 ".backup '/path/to/backup/vw-backup-$(date +%F).sqlite3'"
|
||||
```
|
||||
|
||||
Or include the `vw-data/` directory in your regular rsync backup run.
|
||||
|
||||
---
|
||||
|
||||
## Why Not Bitwarden (Official)?
|
||||
|
||||
The official Bitwarden server is also open source but requires significantly more resources (multiple services, SQL Server). Vaultwarden runs in a single container on minimal RAM and handles everything a personal or family vault needs.
|
||||
|
||||
---
|
||||
|
||||
## Tags
|
||||
|
||||
#vaultwarden #bitwarden #passwords #privacy #self-hosting #docker #linux
|
||||
58
03-opensource/productivity/rmlint-duplicate-scanning.md
Normal file
58
03-opensource/productivity/rmlint-duplicate-scanning.md
Normal file
@@ -0,0 +1,58 @@
|
||||
# rmlint — Extreme Duplicate File Scanning
|
||||
|
||||
## Problem
|
||||
|
||||
Over time, backups and media collections can accumulate massive amounts of duplicate data. Traditional duplicate finders are often slow and limited in how they handle results. On MajorRAID, I identified **~4.0 TB (113,584 files)** of duplicate data across three different storage points.
|
||||
|
||||
## Solution
|
||||
|
||||
`rmlint` is an extremely fast tool for finding (and optionally removing) duplicates. It is significantly faster than `fdupes` or `rdfind` because it uses a multi-stage approach to avoid unnecessary hashing.
|
||||
|
||||
### 1. Installation (Fedora)
|
||||
|
||||
```bash
|
||||
sudo dnf install rmlint
|
||||
```
|
||||
|
||||
### 2. Scanning Multiple Directories
|
||||
|
||||
To scan for duplicates across multiple mount points and compare them:
|
||||
|
||||
```bash
|
||||
rmlint /majorstorage /majorRAID /mnt/usb
|
||||
```
|
||||
|
||||
This will generate a script named `rmlint.sh` and a summary of the findings.
|
||||
|
||||
### 3. Reviewing Results
|
||||
|
||||
**DO NOT** run the generated script without reviewing it first. You can use the summary to see which paths contain the most duplicates:
|
||||
|
||||
```bash
|
||||
# View the summary
|
||||
cat rmlint.json | jq .
|
||||
```
|
||||
|
||||
### 4. Advanced Usage: Finding Duplicates by Hash Only
|
||||
|
||||
If you suspect duplicates with different filenames:
|
||||
|
||||
```bash
|
||||
rmlint --hidden --hard-links /path/to/search
|
||||
```
|
||||
|
||||
### 5. Repurposing Storage
|
||||
|
||||
After scanning and clearing duplicates, you can reclaim significant space. In my case, this was the first step in repurposing a 12TB USB drive as a **SnapRAID parity drive**.
|
||||
|
||||
---
|
||||
|
||||
## Maintenance
|
||||
|
||||
Run a scan monthly or before any major storage consolidation project.
|
||||
|
||||
---
|
||||
|
||||
## Tags
|
||||
|
||||
#rmlint #linux #storage #cleanup #duplicates
|
||||
@@ -5,3 +5,7 @@ Guides for live streaming and podcast production, with a focus on OBS Studio.
|
||||
## OBS Studio
|
||||
|
||||
- [OBS Studio Setup & Encoding](obs/obs-studio-setup-encoding.md)
|
||||
|
||||
## Plex
|
||||
|
||||
- [Plex 4K Codec Compatibility (Apple TV)](plex/plex-4k-codec-compatibility.md)
|
||||
|
||||
148
04-streaming/plex/plex-4k-codec-compatibility.md
Normal file
148
04-streaming/plex/plex-4k-codec-compatibility.md
Normal file
@@ -0,0 +1,148 @@
|
||||
# Plex 4K Codec Compatibility (Apple TV)
|
||||
|
||||
4K content on YouTube is delivered in AV1 or VP9 — neither of which the Plex app on Apple TV can direct play. This forces Plex to transcode, and most home server CPUs can't transcode 4K in real time. The fix is converting to HEVC before Plex ever sees the file.
|
||||
|
||||
## Codec Compatibility Matrix
|
||||
|
||||
| Codec | Apple TV (Plex direct play) | YouTube 4K | Notes |
|
||||
|---|---|---|---|
|
||||
| H.264 (AVC) | ✅ | ❌ (max 1080p) | Most compatible, but no 4K |
|
||||
| HEVC (H.265) | ✅ | ❌ | Best choice: 4K compatible, widely supported |
|
||||
| VP9 | ❌ | ✅ | Google's royalty-free codec, forces transcode |
|
||||
| AV1 | ❌ | ✅ | Best compression, requires modern hardware to decode |
|
||||
|
||||
**Target format: HEVC.** Direct plays on Apple TV, supports 4K/HDR, and modern hardware can encode it quickly.
|
||||
|
||||
## Why AV1 and VP9 Cause Problems
|
||||
|
||||
When Plex can't direct play a file it transcodes it on the server. AV1 and VP9 decoding is CPU-intensive — most home server CPUs can't keep up with 4K60 in real time. Intel Quick Sync (HD 630 era) supports VP9 hardware decode but not AV1. AV1 hardware support requires 11th-gen Intel or RTX 30-series+.
|
||||
|
||||
## Batch Converting Existing Files
|
||||
|
||||
For files already in your Plex library, use this script to find all AV1/VP9 files and convert them to HEVC via VAAPI (Intel Quick Sync):
|
||||
|
||||
```bash
|
||||
#!/bin/bash
|
||||
VAAPI_DEV=/dev/dri/renderD128
|
||||
PLEX_DIR="/plex/plex"
|
||||
LOG="/root/av1_to_hevc.log"
|
||||
TMPDIR="/tmp/av1_convert"
|
||||
|
||||
mkdir -p "$TMPDIR"
|
||||
echo "=== AV1→HEVC batch started $(date) ===" | tee -a "$LOG"
|
||||
|
||||
find "$PLEX_DIR" -iname "*.mp4" -o -iname "*.mkv" | while IFS= read -r f; do
|
||||
codec=$(mediainfo --Inform='Video;%Format%' "$f" 2>/dev/null)
|
||||
[ "$codec" != "AV1" ] && [ "$codec" != "VP9" ] && continue
|
||||
|
||||
echo "[$(date +%H:%M:%S)] Converting: $(basename "$f")" | tee -a "$LOG"
|
||||
tmp="${TMPDIR}/$(basename "${f%.*}").mp4"
|
||||
|
||||
ffmpeg -hide_banner -loglevel error \
|
||||
-vaapi_device "$VAAPI_DEV" \
|
||||
-i "$f" \
|
||||
-vf 'format=nv12,hwupload' \
|
||||
-c:v hevc_vaapi \
|
||||
-qp 22 \
|
||||
-c:a copy \
|
||||
-movflags +faststart \
|
||||
"$tmp"
|
||||
|
||||
if [ $? -eq 0 ] && [ -s "$tmp" ]; then
|
||||
mv "$tmp" "${f%.*}_hevc.mp4"
|
||||
rm -f "$f"
|
||||
else
|
||||
rm -f "$tmp"
|
||||
echo " FAILED — original kept." | tee -a "$LOG"
|
||||
fi
|
||||
done
|
||||
```
|
||||
|
||||
Run in a tmux session so it survives SSH disconnect:
|
||||
|
||||
```bash
|
||||
tmux new-session -d -s av1-convert '/root/av1_to_hevc.sh'
|
||||
tail -f /root/av1_to_hevc.log
|
||||
```
|
||||
|
||||
After completion, trigger a Plex library scan to pick up the renamed files.
|
||||
|
||||
## Automating Future Downloads (yt-dlp)
|
||||
|
||||
Prevent the problem at the source with a post-download conversion hook.
|
||||
|
||||
### 1. Create the conversion script
|
||||
|
||||
Save to `/usr/local/bin/yt-dlp-hevc-convert.sh`:
|
||||
|
||||
```bash
|
||||
#!/bin/bash
|
||||
INPUT="$1"
|
||||
VAAPI_DEV=/dev/dri/renderD128
|
||||
LOG=/var/log/yt-dlp-convert.log
|
||||
|
||||
[ -z "$INPUT" ] && exit 0
|
||||
[ ! -f "$INPUT" ] && exit 0
|
||||
|
||||
CODEC=$(mediainfo --Inform='Video;%Format%' "$INPUT" 2>/dev/null)
|
||||
if [ "$CODEC" != "AV1" ] && [ "$CODEC" != "VP9" ]; then
|
||||
exit 0
|
||||
fi
|
||||
|
||||
echo "[$(date '+%Y-%m-%d %H:%M:%S')] Converting ($CODEC): $(basename "$INPUT")" >> "$LOG"
|
||||
TMPOUT="${INPUT%.*}_hevc_tmp.mp4"
|
||||
|
||||
ffmpeg -hide_banner -loglevel error \
|
||||
-vaapi_device "$VAAPI_DEV" \
|
||||
-i "$INPUT" \
|
||||
-vf 'format=nv12,hwupload' \
|
||||
-c:v hevc_vaapi \
|
||||
-qp 22 \
|
||||
-c:a copy \
|
||||
-movflags +faststart \
|
||||
"$TMPOUT"
|
||||
|
||||
if [ $? -eq 0 ] && [ -s "$TMPOUT" ]; then
|
||||
mv "$TMPOUT" "${INPUT%.*}.mp4"
|
||||
[ "${INPUT%.*}.mp4" != "$INPUT" ] && rm -f "$INPUT"
|
||||
echo "[$(date '+%Y-%m-%d %H:%M:%S')] OK: $(basename "${INPUT%.*}.mp4")" >> "$LOG"
|
||||
else
|
||||
rm -f "$TMPOUT"
|
||||
echo "[$(date '+%Y-%m-%d %H:%M:%S')] FAILED — original kept: $(basename "$INPUT")" >> "$LOG"
|
||||
fi
|
||||
```
|
||||
|
||||
```bash
|
||||
chmod +x /usr/local/bin/yt-dlp-hevc-convert.sh
|
||||
```
|
||||
|
||||
### 2. Configure yt-dlp
|
||||
|
||||
`~/.config/yt-dlp/config`:
|
||||
|
||||
```
|
||||
--remote-components ejs:github
|
||||
--format bestvideo+bestaudio
|
||||
--merge-output-format mp4
|
||||
--output /plex/plex/%(title)s.%(ext)s
|
||||
--write-auto-subs
|
||||
--embed-subs
|
||||
--exec /usr/local/bin/yt-dlp-hevc-convert.sh {}
|
||||
```
|
||||
|
||||
With this config, `yt-dlp <URL>` downloads the best available quality (including 4K AV1/VP9), then immediately converts any AV1 or VP9 output to HEVC before Plex indexes it.
|
||||
|
||||
> [!note] The `--format bestvideo+bestaudio` selector gets true 4K from YouTube (served as AV1 or VP9). The hook converts it to HEVC. Without the hook, using `bestvideo[ext=mp4]` would cap downloads at 1080p since YouTube only serves H.264 up to 1080p.
|
||||
|
||||
## Enabling Hardware Transcoding in Plex
|
||||
|
||||
Even with automatic conversion in place, enable hardware acceleration in Plex as a fallback for any files that slip through:
|
||||
|
||||
**Plex Web → Settings → Transcoder → "Use hardware acceleration when available"**
|
||||
|
||||
This requires Plex Pass. On Intel systems with Quick Sync, VP9 will hardware transcode even without pre-conversion. AV1 will still fall back to CPU on pre-Alder Lake hardware.
|
||||
|
||||
## Related
|
||||
|
||||
- [yt-dlp: Video Downloading](../../03-opensource/media-creative/yt-dlp.md)
|
||||
- [OBS Studio Setup & Encoding](../obs/obs-studio-setup-encoding.md)
|
||||
59
05-troubleshooting/ansible-vault-password-file-missing.md
Normal file
59
05-troubleshooting/ansible-vault-password-file-missing.md
Normal file
@@ -0,0 +1,59 @@
|
||||
# Ansible: Vault Password File Not Found
|
||||
|
||||
## Error
|
||||
|
||||
```
|
||||
[WARNING]: Error getting vault password file (default): The vault password file /Users/majorlinux/.ansible/vault_pass was not found
|
||||
[ERROR]: The vault password file /Users/majorlinux/.ansible/vault_pass was not found
|
||||
```
|
||||
|
||||
## Cause
|
||||
|
||||
Ansible is configured to look for a vault password file at `~/.ansible/vault_pass`, but the file does not exist. This is typically set in `ansible.cfg` via the `vault_password_file` directive.
|
||||
|
||||
## Solutions
|
||||
|
||||
### Option 1: Remove the vault config (if you're not using Vault)
|
||||
|
||||
Check your `ansible.cfg` for this line and remove it if Vault is not needed:
|
||||
|
||||
```ini
|
||||
[defaults]
|
||||
vault_password_file = ~/.ansible/vault_pass
|
||||
```
|
||||
|
||||
### Option 2: Create the vault password file
|
||||
|
||||
```bash
|
||||
echo 'your_vault_password' > ~/.ansible/vault_pass
|
||||
chmod 600 ~/.ansible/vault_pass
|
||||
```
|
||||
|
||||
> **Security note:** Keep permissions tight (`600`) so only your user can read the file. The actual vault password is stored in Bitwarden under the "Ansible Vault Password" entry.
|
||||
|
||||
### Option 3: Pass the password at runtime (no file needed)
|
||||
|
||||
```bash
|
||||
ansible-playbook test.yml --ask-vault-pass
|
||||
```
|
||||
|
||||
## Diagnosing the Source of the Config
|
||||
|
||||
To find which config file is setting `vault_password_file`, run:
|
||||
|
||||
```bash
|
||||
ansible-config dump --only-changed
|
||||
```
|
||||
|
||||
This shows all non-default config values and their source files. Config is loaded in this order of precedence:
|
||||
|
||||
1. `ANSIBLE_CONFIG` environment variable
|
||||
2. `./ansible.cfg` (current directory)
|
||||
3. `~/.ansible.cfg`
|
||||
4. `/etc/ansible/ansible.cfg`
|
||||
|
||||
## Related
|
||||
|
||||
- [Ansible Getting Started](../01-linux/shell-scripting/ansible-getting-started.md)
|
||||
- Vault password is stored in Bitwarden under **"Ansible Vault Password"**
|
||||
- Ansible playbooks live at `~/MajorAnsible` on MajorAir/MajorMac
|
||||
@@ -0,0 +1,82 @@
|
||||
---
|
||||
title: "Nextcloud AIO Container Unhealthy for 20 Hours After Nightly Update"
|
||||
domain: troubleshooting
|
||||
category: docker
|
||||
tags: [nextcloud, docker, healthcheck, netdata, php-fpm, aio]
|
||||
status: published
|
||||
created: 2026-03-28
|
||||
updated: 2026-03-28
|
||||
---
|
||||
|
||||
# Nextcloud AIO Container Unhealthy for 20 Hours After Nightly Update
|
||||
|
||||
## Symptom
|
||||
|
||||
Netdata alert `docker_nextcloud_unhealthy` fired on majorlab and stayed in Warning for 20 hours. The `nextcloud-aio-nextcloud` container was running but its Docker healthcheck kept failing. No user-facing errors were visible in `nextcloud.log`.
|
||||
|
||||
## Investigation
|
||||
|
||||
### Timeline (2026-03-27, all UTC)
|
||||
|
||||
| Time | Event |
|
||||
|---|---|
|
||||
| 04:00 | Nightly backup script started, mastercontainer update kicked off |
|
||||
| 04:03 | `nextcloud-aio-nextcloud` container recreated |
|
||||
| 04:05 | Backup finished |
|
||||
| 07:25 | Mastercontainer logged "Initial startup of Nextcloud All-in-One complete!" (3h20m delay) |
|
||||
| 10:22 | First entry in `nextcloud.log` (deprecation warnings only — no errors) |
|
||||
| 04:00 (Mar 28) | Next nightly backup replaced the container; new container came up healthy in ~25 minutes |
|
||||
|
||||
### Key findings
|
||||
|
||||
- **No image update** — the container image dated to Feb 26, so this was not caused by a version change.
|
||||
- **No app-level errors** — `nextcloud.log` contained only `files_rightclick` deprecation warnings (level 3). No level 2/4 entries.
|
||||
- **PHP-FPM never stabilized** — the healthcheck (`/healthcheck.sh`) tests `nc -z 127.0.0.1 9000` (PHP-FPM). The container was running but FPM wasn't responding to the port check.
|
||||
- **6-hour log gap** — no `nextcloud.log` entries between container start (04:03) and first log (10:22), suggesting the AIO init scripts (occ upgrade, app updates, cron jobs) ran for hours before the app became partially responsive.
|
||||
- **RestartCount: 0** — the container never restarted on its own. It sat there unhealthy for the full 20 hours.
|
||||
- **Disk space fine** — 40% used on `/`.
|
||||
|
||||
### Healthcheck details
|
||||
|
||||
```bash
|
||||
#!/bin/bash
|
||||
# /healthcheck.sh inside nextcloud-aio-nextcloud
|
||||
nc -z "$POSTGRES_HOST" "$POSTGRES_PORT" || exit 0 # postgres down = pass (graceful)
|
||||
nc -z 127.0.0.1 9000 || exit 1 # PHP-FPM down = fail
|
||||
```
|
||||
|
||||
If PostgreSQL is unreachable, the check passes (exits 0). The only failure path is PHP-FPM not listening on port 9000.
|
||||
|
||||
## Root Cause
|
||||
|
||||
The AIO nightly update cycle recreated the container, but the startup/migration process hung or ran extremely long, preventing PHP-FPM from fully initializing. The container sat in this state for 20 hours with no self-recovery mechanism until the next nightly cycle replaced it.
|
||||
|
||||
The exact migration or occ command that stalled could not be confirmed — the old container's entrypoint logs were lost when the Mar 28 backup cycle replaced it.
|
||||
|
||||
## Fix
|
||||
|
||||
Two changes deployed on 2026-03-28:
|
||||
|
||||
### 1. Dedicated Netdata alarm with lenient window
|
||||
|
||||
Split `nextcloud-aio-nextcloud` into its own Netdata alarm (`docker_nextcloud_unhealthy`) with a 10-minute lookup and 10-minute delay, separate from the general container alarm. See [Tuning Netdata Docker Health Alarms](../../02-selfhosting/monitoring/netdata-docker-health-alarm-tuning.md).
|
||||
|
||||
### 2. Watchdog cron for auto-restart
|
||||
|
||||
Deployed `/etc/cron.d/nextcloud-health-watchdog` on majorlab:
|
||||
|
||||
```bash
|
||||
*/15 * * * * root docker inspect --format={{.State.Health.Status}} nextcloud-aio-nextcloud 2>/dev/null | grep -q unhealthy && [ "$(docker inspect --format={{.State.StartedAt}} nextcloud-aio-nextcloud | xargs -I{} date -d {} +\%s)" -lt "$(date -d "1 hour ago" +\%s)" ] && docker restart nextcloud-aio-nextcloud && logger -t nextcloud-watchdog "Restarted unhealthy nextcloud-aio-nextcloud"
|
||||
```
|
||||
|
||||
- Checks every 15 minutes
|
||||
- Only restarts if the container has been running >1 hour (avoids interfering with normal startup)
|
||||
- Logs to syslog: `journalctl -t nextcloud-watchdog`
|
||||
|
||||
This caps future unhealthy outages at ~1 hour instead of persisting until the next nightly cycle.
|
||||
|
||||
## See Also
|
||||
|
||||
- [Tuning Netdata Docker Health Alarms](../../02-selfhosting/monitoring/netdata-docker-health-alarm-tuning.md)
|
||||
- [Debugging Broken Docker Containers](../../02-selfhosting/docker/debugging-broken-docker-containers.md)
|
||||
- [Docker Healthchecks](../../02-selfhosting/docker/docker-healthchecks.md)
|
||||
47
05-troubleshooting/gemini-cli-manual-update.md
Normal file
47
05-troubleshooting/gemini-cli-manual-update.md
Normal file
@@ -0,0 +1,47 @@
|
||||
# 🛠️ Gemini CLI: Manual Update Guide
|
||||
|
||||
If the automatic update fails or you need to force a specific version of the Gemini CLI, use these steps.
|
||||
|
||||
## 🔴 Symptom: Automatic Update Failed
|
||||
You may see an error message like:
|
||||
`✕ Automatic update failed. Please try updating manually`
|
||||
|
||||
## 🟢 Manual Update Procedure
|
||||
|
||||
### 1. Verify Current Version
|
||||
Check the version currently installed on your system:
|
||||
```bash
|
||||
gemini --version
|
||||
```
|
||||
|
||||
### 2. Check Latest Version
|
||||
Query the npm registry for the latest available version:
|
||||
```bash
|
||||
npm show @google/gemini-cli version
|
||||
```
|
||||
|
||||
### 3. Perform Manual Update
|
||||
Use `npm` with `sudo` to update the global package:
|
||||
```bash
|
||||
sudo npm install -g @google/gemini-cli@latest
|
||||
```
|
||||
|
||||
### 4. Confirm Update
|
||||
Verify that the new version is active:
|
||||
```bash
|
||||
gemini --version
|
||||
```
|
||||
|
||||
## 🛠️ Troubleshooting Update Failures
|
||||
|
||||
### Permissions Issues
|
||||
If you encounter `EACCES` errors without `sudo`, ensure your user has permissions or use `sudo` as shown above.
|
||||
|
||||
### Registry Connectivity
|
||||
If `npm` cannot reach the registry, check your internet connection or any local firewall/proxy settings.
|
||||
|
||||
### Cache Issues
|
||||
If the version doesn't update, try clearing the npm cache:
|
||||
```bash
|
||||
npm cache clean --force
|
||||
```
|
||||
84
05-troubleshooting/gitea-runner-boot-race-network-target.md
Normal file
84
05-troubleshooting/gitea-runner-boot-race-network-target.md
Normal file
@@ -0,0 +1,84 @@
|
||||
# Gitea Actions Runner: Boot Race Condition Fix
|
||||
|
||||
If your `gitea-runner` (act_runner) service fails to start on boot — crash-looping and eventually hitting systemd's restart rate limit — the service is likely starting before DNS is available.
|
||||
|
||||
## Symptoms
|
||||
|
||||
- `gitea-runner.service` enters a crash loop on boot
|
||||
- `journalctl -u gitea-runner` shows connection/DNS errors on startup:
|
||||
```
|
||||
dial tcp: lookup git.example.com: no such host
|
||||
```
|
||||
or similar resolution failures
|
||||
- Service eventually stops retrying (systemd restart rate limit reached)
|
||||
- `systemctl status gitea-runner` shows `(Result: start-limit-hit)` after reboot
|
||||
- Service works fine if started manually after boot completes
|
||||
|
||||
## Why It Happens
|
||||
|
||||
`After=network.target` only guarantees that the network **interfaces are configured** — not that DNS resolution is functional. systemd-resolved (or your local resolver) starts slightly later. `act_runner` tries to connect to the Gitea instance by hostname on startup, the DNS lookup fails, and the process exits.
|
||||
|
||||
With the default `Restart=always` and no `RestartSec`, systemd restarts the service immediately. After 5 rapid failures within the default burst window (10 attempts in 2 minutes), systemd hits the rate limit and stops restarting.
|
||||
|
||||
## Fix
|
||||
|
||||
### 1. Update the Service File
|
||||
|
||||
Edit `/etc/systemd/system/gitea-runner.service`:
|
||||
|
||||
```ini
|
||||
[Unit]
|
||||
Description=Gitea Actions Runner
|
||||
After=network-online.target
|
||||
Wants=network-online.target
|
||||
|
||||
[Service]
|
||||
User=deploy
|
||||
WorkingDirectory=/opt/gitea-runner
|
||||
ExecStart=/opt/gitea-runner/act_runner daemon
|
||||
Restart=always
|
||||
RestartSec=10
|
||||
|
||||
[Install]
|
||||
WantedBy=multi-user.target
|
||||
```
|
||||
|
||||
Key changes:
|
||||
- `After=network-online.target` + `Wants=network-online.target` — waits for full network stack including DNS
|
||||
- `RestartSec=10` — adds a 10-second delay between restart attempts, preventing rapid failure bursts from hitting the rate limit
|
||||
|
||||
### 2. Add a Local /etc/hosts Entry (Optional but Recommended)
|
||||
|
||||
If your Gitea instance is on the same local network or reachable via Tailscale, add an entry to `/etc/hosts` so act_runner can resolve it without depending on external DNS:
|
||||
|
||||
```
|
||||
127.0.0.1 git.example.com
|
||||
```
|
||||
|
||||
Replace `git.example.com` with your Gitea hostname and the IP with the correct local address. This makes resolution instantaneous and eliminates the DNS dependency entirely for startup.
|
||||
|
||||
### 3. Reload and Restart
|
||||
|
||||
```bash
|
||||
sudo systemctl daemon-reload
|
||||
sudo systemctl restart gitea-runner
|
||||
sudo systemctl status gitea-runner
|
||||
```
|
||||
|
||||
Verify it shows `active (running)` and stays that way. Then reboot and confirm it comes up automatically.
|
||||
|
||||
## Why `network-online.target` and Not `network.target`
|
||||
|
||||
| Target | What it guarantees |
|
||||
|---|---|
|
||||
| `network.target` | Network interfaces are configured (IP assigned) |
|
||||
| `network-online.target` | Network is fully operational (DNS resolvers reachable) |
|
||||
|
||||
Services that need to make outbound network connections (especially DNS lookups) on startup should always use `network-online.target`. This includes: mail servers, monitoring agents, CI runners, anything that connects to an external host by name.
|
||||
|
||||
> [!note] `network-online.target` can add a few seconds to boot time since systemd waits for the network stack to fully initialize. For server contexts this is always the right tradeoff.
|
||||
|
||||
## Related
|
||||
|
||||
- [Managing Linux Services with systemd](../01-linux/process-management/managing-linux-services-systemd-ansible.md)
|
||||
- [MajorWiki Setup & Publishing Pipeline](majwiki-setup-and-pipeline.md)
|
||||
58
05-troubleshooting/gpu-display/qwen-14b-oom-3080ti.md
Normal file
58
05-troubleshooting/gpu-display/qwen-14b-oom-3080ti.md
Normal file
@@ -0,0 +1,58 @@
|
||||
# Qwen2.5-14B OOM on RTX 3080 Ti (12GB)
|
||||
|
||||
## Problem
|
||||
|
||||
When attempting to run or fine-tune **Qwen2.5-14B** on an NVIDIA RTX 3080 Ti with 12GB of VRAM, the process fails with an Out of Memory (OOM) error:
|
||||
|
||||
```
|
||||
torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate X GiB (GPU 0; 12.00 GiB total capacity; Y GiB already allocated; Z GiB free; ...)
|
||||
```
|
||||
|
||||
The 12GB VRAM limit is hit during the initial model load or immediately upon starting the first training step.
|
||||
|
||||
## Root Causes
|
||||
|
||||
1. **Model Size:** A 14B parameter model in FP16/BF16 requires ~28GB of VRAM just for the weights.
|
||||
2. **Context Length:** High context lengths (e.g., 4096+) significantly increase VRAM usage during training.
|
||||
3. **Training Overhead:** Even with QLoRA (4-bit quantization), the overhead of gradients, optimizer states, and activations can exceed 12GB for a 14B model.
|
||||
|
||||
---
|
||||
|
||||
## Solutions
|
||||
|
||||
### 1. Pivot to a 7B Model (Recommended)
|
||||
|
||||
For a 12GB GPU, a 7B parameter model (like **Qwen2.5-7B-Instruct**) is the sweet spot. It provides excellent performance while leaving enough VRAM for high context lengths and larger batch sizes.
|
||||
|
||||
- **VRAM Usage (7B QLoRA):** ~6-8GB
|
||||
- **Pros:** Stable, fast, supports long context.
|
||||
- **Cons:** Slightly lower reasoning capability than 14B.
|
||||
|
||||
### 2. Aggressive Quantization
|
||||
|
||||
If you MUST run 14B, use 4-bit quantization (GGUF or EXL2) for inference only. Training 14B on 12GB is not reliably possible even with extreme offloading.
|
||||
|
||||
```bash
|
||||
# Example Ollama run (uses 4-bit quantization by default)
|
||||
ollama run qwen2.5:14b
|
||||
```
|
||||
|
||||
### 3. Training Optimizations (if attempting 14B)
|
||||
|
||||
If you have no choice but to try 14B training:
|
||||
- Set `max_seq_length` to 512 or 1024.
|
||||
- Use `Unsloth` (it is highly memory-efficient).
|
||||
- Enable `gradient_checkpointing`.
|
||||
- Set `per_device_train_batch_size = 1`.
|
||||
|
||||
---
|
||||
|
||||
## Maintenance
|
||||
|
||||
Keep your NVIDIA drivers and CUDA toolkit updated. On Windows (MajorRig), ensure WSL2 has sufficient memory allocation in `.wslconfig`.
|
||||
|
||||
---
|
||||
|
||||
## Tags
|
||||
|
||||
#gpu #cuda #oom #qwen #majortwin #llm #fine-tuning
|
||||
@@ -8,13 +8,27 @@ Practical fixes for common Linux, networking, and application problems.
|
||||
## 🌐 Networking & Web
|
||||
- [Apache Outage: Fail2ban Self-Ban + Missing iptables Rules](networking/fail2ban-self-ban-apache-outage.md)
|
||||
- [Mail Client Stops Receiving: Fail2ban IMAP Self-Ban](networking/fail2ban-imap-self-ban-mail-client.md)
|
||||
- [firewalld: Mail Ports Wiped After Reload](networking/firewalld-mail-ports-reset.md)
|
||||
- [Tailscale SSH: Unexpected Re-Authentication Prompt](networking/tailscale-ssh-reauth-prompt.md)
|
||||
- [ISP SNI Filtering & Caddy](isp-sni-filtering-caddy.md)
|
||||
- [yt-dlp YouTube JS Challenge Fix](yt-dlp-fedora-js-challenge.md)
|
||||
|
||||
## 📦 Docker & Systems
|
||||
- [Docker & Caddy Recovery After Reboot (Fedora + SELinux)](docker-caddy-selinux-post-reboot-recovery.md)
|
||||
- [Gitea Actions Runner: Boot Race Condition Fix](gitea-runner-boot-race-network-target.md)
|
||||
- [Systemd Session Scope Fails at Login (`session-cN.scope`)](systemd/session-scope-failure-at-login.md)
|
||||
- [MajorWiki Setup & Publishing Pipeline](majwiki-setup-and-pipeline.md)
|
||||
|
||||
## 🔒 SELinux
|
||||
- [SELinux: Fixing Dovecot Mail Spool Context (/var/vmail)](selinux-dovecot-vmail-context.md)
|
||||
|
||||
## 💾 Storage
|
||||
- [mdadm RAID Recovery After USB Hub Disconnect](storage/mdadm-usb-hub-disconnect-recovery.md)
|
||||
|
||||
## 📝 Application Specific
|
||||
- [Obsidian Vault Recovery — Loading Cache Hang](obsidian-cache-hang-recovery.md)
|
||||
- [Gemini CLI Manual Update](gemini-cli-manual-update.md)
|
||||
|
||||
## 🤖 AI / Local LLM
|
||||
- [Ollama Drops Off Tailscale When Mac Sleeps](ollama-macos-sleep-tailscale-disconnect.md)
|
||||
- [Windows OpenSSH Server (sshd) Stops After Reboot](networking/windows-sshd-stops-after-reboot.md)
|
||||
|
||||
@@ -119,3 +119,20 @@ The webhook runs as a systemd service so it survives reboots:
|
||||
systemctl status majwiki-webhook
|
||||
systemctl restart majwiki-webhook
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
*Updated 2026-03-13: Obsidian Git plugin dropped. See canonical workflow below.*
|
||||
|
||||
## Canonical Publishing Workflow
|
||||
|
||||
The Obsidian Git plugin was evaluated but dropped — too convoluted for a simple push. Manual git from the terminal is the canonical workflow.
|
||||
|
||||
```bash
|
||||
cd ~/Documents/MajorVault
|
||||
git add 20-Projects/MajorTwin/08-Wiki/
|
||||
git commit -m "wiki: describe your changes"
|
||||
git push
|
||||
```
|
||||
|
||||
From there: Gitea receives the push → fires webhook → majorlab pulls → MkDocs rebuilds → `notes.majorshouse.com` updates.
|
||||
|
||||
186
05-troubleshooting/networking/fail2ban-self-ban-apache-outage.md
Normal file
186
05-troubleshooting/networking/fail2ban-self-ban-apache-outage.md
Normal file
@@ -0,0 +1,186 @@
|
||||
# Apache Outage: Fail2ban Self-Ban + Missing iptables Rules
|
||||
|
||||
## 🛑 Problem
|
||||
|
||||
A web server running Apache2 becomes completely unreachable (`ERR_CONNECTION_TIMED_OUT`) despite Apache running normally. SSH access via Tailscale is unaffected.
|
||||
|
||||
---
|
||||
|
||||
## 🔍 Diagnosis
|
||||
|
||||
### Step 1 — Confirm Apache is running
|
||||
|
||||
```bash
|
||||
sudo systemctl status apache2
|
||||
```
|
||||
|
||||
If Apache is `active (running)`, the problem is at the firewall layer, not the application.
|
||||
|
||||
---
|
||||
|
||||
### Step 2 — Test the public IP directly
|
||||
|
||||
```bash
|
||||
curl -I --max-time 5 http://<PUBLIC_IP>
|
||||
```
|
||||
|
||||
A **timeout** means traffic is being dropped by the firewall. A **connection refused** means Apache is down.
|
||||
|
||||
---
|
||||
|
||||
### Step 3 — Check the iptables INPUT chain
|
||||
|
||||
```bash
|
||||
sudo iptables -L INPUT -n -v
|
||||
```
|
||||
|
||||
Look for ACCEPT rules on ports 80 and 443. If they're missing and the chain policy is `DROP`, HTTP/HTTPS traffic is being silently dropped.
|
||||
|
||||
**Example of broken state:**
|
||||
```
|
||||
Chain INPUT (policy DROP)
|
||||
ACCEPT tcp -- lo * ... # loopback only
|
||||
ACCEPT tcp -- tailscale0 * ... tcp dpt:22
|
||||
# no rules for port 80 or 443
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
### Step 4 — Check the nftables ruleset for Fail2ban
|
||||
|
||||
```bash
|
||||
sudo nft list tables
|
||||
```
|
||||
|
||||
Look for `table inet f2b-table` — this is Fail2ban's nftables table. It operates at **priority `filter - 1`**, meaning it is evaluated *before* the main iptables INPUT chain.
|
||||
|
||||
```bash
|
||||
sudo nft list ruleset | grep -A 10 'f2b-table'
|
||||
```
|
||||
|
||||
Fail2ban rejects banned IPs with rules like:
|
||||
```
|
||||
tcp dport { 80, 443 } ip saddr @addr-set-wordpress-hard reject with icmp port-unreachable
|
||||
```
|
||||
|
||||
A banned admin IP will be rejected here regardless of any ACCEPT rules downstream.
|
||||
|
||||
---
|
||||
|
||||
### Step 5 — Check if your IP is banned
|
||||
|
||||
```bash
|
||||
for jail in $(sudo fail2ban-client status | grep "Jail list" | sed 's/.*://;s/,/ /g'); do
|
||||
echo "=== $jail ==="; sudo fail2ban-client get $jail banip | tr ',' '\n' | grep <YOUR_IP>
|
||||
done
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## ✅ Solution
|
||||
|
||||
### Fix 1 — Add missing iptables ACCEPT rules for HTTP/HTTPS
|
||||
|
||||
If ports 80/443 are absent from the INPUT chain:
|
||||
|
||||
```bash
|
||||
sudo iptables -I INPUT -i eth0 -p tcp --dport 80 -j ACCEPT
|
||||
sudo iptables -I INPUT -i eth0 -p tcp --dport 443 -j ACCEPT
|
||||
```
|
||||
|
||||
Persist the rules:
|
||||
|
||||
```bash
|
||||
sudo netfilter-persistent save
|
||||
```
|
||||
|
||||
If `netfilter-persistent` is not installed:
|
||||
|
||||
```bash
|
||||
sudo apt install -y iptables-persistent
|
||||
sudo netfilter-persistent save
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
### Fix 2 — Unban your IP from all Fail2ban jails
|
||||
|
||||
```bash
|
||||
for jail in $(sudo fail2ban-client status | grep "Jail list" | sed 's/.*://;s/,/ /g'); do
|
||||
sudo fail2ban-client set $jail unbanip <YOUR_IP> 2>/dev/null && echo "Unbanned from $jail"
|
||||
done
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
### Fix 3 — Add your IP to Fail2ban's ignore list
|
||||
|
||||
Edit `/etc/fail2ban/jail.local`:
|
||||
|
||||
```bash
|
||||
sudo nano /etc/fail2ban/jail.local
|
||||
```
|
||||
|
||||
Add or update the `[DEFAULT]` section:
|
||||
|
||||
```ini
|
||||
[DEFAULT]
|
||||
ignoreip = 127.0.0.1/8 ::1 <YOUR_IP>
|
||||
```
|
||||
|
||||
Restart Fail2ban:
|
||||
|
||||
```bash
|
||||
sudo systemctl restart fail2ban
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 🔁 Why This Happens
|
||||
|
||||
| Issue | Root Cause |
|
||||
|---|---|
|
||||
| Missing port 80/443 rules | iptables INPUT chain left incomplete after a manual firewall rework (e.g., SSH lockdown) |
|
||||
| Still blocked after adding iptables rules | Fail2ban uses a separate nftables table at higher priority — iptables ACCEPT rules are never reached for banned IPs |
|
||||
| Admin IP gets banned | Automated WordPress/Apache probes trigger Fail2ban jails against the admin's own IP |
|
||||
|
||||
---
|
||||
|
||||
## ⚠️ Key Architecture Note
|
||||
|
||||
On servers running both iptables and Fail2ban, the evaluation order is:
|
||||
|
||||
1. **`inet f2b-table`** (nftables, priority `filter - 1`) — Fail2ban ban sets; evaluated first
|
||||
2. **`ip filter` INPUT chain** (iptables/nftables, policy DROP) — explicit ACCEPT rules
|
||||
3. **UFW chains** — IP-specific rules; evaluated last
|
||||
|
||||
A banned IP is stopped at step 1 and never reaches the ACCEPT rules in step 2. Always check Fail2ban *after* confirming iptables looks correct.
|
||||
|
||||
---
|
||||
|
||||
## 🔎 Quick Diagnostic Commands
|
||||
|
||||
```bash
|
||||
# Check Apache
|
||||
sudo systemctl status apache2
|
||||
|
||||
# Test public connectivity
|
||||
curl -I --max-time 5 http://<PUBLIC_IP>
|
||||
|
||||
# Check iptables INPUT chain
|
||||
sudo iptables -L INPUT -n -v
|
||||
|
||||
# List nftables tables (look for inet f2b-table)
|
||||
sudo nft list tables
|
||||
|
||||
# Check Fail2ban jail status
|
||||
sudo fail2ban-client status
|
||||
|
||||
# Check a specific jail's banned IPs
|
||||
sudo fail2ban-client status wordpress-hard
|
||||
|
||||
# Unban an IP from all jails
|
||||
for jail in $(sudo fail2ban-client status | grep "Jail list" | sed 's/.*://;s/,/ /g'); do
|
||||
sudo fail2ban-client set $jail unbanip <YOUR_IP> 2>/dev/null && echo "Unbanned from $jail"
|
||||
done
|
||||
```
|
||||
158
05-troubleshooting/networking/fail2ban-ufw-rule-bloat-cleanup.md
Normal file
158
05-troubleshooting/networking/fail2ban-ufw-rule-bloat-cleanup.md
Normal file
@@ -0,0 +1,158 @@
|
||||
# Fail2ban & UFW Rule Bloat: 30k Rules Slowing Down a VPS
|
||||
|
||||
## 🛑 Problem
|
||||
|
||||
A small VPS (1–2 GB RAM) running Fail2ban with permanent bans (`bantime = -1`) gradually accumulates thousands of UFW DENY rules or nftables entries. Over time this causes:
|
||||
|
||||
- High memory usage from Fail2ban (100+ MB RSS)
|
||||
- Bloated nftables ruleset (30k+ rules) — every incoming packet must traverse the full list
|
||||
- Netdata alerts flapping on RAM/swap thresholds
|
||||
- Degraded packet processing performance
|
||||
|
||||
---
|
||||
|
||||
## 🔍 Diagnosis
|
||||
|
||||
### Step 1 — Check Fail2ban memory and thread count
|
||||
|
||||
```bash
|
||||
grep -E "VmRSS|VmSwap|Threads" /proc/$(pgrep -ox fail2ban-server)/status
|
||||
```
|
||||
|
||||
On a small VPS, Fail2ban RSS over 80 MB is a red flag. Thread count scales with jail count (roughly 2 threads per jail + overhead).
|
||||
|
||||
---
|
||||
|
||||
### Step 2 — Count nftables/UFW rules
|
||||
|
||||
```bash
|
||||
# Total drop/reject rules in nftables
|
||||
nft list ruleset | grep -c "reject\|drop"
|
||||
|
||||
# UFW rule file size
|
||||
wc -l /etc/ufw/user.rules
|
||||
```
|
||||
|
||||
A healthy UFW setup has 10–30 rules. Thousands means manual `ufw deny` commands or permanent Fail2ban bans have accumulated.
|
||||
|
||||
---
|
||||
|
||||
### Step 3 — Identify dead jails
|
||||
|
||||
```bash
|
||||
for jail in $(fail2ban-client status | grep "Jail list" | sed 's/.*://;s/,/ /g'); do
|
||||
total=$(fail2ban-client status $jail | grep "Total banned" | awk '{print $NF}')
|
||||
echo "$jail: $total total bans"
|
||||
done
|
||||
```
|
||||
|
||||
Jails with zero total bans are dead weight — burning threads and regex cycles for nothing.
|
||||
|
||||
---
|
||||
|
||||
### Step 4 — Check ban policy
|
||||
|
||||
```bash
|
||||
grep bantime /etc/fail2ban/jail.local
|
||||
```
|
||||
|
||||
`bantime = -1` means permanent. On a public-facing server, scanner IPs rotate constantly — permanent bans just pile up with no benefit.
|
||||
|
||||
---
|
||||
|
||||
## ✅ Solution
|
||||
|
||||
### Fix 1 — Disable dead jails
|
||||
|
||||
Edit `/etc/fail2ban/jail.local` and set `enabled = false` for any jail with zero historical bans.
|
||||
|
||||
### Fix 2 — Switch to time-limited bans
|
||||
|
||||
```ini
|
||||
[DEFAULT]
|
||||
bantime = 30d
|
||||
|
||||
[recidive]
|
||||
bantime = 90d
|
||||
```
|
||||
|
||||
30 days is long enough to block active campaigns; repeat offenders get 90 days via recidive. Scanner IPs rarely persist beyond a week.
|
||||
|
||||
### Fix 3 — Flush accumulated bans
|
||||
|
||||
```bash
|
||||
fail2ban-client unban --all
|
||||
```
|
||||
|
||||
### Fix 4 — Reset bloated UFW rules
|
||||
|
||||
**Back up first:**
|
||||
|
||||
```bash
|
||||
cp /etc/ufw/user.rules /etc/ufw/user.rules.bak
|
||||
cp /etc/ufw/user6.rules /etc/ufw/user6.rules.bak
|
||||
```
|
||||
|
||||
**Reset and re-add only legitimate ALLOW rules:**
|
||||
|
||||
```bash
|
||||
ufw --force reset
|
||||
ufw default deny incoming
|
||||
ufw default allow outgoing
|
||||
ufw allow 443/tcp
|
||||
ufw allow 80/tcp
|
||||
ufw allow in on tailscale0 to any port 22 comment "SSH via Tailscale"
|
||||
# Add any other ALLOW rules specific to your server
|
||||
ufw --force enable
|
||||
```
|
||||
|
||||
**Restart Fail2ban** so it re-creates its nftables chains:
|
||||
|
||||
```bash
|
||||
systemctl restart fail2ban
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 🔁 Why This Happens
|
||||
|
||||
| Cause | Effect |
|
||||
|---|---|
|
||||
| `bantime = -1` (permanent) | Banned IP list grows forever; nftables rules never expire |
|
||||
| Manual `ufw deny from <IP>` | Each adds a persistent rule to `user.rules`; survives reboots |
|
||||
| Many jails with no hits | Each jail spawns 2+ threads, runs regex against logs continuously |
|
||||
| Small VPS (1–2 GB RAM) | Fail2ban + nftables overhead becomes significant fraction of total RAM |
|
||||
|
||||
---
|
||||
|
||||
## ⚠️ Key Notes
|
||||
|
||||
- **Deleting UFW rules one-by-one is impractical** at scale — `ufw delete` with 30k rules takes hours. A full reset + re-add is the only efficient path.
|
||||
- **`ufw --force reset` also resets `before.rules` and `after.rules`** — UFW auto-backs these up, but verify your custom chains if any exist.
|
||||
- **After flushing bans, expect a brief spike in 4xx responses** as scanners that were previously blocked hit Apache again. Fail2ban will re-ban them within minutes.
|
||||
- **The Netdata `web_log_1m_successful` alert may fire** during this window — it will self-clear once bans repopulate.
|
||||
|
||||
---
|
||||
|
||||
## 🔎 Quick Diagnostic Commands
|
||||
|
||||
```bash
|
||||
# Fail2ban memory usage
|
||||
grep -E "VmRSS|VmSwap|Threads" /proc/$(pgrep -ox fail2ban-server)/status
|
||||
|
||||
# Count nftables rules
|
||||
nft list ruleset | grep -c "reject\|drop"
|
||||
|
||||
# UFW rule count
|
||||
ufw status numbered | tail -1
|
||||
|
||||
# List all jails with ban counts
|
||||
for jail in $(fail2ban-client status | grep "Jail list" | sed 's/.*://;s/,/ /g'); do
|
||||
banned=$(fail2ban-client status $jail | grep "Currently banned" | awk '{print $NF}')
|
||||
total=$(fail2ban-client status $jail | grep "Total banned" | awk '{print $NF}')
|
||||
echo "$jail: $banned current / $total total"
|
||||
done
|
||||
|
||||
# Flush all bans
|
||||
fail2ban-client unban --all
|
||||
```
|
||||
70
05-troubleshooting/networking/firewalld-mail-ports-reset.md
Normal file
70
05-troubleshooting/networking/firewalld-mail-ports-reset.md
Normal file
@@ -0,0 +1,70 @@
|
||||
# firewalld: Mail Ports Wiped After Reload (IMAP + Webmail Outage)
|
||||
|
||||
If IMAP, SMTP, and webmail all stop working simultaneously on a Fedora/RHEL mail server, firewalld may have reloaded and lost its mail port configuration.
|
||||
|
||||
## Symptoms
|
||||
|
||||
- `openssl s_client -connect mail.example.com:993` returns `Connection refused`
|
||||
- Webmail returns connection refused or times out
|
||||
- SSH still works (port 22 is typically in the persisted config)
|
||||
- `firewall-cmd --list-services --zone=public` shows only `ssh dhcpv6-client mdns` or similar — no mail services
|
||||
- Mail was working before a service restart or system event
|
||||
|
||||
## Why It Happens
|
||||
|
||||
firewalld uses two layers of configuration:
|
||||
- **Runtime** — active rules in memory (lost on reload or restart)
|
||||
- **Permanent** — written to `/etc/firewalld/zones/public.xml` (survives reloads)
|
||||
|
||||
If mail ports were added with `firewall-cmd --add-service=imaps` (without `--permanent`), they exist only in the runtime config. Any event that triggers a `firewall-cmd --reload` — including Fail2ban restarting, a system update, or manual reload — wipes the runtime config back to the permanent state, dropping all non-permanent rules.
|
||||
|
||||
## Diagnosis
|
||||
|
||||
```bash
|
||||
# Check what's currently allowed
|
||||
firewall-cmd --list-services --zone=public
|
||||
|
||||
# Check nftables for catch-all reject rules
|
||||
nft list ruleset | grep -E '(reject|accept|993|143)'
|
||||
|
||||
# Test port 993 from an external machine
|
||||
openssl s_client -connect mail.example.com:993 -brief
|
||||
```
|
||||
|
||||
If the only services listed are `ssh` and the port test shows `Connection refused`, the rules are gone.
|
||||
|
||||
## Fix
|
||||
|
||||
Add all mail services permanently and reload:
|
||||
|
||||
```bash
|
||||
firewall-cmd --permanent \
|
||||
--add-service=smtp \
|
||||
--add-service=smtps \
|
||||
--add-service=smtp-submission \
|
||||
--add-service=imap \
|
||||
--add-service=imaps \
|
||||
--add-service=http \
|
||||
--add-service=https
|
||||
firewall-cmd --reload
|
||||
|
||||
# Verify
|
||||
firewall-cmd --list-services --zone=public
|
||||
```
|
||||
|
||||
Expected output:
|
||||
```
|
||||
dhcpv6-client http https imap imaps mdns smtp smtp-submission smtps ssh
|
||||
```
|
||||
|
||||
## Key Notes
|
||||
|
||||
- **Always use `--permanent`** when adding services to firewalld on a server. Without it, the rule exists only until the next reload.
|
||||
- **Fail2ban + firewalld**: Fail2ban uses firewalld as its ban backend (`firewallcmd-rich-rules`). When Fail2ban restarts or crashes, it may trigger a `firewall-cmd --reload`, resetting any runtime-only rules.
|
||||
- **Verify after any firewall event**: After Fail2ban restarts, system reboots, or `firewall-cmd --reload`, always confirm mail services are still present with `firewall-cmd --list-services --zone=public`.
|
||||
- **Check the permanent config directly**: `cat /etc/firewalld/zones/public.xml` — if mail services aren't in this file, they'll be lost on next reload.
|
||||
|
||||
## Related
|
||||
|
||||
- [Linux Server Hardening Checklist](../../02-selfhosting/security/linux-server-hardening-checklist.md)
|
||||
- [Mail Client Stops Receiving: Fail2ban IMAP Self-Ban](fail2ban-imap-self-ban-mail-client.md)
|
||||
66
05-troubleshooting/networking/tailscale-ssh-reauth-prompt.md
Normal file
66
05-troubleshooting/networking/tailscale-ssh-reauth-prompt.md
Normal file
@@ -0,0 +1,66 @@
|
||||
# Tailscale SSH: Unexpected Re-Authentication Prompt
|
||||
|
||||
If a Tailscale SSH connection unexpectedly presents a browser authentication URL mid-session, the first instinct is to check the ACL policy. However, this is often a one-off Tailscale hiccup rather than a misconfiguration.
|
||||
|
||||
## Symptoms
|
||||
|
||||
- SSH connection to a fleet node displays a Tailscale auth URL:
|
||||
```
|
||||
To authenticate, visit: https://login.tailscale.com/a/xxxxxxxx
|
||||
```
|
||||
- The prompt appears even though the node worked fine previously
|
||||
- Other nodes in the fleet connect without prompting
|
||||
|
||||
## What Causes It
|
||||
|
||||
Tailscale SSH supports two ACL `action` values:
|
||||
|
||||
| Action | Behavior |
|
||||
|---|---|
|
||||
| `accept` | Trusts Tailscale identity — no additional auth required |
|
||||
| `check` | Requires periodic browser-based re-authentication |
|
||||
|
||||
If `action: "check"` is set, every session (or after token expiry) will prompt for browser auth. However, even with `action: "accept"`, a one-off prompt can appear due to a Tailscale daemon glitch or key refresh event.
|
||||
|
||||
## How to Diagnose
|
||||
|
||||
### 1. Verify the ACL policy
|
||||
|
||||
In the Tailscale admin console (or via `tailscale debug acl`), inspect the SSH rules. For a trusted homelab fleet, the rule should use `accept`:
|
||||
|
||||
```json
|
||||
{
|
||||
"src": ["autogroup:member"],
|
||||
"dst": ["autogroup:self"],
|
||||
"users": ["autogroup:nonroot", "root"],
|
||||
"action": "accept",
|
||||
}
|
||||
```
|
||||
|
||||
If `action` is `check`, that is the root cause — change it to `accept` for trusted source/destination pairs.
|
||||
|
||||
### 2. Confirm it was a one-off
|
||||
|
||||
If the ACL already shows `accept`, the prompt was transient. Test with:
|
||||
|
||||
```bash
|
||||
ssh <hostname> "echo ok"
|
||||
```
|
||||
|
||||
No auth prompt + `ok` output = resolved. Note that this test is only meaningful if the previous session's auth token has expired, or you test from a different device that hasn't recently authenticated.
|
||||
|
||||
## Fix
|
||||
|
||||
**If ACL shows `check`:** Change to `accept` in the Tailscale admin console under Access Controls. Takes effect immediately — no server changes needed.
|
||||
|
||||
**If ACL already shows `accept`:** No action required. The prompt was a one-off Tailscale event (daemon restart, key refresh, etc.). Monitor for recurrence.
|
||||
|
||||
## Notes
|
||||
|
||||
- Port 2222 on **MajorRig** exists as a hard bypass for Tailscale SSH browser auth — regular SSH over Tailscale network, bypassing Tailscale SSH entirely. This is an alternative approach if `check` mode is required for compliance but browser auth is too disruptive.
|
||||
- The `autogroup:self` destination means the rule applies when connecting from your own devices to your own devices — appropriate for a personal homelab fleet.
|
||||
|
||||
## Related
|
||||
|
||||
- [[Network Overview]] — Tailscale fleet inventory and SSH access model
|
||||
- [[SSH-Aliases]] — Fleet SSH access shortcuts
|
||||
@@ -0,0 +1,68 @@
|
||||
# Windows OpenSSH Server (sshd) Stops After Reboot
|
||||
|
||||
## 🛑 Problem
|
||||
|
||||
SSH connections to MajorRig from a mobile device or Tailscale client time out on port 22. No connection refused error — just a timeout. The OpenSSH Server service is installed but not running.
|
||||
|
||||
---
|
||||
|
||||
## 🔍 Diagnosis
|
||||
|
||||
From an **elevated** PowerShell on MajorRig:
|
||||
|
||||
```powershell
|
||||
Get-Service sshd
|
||||
```
|
||||
|
||||
If the output shows `Stopped`, the service is not running. This is the cause of the timeout.
|
||||
|
||||
---
|
||||
|
||||
## ✅ Fix
|
||||
|
||||
Run the following from an **elevated** PowerShell (Win+X → Terminal (Admin)):
|
||||
|
||||
```powershell
|
||||
Start-Service sshd
|
||||
Set-Service -Name sshd -StartupType Automatic
|
||||
Get-Service sshd
|
||||
```
|
||||
|
||||
The final command should confirm `Running`. SSH connections will resume immediately — no reboot required.
|
||||
|
||||
---
|
||||
|
||||
## 🔄 Why This Happens
|
||||
|
||||
| Trigger | Reason |
|
||||
|---|---|
|
||||
| Windows Update reboot | If `sshd` startup type is Manual, it won't restart after a reboot |
|
||||
| WSL2 export/import/rebuild | WSL2 reinstall operations often involve reboots that expose the same issue |
|
||||
| Fresh Windows install | OpenSSH Server is installed but startup type defaults to Manual |
|
||||
|
||||
The Windows OpenSSH Server is installed as a Windows Feature (`Add-WindowsCapability`), not a WSL2 package. It runs entirely on the Windows side. However, its **default startup type is Manual**, meaning it will not survive a reboot unless explicitly set to Automatic.
|
||||
|
||||
---
|
||||
|
||||
## ⚠️ Key Notes
|
||||
|
||||
- **This is a Windows-side issue** — WSL2 itself is unaffected. The service must be started and configured from Windows, not from within WSL2.
|
||||
- **Elevated PowerShell required** — `Start-Service` and `Set-Service` for sshd will return "Access is denied" if run without Administrator privileges.
|
||||
- **Port 2222 was retired (2026-03-25)** — the bypass port 2222 on MajorRig is no longer in use. The entire fleet now uses port 22 uniformly after the Tailscale SSH auth fix. Only port 22 needs to be verified when troubleshooting sshd.
|
||||
- **Default shell still works once fixed** — MajorRig's sshd is configured to use `C:\Windows\System32\wsl.exe` as the default shell, dropping SSH sessions directly into WSL2/Bash. This config is preserved across service restarts.
|
||||
|
||||
---
|
||||
|
||||
## 🔎 Quick Reference
|
||||
|
||||
```powershell
|
||||
# Check status (run as Admin)
|
||||
Get-Service sshd
|
||||
|
||||
# Start and set to auto-start (run as Admin)
|
||||
Start-Service sshd
|
||||
Set-Service -Name sshd -StartupType Automatic
|
||||
|
||||
# Verify firewall rule exists
|
||||
Get-NetFirewallRule -DisplayName "*ssh*" | Select DisplayName, Enabled, Direction, Action
|
||||
```
|
||||
@@ -0,0 +1,68 @@
|
||||
---
|
||||
title: "Ollama Drops Off Tailscale When Mac Sleeps"
|
||||
domain: troubleshooting
|
||||
category: ai-inference
|
||||
tags: [ollama, tailscale, macos, sleep, open-webui, majormac]
|
||||
status: published
|
||||
created: 2026-03-17
|
||||
updated: 2026-03-17
|
||||
---
|
||||
|
||||
# Ollama Drops Off Tailscale When Mac Sleeps
|
||||
|
||||
Open WebUI loses its Ollama connection when the host Mac goes to sleep. Models stop appearing, and curl to the Ollama API times out from other machines on the tailnet.
|
||||
|
||||
## The Short Answer
|
||||
|
||||
Disable sleep when plugged into AC power:
|
||||
|
||||
```bash
|
||||
sudo pmset -c sleep 0
|
||||
```
|
||||
|
||||
Or via **System Settings → Energy → Prevent automatic sleeping when the display is off**.
|
||||
|
||||
## Background
|
||||
|
||||
macOS suspends network interfaces when the machine sleeps, which drops the Tailscale tunnel. Ollama becomes unreachable over the tailnet even though it was running fine before sleep. Open WebUI doesn't reconnect automatically — it just shows no models until the connection is manually refreshed after the Mac wakes.
|
||||
|
||||
The `-c` flag in `pmset` limits the setting to AC power only, so the machine will still sleep normally on battery.
|
||||
|
||||
## Diagnosis
|
||||
|
||||
From any other machine on the tailnet:
|
||||
|
||||
```bash
|
||||
tailscale status | grep majormac
|
||||
```
|
||||
|
||||
If it shows `offline, last seen Xm ago` or is routing through a relay instead of direct, the Mac is asleep or the tunnel is degraded.
|
||||
|
||||
```bash
|
||||
curl http://100.74.124.81:11434/api/tags
|
||||
```
|
||||
|
||||
Timeout = Ollama unreachable. After waking the Mac, this should return a JSON list of models immediately.
|
||||
|
||||
## Fix
|
||||
|
||||
```bash
|
||||
# Disable sleep on AC power (run on MajorMac)
|
||||
sudo pmset -c sleep 0
|
||||
|
||||
# Verify
|
||||
pmset -g | grep sleep
|
||||
```
|
||||
|
||||
The display can still sleep — only system sleep needs to be off for Ollama and Tailscale to stay available.
|
||||
|
||||
## Gotchas & Notes
|
||||
|
||||
- **Display sleep is fine** — `pmset -c displaysleep 15` or whatever you prefer won't affect Ollama availability.
|
||||
- **Battery behavior unchanged** — `-c` flag means AC only; normal sleep on battery is preserved.
|
||||
- **Open WebUI won't auto-reconnect** — after waking the Mac, go to Settings → Connections and hit the verify button, or just reload the page.
|
||||
- This affects any service bound to the Tailscale interface on MajorMac, not just Ollama.
|
||||
|
||||
## See Also
|
||||
|
||||
- [[MajorMac]] — device config and known issues
|
||||
113
05-troubleshooting/security/apache-dirscan-fail2ban-jail.md
Normal file
113
05-troubleshooting/security/apache-dirscan-fail2ban-jail.md
Normal file
@@ -0,0 +1,113 @@
|
||||
# Custom Fail2ban Jail: Apache Directory Scanning & Junk Methods
|
||||
|
||||
## 🛑 Problem
|
||||
|
||||
Bots and vulnerability scanners enumerate WordPress directories (`/wp-admin/`, `/wp-includes/`, `/wp-content/`), probe for access-denied paths, or send junk HTTP methods (e.g., `YQEILVHZ`, `DUTEDCEM`). These generate Apache error log entries but are not caught by any default Fail2ban jail:
|
||||
|
||||
- `AH01276` — directory index forbidden (autoindex:error)
|
||||
- `AH01630` — client denied by server configuration (authz_core:error)
|
||||
- `AH00135` — invalid method in request (core:error)
|
||||
|
||||
The result is a low success ratio on Netdata's `web_log_1m_successful` metric and wasted server resources processing scanner requests.
|
||||
|
||||
---
|
||||
|
||||
## ✅ Solution
|
||||
|
||||
### Step 1 — Create the filter
|
||||
|
||||
Create `/etc/fail2ban/filter.d/apache-dirscan.conf`:
|
||||
|
||||
```ini
|
||||
# Fail2ban filter for Apache scanning/probing
|
||||
# Catches: directory enumeration (AH01276), access denied (AH01630), invalid methods (AH00135)
|
||||
|
||||
[Definition]
|
||||
failregex = ^\[.*\] \[autoindex:error\] \[pid \d+\] \[client <HOST>:\d+\] AH01276:
|
||||
^\[.*\] \[authz_core:error\] \[pid \d+\] \[client <HOST>:\d+\] AH01630:
|
||||
^\[.*\] \[core:error\] \[pid \d+\] \[client <HOST>:\d+\] AH00135:
|
||||
|
||||
ignoreregex =
|
||||
```
|
||||
|
||||
### Step 2 — Add the jail
|
||||
|
||||
Add to `/etc/fail2ban/jail.local`:
|
||||
|
||||
```ini
|
||||
[apache-dirscan]
|
||||
enabled = true
|
||||
port = http,https
|
||||
filter = apache-dirscan
|
||||
logpath = /var/log/apache2/error.log
|
||||
maxretry = 3
|
||||
findtime = 60
|
||||
```
|
||||
|
||||
Three hits in 60 seconds is aggressive enough to catch active scanners while avoiding false positives from legitimate 403s.
|
||||
|
||||
### Step 3 — Test the regex
|
||||
|
||||
```bash
|
||||
fail2ban-regex /var/log/apache2/error.log /etc/fail2ban/filter.d/apache-dirscan.conf
|
||||
```
|
||||
|
||||
This shows match counts per regex line and any missed lines.
|
||||
|
||||
### Step 4 — Reload Fail2ban
|
||||
|
||||
```bash
|
||||
fail2ban-client reload
|
||||
fail2ban-client status apache-dirscan
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 🔍 What Each Pattern Catches
|
||||
|
||||
| Error Code | Apache Module | Trigger |
|
||||
|---|---|---|
|
||||
| `AH01276` | `autoindex:error` | Bot requests a directory with no index file and `Options -Indexes` is set. Classic WordPress/CMS directory enumeration. |
|
||||
| `AH01630` | `authz_core:error` | Request denied by `<Directory>` or `<Location>` rules (e.g., probing `/wp-content/plugins/`). |
|
||||
| `AH00135` | `core:error` | Request uses a garbage HTTP method that Apache can't parse. Scanners use these to fingerprint servers. |
|
||||
|
||||
---
|
||||
|
||||
## 🔁 Why Default Jails Miss This
|
||||
|
||||
| Default Jail | What It Catches | Gap |
|
||||
|---|---|---|
|
||||
| `apache-badbots` | Bad User-Agent strings in access log | Doesn't look at error log; many scanners use normal UAs |
|
||||
| `apache-botsearch` | 404s for common exploit paths | Only matches access log 404s, not error log entries |
|
||||
| `apache-noscript` | Requests for non-existent scripts | Narrow regex, doesn't cover directory probes |
|
||||
| `apache-overflows` | Long request URIs | Only catches buffer overflow attempts |
|
||||
| `apache-invaliduri` | `AH10244` invalid URI encoding | Different error code — catches URL-encoded traversal, not directory scanning |
|
||||
|
||||
The `apache-dirscan` filter fills the gap by monitoring the error log for the three most common scanner signatures that slip through all default jails.
|
||||
|
||||
---
|
||||
|
||||
## ⚠️ Key Notes
|
||||
|
||||
- **`logpath` must point to the error log**, not the access log. All three patterns are logged to `error.log`.
|
||||
- **Adjust `logpath`** for your distribution: Debian/Ubuntu uses `/var/log/apache2/error.log`, RHEL/Fedora uses `/var/log/httpd/error_log`.
|
||||
- **The `allowipv6` warning** on reload is cosmetic (Fail2ban 1.0+) and can be ignored.
|
||||
- **Pair with `recidive`** to escalate repeat offenders to longer bans.
|
||||
|
||||
---
|
||||
|
||||
## 🔎 Quick Diagnostic Commands
|
||||
|
||||
```bash
|
||||
# Test filter against current error log
|
||||
fail2ban-regex /var/log/apache2/error.log /etc/fail2ban/filter.d/apache-dirscan.conf
|
||||
|
||||
# Check jail status
|
||||
fail2ban-client status apache-dirscan
|
||||
|
||||
# Watch bans in real time
|
||||
tail -f /var/log/fail2ban.log | grep apache-dirscan
|
||||
|
||||
# Count current error types
|
||||
grep -c "AH01276\|AH01630\|AH00135" /var/log/apache2/error.log
|
||||
```
|
||||
@@ -0,0 +1,73 @@
|
||||
# ClamAV Safe Scheduling on Live Servers
|
||||
|
||||
Running `clamscan` unthrottled on a live server will peg CPU until completion. On a small VPS (1 vCPU), a full recursive scan can sustain 70–100% CPU for an hour or more, degrading or taking down hosted services.
|
||||
|
||||
## The Problem
|
||||
|
||||
A common out-of-the-box ClamAV cron setup looks like this:
|
||||
|
||||
```cron
|
||||
0 1 * * 0 clamscan --infected --recursive / --exclude=/sys
|
||||
```
|
||||
|
||||
This runs at Linux's default scheduling priority (`nice 0`) with normal I/O priority. On a live server it will:
|
||||
|
||||
- Monopolize the CPU for the scan duration
|
||||
- Cause high I/O wait, degrading web serving, databases, and other services
|
||||
- Trigger monitoring alerts (e.g., Netdata `10min_cpu_usage`)
|
||||
|
||||
## The Fix
|
||||
|
||||
Throttle the scan with `nice` and `ionice`:
|
||||
|
||||
```cron
|
||||
0 1 * * 0 nice -n 19 ionice -c 3 clamscan --infected --recursive / --exclude=/sys
|
||||
```
|
||||
|
||||
| Flag | Meaning |
|
||||
|------|---------|
|
||||
| `nice -n 19` | Lowest CPU scheduling priority (range: -20 to 19) |
|
||||
| `ionice -c 3` | Idle I/O class — only uses disk when no other process needs it |
|
||||
|
||||
The scan will take longer but will not impact server performance.
|
||||
|
||||
## Applying the Fix
|
||||
|
||||
Edit root's crontab:
|
||||
|
||||
```bash
|
||||
crontab -e
|
||||
```
|
||||
|
||||
Or apply non-interactively:
|
||||
|
||||
```bash
|
||||
crontab -l | sed 's|clamscan|nice -n 19 ionice -c 3 clamscan|' | crontab -
|
||||
```
|
||||
|
||||
Verify:
|
||||
|
||||
```bash
|
||||
crontab -l | grep clam
|
||||
```
|
||||
|
||||
## Diagnosing a Runaway Scan
|
||||
|
||||
If CPU is already pegged, identify and kill the process:
|
||||
|
||||
```bash
|
||||
ps aux --sort=-%cpu | head -15
|
||||
# Look for clamscan
|
||||
kill <PID>
|
||||
```
|
||||
|
||||
## Notes
|
||||
|
||||
- `ionice -c 3` (Idle) requires Linux kernel ≥ 2.6.13 and CFQ/BFQ I/O scheduler. Works on most Ubuntu/Debian/Fedora systems.
|
||||
- On multi-core servers, consider also using `cpulimit` for a hard cap: `cpulimit -l 30 -- clamscan ...`
|
||||
- Always keep `--exclude=/sys` (and optionally `--exclude=/proc`, `--exclude=/dev`) to avoid scanning virtual filesystems.
|
||||
|
||||
## Related
|
||||
|
||||
- [ClamAV Documentation](https://docs.clamav.net/)
|
||||
- [[02-selfhosting/security/linux-server-hardening-checklist|Linux Server Hardening Checklist]]
|
||||
103
05-troubleshooting/selinux-dovecot-vmail-context.md
Normal file
103
05-troubleshooting/selinux-dovecot-vmail-context.md
Normal file
@@ -0,0 +1,103 @@
|
||||
# SELinux: Fixing Dovecot Mail Spool Context (/var/vmail)
|
||||
|
||||
If Dovecot is generating SELinux AVC denials and mail delivery or retrieval is broken on a Fedora/RHEL system with SELinux enforcing, the `/var/vmail` directory tree likely has incorrect file contexts.
|
||||
|
||||
## Symptoms
|
||||
|
||||
- Thousands of AVC denials in `/var/log/audit/audit.log` for Dovecot processes
|
||||
- Denials reference `var_t` context on files under `/var/vmail/`
|
||||
- Mail delivery may fail silently; IMAP folders may appear empty or inaccessible
|
||||
- `ausearch -m avc -ts recent` shows denials like:
|
||||
```
|
||||
type=AVC msg=audit(...): avc: denied { write } for pid=... comm="dovecot" name="..." scontext=system_u:system_r:dovecot_t:s0 tcontext=system_u:object_r:var_t:s0
|
||||
```
|
||||
|
||||
## Why It Happens
|
||||
|
||||
SELinux requires files to have the correct security context for the process that accesses them. When Postfix/Dovecot are installed on a fresh system and `/var/vmail` is created manually (or by the mail stack installer), the directory may inherit the default `var_t` context from `/var/` rather than the mail-specific `mail_spool_t` context Dovecot expects.
|
||||
|
||||
The correct context for the entire `/var/vmail` tree is `mail_spool_t` — including the `tmp/` subdirectories inside each Maildir folder.
|
||||
|
||||
> [!warning] Do NOT apply `dovecot_tmp_t` to Maildir `tmp/` directories
|
||||
> `dovecot_tmp_t` is for Dovecot's own process-level temp files, not for Maildir `tmp/` folders. Postfix's virtual delivery agent writes to `tmp/` when delivering new mail. Applying `dovecot_tmp_t` will block Postfix from delivering any mail, silently deferring all messages with `Permission denied`.
|
||||
|
||||
## Fix
|
||||
|
||||
### 1. Check Current Context
|
||||
|
||||
```bash
|
||||
ls -Zd /var/vmail/
|
||||
ls -Z /var/vmail/example.com/user/
|
||||
ls -Zd /var/vmail/example.com/user/tmp/
|
||||
```
|
||||
|
||||
If you see `var_t` instead of `mail_spool_t`, the contexts need to be set. If you see `dovecot_tmp_t` on `tmp/`, that needs to be corrected too.
|
||||
|
||||
### 2. Define the Correct File Context Rule
|
||||
|
||||
One rule covers everything — including `tmp/`:
|
||||
|
||||
```bash
|
||||
sudo semanage fcontext -a -t mail_spool_t "/var/vmail(/.*)?"
|
||||
```
|
||||
|
||||
If you previously added a `dovecot_tmp_t` rule for `tmp/` directories, remove it:
|
||||
|
||||
```bash
|
||||
# Check for an erroneous dovecot_tmp_t rule
|
||||
sudo semanage fcontext -l | grep vmail
|
||||
|
||||
# If you see one like "/var/vmail(/.*)*/tmp(/.*)?" with dovecot_tmp_t, delete it:
|
||||
sudo semanage fcontext -d "/var/vmail(/.*)*/tmp(/.*)?"
|
||||
```
|
||||
|
||||
### 3. Apply the Labels
|
||||
|
||||
```bash
|
||||
sudo restorecon -Rv /var/vmail
|
||||
```
|
||||
|
||||
This relabels all existing files. On a mail server with many users and messages, this may take a moment and will print every relabeled path.
|
||||
|
||||
### 4. Verify
|
||||
|
||||
```bash
|
||||
ls -Zd /var/vmail/
|
||||
ls -Zd /var/vmail/example.com/user/tmp/
|
||||
```
|
||||
|
||||
Both should show `mail_spool_t`:
|
||||
```
|
||||
system_u:object_r:mail_spool_t:s0 /var/vmail/
|
||||
system_u:object_r:mail_spool_t:s0 /var/vmail/example.com/user/tmp/
|
||||
```
|
||||
|
||||
### 5. Flush Deferred Mail
|
||||
|
||||
If mail was queued while the context was wrong, flush it:
|
||||
|
||||
```bash
|
||||
postqueue -f
|
||||
postqueue -p # should be empty shortly
|
||||
```
|
||||
|
||||
### 6. Check That Denials Stopped
|
||||
|
||||
```bash
|
||||
ausearch -m avc -ts recent | grep dovecot
|
||||
```
|
||||
|
||||
No output = no new denials.
|
||||
|
||||
## Key Notes
|
||||
|
||||
- **One rule is enough** — `"/var/vmail(/.*)?"` with `mail_spool_t` covers every file and directory under `/var/vmail`, including all `tmp/` subdirectories.
|
||||
- **`semanage fcontext` is persistent** — the rules survive reboots and `restorecon` calls. You only need to run `semanage` once.
|
||||
- **`restorecon` applies current rules to existing files** — run it after any `semanage` change and any time you manually create directories.
|
||||
- **New mail directories are labeled automatically** — SELinux applies the registered `semanage` rules to any new files created under `/var/vmail`.
|
||||
- **`var_t` context is the default for `/var/`** — any directory created under `/var/` without a specific `semanage` rule will inherit `var_t`. This is almost never correct for service data directories.
|
||||
|
||||
## Related
|
||||
|
||||
- [Linux Server Hardening Checklist](../02-selfhosting/security/linux-server-hardening-checklist.md)
|
||||
- [Docker & Caddy Recovery After Reboot (Fedora + SELinux)](docker-caddy-selinux-post-reboot-recovery.md)
|
||||
105
05-troubleshooting/storage/mdadm-usb-hub-disconnect-recovery.md
Normal file
105
05-troubleshooting/storage/mdadm-usb-hub-disconnect-recovery.md
Normal file
@@ -0,0 +1,105 @@
|
||||
# mdadm RAID Recovery After USB Hub Disconnect
|
||||
|
||||
A software RAID array managed by mdadm can appear to catastrophically fail when the drives are connected via USB rather than SATA. The array is fine — the hub dropped out. Here's how to diagnose and recover.
|
||||
|
||||
## Symptoms
|
||||
|
||||
- rsync or other I/O to the RAID mount returns `Input/output error`
|
||||
- `cat /proc/mdstat` shows `broken raid0` or `FAILED`
|
||||
- `mdadm --detail /dev/md0` shows `State: broken, FAILED`
|
||||
- `lsblk` no longer lists the RAID member drives (e.g. `sdd`, `sde` gone)
|
||||
- XFS (or other filesystem) logs in dmesg:
|
||||
```
|
||||
XFS (md0): log I/O error -5
|
||||
XFS (md0): Filesystem has been shut down due to log error (0x2).
|
||||
```
|
||||
- `smartctl -H /dev/sdd` returns `No such device`
|
||||
|
||||
## Why It Happens
|
||||
|
||||
If your RAID drives are in a USB enclosure (e.g. TerraMaster via ASMedia hub), a USB disconnect — triggered by a power fluctuation, plugging in another device, or a hub reset — causes mdadm to see the drives disappear. mdadm cannot distinguish a USB dropout from a physical drive failure, so it declares the array failed.
|
||||
|
||||
The failure message in dmesg will show `hostbyte=DID_ERROR` rather than a drive-level error:
|
||||
|
||||
```
|
||||
md/raid0md0: Disk failure on sdd1 detected, failing array.
|
||||
sd X:0:0:0: [sdd] Synchronize Cache(10) failed: Result: hostbyte=DID_ERROR driverbyte=DRIVER_OK
|
||||
```
|
||||
|
||||
`DID_ERROR` means the SCSI host adapter (USB controller) reported the error — the drives themselves are likely fine.
|
||||
|
||||
## Diagnosis
|
||||
|
||||
### 1. Check if the USB hub recovered
|
||||
|
||||
```bash
|
||||
lsblk -o NAME,SIZE,TYPE,FSTYPE,MODEL
|
||||
```
|
||||
|
||||
After a hub reconnects, drives will reappear — often with **new device names** (e.g. `sdd`/`sde` become `sdg`/`sdh`). Look for drives with `linux_raid_member` filesystem type.
|
||||
|
||||
```bash
|
||||
dmesg | grep -iE 'usb|disconnect|DID_ERROR' | tail -30
|
||||
```
|
||||
|
||||
A hub dropout looks like multiple devices disconnecting at the same time on the same USB port.
|
||||
|
||||
### 2. Confirm drives have intact superblocks
|
||||
|
||||
```bash
|
||||
mdadm --examine /dev/sdg1
|
||||
mdadm --examine /dev/sdh1
|
||||
```
|
||||
|
||||
If the superblocks are present and show matching UUID/array info, the data is intact.
|
||||
|
||||
## Recovery
|
||||
|
||||
### 1. Unmount and stop the degraded array
|
||||
|
||||
```bash
|
||||
umount /majorRAID # or wherever md0 is mounted
|
||||
mdadm --stop /dev/md0
|
||||
```
|
||||
|
||||
If umount fails due to a busy mount or already-failed filesystem, it may already be unmounted by the kernel. Proceed with `--stop`.
|
||||
|
||||
### 2. Reassemble with the new device names
|
||||
|
||||
```bash
|
||||
mdadm --assemble /dev/md0 /dev/sdg1 /dev/sdh1
|
||||
```
|
||||
|
||||
mdadm matches drives by their superblock UUID, not device name. As long as both drives are present the assembly will succeed regardless of what they're called.
|
||||
|
||||
### 3. Mount and verify
|
||||
|
||||
```bash
|
||||
mount /dev/md0 /majorRAID
|
||||
df -h /majorRAID
|
||||
ls /majorRAID
|
||||
```
|
||||
|
||||
If the filesystem mounts and data is visible, recovery is complete.
|
||||
|
||||
### 4. Create or update /etc/mdadm.conf
|
||||
|
||||
If `/etc/mdadm.conf` doesn't exist (or references old device names), update it:
|
||||
|
||||
```bash
|
||||
mdadm --detail --scan > /etc/mdadm.conf
|
||||
cat /etc/mdadm.conf
|
||||
```
|
||||
|
||||
The output uses UUID rather than device names — the array will reassemble correctly on reboot even if drive letters change again.
|
||||
|
||||
## Prevention
|
||||
|
||||
The root cause is drives on USB rather than SATA. Short of moving the drives to a SATA controller, options are limited. When planning a migration off the RAID array (e.g. to SnapRAID + MergerFS), prioritize getting drives onto SATA connections.
|
||||
|
||||
> [!warning] RAID 0 has no redundancy. A USB dropout that causes the array to fail mid-write could corrupt data even if the drives themselves are healthy. Keep current backups before any maintenance involving the enclosure.
|
||||
|
||||
## Related
|
||||
|
||||
- [SnapRAID & MergerFS Storage Setup](../../01-linux/storage/snapraid-mergerfs-setup.md)
|
||||
- [rsync Backup Patterns](../../02-selfhosting/storage-backup/rsync-backup-patterns.md)
|
||||
93
05-troubleshooting/systemd/session-scope-failure-at-login.md
Normal file
93
05-troubleshooting/systemd/session-scope-failure-at-login.md
Normal file
@@ -0,0 +1,93 @@
|
||||
# Systemd Session Scope Fails at Login (`session-cN.scope`)
|
||||
|
||||
After SSH login, systemd reports a failed transient unit like `session-c1.scope`. The MOTD or login banner shows `Failed Units: 1 — session-c1.scope`. This is a harmless race condition, not a real service failure.
|
||||
|
||||
## Symptoms
|
||||
|
||||
- Login banner or MOTD displays:
|
||||
```
|
||||
Failed Units: 1
|
||||
session-c1.scope
|
||||
```
|
||||
- `systemctl list-units --failed` shows one or more `session-cN.scope` units in a failed state
|
||||
- The system is otherwise healthy — no services are actually broken
|
||||
|
||||
## What Causes It
|
||||
|
||||
A transient session scope is created by systemd-logind every time a user logs in (SSH, console, etc.). The scope tracks the login session's process group via cgroups.
|
||||
|
||||
The failure occurs when a login process (PID) exits before systemd can move it into the target cgroup. This is a race condition triggered by:
|
||||
|
||||
- **Short-lived SSH connections** — automated probes, health checks, or monitoring tools that connect and immediately disconnect
|
||||
- **Sessions that disconnect before PAM completes** — network interruptions or aggressive client timeouts
|
||||
- **Cron jobs or scripts** that create transient SSH sessions
|
||||
|
||||
systemd logs the sequence:
|
||||
|
||||
1. `PID N vanished before we could move it to target cgroup`
|
||||
2. `No PIDs left to attach to the scope's control group, refusing.`
|
||||
3. Unit enters `failed (Result: resources)` state
|
||||
|
||||
Because session scopes are transient (not backed by a unit file), the failed state lingers until manually cleared.
|
||||
|
||||
## How to Diagnose
|
||||
|
||||
### 1. Check the failed unit
|
||||
|
||||
```bash
|
||||
systemctl status session-c1.scope
|
||||
```
|
||||
|
||||
Look for:
|
||||
|
||||
```
|
||||
Active: failed (Result: resources)
|
||||
```
|
||||
|
||||
And in the log output:
|
||||
|
||||
```
|
||||
PID <N> vanished before we could move it to target cgroup
|
||||
No PIDs left to attach to the scope's control group, refusing.
|
||||
```
|
||||
|
||||
### 2. Confirm no real failures
|
||||
|
||||
```bash
|
||||
systemctl list-units --failed
|
||||
```
|
||||
|
||||
If the only failed units are `session-cN.scope` entries, the system is healthy.
|
||||
|
||||
## Fix
|
||||
|
||||
Reset the failed unit:
|
||||
|
||||
```bash
|
||||
systemctl reset-failed session-c1.scope
|
||||
```
|
||||
|
||||
To clear all failed session scopes at once:
|
||||
|
||||
```bash
|
||||
systemctl reset-failed 'session-*.scope'
|
||||
```
|
||||
|
||||
Verify:
|
||||
|
||||
```bash
|
||||
systemctl list-units --failed
|
||||
```
|
||||
|
||||
Should report 0 failed units.
|
||||
|
||||
## Notes
|
||||
|
||||
- This is a known systemd behavior and not indicative of a real problem. It can be safely ignored or cleared whenever it appears.
|
||||
- If it recurs frequently, investigate what is creating short-lived SSH sessions — common culprits include monitoring agents (Netdata, Nagios), automated backup scripts, or SSH brute-force attempts.
|
||||
- The `c` in `session-c1.scope` indicates a **console/SSH session** (as opposed to graphical sessions which use different prefixes). The number increments with each new session.
|
||||
- Applies to **Fedora, Ubuntu, and any systemd-based Linux distribution**.
|
||||
|
||||
## Related
|
||||
|
||||
- [[gitea-runner-boot-race-network-target]] — Another systemd race condition involving service startup ordering
|
||||
@@ -31,7 +31,7 @@ DNS record and Caddy entry have been removed.
|
||||
|
||||
## Content
|
||||
|
||||
- 37 articles across 5 domains
|
||||
- 42 articles across 5 domains
|
||||
- Source of truth: `MajorVault/20-Projects/MajorTwin/08-Wiki/`
|
||||
- Deployed via Gitea webhook (push from MajorAir → auto-pull on majorlab)
|
||||
|
||||
@@ -63,7 +63,7 @@ rsync -av --include="*.md" --include="*/" --exclude="*" \
|
||||
|
||||
---
|
||||
|
||||
*Updated 2026-03-14*
|
||||
*Updated 2026-03-15*
|
||||
|
||||
## Canonical Update Workflow
|
||||
|
||||
@@ -102,3 +102,46 @@ Every time a new article is added, the following **MUST** be updated to maintain
|
||||
- [[MajorRig|MajorRig]] — alternative git push host (WSL2 path documented)
|
||||
- [[03-11-2026|Status Update 2026-03-11]] — deployment date journal entry
|
||||
- [[03-13-2026|Status Update 2026-03-13]] — content expansion and SUMMARY.md sync
|
||||
|
||||
---
|
||||
|
||||
## Session Update — 2026-03-16
|
||||
|
||||
**Article count:** 45 (was 42)
|
||||
|
||||
**New articles added:**
|
||||
- `01-linux/distro-specific/wsl2-rebuild-fedora43-training-env.md` — full MajorTwin training env rebuild guide
|
||||
- `01-linux/distro-specific/wsl2-backup-powershell.md` — WSL2 backup via PowerShell scheduled task
|
||||
- `02-selfhosting/security/ansible-unattended-upgrades-fleet.md` — standardizing unattended-upgrades across Ubuntu fleet
|
||||
|
||||
**SUMMARY.md:** Updated to include all 3 new articles. Run SUMMARY.md dedup script if duplicate content appears (see board file cleanup pattern).
|
||||
|
||||
**Updated:** `updated: 2026-03-16`
|
||||
|
||||
## Session Update — 2026-03-17
|
||||
|
||||
**Article count:** 47 (was 45)
|
||||
|
||||
**New articles added:**
|
||||
- `05-troubleshooting/networking/windows-sshd-stops-after-reboot.md` — Windows OpenSSH sshd not starting after reboot
|
||||
- `05-troubleshooting/ollama-macos-sleep-tailscale-disconnect.md` — Ollama drops off Tailscale when MajorMac sleeps
|
||||
|
||||
**Updated:** `updated: 2026-03-17`
|
||||
|
||||
## Session Update — 2026-03-18 (morning)
|
||||
|
||||
**Article count:** 48 (was 47)
|
||||
|
||||
**New articles added:**
|
||||
- `02-selfhosting/monitoring/netdata-docker-health-alarm-tuning.md` — tuning docker_container_unhealthy alarm to prevent flapping during Nextcloud AIO updates
|
||||
|
||||
**Updated:** `updated: 2026-03-18`
|
||||
|
||||
## Session Update — 2026-03-18 (afternoon)
|
||||
|
||||
**Article count:** 49 (was 48)
|
||||
|
||||
**New articles added:**
|
||||
- `02-selfhosting/monitoring/netdata-new-server-setup.md` — full Netdata deployment guide: install via kickstart.sh, email notification config, Netdata Cloud claim
|
||||
|
||||
**Updated:** `updated: 2026-03-18`
|
||||
|
||||
39
README.md
39
README.md
@@ -2,18 +2,18 @@
|
||||
|
||||
> A growing reference of Linux, self-hosting, open source, streaming, and troubleshooting guides. Written by MajorLinux. Used by MajorTwin.
|
||||
>
|
||||
**Last updated:** 2026-03-14
|
||||
**Article count:** 37
|
||||
**Last updated:** 2026-03-18
|
||||
**Article count:** 49
|
||||
|
||||
## Domains
|
||||
|
||||
| Domain | Folder | Articles |
|
||||
|---|---|---|
|
||||
| 🐧 Linux & Sysadmin | `01-linux/` | 9 |
|
||||
| 🏠 Self-Hosting & Homelab | `02-selfhosting/` | 8 |
|
||||
| 🐧 Linux & Sysadmin | `01-linux/` | 11 |
|
||||
| 🏠 Self-Hosting & Homelab | `02-selfhosting/` | 11 |
|
||||
| 🔓 Open Source Tools | `03-opensource/` | 9 |
|
||||
| 🎙️ Streaming & Podcasting | `04-streaming/` | 1 |
|
||||
| 🔧 General Troubleshooting | `05-troubleshooting/` | 10 |
|
||||
| 🎙️ Streaming & Podcasting | `04-streaming/` | 2 |
|
||||
| 🔧 General Troubleshooting | `05-troubleshooting/` | 16 |
|
||||
|
||||
---
|
||||
|
||||
@@ -41,6 +41,8 @@
|
||||
### Distro-Specific
|
||||
- [Linux Distro Guide for Beginners](01-linux/distro-specific/linux-distro-guide-beginners.md) — Ubuntu recommendation, distro comparison, desktop environments
|
||||
- [WSL2 Instance Migration to Fedora 43](01-linux/distro-specific/wsl2-instance-migration-fedora43.md) — moving WSL2 VHDX from C: to another drive
|
||||
- [WSL2 Training Environment Rebuild (Fedora 43)](01-linux/distro-specific/wsl2-rebuild-fedora43-training-env.md) — rebuilding the MajorTwin training env in WSL2 from scratch
|
||||
- [WSL2 Backup via PowerShell Scheduled Task](01-linux/distro-specific/wsl2-backup-powershell.md) — automating WSL2 exports on a schedule using PowerShell
|
||||
|
||||
---
|
||||
|
||||
@@ -62,9 +64,12 @@
|
||||
|
||||
### Monitoring
|
||||
- [Tuning Netdata Web Log Alerts](02-selfhosting/monitoring/tuning-netdata-web-log-alerts.md) — tuning web_log_1m_redirects threshold for HTTPS-forcing servers
|
||||
- [Tuning Netdata Docker Health Alarms](02-selfhosting/monitoring/netdata-docker-health-alarm-tuning.md) — preventing false alerts during nightly Nextcloud AIO container update cycles
|
||||
- [Deploying Netdata to a New Server](02-selfhosting/monitoring/netdata-new-server-setup.md) — install, email notifications, and Netdata Cloud claim for Ubuntu/Debian servers
|
||||
|
||||
### Security
|
||||
- [Linux Server Hardening Checklist](02-selfhosting/security/linux-server-hardening-checklist.md) — non-root user, SSH key auth, sshd_config, firewall, fail2ban
|
||||
- [Standardizing unattended-upgrades with Ansible](02-selfhosting/security/ansible-unattended-upgrades-fleet.md) — fleet-wide automatic security updates across Ubuntu servers
|
||||
|
||||
---
|
||||
|
||||
@@ -96,12 +101,16 @@
|
||||
### OBS Studio
|
||||
- [OBS Studio Setup & Encoding](04-streaming/obs/obs-studio-setup-encoding.md) — installation, NVENC/x264 settings, scene setup, audio filters, Linux Wayland notes
|
||||
|
||||
### Plex
|
||||
- [Plex 4K Codec Compatibility (Apple TV)](04-streaming/plex/plex-4k-codec-compatibility.md) — AV1/VP9 vs HEVC, batch conversion script, yt-dlp auto-convert hook
|
||||
|
||||
---
|
||||
|
||||
## 🔧 General Troubleshooting
|
||||
|
||||
- [Apache Outage: Fail2ban Self-Ban + Missing iptables Rules](05-troubleshooting/networking/fail2ban-self-ban-apache-outage.md) — diagnosing and fixing Apache outages caused by missing firewall rules and Fail2ban self-bans
|
||||
- [Mail Client Stops Receiving: Fail2ban IMAP Self-Ban](05-troubleshooting/networking/fail2ban-imap-self-ban-mail-client.md) — diagnosing why one device stops receiving email when the mail server is healthy
|
||||
- [firewalld: Mail Ports Wiped After Reload](05-troubleshooting/networking/firewalld-mail-ports-reset.md) — recovering IMAP and webmail after firewalld reload drops all mail service rules
|
||||
- [Docker & Caddy Recovery After Reboot (Fedora + SELinux)](05-troubleshooting/docker-caddy-selinux-post-reboot-recovery.md) — fixing docker.socket, SELinux port blocks, and httpd_can_network_connect after reboot
|
||||
- [ISP SNI Filtering with Caddy](05-troubleshooting/isp-sni-filtering-caddy.md) — troubleshooting why wiki.majorshouse.com was blocked by Google Fiber
|
||||
- [Obsidian Cache Hang Recovery](05-troubleshooting/obsidian-cache-hang-recovery.md) — resolving "Loading cache" hang in Obsidian by cleaning Electron app data and ML artifacts
|
||||
@@ -109,6 +118,11 @@
|
||||
- [yt-dlp YouTube JS Challenge Fix on Fedora](05-troubleshooting/yt-dlp-fedora-js-challenge.md) — fixing YouTube JS challenge solver errors and missing formats on Fedora
|
||||
- [Gemini CLI Manual Update](05-troubleshooting/gemini-cli-manual-update.md) — how to manually update the Gemini CLI when automatic updates fail
|
||||
- [MajorWiki Setup & Pipeline](05-troubleshooting/majwiki-setup-and-pipeline.md) — setting up MajorWiki and the Obsidian → Gitea → MkDocs publishing pipeline
|
||||
- [Gitea Actions Runner: Boot Race Condition Fix](05-troubleshooting/gitea-runner-boot-race-network-target.md) — fixing act_runner crash loop on boot caused by DNS not ready at startup
|
||||
- [SELinux: Fixing Dovecot Mail Spool Context (/var/vmail)](05-troubleshooting/selinux-dovecot-vmail-context.md) — fixing thousands of AVC denials when /var/vmail has wrong SELinux context
|
||||
- [mdadm RAID Recovery After USB Hub Disconnect](05-troubleshooting/storage/mdadm-usb-hub-disconnect-recovery.md) — diagnosing and recovering a failed mdadm array caused by a USB hub dropout
|
||||
- [Windows OpenSSH Server (sshd) Stops After Reboot](05-troubleshooting/networking/windows-sshd-stops-after-reboot.md) — fixing sshd not running after reboot due to Manual startup type
|
||||
- [Ollama Drops Off Tailscale When Mac Sleeps](05-troubleshooting/ollama-macos-sleep-tailscale-disconnect.md) — keeping Ollama reachable over Tailscale by disabling macOS sleep on AC power
|
||||
|
||||
---
|
||||
|
||||
@@ -116,6 +130,19 @@
|
||||
|
||||
| Date | Article | Domain |
|
||||
|---|---|---|
|
||||
| 2026-03-18 | [Deploying Netdata to a New Server](02-selfhosting/monitoring/netdata-new-server-setup.md) | Self-Hosting |
|
||||
| 2026-03-18 | [Tuning Netdata Docker Health Alarms](02-selfhosting/monitoring/netdata-docker-health-alarm-tuning.md) | Self-Hosting |
|
||||
| 2026-03-17 | [Ollama Drops Off Tailscale When Mac Sleeps](05-troubleshooting/ollama-macos-sleep-tailscale-disconnect.md) | Troubleshooting |
|
||||
| 2026-03-17 | [Windows OpenSSH Server (sshd) Stops After Reboot](05-troubleshooting/networking/windows-sshd-stops-after-reboot.md) | Troubleshooting |
|
||||
| 2026-03-16 | [Standardizing unattended-upgrades with Ansible](02-selfhosting/security/ansible-unattended-upgrades-fleet.md) | Self-Hosting |
|
||||
| 2026-03-16 | [WSL2 Training Environment Rebuild (Fedora 43)](01-linux/distro-specific/wsl2-rebuild-fedora43-training-env.md) | Linux |
|
||||
| 2026-03-16 | [WSL2 Backup via PowerShell Scheduled Task](01-linux/distro-specific/wsl2-backup-powershell.md) | Linux |
|
||||
| 2026-03-15 | [firewalld: Mail Ports Wiped After Reload](05-troubleshooting/networking/firewalld-mail-ports-reset.md) | Troubleshooting |
|
||||
| 2026-03-15 | [Plex 4K Codec Compatibility (Apple TV)](04-streaming/plex/plex-4k-codec-compatibility.md) | Streaming |
|
||||
| 2026-03-15 | [mdadm RAID Recovery After USB Hub Disconnect](05-troubleshooting/storage/mdadm-usb-hub-disconnect-recovery.md) | Troubleshooting |
|
||||
| 2026-03-15 | [yt-dlp: Video Downloading](03-opensource/media-creative/yt-dlp.md) | Open Source |
|
||||
| 2026-03-14 | [SELinux: Fixing Dovecot Mail Spool Context (/var/vmail)](05-troubleshooting/selinux-dovecot-vmail-context.md) | Troubleshooting |
|
||||
| 2026-03-14 | [Gitea Actions Runner: Boot Race Condition Fix](05-troubleshooting/gitea-runner-boot-race-network-target.md) | Troubleshooting |
|
||||
| 2026-03-14 | [Mail Client Stops Receiving: Fail2ban IMAP Self-Ban](05-troubleshooting/networking/fail2ban-imap-self-ban-mail-client.md) | Troubleshooting |
|
||||
| 2026-03-14 | [SearXNG: Private Self-Hosted Search](03-opensource/alternatives/searxng.md) | Open Source |
|
||||
| 2026-03-14 | [FreshRSS: Self-Hosted RSS Reader](03-opensource/alternatives/freshrss.md) | Open Source |
|
||||
|
||||
21
SUMMARY.md
21
SUMMARY.md
@@ -9,15 +9,23 @@
|
||||
* [SnapRAID & MergerFS Storage Setup](01-linux/storage/snapraid-mergerfs-setup.md)
|
||||
* [Linux Distro Guide for Beginners](01-linux/distro-specific/linux-distro-guide-beginners.md)
|
||||
* [WSL2 Instance Migration to Fedora 43](01-linux/distro-specific/wsl2-instance-migration-fedora43.md)
|
||||
* [WSL2 Training Environment Rebuild](01-linux/distro-specific/wsl2-rebuild-fedora43-training-env.md)
|
||||
* [WSL2 Backup via PowerShell](01-linux/distro-specific/wsl2-backup-powershell.md)
|
||||
* [Self-Hosting & Homelab](02-selfhosting/index.md)
|
||||
* [Self-Hosting Starter Guide](02-selfhosting/docker/self-hosting-starter-guide.md)
|
||||
* [Docker vs VMs for the Homelab](02-selfhosting/docker/docker-vs-vms-homelab.md)
|
||||
* [Debugging Broken Docker Containers](02-selfhosting/docker/debugging-broken-docker-containers.md)
|
||||
* [Docker Healthchecks](02-selfhosting/docker/docker-healthchecks.md)
|
||||
* [Setting Up Caddy as a Reverse Proxy](02-selfhosting/reverse-proxy/setting-up-caddy-reverse-proxy.md)
|
||||
* [Tailscale for Homelab Remote Access](02-selfhosting/dns-networking/tailscale-homelab-remote-access.md)
|
||||
* [rsync Backup Patterns](02-selfhosting/storage-backup/rsync-backup-patterns.md)
|
||||
* [Tuning Netdata Web Log Alerts](02-selfhosting/monitoring/tuning-netdata-web-log-alerts.md)
|
||||
* [Tuning Netdata Docker Health Alarms](02-selfhosting/monitoring/netdata-docker-health-alarm-tuning.md)
|
||||
* [Deploying Netdata to a New Server](02-selfhosting/monitoring/netdata-new-server-setup.md)
|
||||
* [Netdata SELinux AVC Denial Monitoring](02-selfhosting/monitoring/netdata-selinux-avc-chart.md)
|
||||
* [Netdata n8n Enriched Alert Emails](02-selfhosting/monitoring/netdata-n8n-enriched-alerts.md)
|
||||
* [Linux Server Hardening Checklist](02-selfhosting/security/linux-server-hardening-checklist.md)
|
||||
* [Standardizing unattended-upgrades with Ansible](02-selfhosting/security/ansible-unattended-upgrades-fleet.md)
|
||||
* [Open Source & Alternatives](03-opensource/index.md)
|
||||
* [SearXNG: Private Self-Hosted Search](03-opensource/alternatives/searxng.md)
|
||||
* [FreshRSS: Self-Hosted RSS Reader](03-opensource/alternatives/freshrss.md)
|
||||
@@ -30,9 +38,15 @@
|
||||
* [yt-dlp: Video Downloading](03-opensource/media-creative/yt-dlp.md)
|
||||
* [Streaming & Podcasting](04-streaming/index.md)
|
||||
* [OBS Studio Setup & Encoding](04-streaming/obs/obs-studio-setup-encoding.md)
|
||||
* [Plex 4K Codec Compatibility (Apple TV)](04-streaming/plex/plex-4k-codec-compatibility.md)
|
||||
* [Troubleshooting](05-troubleshooting/index.md)
|
||||
* [Apache Outage: Fail2ban Self-Ban + Missing iptables Rules](05-troubleshooting/networking/fail2ban-self-ban-apache-outage.md)
|
||||
* [Mail Client Stops Receiving: Fail2ban IMAP Self-Ban](05-troubleshooting/networking/fail2ban-imap-self-ban-mail-client.md)
|
||||
* [firewalld: Mail Ports Wiped After Reload](05-troubleshooting/networking/firewalld-mail-ports-reset.md)
|
||||
* [Tailscale SSH: Unexpected Re-Authentication Prompt](05-troubleshooting/networking/tailscale-ssh-reauth-prompt.md)
|
||||
* [Fail2ban & UFW Rule Bloat Cleanup](05-troubleshooting/networking/fail2ban-ufw-rule-bloat-cleanup.md)
|
||||
* [Custom Fail2ban Jail: Apache Directory Scanning](05-troubleshooting/security/apache-dirscan-fail2ban-jail.md)
|
||||
* [Nextcloud AIO Unhealthy 20h After Nightly Update](05-troubleshooting/docker/nextcloud-aio-unhealthy-20h-stuck.md)
|
||||
* [Docker & Caddy Recovery After Reboot (Fedora + SELinux)](05-troubleshooting/docker-caddy-selinux-post-reboot-recovery.md)
|
||||
* [ISP SNI Filtering with Caddy](05-troubleshooting/isp-sni-filtering-caddy.md)
|
||||
* [Obsidian Vault Recovery — Loading Cache Hang](05-troubleshooting/obsidian-cache-hang-recovery.md)
|
||||
@@ -40,3 +54,10 @@
|
||||
* [yt-dlp YouTube JS Challenge Fix on Fedora](05-troubleshooting/yt-dlp-fedora-js-challenge.md)
|
||||
* [Gemini CLI Manual Update](05-troubleshooting/gemini-cli-manual-update.md)
|
||||
* [MajorWiki Setup & Publishing Pipeline](05-troubleshooting/majwiki-setup-and-pipeline.md)
|
||||
* [Gitea Actions Runner: Boot Race Condition Fix](05-troubleshooting/gitea-runner-boot-race-network-target.md)
|
||||
* [SELinux: Fixing Dovecot Mail Spool Context (/var/vmail)](05-troubleshooting/selinux-dovecot-vmail-context.md)
|
||||
* [mdadm RAID Recovery After USB Hub Disconnect](05-troubleshooting/storage/mdadm-usb-hub-disconnect-recovery.md)
|
||||
* [Windows OpenSSH Server (sshd) Stops After Reboot](05-troubleshooting/networking/windows-sshd-stops-after-reboot.md)
|
||||
* [Ollama Drops Off Tailscale When Mac Sleeps](05-troubleshooting/ollama-macos-sleep-tailscale-disconnect.md)
|
||||
* [ClamAV CPU Spike: Safe Scheduling with nice/ionice](05-troubleshooting/security/clamscan-cpu-spike-nice-ionice.md)
|
||||
* [Ansible: Vault Password File Not Found](05-troubleshooting/ansible-vault-password-file-missing.md)
|
||||
|
||||
46
index.md
46
index.md
@@ -2,18 +2,19 @@
|
||||
|
||||
> A growing reference of Linux, self-hosting, open source, streaming, and troubleshooting guides. Written by MajorLinux. Used by MajorTwin.
|
||||
>
|
||||
> **Last updated:** 2026-03-14
|
||||
> **Article count:** 37
|
||||
> **Last updated:** 2026-03-27
|
||||
> **Article count:** 53
|
||||
|
||||
## Domains
|
||||
|
||||
| Domain | Folder | Articles |
|
||||
|---|---|---|
|
||||
| 🐧 Linux & Sysadmin | `01-linux/` | 9 |
|
||||
| 🏠 Self-Hosting & Homelab | `02-selfhosting/` | 8 |
|
||||
| 🐧 Linux & Sysadmin | `01-linux/` | 11 |
|
||||
| 🏠 Self-Hosting & Homelab | `02-selfhosting/` | 11 |
|
||||
| 🔓 Open Source Tools | `03-opensource/` | 9 |
|
||||
| 🎙️ Streaming & Podcasting | `04-streaming/` | 1 |
|
||||
| 🔧 General Troubleshooting | `05-troubleshooting/` | 10 |
|
||||
| 🎙️ Streaming & Podcasting | `04-streaming/` | 2 |
|
||||
| 🔧 General Troubleshooting | `05-troubleshooting/` | 17 |
|
||||
|
||||
|
||||
---
|
||||
|
||||
@@ -41,6 +42,8 @@
|
||||
### Distro-Specific
|
||||
- [Linux Distro Guide for Beginners](01-linux/distro-specific/linux-distro-guide-beginners.md) — Ubuntu recommendation, distro comparison, desktop environments
|
||||
- [WSL2 Instance Migration to Fedora 43](01-linux/distro-specific/wsl2-instance-migration-fedora43.md) — moving WSL2 VHDX from C: to another drive
|
||||
- [WSL2 Training Environment Rebuild (Fedora 43)](01-linux/distro-specific/wsl2-rebuild-fedora43-training-env.md) — rebuilding the MajorTwin training env in WSL2 from scratch
|
||||
- [WSL2 Backup via PowerShell Scheduled Task](01-linux/distro-specific/wsl2-backup-powershell.md) — automating WSL2 exports on a schedule using PowerShell
|
||||
|
||||
---
|
||||
|
||||
@@ -62,9 +65,12 @@
|
||||
|
||||
### Monitoring
|
||||
- [Tuning Netdata Web Log Alerts](02-selfhosting/monitoring/tuning-netdata-web-log-alerts.md) — tuning web_log_1m_redirects threshold for HTTPS-forcing servers
|
||||
- [Tuning Netdata Docker Health Alarms](02-selfhosting/monitoring/netdata-docker-health-alarm-tuning.md) — preventing false alerts during nightly Nextcloud AIO container update cycles
|
||||
- [Deploying Netdata to a New Server](02-selfhosting/monitoring/netdata-new-server-setup.md) — install, email notifications, and Netdata Cloud claim for Ubuntu/Debian servers
|
||||
|
||||
### Security
|
||||
- [Linux Server Hardening Checklist](02-selfhosting/security/linux-server-hardening-checklist.md) — non-root user, SSH key auth, sshd_config, firewall, fail2ban
|
||||
- [Standardizing unattended-upgrades with Ansible](02-selfhosting/security/ansible-unattended-upgrades-fleet.md) — fleet-wide automatic security updates across Ubuntu servers
|
||||
|
||||
---
|
||||
|
||||
@@ -96,12 +102,16 @@
|
||||
### OBS Studio
|
||||
- [OBS Studio Setup & Encoding](04-streaming/obs/obs-studio-setup-encoding.md) — installation, NVENC/x264 settings, scene setup, audio filters, Linux Wayland notes
|
||||
|
||||
### Plex
|
||||
- [Plex 4K Codec Compatibility (Apple TV)](04-streaming/plex/plex-4k-codec-compatibility.md) — AV1/VP9 vs HEVC, batch conversion script, yt-dlp auto-convert hook
|
||||
|
||||
---
|
||||
|
||||
## 🔧 General Troubleshooting
|
||||
|
||||
- [Apache Outage: Fail2ban Self-Ban + Missing iptables Rules](05-troubleshooting/networking/fail2ban-self-ban-apache-outage.md) — diagnosing and fixing Apache outages caused by missing firewall rules and Fail2ban self-bans
|
||||
- [Mail Client Stops Receiving: Fail2ban IMAP Self-Ban](05-troubleshooting/networking/fail2ban-imap-self-ban-mail-client.md) — diagnosing why one device stops receiving email when the mail server is healthy
|
||||
- [firewalld: Mail Ports Wiped After Reload](05-troubleshooting/networking/firewalld-mail-ports-reset.md) — recovering IMAP and webmail after firewalld reload drops all mail service rules
|
||||
- [Docker & Caddy Recovery After Reboot (Fedora + SELinux)](05-troubleshooting/docker-caddy-selinux-post-reboot-recovery.md) — fixing docker.socket, SELinux port blocks, and httpd_can_network_connect after reboot
|
||||
- [ISP SNI Filtering with Caddy](05-troubleshooting/isp-sni-filtering-caddy.md) — troubleshooting why wiki.majorshouse.com was blocked by Google Fiber
|
||||
- [Obsidian Cache Hang Recovery](05-troubleshooting/obsidian-cache-hang-recovery.md) — resolving "Loading cache" hang in Obsidian by cleaning Electron app data and ML artifacts
|
||||
@@ -109,6 +119,13 @@
|
||||
- [yt-dlp YouTube JS Challenge Fix on Fedora](05-troubleshooting/yt-dlp-fedora-js-challenge.md) — fixing YouTube JS challenge solver errors and missing formats on Fedora
|
||||
- [Gemini CLI Manual Update](05-troubleshooting/gemini-cli-manual-update.md) — how to manually update the Gemini CLI when automatic updates fail
|
||||
- [MajorWiki Setup & Pipeline](05-troubleshooting/majwiki-setup-and-pipeline.md) — setting up MajorWiki and the Obsidian → Gitea → MkDocs publishing pipeline
|
||||
- [Gitea Actions Runner: Boot Race Condition Fix](05-troubleshooting/gitea-runner-boot-race-network-target.md) — fixing act_runner crash loop on boot caused by DNS not ready at startup
|
||||
- [SELinux: Fixing Dovecot Mail Spool Context (/var/vmail)](05-troubleshooting/selinux-dovecot-vmail-context.md) — fixing thousands of AVC denials when /var/vmail has wrong SELinux context
|
||||
- [mdadm RAID Recovery After USB Hub Disconnect](05-troubleshooting/storage/mdadm-usb-hub-disconnect-recovery.md) — diagnosing and recovering a failed mdadm array caused by a USB hub dropout
|
||||
- [Windows OpenSSH Server (sshd) Stops After Reboot](05-troubleshooting/networking/windows-sshd-stops-after-reboot.md) — fixing sshd not running after reboot due to Manual startup type
|
||||
- [Ollama Drops Off Tailscale When Mac Sleeps](05-troubleshooting/ollama-macos-sleep-tailscale-disconnect.md) — keeping Ollama reachable over Tailscale by disabling macOS sleep on AC power
|
||||
- [Ansible: Vault Password File Not Found](05-troubleshooting/ansible-vault-password-file-missing.md) — fixing the missing vault_pass file error when running ansible-playbook
|
||||
|
||||
|
||||
---
|
||||
|
||||
@@ -116,6 +133,23 @@
|
||||
|
||||
| Date | Article | Domain |
|
||||
|---|---|---|
|
||||
<<<<<<< HEAD
|
||||
| 2026-03-23 | [Ansible: Vault Password File Not Found](05-troubleshooting/ansible-vault-password-file-missing.md) | Troubleshooting |
|
||||
=======
|
||||
>>>>>>> 335c4b57f20799b3a968460f4f6aa17a8b706fdc
|
||||
| 2026-03-18 | [Deploying Netdata to a New Server](02-selfhosting/monitoring/netdata-new-server-setup.md) | Self-Hosting |
|
||||
| 2026-03-18 | [Tuning Netdata Docker Health Alarms](02-selfhosting/monitoring/netdata-docker-health-alarm-tuning.md) | Self-Hosting |
|
||||
| 2026-03-17 | [Ollama Drops Off Tailscale When Mac Sleeps](05-troubleshooting/ollama-macos-sleep-tailscale-disconnect.md) | Troubleshooting |
|
||||
| 2026-03-17 | [Windows OpenSSH Server (sshd) Stops After Reboot](05-troubleshooting/networking/windows-sshd-stops-after-reboot.md) | Troubleshooting |
|
||||
| 2026-03-16 | [Standardizing unattended-upgrades with Ansible](02-selfhosting/security/ansible-unattended-upgrades-fleet.md) | Self-Hosting |
|
||||
| 2026-03-16 | [WSL2 Training Environment Rebuild (Fedora 43)](01-linux/distro-specific/wsl2-rebuild-fedora43-training-env.md) | Linux |
|
||||
| 2026-03-16 | [WSL2 Backup via PowerShell Scheduled Task](01-linux/distro-specific/wsl2-backup-powershell.md) | Linux |
|
||||
| 2026-03-15 | [firewalld: Mail Ports Wiped After Reload](05-troubleshooting/networking/firewalld-mail-ports-reset.md) | Troubleshooting |
|
||||
| 2026-03-15 | [Plex 4K Codec Compatibility (Apple TV)](04-streaming/plex/plex-4k-codec-compatibility.md) | Streaming |
|
||||
| 2026-03-15 | [mdadm RAID Recovery After USB Hub Disconnect](05-troubleshooting/storage/mdadm-usb-hub-disconnect-recovery.md) | Troubleshooting |
|
||||
| 2026-03-15 | [yt-dlp: Video Downloading](03-opensource/media-creative/yt-dlp.md) | Open Source |
|
||||
| 2026-03-14 | [SELinux: Fixing Dovecot Mail Spool Context (/var/vmail)](05-troubleshooting/selinux-dovecot-vmail-context.md) | Troubleshooting |
|
||||
| 2026-03-14 | [Gitea Actions Runner: Boot Race Condition Fix](05-troubleshooting/gitea-runner-boot-race-network-target.md) | Troubleshooting |
|
||||
| 2026-03-14 | [Mail Client Stops Receiving: Fail2ban IMAP Self-Ban](05-troubleshooting/networking/fail2ban-imap-self-ban-mail-client.md) | Troubleshooting |
|
||||
| 2026-03-14 | [SearXNG: Private Self-Hosted Search](03-opensource/alternatives/searxng.md) | Open Source |
|
||||
| 2026-03-14 | [FreshRSS: Self-Hosted RSS Reader](03-opensource/alternatives/freshrss.md) | Open Source |
|
||||
|
||||
Reference in New Issue
Block a user