Compare commits
2 Commits
335c4b57f2
...
1b801a9590
| Author | SHA1 | Date | |
|---|---|---|---|
| 1b801a9590 | |||
| 598e6fa26a |
@@ -154,7 +154,7 @@ alias majorlab='ssh root@100.86.14.126'
|
||||
alias majormail='ssh root@100.84.165.52'
|
||||
alias teelia='ssh root@100.120.32.69'
|
||||
alias tttpod='ssh root@100.84.42.102'
|
||||
alias majorrig='ssh -p 2222 majorlinux@100.98.47.29'
|
||||
alias majorrig='ssh majorlinux@100.98.47.29' # port 2222 retired 2026-03-25, fleet uses port 22
|
||||
|
||||
# DNF5
|
||||
alias update='sudo dnf upgrade --refresh'
|
||||
|
||||
157
02-selfhosting/docker/docker-healthchecks.md
Normal file
157
02-selfhosting/docker/docker-healthchecks.md
Normal file
@@ -0,0 +1,157 @@
|
||||
---
|
||||
title: "Docker Healthchecks"
|
||||
domain: selfhosting
|
||||
category: docker
|
||||
tags: [docker, healthcheck, monitoring, uptime-kuma, compose]
|
||||
status: published
|
||||
created: 2026-03-23
|
||||
updated: 2026-03-23
|
||||
---
|
||||
|
||||
# Docker Healthchecks
|
||||
|
||||
A Docker healthcheck tells the daemon (and any monitoring tool) whether a container is actually working — not just running. Without one, a container shows as `Up` even if the app inside is crashed, deadlocked, or waiting on a dependency.
|
||||
|
||||
## Why It Matters
|
||||
|
||||
Tools like Uptime Kuma report containers without healthchecks as:
|
||||
|
||||
> Container has not reported health and is currently running. As it is running, it is considered UP. Consider adding a health check for better service visibility.
|
||||
|
||||
A healthcheck upgrades that to a real `(healthy)` or `(unhealthy)` status, making monitoring meaningful.
|
||||
|
||||
## Basic Syntax (docker-compose)
|
||||
|
||||
```yaml
|
||||
healthcheck:
|
||||
test: ["CMD", "wget", "-q", "--spider", "http://localhost:8080/health"]
|
||||
interval: 30s
|
||||
timeout: 10s
|
||||
retries: 3
|
||||
start_period: 30s
|
||||
```
|
||||
|
||||
| Field | Description |
|
||||
|---|---|
|
||||
| `test` | Command to run. Exit 0 = healthy, non-zero = unhealthy. |
|
||||
| `interval` | How often to run the check. |
|
||||
| `timeout` | How long to wait before marking as failed. |
|
||||
| `retries` | Failures before marking `unhealthy`. |
|
||||
| `start_period` | Grace period on startup before failures count. |
|
||||
|
||||
## Common Patterns
|
||||
|
||||
### HTTP service (wget — available in Alpine)
|
||||
```yaml
|
||||
healthcheck:
|
||||
test: ["CMD", "wget", "-q", "--spider", "http://localhost:2368/"]
|
||||
interval: 30s
|
||||
timeout: 10s
|
||||
retries: 3
|
||||
start_period: 30s
|
||||
```
|
||||
|
||||
### HTTP service (curl)
|
||||
```yaml
|
||||
healthcheck:
|
||||
test: ["CMD", "curl", "-f", "http://localhost:8080/health"]
|
||||
interval: 30s
|
||||
timeout: 10s
|
||||
retries: 3
|
||||
start_period: 30s
|
||||
```
|
||||
|
||||
### MySQL / MariaDB
|
||||
```yaml
|
||||
healthcheck:
|
||||
test: ["CMD", "mysqladmin", "ping", "-h", "localhost", "-u", "root", "-psecret"]
|
||||
interval: 10s
|
||||
timeout: 5s
|
||||
retries: 3
|
||||
start_period: 20s
|
||||
```
|
||||
|
||||
### PostgreSQL
|
||||
```yaml
|
||||
healthcheck:
|
||||
test: ["CMD-SHELL", "pg_isready -U postgres"]
|
||||
interval: 10s
|
||||
timeout: 5s
|
||||
retries: 5
|
||||
```
|
||||
|
||||
### Redis
|
||||
```yaml
|
||||
healthcheck:
|
||||
test: ["CMD", "redis-cli", "ping"]
|
||||
interval: 10s
|
||||
timeout: 5s
|
||||
retries: 3
|
||||
```
|
||||
|
||||
### TCP port check (no curl/wget available)
|
||||
```yaml
|
||||
healthcheck:
|
||||
test: ["CMD-SHELL", "nc -z localhost 8080 || exit 1"]
|
||||
interval: 30s
|
||||
timeout: 5s
|
||||
retries: 3
|
||||
```
|
||||
|
||||
## Using Healthchecks with `depends_on`
|
||||
|
||||
Healthchecks enable proper startup ordering. Instead of a fixed sleep, a dependent container waits until its dependency is actually ready:
|
||||
|
||||
```yaml
|
||||
services:
|
||||
app:
|
||||
depends_on:
|
||||
db:
|
||||
condition: service_healthy
|
||||
|
||||
db:
|
||||
image: mysql:8.0
|
||||
healthcheck:
|
||||
test: ["CMD", "mysqladmin", "ping", "-h", "localhost"]
|
||||
interval: 10s
|
||||
timeout: 5s
|
||||
retries: 3
|
||||
start_period: 20s
|
||||
```
|
||||
|
||||
This prevents the classic race condition where the app starts before the database is ready to accept connections.
|
||||
|
||||
## Checking Health Status
|
||||
|
||||
```bash
|
||||
# See health status in container list
|
||||
docker ps
|
||||
|
||||
# Get detailed health info including last check output
|
||||
docker inspect --format='{{json .State.Health}}' <container> | jq
|
||||
```
|
||||
|
||||
## Ghost Example
|
||||
|
||||
Ghost (Alpine-based) uses `wget` rather than `curl`:
|
||||
|
||||
```yaml
|
||||
healthcheck:
|
||||
test: ["CMD", "wget", "-q", "--spider", "http://localhost:2368/ghost/api/v4/admin/site/"]
|
||||
interval: 30s
|
||||
timeout: 10s
|
||||
retries: 3
|
||||
start_period: 30s
|
||||
```
|
||||
|
||||
## Gotchas & Notes
|
||||
|
||||
- **Alpine images** don't have `curl` by default — use `wget` or install curl in the image.
|
||||
- **`start_period`** is critical for slow-starting apps (databases, JVM services). Failures during this window don't count toward `retries`.
|
||||
- **`CMD` vs `CMD-SHELL`** — use `CMD` for direct exec (no shell needed), `CMD-SHELL` when you need pipes, `&&`, or shell builtins.
|
||||
- **Uptime Kuma** will pick up Docker healthcheck status automatically when monitoring via the Docker socket — no extra config needed.
|
||||
|
||||
## See Also
|
||||
|
||||
- [[debugging-broken-docker-containers]]
|
||||
- [[netdata-docker-health-alarm-tuning]]
|
||||
137
02-selfhosting/monitoring/netdata-selinux-avc-chart.md
Normal file
137
02-selfhosting/monitoring/netdata-selinux-avc-chart.md
Normal file
@@ -0,0 +1,137 @@
|
||||
---
|
||||
title: "Netdata SELinux AVC Denial Monitoring"
|
||||
domain: selfhosting
|
||||
category: monitoring
|
||||
tags: [netdata, selinux, fedora, monitoring, ausearch, charts.d]
|
||||
status: published
|
||||
created: 2026-03-27
|
||||
updated: 2026-03-27
|
||||
---
|
||||
|
||||
# Netdata SELinux AVC Denial Monitoring
|
||||
|
||||
A custom `charts.d` plugin that tracks SELinux AVC denials over time via Netdata. Deployed on all Fedora boxes in the fleet where SELinux is Enforcing.
|
||||
|
||||
## What It Does
|
||||
|
||||
The plugin runs `ausearch -m avc` every 60 seconds and reports the count of AVC denial events from the last 10 minutes. This gives a real-time chart in Netdata Cloud showing SELinux denial spikes — useful for catching misconfigurations after service changes or package updates.
|
||||
|
||||
## Where It's Deployed
|
||||
|
||||
| Host | OS | SELinux | Chart Installed |
|
||||
|------|----|---------|-----------------|
|
||||
| majorhome | Fedora 43 | Enforcing | Yes |
|
||||
| majorlab | Fedora 43 | Enforcing | Yes |
|
||||
| majormail | Fedora 43 | Enforcing | Yes |
|
||||
| majordiscord | Fedora 43 | Enforcing | Yes |
|
||||
|
||||
Ubuntu hosts (dca, teelia, tttpod, majortoot, majorlinux) do not run SELinux and do not have this chart.
|
||||
|
||||
## Installation
|
||||
|
||||
### 1. Create the Chart Plugin
|
||||
|
||||
Create `/etc/netdata/charts.d/selinux.chart.sh`:
|
||||
|
||||
```bash
|
||||
cat > /etc/netdata/charts.d/selinux.chart.sh << 'EOF'
|
||||
# SELinux AVC denial counter for Netdata charts.d
|
||||
selinux_update_every=60
|
||||
selinux_priority=90000
|
||||
|
||||
selinux_check() {
|
||||
which ausearch >/dev/null 2>&1 || return 1
|
||||
return 0
|
||||
}
|
||||
|
||||
selinux_create() {
|
||||
cat <<CHART
|
||||
CHART selinux.avc_denials '' 'SELinux AVC Denials (last 10 min)' 'denials' selinux '' line 90000 $selinux_update_every ''
|
||||
DIMENSION denials '' absolute 1 1
|
||||
CHART
|
||||
return 0
|
||||
}
|
||||
|
||||
selinux_update() {
|
||||
local count
|
||||
count=$(sudo /usr/bin/ausearch -m avc -if /var/log/audit/audit.log -ts recent 2>/dev/null | grep -c "type=AVC")
|
||||
echo "BEGIN selinux.avc_denials $1"
|
||||
echo "SET denials = ${count}"
|
||||
echo "END"
|
||||
return 0
|
||||
}
|
||||
EOF
|
||||
```
|
||||
|
||||
### 2. Grant Netdata Sudo Access to ausearch
|
||||
|
||||
`ausearch` requires root to read the audit log. Add a sudoers entry for the `netdata` user:
|
||||
|
||||
```bash
|
||||
echo 'netdata ALL=(root) NOPASSWD: /usr/bin/ausearch -m avc -if /var/log/audit/audit.log -ts recent' > /etc/sudoers.d/netdata-selinux
|
||||
chmod 440 /etc/sudoers.d/netdata-selinux
|
||||
visudo -c
|
||||
```
|
||||
|
||||
The `visudo -c` validates syntax. If it reports errors, fix the file before proceeding — a broken sudoers file can lock out sudo entirely.
|
||||
|
||||
### 3. Restart Netdata
|
||||
|
||||
```bash
|
||||
systemctl restart netdata
|
||||
```
|
||||
|
||||
### 4. Verify
|
||||
|
||||
Check that the chart is collecting data:
|
||||
|
||||
```bash
|
||||
curl -s 'http://localhost:19999/api/v1/chart?chart=selinux.avc_denials' | python3 -c "
|
||||
import sys, json
|
||||
d = json.load(sys.stdin)
|
||||
print(f'Chart: {d[\"id\"]}')
|
||||
print(f'Update every: {d[\"update_every\"]}s')
|
||||
print(f'Type: {d[\"chart_type\"]}')
|
||||
"
|
||||
```
|
||||
|
||||
If the chart doesn't appear, check that `charts.d` is enabled in `/etc/netdata/netdata.conf` and that the plugin file is readable by the `netdata` user.
|
||||
|
||||
## Known Side Effect: pam_systemd Log Noise
|
||||
|
||||
Because the `netdata` user calls `sudo ausearch` every 60 seconds, `pam_systemd` logs a warning each time:
|
||||
|
||||
```
|
||||
pam_systemd(sudo:session): Failed to check if /run/user/0/bus exists, ignoring: Permission denied
|
||||
```
|
||||
|
||||
This is cosmetic. The `sudo` command succeeds — `pam_systemd` just can't find a D-Bus user session for the `netdata` service account, which is expected. The message volume scales with the collection interval (1,440/day at 60-second intervals).
|
||||
|
||||
**To suppress it**, the `system-auth` PAM config on Fedora already marks `pam_systemd.so` as `-session optional` (the `-` prefix means "don't fail if the module errors"). The messages are informational log noise, not actual failures. No PAM changes are needed.
|
||||
|
||||
If the log volume is a concern for log analysis or monitoring, filter it at the journald level:
|
||||
|
||||
```ini
|
||||
# /etc/rsyslog.d/suppress-pam-systemd.conf
|
||||
:msg, contains, "pam_systemd(sudo:session): Failed to check" stop
|
||||
```
|
||||
|
||||
Or in Netdata's log alert config, exclude the pattern from any log-based alerts.
|
||||
|
||||
## Fleet Audit
|
||||
|
||||
To verify the chart is deployed and functioning on all Fedora hosts:
|
||||
|
||||
```bash
|
||||
for host in majorhome majorlab majormail majordiscord; do
|
||||
echo -n "=== $host: "
|
||||
ssh root@$host "curl -s 'http://localhost:19999/api/v1/chart?chart=selinux.avc_denials' 2>/dev/null | python3 -c 'import sys,json; d=json.load(sys.stdin); print(d[\"id\"], \"every\", str(d[\"update_every\"])+\"s\")' 2>/dev/null || echo 'NOT FOUND'"
|
||||
done
|
||||
```
|
||||
|
||||
## Related
|
||||
|
||||
- [Deploying Netdata to a New Server](netdata-new-server-setup.md)
|
||||
- [Tuning Netdata Web Log Alerts](tuning-netdata-web-log-alerts.md)
|
||||
- [Tuning Netdata Docker Health Alarms](netdata-docker-health-alarm-tuning.md)
|
||||
- [SELinux: Fixing Dovecot Mail Spool Context](/05-troubleshooting/selinux-dovecot-vmail-context.md)
|
||||
59
05-troubleshooting/ansible-vault-password-file-missing.md
Normal file
59
05-troubleshooting/ansible-vault-password-file-missing.md
Normal file
@@ -0,0 +1,59 @@
|
||||
# Ansible: Vault Password File Not Found
|
||||
|
||||
## Error
|
||||
|
||||
```
|
||||
[WARNING]: Error getting vault password file (default): The vault password file /Users/majorlinux/.ansible/vault_pass was not found
|
||||
[ERROR]: The vault password file /Users/majorlinux/.ansible/vault_pass was not found
|
||||
```
|
||||
|
||||
## Cause
|
||||
|
||||
Ansible is configured to look for a vault password file at `~/.ansible/vault_pass`, but the file does not exist. This is typically set in `ansible.cfg` via the `vault_password_file` directive.
|
||||
|
||||
## Solutions
|
||||
|
||||
### Option 1: Remove the vault config (if you're not using Vault)
|
||||
|
||||
Check your `ansible.cfg` for this line and remove it if Vault is not needed:
|
||||
|
||||
```ini
|
||||
[defaults]
|
||||
vault_password_file = ~/.ansible/vault_pass
|
||||
```
|
||||
|
||||
### Option 2: Create the vault password file
|
||||
|
||||
```bash
|
||||
echo 'your_vault_password' > ~/.ansible/vault_pass
|
||||
chmod 600 ~/.ansible/vault_pass
|
||||
```
|
||||
|
||||
> **Security note:** Keep permissions tight (`600`) so only your user can read the file. The actual vault password is stored in Bitwarden under the "Ansible Vault Password" entry.
|
||||
|
||||
### Option 3: Pass the password at runtime (no file needed)
|
||||
|
||||
```bash
|
||||
ansible-playbook test.yml --ask-vault-pass
|
||||
```
|
||||
|
||||
## Diagnosing the Source of the Config
|
||||
|
||||
To find which config file is setting `vault_password_file`, run:
|
||||
|
||||
```bash
|
||||
ansible-config dump --only-changed
|
||||
```
|
||||
|
||||
This shows all non-default config values and their source files. Config is loaded in this order of precedence:
|
||||
|
||||
1. `ANSIBLE_CONFIG` environment variable
|
||||
2. `./ansible.cfg` (current directory)
|
||||
3. `~/.ansible.cfg`
|
||||
4. `/etc/ansible/ansible.cfg`
|
||||
|
||||
## Related
|
||||
|
||||
- [Ansible Getting Started](../01-linux/shell-scripting/ansible-getting-started.md)
|
||||
- Vault password is stored in Bitwarden under **"Ansible Vault Password"**
|
||||
- Ansible playbooks live at `~/MajorAnsible` on MajorAir/MajorMac
|
||||
@@ -48,7 +48,7 @@ The Windows OpenSSH Server is installed as a Windows Feature (`Add-WindowsCapabi
|
||||
|
||||
- **This is a Windows-side issue** — WSL2 itself is unaffected. The service must be started and configured from Windows, not from within WSL2.
|
||||
- **Elevated PowerShell required** — `Start-Service` and `Set-Service` for sshd will return "Access is denied" if run without Administrator privileges.
|
||||
- **Port 2222 is also affected** — both the standard port 22 and the bypass port 2222 on MajorRig are served by the same `sshd` service.
|
||||
- **Port 2222 was retired (2026-03-25)** — the bypass port 2222 on MajorRig is no longer in use. The entire fleet now uses port 22 uniformly after the Tailscale SSH auth fix. Only port 22 needs to be verified when troubleshooting sshd.
|
||||
- **Default shell still works once fixed** — MajorRig's sshd is configured to use `C:\Windows\System32\wsl.exe` as the default shell, dropping SSH sessions directly into WSL2/Bash. This config is preserved across service restarts.
|
||||
|
||||
---
|
||||
|
||||
@@ -15,12 +15,15 @@
|
||||
* [Self-Hosting Starter Guide](02-selfhosting/docker/self-hosting-starter-guide.md)
|
||||
* [Docker vs VMs for the Homelab](02-selfhosting/docker/docker-vs-vms-homelab.md)
|
||||
* [Debugging Broken Docker Containers](02-selfhosting/docker/debugging-broken-docker-containers.md)
|
||||
* [Docker Healthchecks](02-selfhosting/docker/docker-healthchecks.md)
|
||||
* [Setting Up Caddy as a Reverse Proxy](02-selfhosting/reverse-proxy/setting-up-caddy-reverse-proxy.md)
|
||||
* [Tailscale for Homelab Remote Access](02-selfhosting/dns-networking/tailscale-homelab-remote-access.md)
|
||||
* [rsync Backup Patterns](02-selfhosting/storage-backup/rsync-backup-patterns.md)
|
||||
* [Tuning Netdata Web Log Alerts](02-selfhosting/monitoring/tuning-netdata-web-log-alerts.md)
|
||||
* [Tuning Netdata Docker Health Alarms](02-selfhosting/monitoring/netdata-docker-health-alarm-tuning.md)
|
||||
* [Deploying Netdata to a New Server](02-selfhosting/monitoring/netdata-new-server-setup.md)
|
||||
* [Netdata SELinux AVC Denial Monitoring](02-selfhosting/monitoring/netdata-selinux-avc-chart.md)
|
||||
* [Netdata n8n Enriched Alert Emails](02-selfhosting/monitoring/netdata-n8n-enriched-alerts.md)
|
||||
* [Linux Server Hardening Checklist](02-selfhosting/security/linux-server-hardening-checklist.md)
|
||||
* [Standardizing unattended-upgrades with Ansible](02-selfhosting/security/ansible-unattended-upgrades-fleet.md)
|
||||
* [Open Source & Alternatives](03-opensource/index.md)
|
||||
@@ -54,3 +57,4 @@
|
||||
* [Windows OpenSSH Server (sshd) Stops After Reboot](05-troubleshooting/networking/windows-sshd-stops-after-reboot.md)
|
||||
* [Ollama Drops Off Tailscale When Mac Sleeps](05-troubleshooting/ollama-macos-sleep-tailscale-disconnect.md)
|
||||
* [ClamAV CPU Spike: Safe Scheduling with nice/ionice](05-troubleshooting/security/clamscan-cpu-spike-nice-ionice.md)
|
||||
* [Ansible: Vault Password File Not Found](05-troubleshooting/ansible-vault-password-file-missing.md)
|
||||
|
||||
13
index.md
13
index.md
@@ -2,8 +2,8 @@
|
||||
|
||||
> A growing reference of Linux, self-hosting, open source, streaming, and troubleshooting guides. Written by MajorLinux. Used by MajorTwin.
|
||||
>
|
||||
> **Last updated:** 2026-03-18
|
||||
> **Article count:** 49
|
||||
> **Last updated:** 2026-03-27
|
||||
> **Article count:** 53
|
||||
|
||||
## Domains
|
||||
|
||||
@@ -13,7 +13,8 @@
|
||||
| 🏠 Self-Hosting & Homelab | `02-selfhosting/` | 11 |
|
||||
| 🔓 Open Source Tools | `03-opensource/` | 9 |
|
||||
| 🎙️ Streaming & Podcasting | `04-streaming/` | 2 |
|
||||
| 🔧 General Troubleshooting | `05-troubleshooting/` | 16 |
|
||||
| 🔧 General Troubleshooting | `05-troubleshooting/` | 17 |
|
||||
|
||||
|
||||
---
|
||||
|
||||
@@ -123,6 +124,8 @@
|
||||
- [mdadm RAID Recovery After USB Hub Disconnect](05-troubleshooting/storage/mdadm-usb-hub-disconnect-recovery.md) — diagnosing and recovering a failed mdadm array caused by a USB hub dropout
|
||||
- [Windows OpenSSH Server (sshd) Stops After Reboot](05-troubleshooting/networking/windows-sshd-stops-after-reboot.md) — fixing sshd not running after reboot due to Manual startup type
|
||||
- [Ollama Drops Off Tailscale When Mac Sleeps](05-troubleshooting/ollama-macos-sleep-tailscale-disconnect.md) — keeping Ollama reachable over Tailscale by disabling macOS sleep on AC power
|
||||
- [Ansible: Vault Password File Not Found](05-troubleshooting/ansible-vault-password-file-missing.md) — fixing the missing vault_pass file error when running ansible-playbook
|
||||
|
||||
|
||||
---
|
||||
|
||||
@@ -130,6 +133,10 @@
|
||||
|
||||
| Date | Article | Domain |
|
||||
|---|---|---|
|
||||
<<<<<<< HEAD
|
||||
| 2026-03-23 | [Ansible: Vault Password File Not Found](05-troubleshooting/ansible-vault-password-file-missing.md) | Troubleshooting |
|
||||
=======
|
||||
>>>>>>> 335c4b57f20799b3a968460f4f6aa17a8b706fdc
|
||||
| 2026-03-18 | [Deploying Netdata to a New Server](02-selfhosting/monitoring/netdata-new-server-setup.md) | Self-Hosting |
|
||||
| 2026-03-18 | [Tuning Netdata Docker Health Alarms](02-selfhosting/monitoring/netdata-docker-health-alarm-tuning.md) | Self-Hosting |
|
||||
| 2026-03-17 | [Ollama Drops Off Tailscale When Mac Sleeps](05-troubleshooting/ollama-macos-sleep-tailscale-disconnect.md) | Troubleshooting |
|
||||
|
||||
Reference in New Issue
Block a user