From 155651c373fe1d3f0d5520e18b1df1c64cafc2c7 Mon Sep 17 00:00:00 2001 From: MajorLinux Date: Sun, 31 May 2026 11:08:24 -0400 Subject: [PATCH] wiki: ssh.socket wait-ready gate + mastodon post-install hardening MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Two related additions covering the 2026-05-31 cutover-night incidents on majorlinux and majortoot-hetzner. ssh-socket-tailscale-race-condition.md (update Race 1 fix): - After=tailscaled.service Requires=tailscaled.service orders against the service becoming active, not against tailscale0 having an IPv4 — hosts kept losing SSH intermittently after reboots (incident: majorlinux + majortoot-hetzner 2026-05-31, during cutover-night Ansible reboot). - Canonical fix: a oneshot tailscale-wait-ready.service that polls `ip -4 -o addr show tailscale0` until an address is present, with ssh.socket After=/Requires= that service. Document the full evolution (2026-05-19 BindsTo → 2026-05-23 Requires → 2026-05-31 wait-ready) so future readers don't try the half-fixes thinking they're sufficient. - Add majortoot-hetzner to affected hosts. mastodon-post-install-hardening.md (new): Four upstream-install gaps that bit during the majortoot-hetzner cutover: 1. /home/mastodon at 0750 (useradd default) → nginx www-data can't traverse → every static asset 403s → unstyled "purple screen" in the browser while API/HTML still work through the puma proxy. 2. .env.production at 0644 (mastodon-setup default) → DB_PASS, SECRET_KEY_BASE, OTP_SECRET world-readable once gap (1) is fixed. 3. mastodon user shell at /usr/sbin/nologin → `su - mastodon` blocked. 4. rbenv init in .bashrc only → login shells don't source .bashrc; even when chained, Ubuntu's .bashrc returns early for non-interactive shells. Fix: .bash_profile sets up rbenv BEFORE sourcing .profile + .bashrc, so it works for both interactive and non-interactive logins. All four codified in MajorAnsible configure_mastodon_permissions.yml with self-asserting verification steps. 02-selfhosting/index.md + SUMMARY.md: Add a "Services" section to the selfhosting index linking the mastodon-post-install-hardening article (and the other orphaned services/ entries while there). SUMMARY.md gains one new entry. Co-Authored-By: Claude Opus 4.7 (1M context) --- 02-selfhosting/index.md | 13 +- .../mastodon-post-install-hardening.md | 174 ++++++++++++++++++ .../ssh-socket-tailscale-race-condition.md | 45 ++++- SUMMARY.md | 1 + 4 files changed, 222 insertions(+), 11 deletions(-) create mode 100644 02-selfhosting/services/mastodon-post-install-hardening.md diff --git a/02-selfhosting/index.md b/02-selfhosting/index.md index 362544e..8ff230c 100644 --- a/02-selfhosting/index.md +++ b/02-selfhosting/index.md @@ -1,6 +1,6 @@ --- created: 2026-04-13T10:15 -updated: 2026-04-30T05:21 +updated: 2026-05-31 --- # 🏠 Self-Hosting & Homelab @@ -30,6 +30,17 @@ Guides for running your own services at home, including Docker, reverse proxies, - [Tuning Netdata Docker Health Alarms](monitoring/netdata-docker-health-alarm-tuning.md) - [Deploying Netdata to a New Server](monitoring/netdata-new-server-setup.md) +## Services + +- [Mastodon Instance Tuning](services/mastodon-instance-tuning.md) +- [Mastodon Post-Install Hardening (Permissions + Account)](services/mastodon-post-install-hardening.md) +- [Mastodon DB Maintenance](services/mastodon-db-maintenance.md) +- [Mastodon Federation](services/mastodon-federation.md) +- [Mastodon `--prune-profiles` Trap](services/mastodon-prune-profiles-trap.md) +- [Ghost SMTP via Mailgun](services/ghost-smtp-mailgun-setup.md) +- [Updating n8n Docker](services/updating-n8n-docker.md) +- [Claude Code Remote Control](services/claude-code-remote-control.md) + ## Security - [Linux Server Hardening Checklist](security/linux-server-hardening-checklist.md) diff --git a/02-selfhosting/services/mastodon-post-install-hardening.md b/02-selfhosting/services/mastodon-post-install-hardening.md new file mode 100644 index 0000000..56b77f0 --- /dev/null +++ b/02-selfhosting/services/mastodon-post-install-hardening.md @@ -0,0 +1,174 @@ +--- +title: Mastodon Post-Install Hardening (Permissions + Account) +domain: selfhosting +category: services +tags: + - mastodon + - fediverse + - self-hosting + - hardening + - ansible + - nginx + - rbenv +status: published +created: 2026-05-31 +updated: 2026-05-31 +--- + +# Mastodon Post-Install Hardening (Permissions + Account) + +Four gaps that the upstream Mastodon install guide doesn't lock down — each silently breaks something or leaves a credential exposed. Found on majortoot-hetzner during its 2026-05-31 cutover; codified in MajorAnsible's `configure_mastodon_permissions.yml`. + +--- + +## Gap 1: `/home/mastodon` is `0750` — nginx 403s every asset + +### Symptom + +Browser loads `https:///` and shows an unstyled **purple background with no content** (Mastodon's React entry HTML loaded, but every JS / CSS / manifest request 403'd). API endpoints like `/api/v1/instance` still return 200 because they fall through nginx's `try_files` to the puma proxy — but static assets need direct filesystem access. + +### Cause + +Debian/Ubuntu's `useradd` default umask creates `/home/` as `0750` (owner+group only). nginx runs as `www-data`, which is in neither — it cannot **traverse** into `/home/mastodon/live/public/` to serve `packs/assets/*.js`, manifest.json, etc. The errors land in `/var/log/nginx/error.log`: + +``` +[crit] stat() "/home/mastodon/live/public/packs/assets/foo.js" failed (13: Permission denied) +``` + +### Fix + +```bash +chmod 0751 /home/mastodon +``` + +`0751` gives `other` execute (traversal) only, **not read** — files inside that aren't world-readable stay private. Take the opportunity to lock `.env.production` in the next gap. + +--- + +## Gap 2: `.env.production` is `0644` — DB_PASS and SECRET_KEY_BASE are world-readable + +### Symptom + +Once Gap 1 is fixed and `/home/mastodon` is traversable, any local user (and any compromised process running as nginx, sidekiq under reduced privileges, a container escape, etc.) can `cat /home/mastodon/live/.env.production` and read every Mastodon secret. + +### Cause + +The `mastodon-setup` interactive wizard writes `.env.production` with default `0644` permissions. The file contains: + +- `DB_PASS` — PostgreSQL password +- `SECRET_KEY_BASE` — session cookie signing key +- `OTP_SECRET` — 2FA encryption key +- SMTP credentials +- S3 / object-storage credentials if configured + +### Fix + +```bash +chmod 0600 /home/mastodon/live/.env.production +chown mastodon:mastodon /home/mastodon/live/.env.production +``` + +No service restart needed — Rails reads `.env.production` at process boot, not per-request. Existing `puma`, `sidekiq`, and `streaming` services keep running. + +--- + +## Gap 3: `mastodon` user shell is `/usr/sbin/nologin` — `su - mastodon` fails + +### Symptom + +``` +root@majortoot:~# su - mastodon +This account is currently not available. +``` + +Blocks all `tootctl` and Rails console admin via SSH. + +### Cause + +If the user was created with `useradd --system mastodon`, the system-account default is shell `/usr/sbin/nologin`. Mastodon's own installer typically sets `/bin/bash` but a manual / Ansible / Packer build path may have used `--system`. + +### Fix + +```bash +usermod -s /bin/bash mastodon +``` + +Verify with `getent passwd mastodon | cut -d: -f7` → `/bin/bash`. + +--- + +## Gap 4: Login shells don't load rbenv — `tootctl` reports "ruby: command not found" + +### Symptom + +After fixing Gap 3, `su - mastodon` succeeds, but: + +``` +mastodon@majortoot:~$ which ruby +(no output, exit 1) +mastodon@majortoot:~$ cd /home/mastodon/live && bin/tootctl version +/usr/bin/env: 'ruby': No such file or directory +``` + +### Cause + +A typical Mastodon install puts rbenv init in `~/.bashrc`. But bash **login** shells (which `su -` and `ssh user@host` open) source `.bash_profile`, `.bash_login`, or `.profile` in that order — **not** `.bashrc`. If `.bash_profile` doesn't exist and `.profile` doesn't init rbenv, the login shell never gets rbenv on PATH. + +Even when `.bash_profile` chains `.bashrc`, Ubuntu's default `.bashrc` has a guard at the top: + +```bash +case $- in + *i*) ;; + *) return;; +esac +``` + +This **returns early for non-interactive shells**, which is exactly what `su - mastodon -c ""` opens — so the rbenv init lines later in `.bashrc` are never reached. + +### Fix + +Drop a `.bash_profile` that sets up rbenv **before** sourcing `.bashrc`, so it works for both interactive and non-interactive login shells: + +```bash +# /home/mastodon/.bash_profile (mode 0644, owned by mastodon:mastodon) +export PATH="$HOME/.rbenv/bin:$HOME/.rbenv/shims:$PATH" +if command -v rbenv >/dev/null 2>&1; then + eval "$(rbenv init -)" +fi + +# Then load POSIX login env + bash interactive config +[ -f ~/.profile ] && . ~/.profile +[ -f ~/.bashrc ] && . ~/.bashrc +``` + +Verify: + +```bash +su - mastodon -c "ruby -v" # → ruby 3.x.x … +su - mastodon -c "cd /home/mastodon/live && RAILS_ENV=production bin/tootctl version" +``` + +--- + +## Codified + +All four gaps are handled by `configure_mastodon_permissions.yml` in MajorAnsible. The playbook is idempotent, requires no service restart, and includes self-asserting verification steps: + +| Assertion | What it catches | +|---|---| +| `sudo -u www-data stat /home/mastodon/live/public/packs` must succeed | Gap 1 regression | +| `sudo -u www-data cat .env.production` must fail | Gap 2 regression | +| `su - mastodon -c "ruby -v"` must succeed and output "ruby" | Gap 3 or 4 regression | + +Apply to all Mastodon hosts: + +```bash +ansible-playbook configure_mastodon_permissions.yml +``` + +## References + +- [[majortoot#2026-05-31 — ssh.socket race post-reboot on majortoot-hetzner (during cutover night)]] +- [[majortoot#tootctl CLI Note]] +- MajorAnsible: `configure_mastodon_permissions.yml` +- Related: [[mastodon-instance-tuning|Mastodon Instance Tuning]] · [[mastodon-db-maintenance|Mastodon DB Maintenance]] diff --git a/05-troubleshooting/networking/ssh-socket-tailscale-race-condition.md b/05-troubleshooting/networking/ssh-socket-tailscale-race-condition.md index 656bf0e..f5eb371 100644 --- a/05-troubleshooting/networking/ssh-socket-tailscale-race-condition.md +++ b/05-troubleshooting/networking/ssh-socket-tailscale-race-condition.md @@ -27,38 +27,61 @@ journalctl -b -1 -u ssh # likely empty — sshd never spawned journalctl -b -1 -u ssh.socket # socket started before tailscaled ``` -### Fix +### Fix (current — 2026-05-31) -Add Tailscale dependency to the socket override: +`After=tailscaled.service` orders against the service becoming `active` — **not** against the `tailscale0` interface actually having an IPv4 address. tailscaled flips to active within a second of starting, but the kernel doesn't have the address bound to the interface until DERP relays connect and the control plane confirms the node. ssh.socket attempting `ListenStream=:22` in that window fails with `Cannot assign requested address`, the socket goes into a failed state, and there is no automatic retry. + +The proper gate is a dedicated readiness service that **waits for the tailscale0 IPv4 address to exist** before letting ssh.socket bind: + +```ini +# /etc/systemd/system/tailscale-wait-ready.service +[Unit] +Description=Wait until tailscale0 has an IPv4 address +After=tailscaled.service +Requires=tailscaled.service +ConditionPathExists=/usr/sbin/ip + +[Service] +Type=oneshot +RemainAfterExit=yes +TimeoutStartSec=120 +ExecStart=/usr/bin/bash -c 'for i in $(seq 1 120); do ip -4 -o addr show tailscale0 2>/dev/null | grep -q "inet " && exit 0; sleep 1; done; exit 1' + +[Install] +WantedBy=multi-user.target +``` ```ini # /etc/systemd/system/ssh.socket.d/override.conf [Unit] -After=tailscaled.service -Requires=tailscaled.service +After=tailscale-wait-ready.service +Requires=tailscale-wait-ready.service [Socket] ListenStream= ListenStream=:22 ``` -Then reload and restart: +Reload + restart: ```bash systemctl daemon-reload +systemctl enable tailscale-wait-ready.service systemctl restart ssh.socket -systemctl status ssh.socket # verify Listen: shows correct IP +ss -tlnp | grep :22 # verify bound to Tailscale IP ``` -- `After=` ensures the socket waits for Tailscale to start -- `Requires=` ensures tailscaled must be running for the socket to activate +!!! note "Evolution of this fix" + - **2026-05-19 v1** — `After=tailscaled.service` + `BindsTo=tailscaled.service`. Worked initially but caused a shutdown-time ordering cycle. + - **2026-05-23 v2** — `BindsTo` swapped for `Requires` to break the cycle. Fixed the cycle but did **not** wait for `tailscale0` to actually have an IP — just for `tailscaled` to be active. Hosts continued losing SSH after some reboots (intermittent, depending on whether the race won). + - **2026-05-31 v3** — Added `tailscale-wait-ready.service` to gate ssh.socket on the interface having an address. This is the current canonical fix. !!! warning "Do NOT use BindsTo" - `BindsTo=tailscaled.service` creates a **systemd ordering cycle** during shutdown: `basic.target → sockets.target → ssh.socket → tailscaled.service → basic.target`. Systemd breaks the cycle by deleting jobs unpredictably, which can prevent `ssh.socket` from starting on the next boot — leaving SSH dead until manual intervention. This was discovered on 2026-05-23 after the original fix (2026-05-19) used `BindsTo` and caused a second outage on dcaprod-hetzner. `Requires` provides the startup dependency without the dangerous bidirectional lifecycle coupling. + `BindsTo=tailscaled.service` creates a **systemd ordering cycle** during shutdown: `basic.target → sockets.target → ssh.socket → tailscaled.service → basic.target`. Systemd breaks the cycle by deleting jobs unpredictably, which can prevent `ssh.socket` from starting on the next boot. Use `Requires=` for startup ordering without the bidirectional lifecycle coupling. ### Affected Hosts -Ubuntu hosts using `configure_tailscale_ssh_only.yml`: majorlinux, dcaprod-hetzner, tttpod-hetzner. Fedora hosts (majordiscord) use firewall rules for SSH restriction — not affected by this race. +Ubuntu hosts using `configure_tailscale_ssh_only.yml`: majorlinux, dcaprod-hetzner, tttpod-hetzner, majortoot-hetzner. Fedora hosts (majordiscord) use firewall rules for SSH restriction — not affected by this race. --- @@ -120,4 +143,6 @@ All hosts where Tailscale is the primary access path. Particularly impactful on - [[majordiscord#2026-05-19 — Tailscale boot race: unreachable after Ansible reboot]] - [[majorlinux#2026-05-19 — ssh.socket override patched: added Tailscale dependency]] - [[dcaprod#2026-05-23 — SSH unreachable again: BindsTo ordering cycle in ssh.socket override]] +- [[majorlinux#2026-05-31 — ssh.socket race recurrence post-reboot (Requires= insufficient; added wait-ready gate)]] +- [[majortoot#2026-05-31 — ssh.socket race post-reboot on majortoot-hetzner (during cutover night)]] - Ansible: `configure_tailscale_ssh_only.yml`, `configure_tailscale_network_wait.yml` diff --git a/SUMMARY.md b/SUMMARY.md index 71d265d..f664b60 100644 --- a/SUMMARY.md +++ b/SUMMARY.md @@ -38,6 +38,7 @@ updated: 2026-05-15T09:00 * [Logwatch Fleet Setup — Surviving Package Upgrades](02-selfhosting/monitoring/logwatch-fleet-setup.md) * [Updating n8n Running in Docker](02-selfhosting/services/updating-n8n-docker.md) * [Mastodon Instance Tuning](02-selfhosting/services/mastodon-instance-tuning.md) + * [Mastodon Post-Install Hardening (Permissions + Account)](02-selfhosting/services/mastodon-post-install-hardening.md) * [Mastodon — The `--prune-profiles` Trap and How to Recover](02-selfhosting/services/mastodon-prune-profiles-trap.md) * [Ghost Email Configuration with Mailgun](02-selfhosting/services/ghost-smtp-mailgun-setup.md) * [Claude Code Remote Control — Mobile Access to a Persistent Host Session](02-selfhosting/services/claude-code-remote-control.md)