wiki: ssh.socket wait-ready gate + mastodon post-install hardening
Two related additions covering the 2026-05-31 cutover-night incidents on majorlinux and majortoot-hetzner. ssh-socket-tailscale-race-condition.md (update Race 1 fix): - After=tailscaled.service Requires=tailscaled.service orders against the service becoming active, not against tailscale0 having an IPv4 — hosts kept losing SSH intermittently after reboots (incident: majorlinux + majortoot-hetzner 2026-05-31, during cutover-night Ansible reboot). - Canonical fix: a oneshot tailscale-wait-ready.service that polls `ip -4 -o addr show tailscale0` until an address is present, with ssh.socket After=/Requires= that service. Document the full evolution (2026-05-19 BindsTo → 2026-05-23 Requires → 2026-05-31 wait-ready) so future readers don't try the half-fixes thinking they're sufficient. - Add majortoot-hetzner to affected hosts. mastodon-post-install-hardening.md (new): Four upstream-install gaps that bit during the majortoot-hetzner cutover: 1. /home/mastodon at 0750 (useradd default) → nginx www-data can't traverse → every static asset 403s → unstyled "purple screen" in the browser while API/HTML still work through the puma proxy. 2. .env.production at 0644 (mastodon-setup default) → DB_PASS, SECRET_KEY_BASE, OTP_SECRET world-readable once gap (1) is fixed. 3. mastodon user shell at /usr/sbin/nologin → `su - mastodon` blocked. 4. rbenv init in .bashrc only → login shells don't source .bashrc; even when chained, Ubuntu's .bashrc returns early for non-interactive shells. Fix: .bash_profile sets up rbenv BEFORE sourcing .profile + .bashrc, so it works for both interactive and non-interactive logins. All four codified in MajorAnsible configure_mastodon_permissions.yml with self-asserting verification steps. 02-selfhosting/index.md + SUMMARY.md: Add a "Services" section to the selfhosting index linking the mastodon-post-install-hardening article (and the other orphaned services/ entries while there). SUMMARY.md gains one new entry. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
parent
73c10111e0
commit
155651c373
4 changed files with 222 additions and 11 deletions
|
|
@ -1,6 +1,6 @@
|
|||
---
|
||||
created: 2026-04-13T10:15
|
||||
updated: 2026-04-30T05:21
|
||||
updated: 2026-05-31
|
||||
---
|
||||
# 🏠 Self-Hosting & Homelab
|
||||
|
||||
|
|
@ -30,6 +30,17 @@ Guides for running your own services at home, including Docker, reverse proxies,
|
|||
- [Tuning Netdata Docker Health Alarms](monitoring/netdata-docker-health-alarm-tuning.md)
|
||||
- [Deploying Netdata to a New Server](monitoring/netdata-new-server-setup.md)
|
||||
|
||||
## Services
|
||||
|
||||
- [Mastodon Instance Tuning](services/mastodon-instance-tuning.md)
|
||||
- [Mastodon Post-Install Hardening (Permissions + Account)](services/mastodon-post-install-hardening.md)
|
||||
- [Mastodon DB Maintenance](services/mastodon-db-maintenance.md)
|
||||
- [Mastodon Federation](services/mastodon-federation.md)
|
||||
- [Mastodon `--prune-profiles` Trap](services/mastodon-prune-profiles-trap.md)
|
||||
- [Ghost SMTP via Mailgun](services/ghost-smtp-mailgun-setup.md)
|
||||
- [Updating n8n Docker](services/updating-n8n-docker.md)
|
||||
- [Claude Code Remote Control](services/claude-code-remote-control.md)
|
||||
|
||||
## Security
|
||||
|
||||
- [Linux Server Hardening Checklist](security/linux-server-hardening-checklist.md)
|
||||
|
|
|
|||
174
02-selfhosting/services/mastodon-post-install-hardening.md
Normal file
174
02-selfhosting/services/mastodon-post-install-hardening.md
Normal file
|
|
@ -0,0 +1,174 @@
|
|||
---
|
||||
title: Mastodon Post-Install Hardening (Permissions + Account)
|
||||
domain: selfhosting
|
||||
category: services
|
||||
tags:
|
||||
- mastodon
|
||||
- fediverse
|
||||
- self-hosting
|
||||
- hardening
|
||||
- ansible
|
||||
- nginx
|
||||
- rbenv
|
||||
status: published
|
||||
created: 2026-05-31
|
||||
updated: 2026-05-31
|
||||
---
|
||||
|
||||
# Mastodon Post-Install Hardening (Permissions + Account)
|
||||
|
||||
Four gaps that the upstream Mastodon install guide doesn't lock down — each silently breaks something or leaves a credential exposed. Found on majortoot-hetzner during its 2026-05-31 cutover; codified in MajorAnsible's `configure_mastodon_permissions.yml`.
|
||||
|
||||
---
|
||||
|
||||
## Gap 1: `/home/mastodon` is `0750` — nginx 403s every asset
|
||||
|
||||
### Symptom
|
||||
|
||||
Browser loads `https://<your-instance>/` and shows an unstyled **purple background with no content** (Mastodon's React entry HTML loaded, but every JS / CSS / manifest request 403'd). API endpoints like `/api/v1/instance` still return 200 because they fall through nginx's `try_files` to the puma proxy — but static assets need direct filesystem access.
|
||||
|
||||
### Cause
|
||||
|
||||
Debian/Ubuntu's `useradd` default umask creates `/home/<user>` as `0750` (owner+group only). nginx runs as `www-data`, which is in neither — it cannot **traverse** into `/home/mastodon/live/public/` to serve `packs/assets/*.js`, manifest.json, etc. The errors land in `/var/log/nginx/error.log`:
|
||||
|
||||
```
|
||||
[crit] stat() "/home/mastodon/live/public/packs/assets/foo.js" failed (13: Permission denied)
|
||||
```
|
||||
|
||||
### Fix
|
||||
|
||||
```bash
|
||||
chmod 0751 /home/mastodon
|
||||
```
|
||||
|
||||
`0751` gives `other` execute (traversal) only, **not read** — files inside that aren't world-readable stay private. Take the opportunity to lock `.env.production` in the next gap.
|
||||
|
||||
---
|
||||
|
||||
## Gap 2: `.env.production` is `0644` — DB_PASS and SECRET_KEY_BASE are world-readable
|
||||
|
||||
### Symptom
|
||||
|
||||
Once Gap 1 is fixed and `/home/mastodon` is traversable, any local user (and any compromised process running as nginx, sidekiq under reduced privileges, a container escape, etc.) can `cat /home/mastodon/live/.env.production` and read every Mastodon secret.
|
||||
|
||||
### Cause
|
||||
|
||||
The `mastodon-setup` interactive wizard writes `.env.production` with default `0644` permissions. The file contains:
|
||||
|
||||
- `DB_PASS` — PostgreSQL password
|
||||
- `SECRET_KEY_BASE` — session cookie signing key
|
||||
- `OTP_SECRET` — 2FA encryption key
|
||||
- SMTP credentials
|
||||
- S3 / object-storage credentials if configured
|
||||
|
||||
### Fix
|
||||
|
||||
```bash
|
||||
chmod 0600 /home/mastodon/live/.env.production
|
||||
chown mastodon:mastodon /home/mastodon/live/.env.production
|
||||
```
|
||||
|
||||
No service restart needed — Rails reads `.env.production` at process boot, not per-request. Existing `puma`, `sidekiq`, and `streaming` services keep running.
|
||||
|
||||
---
|
||||
|
||||
## Gap 3: `mastodon` user shell is `/usr/sbin/nologin` — `su - mastodon` fails
|
||||
|
||||
### Symptom
|
||||
|
||||
```
|
||||
root@majortoot:~# su - mastodon
|
||||
This account is currently not available.
|
||||
```
|
||||
|
||||
Blocks all `tootctl` and Rails console admin via SSH.
|
||||
|
||||
### Cause
|
||||
|
||||
If the user was created with `useradd --system mastodon`, the system-account default is shell `/usr/sbin/nologin`. Mastodon's own installer typically sets `/bin/bash` but a manual / Ansible / Packer build path may have used `--system`.
|
||||
|
||||
### Fix
|
||||
|
||||
```bash
|
||||
usermod -s /bin/bash mastodon
|
||||
```
|
||||
|
||||
Verify with `getent passwd mastodon | cut -d: -f7` → `/bin/bash`.
|
||||
|
||||
---
|
||||
|
||||
## Gap 4: Login shells don't load rbenv — `tootctl` reports "ruby: command not found"
|
||||
|
||||
### Symptom
|
||||
|
||||
After fixing Gap 3, `su - mastodon` succeeds, but:
|
||||
|
||||
```
|
||||
mastodon@majortoot:~$ which ruby
|
||||
(no output, exit 1)
|
||||
mastodon@majortoot:~$ cd /home/mastodon/live && bin/tootctl version
|
||||
/usr/bin/env: 'ruby': No such file or directory
|
||||
```
|
||||
|
||||
### Cause
|
||||
|
||||
A typical Mastodon install puts rbenv init in `~/.bashrc`. But bash **login** shells (which `su -` and `ssh user@host` open) source `.bash_profile`, `.bash_login`, or `.profile` in that order — **not** `.bashrc`. If `.bash_profile` doesn't exist and `.profile` doesn't init rbenv, the login shell never gets rbenv on PATH.
|
||||
|
||||
Even when `.bash_profile` chains `.bashrc`, Ubuntu's default `.bashrc` has a guard at the top:
|
||||
|
||||
```bash
|
||||
case $- in
|
||||
*i*) ;;
|
||||
*) return;;
|
||||
esac
|
||||
```
|
||||
|
||||
This **returns early for non-interactive shells**, which is exactly what `su - mastodon -c "<command>"` opens — so the rbenv init lines later in `.bashrc` are never reached.
|
||||
|
||||
### Fix
|
||||
|
||||
Drop a `.bash_profile` that sets up rbenv **before** sourcing `.bashrc`, so it works for both interactive and non-interactive login shells:
|
||||
|
||||
```bash
|
||||
# /home/mastodon/.bash_profile (mode 0644, owned by mastodon:mastodon)
|
||||
export PATH="$HOME/.rbenv/bin:$HOME/.rbenv/shims:$PATH"
|
||||
if command -v rbenv >/dev/null 2>&1; then
|
||||
eval "$(rbenv init -)"
|
||||
fi
|
||||
|
||||
# Then load POSIX login env + bash interactive config
|
||||
[ -f ~/.profile ] && . ~/.profile
|
||||
[ -f ~/.bashrc ] && . ~/.bashrc
|
||||
```
|
||||
|
||||
Verify:
|
||||
|
||||
```bash
|
||||
su - mastodon -c "ruby -v" # → ruby 3.x.x …
|
||||
su - mastodon -c "cd /home/mastodon/live && RAILS_ENV=production bin/tootctl version"
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Codified
|
||||
|
||||
All four gaps are handled by `configure_mastodon_permissions.yml` in MajorAnsible. The playbook is idempotent, requires no service restart, and includes self-asserting verification steps:
|
||||
|
||||
| Assertion | What it catches |
|
||||
|---|---|
|
||||
| `sudo -u www-data stat /home/mastodon/live/public/packs` must succeed | Gap 1 regression |
|
||||
| `sudo -u www-data cat .env.production` must fail | Gap 2 regression |
|
||||
| `su - mastodon -c "ruby -v"` must succeed and output "ruby" | Gap 3 or 4 regression |
|
||||
|
||||
Apply to all Mastodon hosts:
|
||||
|
||||
```bash
|
||||
ansible-playbook configure_mastodon_permissions.yml
|
||||
```
|
||||
|
||||
## References
|
||||
|
||||
- [[majortoot#2026-05-31 — ssh.socket race post-reboot on majortoot-hetzner (during cutover night)]]
|
||||
- [[majortoot#tootctl CLI Note]]
|
||||
- MajorAnsible: `configure_mastodon_permissions.yml`
|
||||
- Related: [[mastodon-instance-tuning|Mastodon Instance Tuning]] · [[mastodon-db-maintenance|Mastodon DB Maintenance]]
|
||||
|
|
@ -27,38 +27,61 @@ journalctl -b -1 -u ssh # likely empty — sshd never spawned
|
|||
journalctl -b -1 -u ssh.socket # socket started before tailscaled
|
||||
```
|
||||
|
||||
### Fix
|
||||
### Fix (current — 2026-05-31)
|
||||
|
||||
Add Tailscale dependency to the socket override:
|
||||
`After=tailscaled.service` orders against the service becoming `active` — **not** against the `tailscale0` interface actually having an IPv4 address. tailscaled flips to active within a second of starting, but the kernel doesn't have the address bound to the interface until DERP relays connect and the control plane confirms the node. ssh.socket attempting `ListenStream=<TS IP>:22` in that window fails with `Cannot assign requested address`, the socket goes into a failed state, and there is no automatic retry.
|
||||
|
||||
The proper gate is a dedicated readiness service that **waits for the tailscale0 IPv4 address to exist** before letting ssh.socket bind:
|
||||
|
||||
```ini
|
||||
# /etc/systemd/system/tailscale-wait-ready.service
|
||||
[Unit]
|
||||
Description=Wait until tailscale0 has an IPv4 address
|
||||
After=tailscaled.service
|
||||
Requires=tailscaled.service
|
||||
ConditionPathExists=/usr/sbin/ip
|
||||
|
||||
[Service]
|
||||
Type=oneshot
|
||||
RemainAfterExit=yes
|
||||
TimeoutStartSec=120
|
||||
ExecStart=/usr/bin/bash -c 'for i in $(seq 1 120); do ip -4 -o addr show tailscale0 2>/dev/null | grep -q "inet " && exit 0; sleep 1; done; exit 1'
|
||||
|
||||
[Install]
|
||||
WantedBy=multi-user.target
|
||||
```
|
||||
|
||||
```ini
|
||||
# /etc/systemd/system/ssh.socket.d/override.conf
|
||||
[Unit]
|
||||
After=tailscaled.service
|
||||
Requires=tailscaled.service
|
||||
After=tailscale-wait-ready.service
|
||||
Requires=tailscale-wait-ready.service
|
||||
|
||||
[Socket]
|
||||
ListenStream=
|
||||
ListenStream=<TAILSCALE_IP>:22
|
||||
```
|
||||
|
||||
Then reload and restart:
|
||||
Reload + restart:
|
||||
|
||||
```bash
|
||||
systemctl daemon-reload
|
||||
systemctl enable tailscale-wait-ready.service
|
||||
systemctl restart ssh.socket
|
||||
systemctl status ssh.socket # verify Listen: shows correct IP
|
||||
ss -tlnp | grep :22 # verify bound to Tailscale IP
|
||||
```
|
||||
|
||||
- `After=` ensures the socket waits for Tailscale to start
|
||||
- `Requires=` ensures tailscaled must be running for the socket to activate
|
||||
!!! note "Evolution of this fix"
|
||||
- **2026-05-19 v1** — `After=tailscaled.service` + `BindsTo=tailscaled.service`. Worked initially but caused a shutdown-time ordering cycle.
|
||||
- **2026-05-23 v2** — `BindsTo` swapped for `Requires` to break the cycle. Fixed the cycle but did **not** wait for `tailscale0` to actually have an IP — just for `tailscaled` to be active. Hosts continued losing SSH after some reboots (intermittent, depending on whether the race won).
|
||||
- **2026-05-31 v3** — Added `tailscale-wait-ready.service` to gate ssh.socket on the interface having an address. This is the current canonical fix.
|
||||
|
||||
!!! warning "Do NOT use BindsTo"
|
||||
`BindsTo=tailscaled.service` creates a **systemd ordering cycle** during shutdown: `basic.target → sockets.target → ssh.socket → tailscaled.service → basic.target`. Systemd breaks the cycle by deleting jobs unpredictably, which can prevent `ssh.socket` from starting on the next boot — leaving SSH dead until manual intervention. This was discovered on 2026-05-23 after the original fix (2026-05-19) used `BindsTo` and caused a second outage on dcaprod-hetzner. `Requires` provides the startup dependency without the dangerous bidirectional lifecycle coupling.
|
||||
`BindsTo=tailscaled.service` creates a **systemd ordering cycle** during shutdown: `basic.target → sockets.target → ssh.socket → tailscaled.service → basic.target`. Systemd breaks the cycle by deleting jobs unpredictably, which can prevent `ssh.socket` from starting on the next boot. Use `Requires=` for startup ordering without the bidirectional lifecycle coupling.
|
||||
|
||||
### Affected Hosts
|
||||
|
||||
Ubuntu hosts using `configure_tailscale_ssh_only.yml`: majorlinux, dcaprod-hetzner, tttpod-hetzner. Fedora hosts (majordiscord) use firewall rules for SSH restriction — not affected by this race.
|
||||
Ubuntu hosts using `configure_tailscale_ssh_only.yml`: majorlinux, dcaprod-hetzner, tttpod-hetzner, majortoot-hetzner. Fedora hosts (majordiscord) use firewall rules for SSH restriction — not affected by this race.
|
||||
|
||||
---
|
||||
|
||||
|
|
@ -120,4 +143,6 @@ All hosts where Tailscale is the primary access path. Particularly impactful on
|
|||
- [[majordiscord#2026-05-19 — Tailscale boot race: unreachable after Ansible reboot]]
|
||||
- [[majorlinux#2026-05-19 — ssh.socket override patched: added Tailscale dependency]]
|
||||
- [[dcaprod#2026-05-23 — SSH unreachable again: BindsTo ordering cycle in ssh.socket override]]
|
||||
- [[majorlinux#2026-05-31 — ssh.socket race recurrence post-reboot (Requires= insufficient; added wait-ready gate)]]
|
||||
- [[majortoot#2026-05-31 — ssh.socket race post-reboot on majortoot-hetzner (during cutover night)]]
|
||||
- Ansible: `configure_tailscale_ssh_only.yml`, `configure_tailscale_network_wait.yml`
|
||||
|
|
|
|||
|
|
@ -38,6 +38,7 @@ updated: 2026-05-15T09:00
|
|||
* [Logwatch Fleet Setup — Surviving Package Upgrades](02-selfhosting/monitoring/logwatch-fleet-setup.md)
|
||||
* [Updating n8n Running in Docker](02-selfhosting/services/updating-n8n-docker.md)
|
||||
* [Mastodon Instance Tuning](02-selfhosting/services/mastodon-instance-tuning.md)
|
||||
* [Mastodon Post-Install Hardening (Permissions + Account)](02-selfhosting/services/mastodon-post-install-hardening.md)
|
||||
* [Mastodon — The `--prune-profiles` Trap and How to Recover](02-selfhosting/services/mastodon-prune-profiles-trap.md)
|
||||
* [Ghost Email Configuration with Mailgun](02-selfhosting/services/ghost-smtp-mailgun-setup.md)
|
||||
* [Claude Code Remote Control — Mobile Access to a Persistent Host Session](02-selfhosting/services/claude-code-remote-control.md)
|
||||
|
|
|
|||
Loading…
Add table
Reference in a new issue