wiki: ssh.socket wait-ready gate + mastodon post-install hardening

Two related additions covering the 2026-05-31 cutover-night incidents on
majorlinux and majortoot-hetzner.

ssh-socket-tailscale-race-condition.md (update Race 1 fix):
- After=tailscaled.service Requires=tailscaled.service orders against the
  service becoming active, not against tailscale0 having an IPv4 — hosts
  kept losing SSH intermittently after reboots (incident: majorlinux +
  majortoot-hetzner 2026-05-31, during cutover-night Ansible reboot).
- Canonical fix: a oneshot tailscale-wait-ready.service that polls
  `ip -4 -o addr show tailscale0` until an address is present, with
  ssh.socket After=/Requires= that service. Document the full evolution
  (2026-05-19 BindsTo → 2026-05-23 Requires → 2026-05-31 wait-ready) so
  future readers don't try the half-fixes thinking they're sufficient.
- Add majortoot-hetzner to affected hosts.

mastodon-post-install-hardening.md (new):
Four upstream-install gaps that bit during the majortoot-hetzner cutover:
1. /home/mastodon at 0750 (useradd default) → nginx www-data can't
   traverse → every static asset 403s → unstyled "purple screen" in the
   browser while API/HTML still work through the puma proxy.
2. .env.production at 0644 (mastodon-setup default) → DB_PASS,
   SECRET_KEY_BASE, OTP_SECRET world-readable once gap (1) is fixed.
3. mastodon user shell at /usr/sbin/nologin → `su - mastodon` blocked.
4. rbenv init in .bashrc only → login shells don't source .bashrc; even
   when chained, Ubuntu's .bashrc returns early for non-interactive
   shells. Fix: .bash_profile sets up rbenv BEFORE sourcing .profile +
   .bashrc, so it works for both interactive and non-interactive logins.

All four codified in MajorAnsible configure_mastodon_permissions.yml
with self-asserting verification steps.

02-selfhosting/index.md + SUMMARY.md:
Add a "Services" section to the selfhosting index linking the
mastodon-post-install-hardening article (and the other orphaned
services/ entries while there). SUMMARY.md gains one new entry.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
Marcus Summers 2026-05-31 11:08:24 -04:00
parent 73c10111e0
commit 155651c373
4 changed files with 222 additions and 11 deletions

View file

@ -1,6 +1,6 @@
---
created: 2026-04-13T10:15
updated: 2026-04-30T05:21
updated: 2026-05-31
---
# 🏠 Self-Hosting & Homelab
@ -30,6 +30,17 @@ Guides for running your own services at home, including Docker, reverse proxies,
- [Tuning Netdata Docker Health Alarms](monitoring/netdata-docker-health-alarm-tuning.md)
- [Deploying Netdata to a New Server](monitoring/netdata-new-server-setup.md)
## Services
- [Mastodon Instance Tuning](services/mastodon-instance-tuning.md)
- [Mastodon Post-Install Hardening (Permissions + Account)](services/mastodon-post-install-hardening.md)
- [Mastodon DB Maintenance](services/mastodon-db-maintenance.md)
- [Mastodon Federation](services/mastodon-federation.md)
- [Mastodon `--prune-profiles` Trap](services/mastodon-prune-profiles-trap.md)
- [Ghost SMTP via Mailgun](services/ghost-smtp-mailgun-setup.md)
- [Updating n8n Docker](services/updating-n8n-docker.md)
- [Claude Code Remote Control](services/claude-code-remote-control.md)
## Security
- [Linux Server Hardening Checklist](security/linux-server-hardening-checklist.md)

View file

@ -0,0 +1,174 @@
---
title: Mastodon Post-Install Hardening (Permissions + Account)
domain: selfhosting
category: services
tags:
- mastodon
- fediverse
- self-hosting
- hardening
- ansible
- nginx
- rbenv
status: published
created: 2026-05-31
updated: 2026-05-31
---
# Mastodon Post-Install Hardening (Permissions + Account)
Four gaps that the upstream Mastodon install guide doesn't lock down — each silently breaks something or leaves a credential exposed. Found on majortoot-hetzner during its 2026-05-31 cutover; codified in MajorAnsible's `configure_mastodon_permissions.yml`.
---
## Gap 1: `/home/mastodon` is `0750` — nginx 403s every asset
### Symptom
Browser loads `https://<your-instance>/` and shows an unstyled **purple background with no content** (Mastodon's React entry HTML loaded, but every JS / CSS / manifest request 403'd). API endpoints like `/api/v1/instance` still return 200 because they fall through nginx's `try_files` to the puma proxy — but static assets need direct filesystem access.
### Cause
Debian/Ubuntu's `useradd` default umask creates `/home/<user>` as `0750` (owner+group only). nginx runs as `www-data`, which is in neither — it cannot **traverse** into `/home/mastodon/live/public/` to serve `packs/assets/*.js`, manifest.json, etc. The errors land in `/var/log/nginx/error.log`:
```
[crit] stat() "/home/mastodon/live/public/packs/assets/foo.js" failed (13: Permission denied)
```
### Fix
```bash
chmod 0751 /home/mastodon
```
`0751` gives `other` execute (traversal) only, **not read** — files inside that aren't world-readable stay private. Take the opportunity to lock `.env.production` in the next gap.
---
## Gap 2: `.env.production` is `0644` — DB_PASS and SECRET_KEY_BASE are world-readable
### Symptom
Once Gap 1 is fixed and `/home/mastodon` is traversable, any local user (and any compromised process running as nginx, sidekiq under reduced privileges, a container escape, etc.) can `cat /home/mastodon/live/.env.production` and read every Mastodon secret.
### Cause
The `mastodon-setup` interactive wizard writes `.env.production` with default `0644` permissions. The file contains:
- `DB_PASS` — PostgreSQL password
- `SECRET_KEY_BASE` — session cookie signing key
- `OTP_SECRET` — 2FA encryption key
- SMTP credentials
- S3 / object-storage credentials if configured
### Fix
```bash
chmod 0600 /home/mastodon/live/.env.production
chown mastodon:mastodon /home/mastodon/live/.env.production
```
No service restart needed — Rails reads `.env.production` at process boot, not per-request. Existing `puma`, `sidekiq`, and `streaming` services keep running.
---
## Gap 3: `mastodon` user shell is `/usr/sbin/nologin``su - mastodon` fails
### Symptom
```
root@majortoot:~# su - mastodon
This account is currently not available.
```
Blocks all `tootctl` and Rails console admin via SSH.
### Cause
If the user was created with `useradd --system mastodon`, the system-account default is shell `/usr/sbin/nologin`. Mastodon's own installer typically sets `/bin/bash` but a manual / Ansible / Packer build path may have used `--system`.
### Fix
```bash
usermod -s /bin/bash mastodon
```
Verify with `getent passwd mastodon | cut -d: -f7``/bin/bash`.
---
## Gap 4: Login shells don't load rbenv — `tootctl` reports "ruby: command not found"
### Symptom
After fixing Gap 3, `su - mastodon` succeeds, but:
```
mastodon@majortoot:~$ which ruby
(no output, exit 1)
mastodon@majortoot:~$ cd /home/mastodon/live && bin/tootctl version
/usr/bin/env: 'ruby': No such file or directory
```
### Cause
A typical Mastodon install puts rbenv init in `~/.bashrc`. But bash **login** shells (which `su -` and `ssh user@host` open) source `.bash_profile`, `.bash_login`, or `.profile` in that order — **not** `.bashrc`. If `.bash_profile` doesn't exist and `.profile` doesn't init rbenv, the login shell never gets rbenv on PATH.
Even when `.bash_profile` chains `.bashrc`, Ubuntu's default `.bashrc` has a guard at the top:
```bash
case $- in
*i*) ;;
*) return;;
esac
```
This **returns early for non-interactive shells**, which is exactly what `su - mastodon -c "<command>"` opens — so the rbenv init lines later in `.bashrc` are never reached.
### Fix
Drop a `.bash_profile` that sets up rbenv **before** sourcing `.bashrc`, so it works for both interactive and non-interactive login shells:
```bash
# /home/mastodon/.bash_profile (mode 0644, owned by mastodon:mastodon)
export PATH="$HOME/.rbenv/bin:$HOME/.rbenv/shims:$PATH"
if command -v rbenv >/dev/null 2>&1; then
eval "$(rbenv init -)"
fi
# Then load POSIX login env + bash interactive config
[ -f ~/.profile ] && . ~/.profile
[ -f ~/.bashrc ] && . ~/.bashrc
```
Verify:
```bash
su - mastodon -c "ruby -v" # → ruby 3.x.x …
su - mastodon -c "cd /home/mastodon/live && RAILS_ENV=production bin/tootctl version"
```
---
## Codified
All four gaps are handled by `configure_mastodon_permissions.yml` in MajorAnsible. The playbook is idempotent, requires no service restart, and includes self-asserting verification steps:
| Assertion | What it catches |
|---|---|
| `sudo -u www-data stat /home/mastodon/live/public/packs` must succeed | Gap 1 regression |
| `sudo -u www-data cat .env.production` must fail | Gap 2 regression |
| `su - mastodon -c "ruby -v"` must succeed and output "ruby" | Gap 3 or 4 regression |
Apply to all Mastodon hosts:
```bash
ansible-playbook configure_mastodon_permissions.yml
```
## References
- [[majortoot#2026-05-31 — ssh.socket race post-reboot on majortoot-hetzner (during cutover night)]]
- [[majortoot#tootctl CLI Note]]
- MajorAnsible: `configure_mastodon_permissions.yml`
- Related: [[mastodon-instance-tuning|Mastodon Instance Tuning]] · [[mastodon-db-maintenance|Mastodon DB Maintenance]]

View file

@ -27,38 +27,61 @@ journalctl -b -1 -u ssh # likely empty — sshd never spawned
journalctl -b -1 -u ssh.socket # socket started before tailscaled
```
### Fix
### Fix (current — 2026-05-31)
Add Tailscale dependency to the socket override:
`After=tailscaled.service` orders against the service becoming `active`**not** against the `tailscale0` interface actually having an IPv4 address. tailscaled flips to active within a second of starting, but the kernel doesn't have the address bound to the interface until DERP relays connect and the control plane confirms the node. ssh.socket attempting `ListenStream=<TS IP>:22` in that window fails with `Cannot assign requested address`, the socket goes into a failed state, and there is no automatic retry.
The proper gate is a dedicated readiness service that **waits for the tailscale0 IPv4 address to exist** before letting ssh.socket bind:
```ini
# /etc/systemd/system/tailscale-wait-ready.service
[Unit]
Description=Wait until tailscale0 has an IPv4 address
After=tailscaled.service
Requires=tailscaled.service
ConditionPathExists=/usr/sbin/ip
[Service]
Type=oneshot
RemainAfterExit=yes
TimeoutStartSec=120
ExecStart=/usr/bin/bash -c 'for i in $(seq 1 120); do ip -4 -o addr show tailscale0 2>/dev/null | grep -q "inet " && exit 0; sleep 1; done; exit 1'
[Install]
WantedBy=multi-user.target
```
```ini
# /etc/systemd/system/ssh.socket.d/override.conf
[Unit]
After=tailscaled.service
Requires=tailscaled.service
After=tailscale-wait-ready.service
Requires=tailscale-wait-ready.service
[Socket]
ListenStream=
ListenStream=<TAILSCALE_IP>:22
```
Then reload and restart:
Reload + restart:
```bash
systemctl daemon-reload
systemctl enable tailscale-wait-ready.service
systemctl restart ssh.socket
systemctl status ssh.socket # verify Listen: shows correct IP
ss -tlnp | grep :22 # verify bound to Tailscale IP
```
- `After=` ensures the socket waits for Tailscale to start
- `Requires=` ensures tailscaled must be running for the socket to activate
!!! note "Evolution of this fix"
- **2026-05-19 v1**`After=tailscaled.service` + `BindsTo=tailscaled.service`. Worked initially but caused a shutdown-time ordering cycle.
- **2026-05-23 v2**`BindsTo` swapped for `Requires` to break the cycle. Fixed the cycle but did **not** wait for `tailscale0` to actually have an IP — just for `tailscaled` to be active. Hosts continued losing SSH after some reboots (intermittent, depending on whether the race won).
- **2026-05-31 v3** — Added `tailscale-wait-ready.service` to gate ssh.socket on the interface having an address. This is the current canonical fix.
!!! warning "Do NOT use BindsTo"
`BindsTo=tailscaled.service` creates a **systemd ordering cycle** during shutdown: `basic.target → sockets.target → ssh.socket → tailscaled.service → basic.target`. Systemd breaks the cycle by deleting jobs unpredictably, which can prevent `ssh.socket` from starting on the next boot — leaving SSH dead until manual intervention. This was discovered on 2026-05-23 after the original fix (2026-05-19) used `BindsTo` and caused a second outage on dcaprod-hetzner. `Requires` provides the startup dependency without the dangerous bidirectional lifecycle coupling.
`BindsTo=tailscaled.service` creates a **systemd ordering cycle** during shutdown: `basic.target → sockets.target → ssh.socket → tailscaled.service → basic.target`. Systemd breaks the cycle by deleting jobs unpredictably, which can prevent `ssh.socket` from starting on the next boot. Use `Requires=` for startup ordering without the bidirectional lifecycle coupling.
### Affected Hosts
Ubuntu hosts using `configure_tailscale_ssh_only.yml`: majorlinux, dcaprod-hetzner, tttpod-hetzner. Fedora hosts (majordiscord) use firewall rules for SSH restriction — not affected by this race.
Ubuntu hosts using `configure_tailscale_ssh_only.yml`: majorlinux, dcaprod-hetzner, tttpod-hetzner, majortoot-hetzner. Fedora hosts (majordiscord) use firewall rules for SSH restriction — not affected by this race.
---
@ -120,4 +143,6 @@ All hosts where Tailscale is the primary access path. Particularly impactful on
- [[majordiscord#2026-05-19 — Tailscale boot race: unreachable after Ansible reboot]]
- [[majorlinux#2026-05-19 — ssh.socket override patched: added Tailscale dependency]]
- [[dcaprod#2026-05-23 — SSH unreachable again: BindsTo ordering cycle in ssh.socket override]]
- [[majorlinux#2026-05-31 — ssh.socket race recurrence post-reboot (Requires= insufficient; added wait-ready gate)]]
- [[majortoot#2026-05-31 — ssh.socket race post-reboot on majortoot-hetzner (during cutover night)]]
- Ansible: `configure_tailscale_ssh_only.yml`, `configure_tailscale_network_wait.yml`

View file

@ -38,6 +38,7 @@ updated: 2026-05-15T09:00
* [Logwatch Fleet Setup — Surviving Package Upgrades](02-selfhosting/monitoring/logwatch-fleet-setup.md)
* [Updating n8n Running in Docker](02-selfhosting/services/updating-n8n-docker.md)
* [Mastodon Instance Tuning](02-selfhosting/services/mastodon-instance-tuning.md)
* [Mastodon Post-Install Hardening (Permissions + Account)](02-selfhosting/services/mastodon-post-install-hardening.md)
* [Mastodon — The `--prune-profiles` Trap and How to Recover](02-selfhosting/services/mastodon-prune-profiles-trap.md)
* [Ghost Email Configuration with Mailgun](02-selfhosting/services/ghost-smtp-mailgun-setup.md)
* [Claude Code Remote Control — Mobile Access to a Persistent Host Session](02-selfhosting/services/claude-code-remote-control.md)