docs: point Ansible references at the new roles (clamav/ssh_hardening/tailscale)

Operational/how-to references updated to the role entry playbooks after the
ADR-0001 migration (clamav.yml, ssh_hardening.yml, tailscale.yml). Historical
incident narrative (dated callouts, commit refs) preserved verbatim.

- clamav-fleet-deployment: override + re-run command -> clamav.yml; role note
- ssh-hardening-ansible-fleet: note that this is now the ssh_hardening role
- vps-migration-baseline-checklist: table -> clamav.yml / ssh_hardening.yml
- ssh-socket-tailscale-race-condition: Affected Hosts + Prevention + References
  -> tailscale role tasks (network_wait/ssh_only_ubuntu/ssh_only_fedora)
This commit is contained in:
Marcus Summers 2026-06-11 10:49:54 -04:00
parent 2e58c4625c
commit 50556b7da3
4 changed files with 26 additions and 10 deletions

View file

@ -57,8 +57,8 @@ Every server in the fleet should have these. Check each one after migration:
| Firewall | `firewalld` | `ufw` | `configure_firewall_*.yml` | Verify fail2ban banaction matches |
| Cron | `cronie` | `cron` | — (usually pre-installed) | Required by logwatch |
| Auto-updates | `dnf-automatic` | `unattended-upgrades` | `ansible-unattended-upgrades-fleet` | Security patches only |
| Antivirus | `clamav` | `clamav` | `configure_clamav.yml` | Internet-facing hosts only |
| SSH hardening | `openssh-server` | `openssh-server` | `configure_ssh_hardening.yml` | Key-only, no root password |
| Antivirus | `clamav` | `clamav` | `clamav.yml` (clamav role) | Internet-facing hosts only |
| SSH hardening | `openssh-server` | `openssh-server` | `ssh_hardening.yml` (ssh_hardening role) | Key-only, no root password |
| Timezone | — | — | — | US servers: `America/New_York`; UK: `Europe/London`. Hetzner defaults to UTC. |
| CA bundle (Fedora) | `ca-certificates` | `ca-certificates` | — | Verify `/etc/pki/tls/certs/ca-bundle.crt` symlink exists — see [Fedora CA bundle fix](../../05-troubleshooting/security/fedora-ca-bundle-missing-symlink.md) |
| Syslog (Fedora) | `rsyslog` | — (pre-installed) | — | Fedora 44 Hetzner images have journald only. Logwatch needs `/var/log/messages` + `/var/log/secure`. |

View file

@ -31,6 +31,10 @@ ClamAV is the standard open-source antivirus for Linux servers. For internet-fac
## Ansible Playbook
> On the MajorsHouse fleet this is packaged as the **`clamav` role** (`roles/clamav/`,
> tasks split install → service → scan → verify) and run via `clamav.yml` or `site.yml`.
> The standalone playbook below is the illustrative equivalent.
```yaml
- name: Deploy ClamAV to internet-facing hosts
hosts: internet_facing # dca, majorlinux, teelia, tttpod, majortoot, majormail
@ -240,16 +244,16 @@ On hosts with ≤2 GB RAM, running `clamd` continuously is often counterproducti
**The fix: `clamav_use_daemon: false` host_var**
`configure_clamav.yml` supports a per-host override. Add to the host's `host_vars/<hostname>/vars.yml`:
The `clamav` role supports a per-host override. Add to the host's `host_vars/<hostname>/vars.yml`:
```yaml
clamav_use_daemon: false
```
Then re-run the playbook:
Then re-run the role:
```bash
ansible-playbook configure_clamav.yml --limit <hostname>
ansible-playbook clamav.yml --limit <hostname>
```
This will:

View file

@ -31,6 +31,9 @@ Rather than editing `/etc/ssh/sshd_config` directly (which may be managed by the
## Ansible Playbook
> On the MajorsHouse fleet this is packaged as the **`ssh_hardening` role** (`roles/ssh_hardening/`)
> and run via `ssh_hardening.yml` or `site.yml`. The standalone playbook below is the illustrative equivalent.
```yaml
- name: Harden SSH daemon fleet-wide
hosts: all:!raspbian

View file

@ -81,7 +81,15 @@ ss -tlnp | grep :22 # verify bound to Tailscale IP
### Affected Hosts
Ubuntu hosts using `configure_tailscale_ssh_only.yml`: majorlinux, dcaprod-hetzner, tttpod-hetzner, majortoot-hetzner. Fedora hosts (majordiscord) use firewall rules for SSH restriction — not affected by this race.
Ubuntu hosts locked via the `tailscale` role (`ssh_only_ubuntu` task, formerly `configure_tailscale_ssh_only.yml`): majorlinux, dcaprod-hetzner, tttpod-hetzner, majortoot-hetzner.
> [!danger] The Ubuntu playbook shipped the cycle pattern until 2026-06-07
> Despite the 2026-06-04 resolution above, `configure_tailscale_ssh_only.yml` in the repo kept deploying the `[Unit] Requires=tailscale-wait-ready.service` gate on **ssh.socket** (the cycle-causer) and never added the ssh.service gate — so re-running it *re-armed* the ordering cycle. Caught 2026-06-07: it clobbered majorlinux's hand-fix, and **majortoot-hetzner was found already armed** with the latent cycle (would have lost SSH on its next reboot). Both restored/defused; playbook corrected in MajorAnsible `e0d35aa` (gate on ssh.service, dependency-free socket).
>
> **Fleet audited & reconciled 2026-06-07:** dcaprod-hetzner + tttpod-hetzner had the dependency-free socket already but were **missing `tailscale-wait-ready.service`** (their ssh.service gate referenced a non-existent unit → inert → latent *bind* race, not a cycle); the corrected playbook was applied to both, deploying the service and activating the gate. teelia uses **Tailscale SSH** (no sshd, ss.socket/ssh.service disabled) — immune to both races. All Ubuntu hosts now run the same pattern: dependency-free `ss.socket` bind + `ssh.service` readiness gate + `tailscale-wait-ready.service`.
> [!warning] Fedora hosts are NOT automatically immune (corrected 2026-06-07)
> The firewalld method (`configure_tailscale_ssh_only_fedora.yml`) binds sshd on `0.0.0.0:22` and enforces Tailscale-only via the firewall, so it has no dependency on the Tailscale address — **unless** a host also carries a leftover manual `ListenAddress <tailscale-ip>` drop-in (`/etc/ssh/sshd_config.d/tailscale-only.conf`) from the pre-firewall lockdown. Then sshd.service hits the same boot bind-race (`Bind to port 22 on <ts-ip> failed: Cannot assign requested address`) and flaps every reboot. Hit on **majordiscord 2026-06-07**; fixed by removing the redundant drop-in (firewall stays the enforcing layer). The Fedora playbook now removes it automatically (MajorAnsible `b4a9090`).
---
@ -133,9 +141,10 @@ All hosts where Tailscale is the primary access path. Particularly impactful on
## Prevention
- Set root passwords on all VPS hosts for emergency console access
- Ansible playbooks deploy both fixes automatically:
- `configure_tailscale_network_wait.yml` — tailscaled network-online dependency (all hosts)
- `configure_tailscale_ssh_only.yml` — ssh.socket Tailscale dependency (Ubuntu only)
- The `tailscale` role deploys all fixes automatically (run via `tailscale.yml` / `site.yml`):
- `network_wait` task — tailscaled network-online dependency (all hosts)
- `ssh_only_ubuntu` task — dependency-free ssh.socket bind + ssh.service readiness gate + `tailscale-wait-ready.service` (Ubuntu group)
- `ssh_only_fedora` task — firewalld Tailscale-only lockdown; removes any leftover `ListenAddress` drop-in (Fedora group)
## References
@ -145,4 +154,4 @@ All hosts where Tailscale is the primary access path. Particularly impactful on
- [[dcaprod#2026-05-23 — SSH unreachable again: BindsTo ordering cycle in ssh.socket override]]
- [[majorlinux#2026-05-31 — ssh.socket race recurrence post-reboot (Requires= insufficient; added wait-ready gate)]]
- [[majortoot#2026-05-31 — ssh.socket race post-reboot on majortoot-hetzner (during cutover night)]]
- Ansible: `configure_tailscale_ssh_only.yml`, `configure_tailscale_network_wait.yml`
- Ansible: the `tailscale` role (`tailscale.yml`) — `network_wait` + `ssh_only_ubuntu`/`ssh_only_fedora` tasks, consolidated from the former `configure_tailscale_*` playbooks (MajorAnsible `656302e`)