From 50556b7da39aff8186e01d484f973d105de91530 Mon Sep 17 00:00:00 2001 From: Marcus Summers Date: Thu, 11 Jun 2026 10:49:54 -0400 Subject: [PATCH] docs: point Ansible references at the new roles (clamav/ssh_hardening/tailscale) Operational/how-to references updated to the role entry playbooks after the ADR-0001 migration (clamav.yml, ssh_hardening.yml, tailscale.yml). Historical incident narrative (dated callouts, commit refs) preserved verbatim. - clamav-fleet-deployment: override + re-run command -> clamav.yml; role note - ssh-hardening-ansible-fleet: note that this is now the ssh_hardening role - vps-migration-baseline-checklist: table -> clamav.yml / ssh_hardening.yml - ssh-socket-tailscale-race-condition: Affected Hosts + Prevention + References -> tailscale role tasks (network_wait/ssh_only_ubuntu/ssh_only_fedora) --- .../cloud/vps-migration-baseline-checklist.md | 4 ++-- .../security/clamav-fleet-deployment.md | 10 +++++++--- .../security/ssh-hardening-ansible-fleet.md | 3 +++ .../ssh-socket-tailscale-race-condition.md | 19 ++++++++++++++----- 4 files changed, 26 insertions(+), 10 deletions(-) diff --git a/02-selfhosting/cloud/vps-migration-baseline-checklist.md b/02-selfhosting/cloud/vps-migration-baseline-checklist.md index 3062d3e..d1ac251 100644 --- a/02-selfhosting/cloud/vps-migration-baseline-checklist.md +++ b/02-selfhosting/cloud/vps-migration-baseline-checklist.md @@ -57,8 +57,8 @@ Every server in the fleet should have these. Check each one after migration: | Firewall | `firewalld` | `ufw` | `configure_firewall_*.yml` | Verify fail2ban banaction matches | | Cron | `cronie` | `cron` | — (usually pre-installed) | Required by logwatch | | Auto-updates | `dnf-automatic` | `unattended-upgrades` | `ansible-unattended-upgrades-fleet` | Security patches only | -| Antivirus | `clamav` | `clamav` | `configure_clamav.yml` | Internet-facing hosts only | -| SSH hardening | `openssh-server` | `openssh-server` | `configure_ssh_hardening.yml` | Key-only, no root password | +| Antivirus | `clamav` | `clamav` | `clamav.yml` (clamav role) | Internet-facing hosts only | +| SSH hardening | `openssh-server` | `openssh-server` | `ssh_hardening.yml` (ssh_hardening role) | Key-only, no root password | | Timezone | — | — | — | US servers: `America/New_York`; UK: `Europe/London`. Hetzner defaults to UTC. | | CA bundle (Fedora) | `ca-certificates` | `ca-certificates` | — | Verify `/etc/pki/tls/certs/ca-bundle.crt` symlink exists — see [Fedora CA bundle fix](../../05-troubleshooting/security/fedora-ca-bundle-missing-symlink.md) | | Syslog (Fedora) | `rsyslog` | — (pre-installed) | — | Fedora 44 Hetzner images have journald only. Logwatch needs `/var/log/messages` + `/var/log/secure`. | diff --git a/02-selfhosting/security/clamav-fleet-deployment.md b/02-selfhosting/security/clamav-fleet-deployment.md index f4a2888..553d783 100644 --- a/02-selfhosting/security/clamav-fleet-deployment.md +++ b/02-selfhosting/security/clamav-fleet-deployment.md @@ -31,6 +31,10 @@ ClamAV is the standard open-source antivirus for Linux servers. For internet-fac ## Ansible Playbook +> On the MajorsHouse fleet this is packaged as the **`clamav` role** (`roles/clamav/`, +> tasks split install → service → scan → verify) and run via `clamav.yml` or `site.yml`. +> The standalone playbook below is the illustrative equivalent. + ```yaml - name: Deploy ClamAV to internet-facing hosts hosts: internet_facing # dca, majorlinux, teelia, tttpod, majortoot, majormail @@ -240,16 +244,16 @@ On hosts with ≤2 GB RAM, running `clamd` continuously is often counterproducti **The fix: `clamav_use_daemon: false` host_var** -`configure_clamav.yml` supports a per-host override. Add to the host's `host_vars//vars.yml`: +The `clamav` role supports a per-host override. Add to the host's `host_vars//vars.yml`: ```yaml clamav_use_daemon: false ``` -Then re-run the playbook: +Then re-run the role: ```bash -ansible-playbook configure_clamav.yml --limit +ansible-playbook clamav.yml --limit ``` This will: diff --git a/02-selfhosting/security/ssh-hardening-ansible-fleet.md b/02-selfhosting/security/ssh-hardening-ansible-fleet.md index 2f32a71..a3f2801 100644 --- a/02-selfhosting/security/ssh-hardening-ansible-fleet.md +++ b/02-selfhosting/security/ssh-hardening-ansible-fleet.md @@ -31,6 +31,9 @@ Rather than editing `/etc/ssh/sshd_config` directly (which may be managed by the ## Ansible Playbook +> On the MajorsHouse fleet this is packaged as the **`ssh_hardening` role** (`roles/ssh_hardening/`) +> and run via `ssh_hardening.yml` or `site.yml`. The standalone playbook below is the illustrative equivalent. + ```yaml - name: Harden SSH daemon fleet-wide hosts: all:!raspbian diff --git a/05-troubleshooting/networking/ssh-socket-tailscale-race-condition.md b/05-troubleshooting/networking/ssh-socket-tailscale-race-condition.md index f5eb371..5d16555 100644 --- a/05-troubleshooting/networking/ssh-socket-tailscale-race-condition.md +++ b/05-troubleshooting/networking/ssh-socket-tailscale-race-condition.md @@ -81,7 +81,15 @@ ss -tlnp | grep :22 # verify bound to Tailscale IP ### Affected Hosts -Ubuntu hosts using `configure_tailscale_ssh_only.yml`: majorlinux, dcaprod-hetzner, tttpod-hetzner, majortoot-hetzner. Fedora hosts (majordiscord) use firewall rules for SSH restriction — not affected by this race. +Ubuntu hosts locked via the `tailscale` role (`ssh_only_ubuntu` task, formerly `configure_tailscale_ssh_only.yml`): majorlinux, dcaprod-hetzner, tttpod-hetzner, majortoot-hetzner. + +> [!danger] The Ubuntu playbook shipped the cycle pattern until 2026-06-07 +> Despite the 2026-06-04 resolution above, `configure_tailscale_ssh_only.yml` in the repo kept deploying the `[Unit] Requires=tailscale-wait-ready.service` gate on **ssh.socket** (the cycle-causer) and never added the ssh.service gate — so re-running it *re-armed* the ordering cycle. Caught 2026-06-07: it clobbered majorlinux's hand-fix, and **majortoot-hetzner was found already armed** with the latent cycle (would have lost SSH on its next reboot). Both restored/defused; playbook corrected in MajorAnsible `e0d35aa` (gate on ssh.service, dependency-free socket). +> +> **Fleet audited & reconciled 2026-06-07:** dcaprod-hetzner + tttpod-hetzner had the dependency-free socket already but were **missing `tailscale-wait-ready.service`** (their ssh.service gate referenced a non-existent unit → inert → latent *bind* race, not a cycle); the corrected playbook was applied to both, deploying the service and activating the gate. teelia uses **Tailscale SSH** (no sshd, ss.socket/ssh.service disabled) — immune to both races. All Ubuntu hosts now run the same pattern: dependency-free `ss.socket` bind + `ssh.service` readiness gate + `tailscale-wait-ready.service`. + +> [!warning] Fedora hosts are NOT automatically immune (corrected 2026-06-07) +> The firewalld method (`configure_tailscale_ssh_only_fedora.yml`) binds sshd on `0.0.0.0:22` and enforces Tailscale-only via the firewall, so it has no dependency on the Tailscale address — **unless** a host also carries a leftover manual `ListenAddress ` drop-in (`/etc/ssh/sshd_config.d/tailscale-only.conf`) from the pre-firewall lockdown. Then sshd.service hits the same boot bind-race (`Bind to port 22 on failed: Cannot assign requested address`) and flaps every reboot. Hit on **majordiscord 2026-06-07**; fixed by removing the redundant drop-in (firewall stays the enforcing layer). The Fedora playbook now removes it automatically (MajorAnsible `b4a9090`). --- @@ -133,9 +141,10 @@ All hosts where Tailscale is the primary access path. Particularly impactful on ## Prevention - Set root passwords on all VPS hosts for emergency console access -- Ansible playbooks deploy both fixes automatically: - - `configure_tailscale_network_wait.yml` — tailscaled network-online dependency (all hosts) - - `configure_tailscale_ssh_only.yml` — ssh.socket Tailscale dependency (Ubuntu only) +- The `tailscale` role deploys all fixes automatically (run via `tailscale.yml` / `site.yml`): + - `network_wait` task — tailscaled network-online dependency (all hosts) + - `ssh_only_ubuntu` task — dependency-free ssh.socket bind + ssh.service readiness gate + `tailscale-wait-ready.service` (Ubuntu group) + - `ssh_only_fedora` task — firewalld Tailscale-only lockdown; removes any leftover `ListenAddress` drop-in (Fedora group) ## References @@ -145,4 +154,4 @@ All hosts where Tailscale is the primary access path. Particularly impactful on - [[dcaprod#2026-05-23 — SSH unreachable again: BindsTo ordering cycle in ssh.socket override]] - [[majorlinux#2026-05-31 — ssh.socket race recurrence post-reboot (Requires= insufficient; added wait-ready gate)]] - [[majortoot#2026-05-31 — ssh.socket race post-reboot on majortoot-hetzner (during cutover night)]] -- Ansible: `configure_tailscale_ssh_only.yml`, `configure_tailscale_network_wait.yml` +- Ansible: the `tailscale` role (`tailscale.yml`) — `network_wait` + `ssh_only_ubuntu`/`ssh_only_fedora` tasks, consolidated from the former `configure_tailscale_*` playbooks (MajorAnsible `656302e`)