From d88a209e0b968f9bd172dd90739eb8993ad66147 Mon Sep 17 00:00:00 2001 From: majorlinux Date: Sat, 28 Mar 2026 11:21:18 -0400 Subject: [PATCH] Add Ansible SSH timeout troubleshooting article Documents the SSH keepalive fix for dnf upgrade timeouts on Fedora hosts, plus the do-agent task guard fix. Also adds Ansible & Fleet Management section to the troubleshooting index. Co-Authored-By: Claude Opus 4.6 (1M context) --- .../ansible-ssh-timeout-dnf-upgrade.md | 72 +++++++++++++++++++ 05-troubleshooting/index.md | 4 ++ SUMMARY.md | 1 + 3 files changed, 77 insertions(+) create mode 100644 05-troubleshooting/ansible-ssh-timeout-dnf-upgrade.md diff --git a/05-troubleshooting/ansible-ssh-timeout-dnf-upgrade.md b/05-troubleshooting/ansible-ssh-timeout-dnf-upgrade.md new file mode 100644 index 0000000..5ff254e --- /dev/null +++ b/05-troubleshooting/ansible-ssh-timeout-dnf-upgrade.md @@ -0,0 +1,72 @@ +--- +title: Ansible SSH Timeout During dnf upgrade on Fedora Hosts +domain: troubleshooting +category: ansible +tags: + - ansible + - ssh + - fedora + - dnf + - timeout + - fleet-management +status: published +created: '2026-03-28' +updated: '2026-03-28' +--- + +# Ansible SSH Timeout During dnf upgrade on Fedora Hosts + +## Symptom + +Running `ansible-playbook update.yml` against Fedora/CentOS hosts fails with: + +``` +fatal: [hostname]: UNREACHABLE! => {"changed": false, + "msg": "Failed to connect to the host via ssh: Shared connection to closed."} +``` + +The failure occurs specifically during `ansible.builtin.dnf` tasks that upgrade all packages (`name: '*'`, `state: latest`), because the operation takes long enough for the SSH connection to drop. + +## Root Cause + +Without explicit SSH keepalive settings in `ansible.cfg`, OpenSSH defaults apply. Long-running tasks like full `dnf upgrade` across a fleet can exceed idle timeouts, causing the control connection to close mid-task. + +## Fix + +Add a `[ssh_connection]` section to `ansible.cfg`: + +```ini +[ssh_connection] +ssh_args = -o ServerAliveInterval=30 -o ServerAliveCountMax=10 -o ControlMaster=auto -o ControlPersist=60s +``` + +| Setting | Purpose | +|---------|---------| +| `ServerAliveInterval=30` | Send a keepalive every 30 seconds | +| `ServerAliveCountMax=10` | Allow 10 missed keepalives before disconnect (~5 min tolerance) | +| `ControlMaster=auto` | Reuse SSH connections across tasks | +| `ControlPersist=60s` | Keep the master connection open 60s after last use | + +## Related Fix: do-agent Task Guard + +In the same playbook run, a second failure surfaced on hosts where the `ansible.builtin.uri` task to fetch the latest `do-agent` release was **skipped** (non-RedHat hosts or hosts without do-agent installed). The registered variable existed but contained a skipped result with no `.json` attribute, causing: + +``` +object of type 'dict' has no attribute 'json' +``` + +Fix: add guards to downstream tasks that reference the URI result: + +```yaml +when: + - do_agent_release is defined + - do_agent_release is not skipped + - do_agent_release.json is defined +``` + +## Environment + +- **Controller:** macOS (MajorAir) +- **Targets:** Fedora 43 (majorlab, majormail, majorhome, majordiscord) +- **Ansible:** community edition via Homebrew +- **Committed:** `d9c6bdb` in MajorAnsible repo diff --git a/05-troubleshooting/index.md b/05-troubleshooting/index.md index 2b813bd..ff6c2fe 100644 --- a/05-troubleshooting/index.md +++ b/05-troubleshooting/index.md @@ -13,6 +13,10 @@ Practical fixes for common Linux, networking, and application problems. - [ISP SNI Filtering & Caddy](isp-sni-filtering-caddy.md) - [yt-dlp YouTube JS Challenge Fix](yt-dlp-fedora-js-challenge.md) +## ⚙️ Ansible & Fleet Management +- [SSH Timeout During dnf upgrade on Fedora Hosts](ansible-ssh-timeout-dnf-upgrade.md) +- [Vault Password File Missing](ansible-vault-password-file-missing.md) + ## 📦 Docker & Systems - [Docker & Caddy Recovery After Reboot (Fedora + SELinux)](docker-caddy-selinux-post-reboot-recovery.md) - [Gitea Actions Runner: Boot Race Condition Fix](gitea-runner-boot-race-network-target.md) diff --git a/SUMMARY.md b/SUMMARY.md index 5dd62da..e31aefc 100644 --- a/SUMMARY.md +++ b/SUMMARY.md @@ -62,3 +62,4 @@ * [Ollama Drops Off Tailscale When Mac Sleeps](05-troubleshooting/ollama-macos-sleep-tailscale-disconnect.md) * [ClamAV CPU Spike: Safe Scheduling with nice/ionice](05-troubleshooting/security/clamscan-cpu-spike-nice-ionice.md) * [Ansible: Vault Password File Not Found](05-troubleshooting/ansible-vault-password-file-missing.md) + * [Ansible: SSH Timeout During dnf upgrade on Fedora Hosts](05-troubleshooting/ansible-ssh-timeout-dnf-upgrade.md)