Documents the SSH keepalive fix for dnf upgrade timeouts on Fedora hosts, plus the do-agent task guard fix. Also adds Ansible & Fleet Management section to the troubleshooting index. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2.3 KiB
title, domain, category, tags, status, created, updated
| title | domain | category | tags | status | created | updated | ||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Ansible SSH Timeout During dnf upgrade on Fedora Hosts | troubleshooting | ansible |
|
published | 2026-03-28 | 2026-03-28 |
Ansible SSH Timeout During dnf upgrade on Fedora Hosts
Symptom
Running ansible-playbook update.yml against Fedora/CentOS hosts fails with:
fatal: [hostname]: UNREACHABLE! => {"changed": false,
"msg": "Failed to connect to the host via ssh: Shared connection to <IP> closed."}
The failure occurs specifically during ansible.builtin.dnf tasks that upgrade all packages (name: '*', state: latest), because the operation takes long enough for the SSH connection to drop.
Root Cause
Without explicit SSH keepalive settings in ansible.cfg, OpenSSH defaults apply. Long-running tasks like full dnf upgrade across a fleet can exceed idle timeouts, causing the control connection to close mid-task.
Fix
Add a [ssh_connection] section to ansible.cfg:
[ssh_connection]
ssh_args = -o ServerAliveInterval=30 -o ServerAliveCountMax=10 -o ControlMaster=auto -o ControlPersist=60s
| Setting | Purpose |
|---|---|
ServerAliveInterval=30 |
Send a keepalive every 30 seconds |
ServerAliveCountMax=10 |
Allow 10 missed keepalives before disconnect (~5 min tolerance) |
ControlMaster=auto |
Reuse SSH connections across tasks |
ControlPersist=60s |
Keep the master connection open 60s after last use |
Related Fix: do-agent Task Guard
In the same playbook run, a second failure surfaced on hosts where the ansible.builtin.uri task to fetch the latest do-agent release was skipped (non-RedHat hosts or hosts without do-agent installed). The registered variable existed but contained a skipped result with no .json attribute, causing:
object of type 'dict' has no attribute 'json'
Fix: add guards to downstream tasks that reference the URI result:
when:
- do_agent_release is defined
- do_agent_release is not skipped
- do_agent_release.json is defined
Environment
- Controller: macOS (MajorAir)
- Targets: Fedora 43 (majorlab, majormail, majorhome, majordiscord)
- Ansible: community edition via Homebrew
- Committed:
d9c6bdbin MajorAnsible repo