majorwiki/05-troubleshooting/ansible-check-mode-false-positives.md
majorlinux 4126656c05 wiki: update fail2ban digest + netdata docker health + 3 new articles
- fail2ban-digest-mode-fleet: recidive-only email model, sshd now silent,
  defaults-debian.conf gotcha added
- netdata-docker-health-alarm-tuning: 30m/10m config, tuning history table
- New: wp-fail2ban-logpath-debian-ubuntu, lora-adapter-gguf-conversion-fails,
  tailscale-status-json-hostname-localhost-ios
- Various article updates and nav index refreshes

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-05-02 14:58:07 -04:00

5.9 KiB

title domain category tags status created updated
Ansible Check Mode False Positives in Verify/Assert Tasks selfhosting troubleshooting
ansible
check-mode
dry-run
assert
handlers
troubleshooting
published 2026-04-18 2026-04-30T05:21

Ansible Check Mode False Positives in Verify/Assert Tasks

The Problem

ansible-playbook --check (dry-run mode) reports failures on verify and assert tasks that depend on handler-triggered side effects — even when the playbook is correct and would succeed on a real run.

Symptom: Running --check produces errors like:

TASK [Assert hardened settings are active] ***
fatal: [host]: FAILED! => {
    "assertion": "'permitrootlogin without-password' in sshd_effective.stdout",
    "msg": "One or more SSH hardening settings not effective"
}

But a real run (ansible-playbook without --check) succeeds cleanly.

Why It Happens

In check mode, Ansible simulates tasks but does not execute handlers. This means:

  1. A config file task reports changed (it would deploy the file)
  2. The handler (reload sshd, reload firewalld, etc.) is never fired
  3. A subsequent verify task runs sshd -T or ufw status verbose against the current live state (pre-change)
  4. The assert compares the current state against the expected post-change state and fails

The verify task is reading reality accurately — the change hasn't happened yet — but the failure is misleading. It suggests the playbook is broken when it's actually correct.

The Fix

Guard verify and assert tasks that depend on handler side effects with when: not ansible_check_mode:

- name: Verify effective SSH settings post-reload
  ansible.builtin.command:
    cmd: sshd -T
  register: sshd_effective
  changed_when: false
  when: not ansible_check_mode   # sshd hasn't reloaded in check mode

- name: Assert hardened settings are active
  ansible.builtin.assert:
    that:
      - "'permitrootlogin without-password' in sshd_effective.stdout"
      - "'x11forwarding no' in sshd_effective.stdout"
    fail_msg: "SSH hardening settings not effective — check for conflicting config"
  when: not ansible_check_mode   # result would be pre-change state

This skips the verify/assert during check mode (where they'd produce false failures) while keeping them active on real runs (where they catch actual misconfigurations).

When to Apply This Guard

Apply when: not ansible_check_mode to any task that:

  • Reads the active/effective state of a service after a config change (sshd -T, ufw status verbose, firewall-cmd --list-all, nginx -T)
  • Asserts that the post-change state matches expectations
  • Depends on a handler having fired first (service reload, daemon restart)

Don't apply it to tasks that check pre-existing state (e.g., verifying a file exists before modifying it) — those are valid in check mode.

Common Patterns

SSH daemon verify

- name: Verify effective sshd settings
  ansible.builtin.command: sshd -T
  register: sshd_out
  changed_when: false
  when: not ansible_check_mode

- name: Assert sshd hardening active
  ansible.builtin.assert:
    that:
      - "'maxauthtries 3' in sshd_out.stdout"
  when: not ansible_check_mode

UFW status verify

- name: Show UFW status
  ansible.builtin.command: ufw status verbose
  register: ufw_status
  changed_when: false
  when: not ansible_check_mode

- name: Confirm default deny incoming
  ansible.builtin.assert:
    that:
      - "'Default: deny (incoming)' in ufw_status.stdout"
  when: not ansible_check_mode

nginx config verify

- name: Test nginx config
  ansible.builtin.command: nginx -t
  changed_when: false
  when: not ansible_check_mode

The inverse problem shows up when a playbook registers output from ansible.builtin.command (or shell) and uses that output in a downstream conditional or set_fact. In check mode, command is skipped by default — the registered variable comes back with no stdout, and any in / containment check against it silently evaluates to False.

Saw this 2026-04-19 while writing fix_ebtables_usrmerge.yml. The play queried update-alternatives --display ebtables to detect whether a host ran the nft or legacy backend, then branched on that fact. Under --check, the query was skipped, the fact defaulted to legacy on every host, and the next task's existence check failed on the nft hosts (/usr/bin/ebtables-legacy not found). A real run was fine — but --check output looked like the playbook was broken.

Fix: force the detection task to run even in check mode, since it's a read-only query with no side effects.

- name: Query current ebtables alternative
  ansible.builtin.command: update-alternatives --display ebtables
  register: alt_query
  changed_when: false
  failed_when: false
  check_mode: false        # force execution in --check so downstream conditionals see real data

Apply this to any command/shell task whose output feeds a when:, set_fact, or similar logic. Only safe when the task is genuinely read-only.

Trade-off

Guarding with when: not ansible_check_mode means check mode won't validate these assertions. The benefit — no false failures — outweighs the gap because:

  • Check mode is showing you what would change, not whether the result is valid
  • Real runs still assert and will catch actual misconfigurations
  • The alternative (failing check runs) erodes trust in --check output

If you need to verify the effective post-change state in check mode, consider splitting the playbook into a deploy pass and a separate verify-only playbook run without --check.

See Also