majorwiki/05-troubleshooting/ansible-ubuntu-reboot-detection-kernel-mismatch.md
majorlinux 52ca8a0413 wiki: batch update — 4 new articles + 4 updates
New articles:
- Postfix SendGrid TLS handshake failure (port 465 vs 587)
- Plex transcoding troubleshooting
- Ansible Ubuntu reboot detection kernel mismatch
- WSL2 PyTorch checkpoint Windows filesystem deadlock

Updated:
- AWS S3 cost management (expanded)
- Network overview (IP updates)
- HEVC VAAPI batch encode (progress + fixes)
- SUMMARY.md (new entries)
2026-05-25 13:55:10 -04:00

106 lines
4.3 KiB
Markdown

---
title: "Ansible: Ubuntu Reboot Detection Misses Kernel Upgrades"
domain: troubleshooting
category: ansible
tags: [ansible, ubuntu, kernel, reboot, needrestart, apt]
status: published
created: 2026-05-19
updated: 2026-05-19
---
# Ansible: Ubuntu Reboot Detection Misses Kernel Upgrades
## Problem
`update.yml` runs across the Ubuntu fleet, a kernel package is upgraded, but the executive summary reports `No reboot needed` — even though a reboot is genuinely required. Running `uname -r` on the host confirms it's still on the old kernel.
Example: majortoot had `linux-image-6.8.0-117-generic` installed on May 16 after a Tailscale update triggered `needrestart`, but the playbook kept reporting clean.
## Root Cause
The standard check for Ubuntu reboot state is:
```yaml
- name: Check if a reboot is required for Ubuntu servers
ansible.builtin.stat:
path: /var/run/reboot-required
register: ubuntu_reboot_flag
```
`/var/run/reboot-required` is written by `update-notifier-common`'s `notify-reboot-required` script, called by `/etc/kernel/postinst.d/update-notifier` when a kernel package is installed via `apt`.
The problem is `needrestart`. It runs after every `apt` invocation via a `DPkg::Post-Invoke` hook (`apt-pinvoke -m u`). In **unattended mode** (`-m u`), needrestart detects the pending kernel upgrade and calls `announce_ver()` in `NeedRestart::UI::Ubuntu` — but that function only prints to stdout. It does **not** call `_write_reboot_file()`. Only `announce_ucode()` (microcode upgrades) calls `_write_reboot_file()`.
So the sequence is:
1. `apt` installs kernel → `notify-reboot-required` creates `/run/reboot-required`
2. Some later `apt` run (e.g. Ansible installs Tailscale) → `needrestart -m u` runs → detects kernel mismatch → calls `announce_ver()` → prints to stdout (suppressed in Ansible) → **does not** recreate the sentinel file
3. Next Ansible run: stat check finds no file → reports `No reboot needed`
The `/run` filesystem is tmpfs and clears on reboot, but the sentinel file can disappear between reboots any time needrestart runs without recreating it.
## Fix — Dual Check in update.yml
Add a parallel kernel comparison task after the existing stat check:
```yaml
- name: Check running kernel vs installed kernel (Ubuntu)
ansible.builtin.shell: |
RUNNING=$(uname -r)
INSTALLED=$(dpkg -l 'linux-image-[0-9]*-generic' 2>/dev/null \
| awk '/^ii/{print $2}' \
| sed 's/linux-image-//' \
| sort -V | tail -1)
if [ -n "$INSTALLED" ] && [ "$RUNNING" != "$INSTALLED" ]; then
echo "KERNEL_MISMATCH"
fi
register: kernel_mismatch_check
changed_when: false
when: ansible_facts['os_family'] == "Debian"
```
Then update the `host_summary` Jinja2 template to OR both conditions:
```jinja2
{%- if ansible_facts['os_family'] == 'Debian' and (
(ubuntu_reboot_flag is defined and ubuntu_reboot_flag.stat is defined and ubuntu_reboot_flag.stat.exists)
or
(kernel_mismatch_check is defined and 'KERNEL_MISMATCH' in (kernel_mismatch_check.stdout | default('')))
) -%}
{%- set _ = parts.append('REBOOT REQUIRED') -%}
```
## Common Mistake — Comparing the Wrong dpkg Field
An initial version of this fix used `$3` (the package version) and `cut`:
```bash
# WRONG — version field never matches uname -r
INSTALLED=$(dpkg -l 'linux-image-*-generic' | awk '/^ii/{print $3}' | sort -V | tail -1 | cut -d- -f1-4)
```
| Field | Example value |
|-------|--------------|
| `dpkg $3` (version) after cut | `6.8.0-57.59` |
| `uname -r` | `6.8.0-57-generic` |
These formats never match. Every Ubuntu host permanently reports `KERNEL_MISMATCH`. Always use the **name column (`$2`)**, strip the `linux-image-` prefix, and compare directly to `uname -r`.
Also use `linux-image-[0-9]*-generic` (not `*-generic`) to exclude the `linux-image-generic` meta-package from the sort.
## Verification
Run against a known-pending host before and after reboot:
```bash
ansible-playbook update.yml --limit majortoot
```
Before reboot: `majortoot: 0 pkg(s) upgraded | REBOOT REQUIRED`
After reboot: `majortoot: 0 pkg(s) upgraded | No reboot needed`
## Related
- [[ansible-regex-search-set-fact-capture-group]] — companion Jinja2 gotcha in the same `host_summary` task
- [[ansible-unattended-upgrades-fleet]] — managing the Ubuntu auto-upgrade stack
- [[ansible-check-mode-false-positives]] — another Ansible reporting quirk