Add troubleshooting article for the 2026-03-27 incident where PHP-FPM hung after the nightly update cycle. Update the Netdata Docker alarm tuning article with the dedicated Nextcloud alarm split and the new watchdog cron deployed to majorlab. (54 articles) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
4.2 KiB
title, domain, category, tags, status, created, updated
| title | domain | category | tags | status | created | updated | ||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Nextcloud AIO Container Unhealthy for 20 Hours After Nightly Update | troubleshooting | docker |
|
published | 2026-03-28 | 2026-03-28 |
Nextcloud AIO Container Unhealthy for 20 Hours After Nightly Update
Symptom
Netdata alert docker_nextcloud_unhealthy fired on majorlab and stayed in Warning for 20 hours. The nextcloud-aio-nextcloud container was running but its Docker healthcheck kept failing. No user-facing errors were visible in nextcloud.log.
Investigation
Timeline (2026-03-27, all UTC)
| Time | Event |
|---|---|
| 04:00 | Nightly backup script started, mastercontainer update kicked off |
| 04:03 | nextcloud-aio-nextcloud container recreated |
| 04:05 | Backup finished |
| 07:25 | Mastercontainer logged "Initial startup of Nextcloud All-in-One complete!" (3h20m delay) |
| 10:22 | First entry in nextcloud.log (deprecation warnings only — no errors) |
| 04:00 (Mar 28) | Next nightly backup replaced the container; new container came up healthy in ~25 minutes |
Key findings
- No image update — the container image dated to Feb 26, so this was not caused by a version change.
- No app-level errors —
nextcloud.logcontained onlyfiles_rightclickdeprecation warnings (level 3). No level 2/4 entries. - PHP-FPM never stabilized — the healthcheck (
/healthcheck.sh) testsnc -z 127.0.0.1 9000(PHP-FPM). The container was running but FPM wasn't responding to the port check. - 6-hour log gap — no
nextcloud.logentries between container start (04:03) and first log (10:22), suggesting the AIO init scripts (occ upgrade, app updates, cron jobs) ran for hours before the app became partially responsive. - RestartCount: 0 — the container never restarted on its own. It sat there unhealthy for the full 20 hours.
- Disk space fine — 40% used on
/.
Healthcheck details
#!/bin/bash
# /healthcheck.sh inside nextcloud-aio-nextcloud
nc -z "$POSTGRES_HOST" "$POSTGRES_PORT" || exit 0 # postgres down = pass (graceful)
nc -z 127.0.0.1 9000 || exit 1 # PHP-FPM down = fail
If PostgreSQL is unreachable, the check passes (exits 0). The only failure path is PHP-FPM not listening on port 9000.
Root Cause
The AIO nightly update cycle recreated the container, but the startup/migration process hung or ran extremely long, preventing PHP-FPM from fully initializing. The container sat in this state for 20 hours with no self-recovery mechanism until the next nightly cycle replaced it.
The exact migration or occ command that stalled could not be confirmed — the old container's entrypoint logs were lost when the Mar 28 backup cycle replaced it.
Fix
Two changes deployed on 2026-03-28:
1. Dedicated Netdata alarm with lenient window
Split nextcloud-aio-nextcloud into its own Netdata alarm (docker_nextcloud_unhealthy) with a 10-minute lookup and 10-minute delay, separate from the general container alarm. See Tuning Netdata Docker Health Alarms.
2. Watchdog cron for auto-restart
Deployed /etc/cron.d/nextcloud-health-watchdog on majorlab:
*/15 * * * * root docker inspect --format={{.State.Health.Status}} nextcloud-aio-nextcloud 2>/dev/null | grep -q unhealthy && [ "$(docker inspect --format={{.State.StartedAt}} nextcloud-aio-nextcloud | xargs -I{} date -d {} +\%s)" -lt "$(date -d "1 hour ago" +\%s)" ] && docker restart nextcloud-aio-nextcloud && logger -t nextcloud-watchdog "Restarted unhealthy nextcloud-aio-nextcloud"
- Checks every 15 minutes
- Only restarts if the container has been running >1 hour (avoids interfering with normal startup)
- Logs to syslog:
journalctl -t nextcloud-watchdog
This caps future unhealthy outages at ~1 hour instead of persisting until the next nightly cycle.