--- title: "Nextcloud AIO Container Unhealthy for 20 Hours After Nightly Update" domain: troubleshooting category: docker tags: [nextcloud, docker, healthcheck, netdata, php-fpm, aio] status: published created: 2026-03-28 updated: 2026-03-28 --- # Nextcloud AIO Container Unhealthy for 20 Hours After Nightly Update ## Symptom Netdata alert `docker_nextcloud_unhealthy` fired on majorlab and stayed in Warning for 20 hours. The `nextcloud-aio-nextcloud` container was running but its Docker healthcheck kept failing. No user-facing errors were visible in `nextcloud.log`. ## Investigation ### Timeline (2026-03-27, all UTC) | Time | Event | |---|---| | 04:00 | Nightly backup script started, mastercontainer update kicked off | | 04:03 | `nextcloud-aio-nextcloud` container recreated | | 04:05 | Backup finished | | 07:25 | Mastercontainer logged "Initial startup of Nextcloud All-in-One complete!" (3h20m delay) | | 10:22 | First entry in `nextcloud.log` (deprecation warnings only — no errors) | | 04:00 (Mar 28) | Next nightly backup replaced the container; new container came up healthy in ~25 minutes | ### Key findings - **No image update** — the container image dated to Feb 26, so this was not caused by a version change. - **No app-level errors** — `nextcloud.log` contained only `files_rightclick` deprecation warnings (level 3). No level 2/4 entries. - **PHP-FPM never stabilized** — the healthcheck (`/healthcheck.sh`) tests `nc -z 127.0.0.1 9000` (PHP-FPM). The container was running but FPM wasn't responding to the port check. - **6-hour log gap** — no `nextcloud.log` entries between container start (04:03) and first log (10:22), suggesting the AIO init scripts (occ upgrade, app updates, cron jobs) ran for hours before the app became partially responsive. - **RestartCount: 0** — the container never restarted on its own. It sat there unhealthy for the full 20 hours. - **Disk space fine** — 40% used on `/`. ### Healthcheck details ```bash #!/bin/bash # /healthcheck.sh inside nextcloud-aio-nextcloud nc -z "$POSTGRES_HOST" "$POSTGRES_PORT" || exit 0 # postgres down = pass (graceful) nc -z 127.0.0.1 9000 || exit 1 # PHP-FPM down = fail ``` If PostgreSQL is unreachable, the check passes (exits 0). The only failure path is PHP-FPM not listening on port 9000. ## Root Cause The AIO nightly update cycle recreated the container, but the startup/migration process hung or ran extremely long, preventing PHP-FPM from fully initializing. The container sat in this state for 20 hours with no self-recovery mechanism until the next nightly cycle replaced it. The exact migration or occ command that stalled could not be confirmed — the old container's entrypoint logs were lost when the Mar 28 backup cycle replaced it. ## Fix Two changes deployed on 2026-03-28: ### 1. Dedicated Netdata alarm with lenient window Split `nextcloud-aio-nextcloud` into its own Netdata alarm (`docker_nextcloud_unhealthy`) with a 10-minute lookup and 10-minute delay, separate from the general container alarm. See [Tuning Netdata Docker Health Alarms](../../02-selfhosting/monitoring/netdata-docker-health-alarm-tuning.md). ### 2. Watchdog cron for auto-restart Deployed `/etc/cron.d/nextcloud-health-watchdog` on majorlab: ```bash */15 * * * * root docker inspect --format={{.State.Health.Status}} nextcloud-aio-nextcloud 2>/dev/null | grep -q unhealthy && [ "$(docker inspect --format={{.State.StartedAt}} nextcloud-aio-nextcloud | xargs -I{} date -d {} +\%s)" -lt "$(date -d "1 hour ago" +\%s)" ] && docker restart nextcloud-aio-nextcloud && logger -t nextcloud-watchdog "Restarted unhealthy nextcloud-aio-nextcloud" ``` - Checks every 15 minutes - Only restarts if the container has been running >1 hour (avoids interfering with normal startup) - Logs to syslog: `journalctl -t nextcloud-watchdog` This caps future unhealthy outages at ~1 hour instead of persisting until the next nightly cycle. ## See Also - [Tuning Netdata Docker Health Alarms](../../02-selfhosting/monitoring/netdata-docker-health-alarm-tuning.md) - [Debugging Broken Docker Containers](../../02-selfhosting/docker/debugging-broken-docker-containers.md) - [Docker Healthchecks](../../02-selfhosting/docker/docker-healthchecks.md)