diff --git a/02-selfhosting/monitoring/netdata-docker-health-alarm-tuning.md b/02-selfhosting/monitoring/netdata-docker-health-alarm-tuning.md
index 01a9e2f..b5fc1cf 100644
--- a/02-selfhosting/monitoring/netdata-docker-health-alarm-tuning.md
+++ b/02-selfhosting/monitoring/netdata-docker-health-alarm-tuning.md
@@ -5,7 +5,7 @@ category: monitoring
 tags: [netdata, docker, nextcloud, alarms, health, monitoring]
 status: published
 created: 2026-03-18
-updated: 2026-03-22
+updated: 2026-03-28
 ---
 
 # Tuning Netdata Docker Health Alarms to Prevent Update Flapping
@@ -28,8 +28,13 @@ A single container being unhealthy for 10 seconds triggers it. No grace period,
 
 Create a custom override at `/etc/netdata/health.d/docker.conf` (maps to the Netdata config volume if running in Docker). This file takes precedence over the stock config in `/usr/lib/netdata/conf.d/health.d/docker.conf`.
 
+### General Container Alarm
+
+This alarm covers all containers **except** `nextcloud-aio-nextcloud`, which gets its own dedicated alarm (see below).
+
 ```ini
 # Custom override — reduces flapping during nightly container updates.
+# General container unhealthy alarm — all containers except nextcloud-aio-nextcloud
 
 template: docker_container_unhealthy
        on: docker.container_health_status
@@ -39,6 +44,7 @@ component: Docker
     units: status
     every: 30s
    lookup: average -5m of unhealthy
+chart labels: container_name=!nextcloud-aio-nextcloud *
      warn: $this > 0
     delay: up 3m down 5m multiplier 1.5 max 30m
   summary: Docker container ${label:container_name} health
@@ -53,7 +59,47 @@ component: Docker
 | `delay: up 3m` | none | 3m | Won't fire until unhealthy condition persists for 3 continuous minutes |
 | `delay: down 5m` | none | 5m (max 30m) | Grace period after recovery before clearing |
 
-The `up` delay is the critical addition. Nextcloud AIO's `nextcloud-aio-nextcloud` container checks both PostgreSQL (port 5432) and PHP-FPM (port 9000). PHP-FPM takes ~90 seconds to warm up after a restart, causing 2–3 failing health checks before the container becomes healthy. With `delay: up 3m`, Netdata waits for 3 continuous minutes of unhealthy status before firing — absorbing the ~90 second startup window with margin to spare. A genuinely broken container will still trigger the alert.
+### Dedicated Nextcloud AIO Alarm
+
+Added 2026-03-23, updated 2026-03-28. The `nextcloud-aio-nextcloud` container needs a more lenient window than other containers. Its healthcheck (`/healthcheck.sh`) verifies PostgreSQL connectivity (port 5432) and PHP-FPM (port 9000). PHP-FPM takes ~90 seconds to warm up after a normal restart — but during nightly AIO update cycles, the full startup (occ upgrade, app updates, migrations) can take 5+ minutes. On 2026-03-27, a startup hung and left the container unhealthy for 20 hours until the next nightly cycle replaced it.
+
+The dedicated alarm uses a 10-minute lookup window and 10-minute delay to absorb normal startup, while still catching sustained failures:
+
+```ini
+# Dedicated alarm for nextcloud-aio-nextcloud — lenient window to absorb nightly update cycle
+# PHP-FPM can take 5+ minutes to warm up; only alert on sustained failure
+
+template: docker_nextcloud_unhealthy
+       on: docker.container_health_status
+    class: Errors
+     type: Containers
+component: Docker
+    units: status
+    every: 30s
+   lookup: average -10m of unhealthy
+chart labels: container_name=nextcloud-aio-nextcloud
+     warn: $this > 0
+    delay: up 10m down 5m multiplier 1.5 max 30m
+  summary: Nextcloud container health sustained
+     info: nextcloud-aio-nextcloud has been unhealthy for a sustained period — not a transient update blip
+       to: sysadmin
+```
+
+## Watchdog Cron: Auto-Restart on Sustained Unhealthy
+
+If the Nextcloud container stays unhealthy for more than 1 hour (well past any normal startup window), a cron watchdog on majorlab auto-restarts it and logs the event. This was added 2026-03-28 after an incident where the container sat unhealthy for 20 hours until the next nightly backup cycle replaced it.
+
+**File:** `/etc/cron.d/nextcloud-health-watchdog`
+
+```bash
+# Restart nextcloud-aio-nextcloud if unhealthy for >1 hour
+*/15 * * * * root docker inspect --format={{.State.Health.Status}} nextcloud-aio-nextcloud 2>/dev/null | grep -q unhealthy && [ "$(docker inspect --format={{.State.StartedAt}} nextcloud-aio-nextcloud | xargs -I{} date -d {} +\%s)" -lt "$(date -d "1 hour ago" +\%s)" ] && docker restart nextcloud-aio-nextcloud && logger -t nextcloud-watchdog "Restarted unhealthy nextcloud-aio-nextcloud"
+```
+
+- Runs every 15 minutes as root
+- Only restarts if the container has been running for >1 hour (avoids interfering with normal startup)
+- Logs to syslog as `nextcloud-watchdog` — check with `journalctl -t nextcloud-watchdog`
+- Netdata will still fire the `docker_nextcloud_unhealthy` alert during the unhealthy window, but the outage is capped at ~1 hour instead of persisting until the next nightly cycle
 
 ## Also: Suppress `docker_container_down` for Normally-Exiting Containers
 
diff --git a/05-troubleshooting/docker/nextcloud-aio-unhealthy-20h-stuck.md b/05-troubleshooting/docker/nextcloud-aio-unhealthy-20h-stuck.md
new file mode 100644
index 0000000..d5f7b0e
--- /dev/null
+++ b/05-troubleshooting/docker/nextcloud-aio-unhealthy-20h-stuck.md
@@ -0,0 +1,82 @@
+---
+title: "Nextcloud AIO Container Unhealthy for 20 Hours After Nightly Update"
+domain: troubleshooting
+category: docker
+tags: [nextcloud, docker, healthcheck, netdata, php-fpm, aio]
+status: published
+created: 2026-03-28
+updated: 2026-03-28
+---
+
+# Nextcloud AIO Container Unhealthy for 20 Hours After Nightly Update
+
+## Symptom
+
+Netdata alert `docker_nextcloud_unhealthy` fired on majorlab and stayed in Warning for 20 hours. The `nextcloud-aio-nextcloud` container was running but its Docker healthcheck kept failing. No user-facing errors were visible in `nextcloud.log`.
+
+## Investigation
+
+### Timeline (2026-03-27, all UTC)
+
+| Time | Event |
+|---|---|
+| 04:00 | Nightly backup script started, mastercontainer update kicked off |
+| 04:03 | `nextcloud-aio-nextcloud` container recreated |
+| 04:05 | Backup finished |
+| 07:25 | Mastercontainer logged "Initial startup of Nextcloud All-in-One complete!" (3h20m delay) |
+| 10:22 | First entry in `nextcloud.log` (deprecation warnings only — no errors) |
+| 04:00 (Mar 28) | Next nightly backup replaced the container; new container came up healthy in ~25 minutes |
+
+### Key findings
+
+- **No image update** — the container image dated to Feb 26, so this was not caused by a version change.
+- **No app-level errors** — `nextcloud.log` contained only `files_rightclick` deprecation warnings (level 3). No level 2/4 entries.
+- **PHP-FPM never stabilized** — the healthcheck (`/healthcheck.sh`) tests `nc -z 127.0.0.1 9000` (PHP-FPM). The container was running but FPM wasn't responding to the port check.
+- **6-hour log gap** — no `nextcloud.log` entries between container start (04:03) and first log (10:22), suggesting the AIO init scripts (occ upgrade, app updates, cron jobs) ran for hours before the app became partially responsive.
+- **RestartCount: 0** — the container never restarted on its own. It sat there unhealthy for the full 20 hours.
+- **Disk space fine** — 40% used on `/`.
+
+### Healthcheck details
+
+```bash
+#!/bin/bash
+# /healthcheck.sh inside nextcloud-aio-nextcloud
+nc -z "$POSTGRES_HOST" "$POSTGRES_PORT" || exit 0  # postgres down = pass (graceful)
+nc -z 127.0.0.1 9000 || exit 1                      # PHP-FPM down = fail
+```
+
+If PostgreSQL is unreachable, the check passes (exits 0). The only failure path is PHP-FPM not listening on port 9000.
+
+## Root Cause
+
+The AIO nightly update cycle recreated the container, but the startup/migration process hung or ran extremely long, preventing PHP-FPM from fully initializing. The container sat in this state for 20 hours with no self-recovery mechanism until the next nightly cycle replaced it.
+
+The exact migration or occ command that stalled could not be confirmed — the old container's entrypoint logs were lost when the Mar 28 backup cycle replaced it.
+
+## Fix
+
+Two changes deployed on 2026-03-28:
+
+### 1. Dedicated Netdata alarm with lenient window
+
+Split `nextcloud-aio-nextcloud` into its own Netdata alarm (`docker_nextcloud_unhealthy`) with a 10-minute lookup and 10-minute delay, separate from the general container alarm. See [Tuning Netdata Docker Health Alarms](../../02-selfhosting/monitoring/netdata-docker-health-alarm-tuning.md).
+
+### 2. Watchdog cron for auto-restart
+
+Deployed `/etc/cron.d/nextcloud-health-watchdog` on majorlab:
+
+```bash
+*/15 * * * * root docker inspect --format={{.State.Health.Status}} nextcloud-aio-nextcloud 2>/dev/null | grep -q unhealthy && [ "$(docker inspect --format={{.State.StartedAt}} nextcloud-aio-nextcloud | xargs -I{} date -d {} +\%s)" -lt "$(date -d "1 hour ago" +\%s)" ] && docker restart nextcloud-aio-nextcloud && logger -t nextcloud-watchdog "Restarted unhealthy nextcloud-aio-nextcloud"
+```
+
+- Checks every 15 minutes
+- Only restarts if the container has been running >1 hour (avoids interfering with normal startup)
+- Logs to syslog: `journalctl -t nextcloud-watchdog`
+
+This caps future unhealthy outages at ~1 hour instead of persisting until the next nightly cycle.
+
+## See Also
+
+- [Tuning Netdata Docker Health Alarms](../../02-selfhosting/monitoring/netdata-docker-health-alarm-tuning.md)
+- [Debugging Broken Docker Containers](../../02-selfhosting/docker/debugging-broken-docker-containers.md)
+- [Docker Healthchecks](../../02-selfhosting/docker/docker-healthchecks.md)
diff --git a/SUMMARY.md b/SUMMARY.md
index 42db083..b8c1628 100644
--- a/SUMMARY.md
+++ b/SUMMARY.md
@@ -44,6 +44,7 @@
     * [Mail Client Stops Receiving: Fail2ban IMAP Self-Ban](05-troubleshooting/networking/fail2ban-imap-self-ban-mail-client.md)
     * [firewalld: Mail Ports Wiped After Reload](05-troubleshooting/networking/firewalld-mail-ports-reset.md)
     * [Tailscale SSH: Unexpected Re-Authentication Prompt](05-troubleshooting/networking/tailscale-ssh-reauth-prompt.md)
+    * [Nextcloud AIO Unhealthy 20h After Nightly Update](05-troubleshooting/docker/nextcloud-aio-unhealthy-20h-stuck.md)
     * [Docker & Caddy Recovery After Reboot (Fedora + SELinux)](05-troubleshooting/docker-caddy-selinux-post-reboot-recovery.md)
     * [ISP SNI Filtering with Caddy](05-troubleshooting/isp-sni-filtering-caddy.md)
     * [Obsidian Vault Recovery — Loading Cache Hang](05-troubleshooting/obsidian-cache-hang-recovery.md)