diff --git a/05-troubleshooting/claude-desktop-mcp-mass-disconnect-blocking-reboot.md b/05-troubleshooting/claude-desktop-mcp-mass-disconnect-blocking-reboot.md new file mode 100644 index 0000000..9d75b7d --- /dev/null +++ b/05-troubleshooting/claude-desktop-mcp-mass-disconnect-blocking-reboot.md @@ -0,0 +1,190 @@ +--- +title: "Claude Desktop MCP Mass-Disconnect After Blocking SSH Reboot" +domain: troubleshooting +category: troubleshooting +tags: + - claude-desktop + - mcp + - wsl + - wsl2 + - ssh + - reboot + - troubleshooting + - hang + - transport +status: published +created: 2026-05-10 +updated: 2026-05-10 +--- + +# Claude Desktop MCP Mass-Disconnect After Blocking SSH Reboot + +> **TL;DR** — Issuing a synchronous `ssh host reboot` through Claude Desktop's shell MCP can hang the MCP transport when the target dies mid-session. Eventually the MCP manager force-disconnects **every** MCP at once. Recovery is a full Claude Desktop restart. Prevention is a fire-and-forget reboot pattern that lets the SSH session close cleanly before the target goes down. + +--- + +## Symptom + +You're running Claude Desktop with several MCPs configured (shell, filesystem, mail, etc.), most launched via `wsl.exe` against your WSL2 distro. You ask Claude to reboot a remote host through the shell MCP — typically something like `ssh fleethost reboot` or `ssh fleethost sudo systemctl reboot`. Things appear to succeed. Then, anywhere from immediately to ~30 minutes later: + +- **Every MCP disconnects within tens of milliseconds of each other** — not in the order you'd expect from independent failures +- Claude Desktop's main panel shows all MCP servers as failed/disconnected +- The app itself is still running but cannot reconnect MCPs cleanly until you fully restart it +- New chats can't use any MCP tools + +The MCP server logs (`%APPDATA%\Claude\logs\mcp-server-*.log`) end with the standard *"Server transport closed unexpectedly, this is likely due to the process exiting early"* message — but they end at the **same instant** for every server. + +--- + +## Why this happens + +Claude Desktop launches each MCP server as a stdio child process (commonly `wsl.exe npx -y ` or `wsl.exe `). The MCP manager owns the stdio pipes and a transport per server. When you ask Claude to run a synchronous `ssh remote reboot` via the shell MCP: + +1. The shell MCP calls SSH and waits for the remote process to exit so it can return stdout/stderr to Claude Desktop +2. The remote `reboot` (or `systemctl reboot`) executes on the target — but reboot is special: the target severs its own SSH session as part of going down, often **without** sending a clean TCP FIN +3. The local SSH client sits there waiting for a response that never comes +4. The shell MCP's stdio pipe stays open, blocked on the SSH child +5. Claude Desktop's MCP manager waits on the shell MCP's stdio pipe +6. After some watchdog/timeout interval, the manager force-tears-down — and because of how the manager is wired, it tears down **all** MCP transports together, not just the wedged one + +The blast radius is "every MCP in the session," not just the one that issued the reboot. + +--- + +## Diagnostic chain + +Use this exact order — it lets you rule out each layer cleanly. + +### 1. Are the disconnect timestamps clustered? + +Open `%APPDATA%\Claude\logs\mcp.log` (or each per-server log) and find the *Server transport closed* lines for each MCP. Are they within tens or hundreds of milliseconds of each other? + +``` +2026-05-10T04:10:17.167Z [shell] Server transport closed unexpectedly +2026-05-10T04:10:17.175Z [mail] Server transport closed unexpectedly +2026-05-10T04:10:17.177Z [majorvault] Server transport closed unexpectedly +2026-05-10T04:10:17.202Z [filesystem] Server transport closed unexpectedly +``` + +If yes → a parent killed the children. This is **not** independent MCP failures. + +### 2. Is there a Crashpad minidump? + +```powershell +dir "$env:APPDATA\Claude\Crashpad\reports" +dir "$env:APPDATA\Claude\Crashpad\pending" +``` + +Empty directories (or directories with no files newer than the disconnect time) = **Claude Desktop did not crash, it hung**. A real crash would have written a minidump. + +### 3. Are the MCP child processes still alive in WSL? + +```bash +ps -eo pid,etime,cmd | grep -E 'mcp|claude' | grep -v grep +``` + +If you see your MCP server processes still running with elapsed times spanning the disconnect (or fresh respawns from auto-recovery attempts), the WSL side is healthy. The damage is on the Claude Desktop ↔ MCP transport, not the MCP servers themselves. + +### 4. What was the shell MCP doing right before the disconnect? + +Check `%APPDATA%\Claude\logs\main.log` for the last `mcp__shell__shell_exec` permission grants and tool calls, and `%APPDATA%\Claude\logs\mcp-server-shell.log` for the last commands invoked. If you see an SSH command issued against a host that you also know to be currently rebooting / unreachable, you've found the trigger. + +Confirm with a separate health probe of the remote host (do this in **WSL or a fresh terminal**, not through the wedged Claude Desktop): + +```bash +ping -c 3 -W 2 +ssh -o ConnectTimeout=5 -o BatchMode=yes uptime +tailscale status | grep +``` + +100% packet loss + missing tailnet entry + SSH timeout = the target is genuinely down or hung mid-reboot. + +--- + +## Recovery + +1. **Fully quit Claude Desktop** — system tray icon → *Quit*. Closing the window is not enough; you must terminate the main process so the MCP manager state is cleared. +2. *(Optional)* If you want a clean slate in WSL, kill orphaned MCP child processes: + ```bash + pkill -f mcp-shell + pkill -f mail-mcp + pkill -f mcp-majorvault + # ...etc for any other MCP binaries you run + ``` + This is rarely necessary — fresh spawns will replace them on next launch. +3. **Reopen Claude Desktop**. Watch `mcp.log` and `main.log`: + ``` + [LocalMcpServerManager] Connected to shell (1 tools) + [LocalMcpServerManager] Connected to filesystem (14 tools) + [LocalMcpServerManager] Connected to mail (30 tools) + ... + ``` + Tool counts should match your `claude_desktop_config.json`. The "UtilityProcess Check: Extension X not found in installed extensions" warnings are benign — Claude Desktop just notes that your MCPs aren't bundled built-in extensions (because they're WSL-launched). + +--- + +## Prevention — fire-and-forget reboot patterns + +Don't hand the MCP shell a command that intentionally severs its own SSH session and expects the shell to wait for clean closure. Instead, schedule the reboot to happen **after** SSH disconnects: + +### Option A — `nohup` + background (most portable) + +```bash +ssh host 'nohup shutdown -r +1 >/dev/null 2>&1 &' +``` + +Schedules a reboot 1 minute out, returns immediately, SSH closes cleanly. The minute delay gives you time to cancel (`ssh host 'sudo shutdown -c'`) if you change your mind. + +### Option B — bounded keepalive timeout + +```bash +ssh -o ServerAliveInterval=5 -o ServerAliveCountMax=2 host 'systemctl reboot' +``` + +If the remote drops without responding within 10 s of keepalives, the local SSH client hangs up — bounding the worst case to ~10 s instead of "until something kills the MCP." Less elegant than Option A but works for one-shot situations. + +### Option C — schedule on the box itself + +Use a cron `@reboot` reschedule, a `systemd` oneshot timer, or `at` on the box: + +```bash +ssh host 'echo "systemctl reboot" | at now + 1 minute' +``` + +### Anti-pattern (don't do this) + +```bash +# ❌ Synchronous reboot through MCP shell +ssh host reboot +ssh host sudo reboot +ssh host 'shutdown -r now' +``` + +These all hold the MCP stdio pipe open waiting for a session that is being severed at the kernel level on the remote side. + +--- + +## Worked example — 2026-05-10 majorhome reboot + +| Time (EDT) | Event | +|---|---| +| 00:41:06 | Claude Desktop emits permission prompt for `mcp__shell__shell_exec` | +| 00:41:08 | Shell MCP disconnect+reconnect cycle (transient, recovered in 2 s) | +| 00:41:10 | `[LocalMcpServerManager] Connected to shell (1 tools)` | +| 00:41:26 | Permission granted — likely the `ssh majorhome reboot` call | +| 00:42:16 | `[Result] Turn succeeded` → session marked `running → idle` | +| 00:42 | `main.log` goes silent | +| 04:10:17 UTC (00:10:17 EDT *prior* — note timezone delta in mcp.log vs main.log) | All 5 MCPs disconnect within 35 ms | +| 01:00–01:10 | majorhome physically recovers, comes back up clean (`uptime` 19 min, `systemctl is-system-running` = `running`) | +| 01:13:42 | After full Claude Desktop restart, all 5 MCPs respawn | +| 01:15:22 | All 5 MCPs reconnected, tools registered | + +majorhome itself was never the problem — the reboot succeeded. The damage was the SSH session that never closed cleanly, which poisoned the local Claude Desktop MCP transport. + +--- + +## See also + +- [Claude Desktop MCP Server Started via wsl.exe Sees Empty Environment (WSLENV)](wsl-env-claude-desktop-mcp.md) — different failure mode (start-up env passing) on the same Claude Desktop + WSL stack +- [Pi-hole AI Blocklist Blocks Claude Desktop (ERR_CONNECTION_REFUSED)](networking/pihole-blocks-claude-desktop.md) — another Claude Desktop transport-layer failure +- [Windows OpenSSH: WSL as Default Shell Breaks Remote Commands](networking/windows-openssh-wsl-default-shell-breaks-remote-commands.md) — related WSL/SSH stdio behavior diff --git a/SUMMARY.md b/SUMMARY.md index 80881c9..d83c6dd 100644 --- a/SUMMARY.md +++ b/SUMMARY.md @@ -96,6 +96,7 @@ updated: 2026-05-10T00:10 * [Windows OpenSSH: WSL Default Shell Breaks Remote Commands](05-troubleshooting/networking/windows-openssh-wsl-default-shell-breaks-remote-commands.md) * [Pi-hole AI Blocklist Blocks Claude Desktop (ERR_CONNECTION_REFUSED)](05-troubleshooting/networking/pihole-blocks-claude-desktop.md) * [Claude Desktop MCP Server Started via wsl.exe Sees Empty Environment (WSLENV)](05-troubleshooting/wsl-env-claude-desktop-mcp.md) + * [Claude Desktop MCP Mass-Disconnect After Blocking SSH Reboot](05-troubleshooting/claude-desktop-mcp-mass-disconnect-blocking-reboot.md) * [Ollama Drops Off Tailscale When Mac Sleeps](05-troubleshooting/ollama-macos-sleep-tailscale-disconnect.md) * [Ollama: `ollama run` with Piped Stdin Bypasses Chat Template + SYSTEM Prompt](05-troubleshooting/ollama-chat-template-pipe-stdin-bypass.md) * [rsync over Tailscale: Hung in TCP Teardown After Transfer Completes](05-troubleshooting/networking/rsync-tailscale-teardown-stall.md) diff --git a/index.md b/index.md index d10b82a..7c37d11 100644 --- a/index.md +++ b/index.md @@ -1,13 +1,13 @@ --- created: 2026-04-06T09:52 -updated: 2026-05-10T00:10 +updated: 2026-05-10T01:30 --- # MajorLinux Tech Wiki — Index > A growing reference of Linux, self-hosting, open source, streaming, and troubleshooting guides. Written by MajorLinux. Used by MajorTwin. > > **Last updated:** 2026-05-10 -> **Article count:** 109 +> **Article count:** 110 ## Domains @@ -17,7 +17,7 @@ updated: 2026-05-10T00:10 | 🏠 Self-Hosting & Homelab | `02-selfhosting/` | 39 | | 🔓 Open Source Tools | `03-opensource/` | 10 | | 🎙️ Streaming & Podcasting | `04-streaming/` | 2 | -| 🔧 General Troubleshooting | `05-troubleshooting/` | 46 | +| 🔧 General Troubleshooting | `05-troubleshooting/` | 47 | --- @@ -217,6 +217,7 @@ updated: 2026-05-10T00:10 | Date | Article | Domain | |---|---|---| +| 2026-05-10 | [Claude Desktop MCP Mass-Disconnect After Blocking SSH Reboot](05-troubleshooting/claude-desktop-mcp-mass-disconnect-blocking-reboot.md) | Troubleshooting | | 2026-05-10 | [Castopod Posts Don't Appear on Mastodon — Diagnosing the Federation Path](05-troubleshooting/security/castopod-broadcast-not-on-mastodon.md) | Troubleshooting | | 2026-05-08 | [Castopod: Stale Federated Avatar URLs After Remote Profile Updates](05-troubleshooting/security/castopod-stale-federated-avatar.md) | Troubleshooting | | 2026-05-08 | [Tuning Netdata `web_log_1m_successful` for Redirect-Heavy WordPress Sites](05-troubleshooting/security/netdata-web-log-successful-redirect-heavy-tuning.md) | Troubleshooting |