majorwiki/05-troubleshooting/claude-desktop-mcp-mass-disconnect-blocking-reboot.md
MajorLinux 545df9f5c6 Add troubleshooting article: Claude Desktop MCP mass-disconnect from blocking SSH reboot
Documents the failure mode where issuing a synchronous `ssh host reboot`
through Claude Desktop's shell MCP poisons the local MCP transport when
the target severs its session before responding cleanly — eventually
force-disconnecting every MCP at once. Covers diagnostic chain, recovery,
fire-and-forget reboot patterns, and worked example from the 2026-05-10
majorhome AMD-card reboot.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-10 01:28:11 -04:00

190 lines
8.8 KiB
Markdown
Raw Permalink Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

---
title: "Claude Desktop MCP Mass-Disconnect After Blocking SSH Reboot"
domain: troubleshooting
category: troubleshooting
tags:
- claude-desktop
- mcp
- wsl
- wsl2
- ssh
- reboot
- troubleshooting
- hang
- transport
status: published
created: 2026-05-10
updated: 2026-05-10
---
# Claude Desktop MCP Mass-Disconnect After Blocking SSH Reboot
> **TL;DR** — Issuing a synchronous `ssh host reboot` through Claude Desktop's shell MCP can hang the MCP transport when the target dies mid-session. Eventually the MCP manager force-disconnects **every** MCP at once. Recovery is a full Claude Desktop restart. Prevention is a fire-and-forget reboot pattern that lets the SSH session close cleanly before the target goes down.
---
## Symptom
You're running Claude Desktop with several MCPs configured (shell, filesystem, mail, etc.), most launched via `wsl.exe` against your WSL2 distro. You ask Claude to reboot a remote host through the shell MCP — typically something like `ssh fleethost reboot` or `ssh fleethost sudo systemctl reboot`. Things appear to succeed. Then, anywhere from immediately to ~30 minutes later:
- **Every MCP disconnects within tens of milliseconds of each other** — not in the order you'd expect from independent failures
- Claude Desktop's main panel shows all MCP servers as failed/disconnected
- The app itself is still running but cannot reconnect MCPs cleanly until you fully restart it
- New chats can't use any MCP tools
The MCP server logs (`%APPDATA%\Claude\logs\mcp-server-*.log`) end with the standard *"Server transport closed unexpectedly, this is likely due to the process exiting early"* message — but they end at the **same instant** for every server.
---
## Why this happens
Claude Desktop launches each MCP server as a stdio child process (commonly `wsl.exe npx -y <server>` or `wsl.exe <binary>`). The MCP manager owns the stdio pipes and a transport per server. When you ask Claude to run a synchronous `ssh remote reboot` via the shell MCP:
1. The shell MCP calls SSH and waits for the remote process to exit so it can return stdout/stderr to Claude Desktop
2. The remote `reboot` (or `systemctl reboot`) executes on the target — but reboot is special: the target severs its own SSH session as part of going down, often **without** sending a clean TCP FIN
3. The local SSH client sits there waiting for a response that never comes
4. The shell MCP's stdio pipe stays open, blocked on the SSH child
5. Claude Desktop's MCP manager waits on the shell MCP's stdio pipe
6. After some watchdog/timeout interval, the manager force-tears-down — and because of how the manager is wired, it tears down **all** MCP transports together, not just the wedged one
The blast radius is "every MCP in the session," not just the one that issued the reboot.
---
## Diagnostic chain
Use this exact order — it lets you rule out each layer cleanly.
### 1. Are the disconnect timestamps clustered?
Open `%APPDATA%\Claude\logs\mcp.log` (or each per-server log) and find the *Server transport closed* lines for each MCP. Are they within tens or hundreds of milliseconds of each other?
```
2026-05-10T04:10:17.167Z [shell] Server transport closed unexpectedly
2026-05-10T04:10:17.175Z [mail] Server transport closed unexpectedly
2026-05-10T04:10:17.177Z [majorvault] Server transport closed unexpectedly
2026-05-10T04:10:17.202Z [filesystem] Server transport closed unexpectedly
```
If yes → a parent killed the children. This is **not** independent MCP failures.
### 2. Is there a Crashpad minidump?
```powershell
dir "$env:APPDATA\Claude\Crashpad\reports"
dir "$env:APPDATA\Claude\Crashpad\pending"
```
Empty directories (or directories with no files newer than the disconnect time) = **Claude Desktop did not crash, it hung**. A real crash would have written a minidump.
### 3. Are the MCP child processes still alive in WSL?
```bash
ps -eo pid,etime,cmd | grep -E 'mcp|claude' | grep -v grep
```
If you see your MCP server processes still running with elapsed times spanning the disconnect (or fresh respawns from auto-recovery attempts), the WSL side is healthy. The damage is on the Claude Desktop ↔ MCP transport, not the MCP servers themselves.
### 4. What was the shell MCP doing right before the disconnect?
Check `%APPDATA%\Claude\logs\main.log` for the last `mcp__shell__shell_exec` permission grants and tool calls, and `%APPDATA%\Claude\logs\mcp-server-shell.log` for the last commands invoked. If you see an SSH command issued against a host that you also know to be currently rebooting / unreachable, you've found the trigger.
Confirm with a separate health probe of the remote host (do this in **WSL or a fresh terminal**, not through the wedged Claude Desktop):
```bash
ping -c 3 -W 2 <host-or-tailscale-ip>
ssh -o ConnectTimeout=5 -o BatchMode=yes <host> uptime
tailscale status | grep <host>
```
100% packet loss + missing tailnet entry + SSH timeout = the target is genuinely down or hung mid-reboot.
---
## Recovery
1. **Fully quit Claude Desktop** — system tray icon → *Quit*. Closing the window is not enough; you must terminate the main process so the MCP manager state is cleared.
2. *(Optional)* If you want a clean slate in WSL, kill orphaned MCP child processes:
```bash
pkill -f mcp-shell
pkill -f mail-mcp
pkill -f mcp-majorvault
# ...etc for any other MCP binaries you run
```
This is rarely necessary — fresh spawns will replace them on next launch.
3. **Reopen Claude Desktop**. Watch `mcp.log` and `main.log`:
```
[LocalMcpServerManager] Connected to shell (1 tools)
[LocalMcpServerManager] Connected to filesystem (14 tools)
[LocalMcpServerManager] Connected to mail (30 tools)
...
```
Tool counts should match your `claude_desktop_config.json`. The "UtilityProcess Check: Extension X not found in installed extensions" warnings are benign — Claude Desktop just notes that your MCPs aren't bundled built-in extensions (because they're WSL-launched).
---
## Prevention — fire-and-forget reboot patterns
Don't hand the MCP shell a command that intentionally severs its own SSH session and expects the shell to wait for clean closure. Instead, schedule the reboot to happen **after** SSH disconnects:
### Option A — `nohup` + background (most portable)
```bash
ssh host 'nohup shutdown -r +1 >/dev/null 2>&1 &'
```
Schedules a reboot 1 minute out, returns immediately, SSH closes cleanly. The minute delay gives you time to cancel (`ssh host 'sudo shutdown -c'`) if you change your mind.
### Option B — bounded keepalive timeout
```bash
ssh -o ServerAliveInterval=5 -o ServerAliveCountMax=2 host 'systemctl reboot'
```
If the remote drops without responding within 10 s of keepalives, the local SSH client hangs up — bounding the worst case to ~10 s instead of "until something kills the MCP." Less elegant than Option A but works for one-shot situations.
### Option C — schedule on the box itself
Use a cron `@reboot` reschedule, a `systemd` oneshot timer, or `at` on the box:
```bash
ssh host 'echo "systemctl reboot" | at now + 1 minute'
```
### Anti-pattern (don't do this)
```bash
# ❌ Synchronous reboot through MCP shell
ssh host reboot
ssh host sudo reboot
ssh host 'shutdown -r now'
```
These all hold the MCP stdio pipe open waiting for a session that is being severed at the kernel level on the remote side.
---
## Worked example — 2026-05-10 majorhome reboot
| Time (EDT) | Event |
|---|---|
| 00:41:06 | Claude Desktop emits permission prompt for `mcp__shell__shell_exec` |
| 00:41:08 | Shell MCP disconnect+reconnect cycle (transient, recovered in 2 s) |
| 00:41:10 | `[LocalMcpServerManager] Connected to shell (1 tools)` |
| 00:41:26 | Permission granted — likely the `ssh majorhome reboot` call |
| 00:42:16 | `[Result] Turn succeeded` → session marked `running → idle` |
| 00:42 | `main.log` goes silent |
| 04:10:17 UTC (00:10:17 EDT *prior* — note timezone delta in mcp.log vs main.log) | All 5 MCPs disconnect within 35 ms |
| 01:0001:10 | majorhome physically recovers, comes back up clean (`uptime` 19 min, `systemctl is-system-running` = `running`) |
| 01:13:42 | After full Claude Desktop restart, all 5 MCPs respawn |
| 01:15:22 | All 5 MCPs reconnected, tools registered |
majorhome itself was never the problem — the reboot succeeded. The damage was the SSH session that never closed cleanly, which poisoned the local Claude Desktop MCP transport.
---
## See also
- [Claude Desktop MCP Server Started via wsl.exe Sees Empty Environment (WSLENV)](wsl-env-claude-desktop-mcp.md) — different failure mode (start-up env passing) on the same Claude Desktop + WSL stack
- [Pi-hole AI Blocklist Blocks Claude Desktop (ERR_CONNECTION_REFUSED)](networking/pihole-blocks-claude-desktop.md) — another Claude Desktop transport-layer failure
- [Windows OpenSSH: WSL as Default Shell Breaks Remote Commands](networking/windows-openssh-wsl-default-shell-breaks-remote-commands.md) — related WSL/SSH stdio behavior