Documents the failure mode where issuing a synchronous `ssh host reboot` through Claude Desktop's shell MCP poisons the local MCP transport when the target severs its session before responding cleanly — eventually force-disconnecting every MCP at once. Covers diagnostic chain, recovery, fire-and-forget reboot patterns, and worked example from the 2026-05-10 majorhome AMD-card reboot. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
190 lines
8.8 KiB
Markdown
190 lines
8.8 KiB
Markdown
---
|
||
title: "Claude Desktop MCP Mass-Disconnect After Blocking SSH Reboot"
|
||
domain: troubleshooting
|
||
category: troubleshooting
|
||
tags:
|
||
- claude-desktop
|
||
- mcp
|
||
- wsl
|
||
- wsl2
|
||
- ssh
|
||
- reboot
|
||
- troubleshooting
|
||
- hang
|
||
- transport
|
||
status: published
|
||
created: 2026-05-10
|
||
updated: 2026-05-10
|
||
---
|
||
|
||
# Claude Desktop MCP Mass-Disconnect After Blocking SSH Reboot
|
||
|
||
> **TL;DR** — Issuing a synchronous `ssh host reboot` through Claude Desktop's shell MCP can hang the MCP transport when the target dies mid-session. Eventually the MCP manager force-disconnects **every** MCP at once. Recovery is a full Claude Desktop restart. Prevention is a fire-and-forget reboot pattern that lets the SSH session close cleanly before the target goes down.
|
||
|
||
---
|
||
|
||
## Symptom
|
||
|
||
You're running Claude Desktop with several MCPs configured (shell, filesystem, mail, etc.), most launched via `wsl.exe` against your WSL2 distro. You ask Claude to reboot a remote host through the shell MCP — typically something like `ssh fleethost reboot` or `ssh fleethost sudo systemctl reboot`. Things appear to succeed. Then, anywhere from immediately to ~30 minutes later:
|
||
|
||
- **Every MCP disconnects within tens of milliseconds of each other** — not in the order you'd expect from independent failures
|
||
- Claude Desktop's main panel shows all MCP servers as failed/disconnected
|
||
- The app itself is still running but cannot reconnect MCPs cleanly until you fully restart it
|
||
- New chats can't use any MCP tools
|
||
|
||
The MCP server logs (`%APPDATA%\Claude\logs\mcp-server-*.log`) end with the standard *"Server transport closed unexpectedly, this is likely due to the process exiting early"* message — but they end at the **same instant** for every server.
|
||
|
||
---
|
||
|
||
## Why this happens
|
||
|
||
Claude Desktop launches each MCP server as a stdio child process (commonly `wsl.exe npx -y <server>` or `wsl.exe <binary>`). The MCP manager owns the stdio pipes and a transport per server. When you ask Claude to run a synchronous `ssh remote reboot` via the shell MCP:
|
||
|
||
1. The shell MCP calls SSH and waits for the remote process to exit so it can return stdout/stderr to Claude Desktop
|
||
2. The remote `reboot` (or `systemctl reboot`) executes on the target — but reboot is special: the target severs its own SSH session as part of going down, often **without** sending a clean TCP FIN
|
||
3. The local SSH client sits there waiting for a response that never comes
|
||
4. The shell MCP's stdio pipe stays open, blocked on the SSH child
|
||
5. Claude Desktop's MCP manager waits on the shell MCP's stdio pipe
|
||
6. After some watchdog/timeout interval, the manager force-tears-down — and because of how the manager is wired, it tears down **all** MCP transports together, not just the wedged one
|
||
|
||
The blast radius is "every MCP in the session," not just the one that issued the reboot.
|
||
|
||
---
|
||
|
||
## Diagnostic chain
|
||
|
||
Use this exact order — it lets you rule out each layer cleanly.
|
||
|
||
### 1. Are the disconnect timestamps clustered?
|
||
|
||
Open `%APPDATA%\Claude\logs\mcp.log` (or each per-server log) and find the *Server transport closed* lines for each MCP. Are they within tens or hundreds of milliseconds of each other?
|
||
|
||
```
|
||
2026-05-10T04:10:17.167Z [shell] Server transport closed unexpectedly
|
||
2026-05-10T04:10:17.175Z [mail] Server transport closed unexpectedly
|
||
2026-05-10T04:10:17.177Z [majorvault] Server transport closed unexpectedly
|
||
2026-05-10T04:10:17.202Z [filesystem] Server transport closed unexpectedly
|
||
```
|
||
|
||
If yes → a parent killed the children. This is **not** independent MCP failures.
|
||
|
||
### 2. Is there a Crashpad minidump?
|
||
|
||
```powershell
|
||
dir "$env:APPDATA\Claude\Crashpad\reports"
|
||
dir "$env:APPDATA\Claude\Crashpad\pending"
|
||
```
|
||
|
||
Empty directories (or directories with no files newer than the disconnect time) = **Claude Desktop did not crash, it hung**. A real crash would have written a minidump.
|
||
|
||
### 3. Are the MCP child processes still alive in WSL?
|
||
|
||
```bash
|
||
ps -eo pid,etime,cmd | grep -E 'mcp|claude' | grep -v grep
|
||
```
|
||
|
||
If you see your MCP server processes still running with elapsed times spanning the disconnect (or fresh respawns from auto-recovery attempts), the WSL side is healthy. The damage is on the Claude Desktop ↔ MCP transport, not the MCP servers themselves.
|
||
|
||
### 4. What was the shell MCP doing right before the disconnect?
|
||
|
||
Check `%APPDATA%\Claude\logs\main.log` for the last `mcp__shell__shell_exec` permission grants and tool calls, and `%APPDATA%\Claude\logs\mcp-server-shell.log` for the last commands invoked. If you see an SSH command issued against a host that you also know to be currently rebooting / unreachable, you've found the trigger.
|
||
|
||
Confirm with a separate health probe of the remote host (do this in **WSL or a fresh terminal**, not through the wedged Claude Desktop):
|
||
|
||
```bash
|
||
ping -c 3 -W 2 <host-or-tailscale-ip>
|
||
ssh -o ConnectTimeout=5 -o BatchMode=yes <host> uptime
|
||
tailscale status | grep <host>
|
||
```
|
||
|
||
100% packet loss + missing tailnet entry + SSH timeout = the target is genuinely down or hung mid-reboot.
|
||
|
||
---
|
||
|
||
## Recovery
|
||
|
||
1. **Fully quit Claude Desktop** — system tray icon → *Quit*. Closing the window is not enough; you must terminate the main process so the MCP manager state is cleared.
|
||
2. *(Optional)* If you want a clean slate in WSL, kill orphaned MCP child processes:
|
||
```bash
|
||
pkill -f mcp-shell
|
||
pkill -f mail-mcp
|
||
pkill -f mcp-majorvault
|
||
# ...etc for any other MCP binaries you run
|
||
```
|
||
This is rarely necessary — fresh spawns will replace them on next launch.
|
||
3. **Reopen Claude Desktop**. Watch `mcp.log` and `main.log`:
|
||
```
|
||
[LocalMcpServerManager] Connected to shell (1 tools)
|
||
[LocalMcpServerManager] Connected to filesystem (14 tools)
|
||
[LocalMcpServerManager] Connected to mail (30 tools)
|
||
...
|
||
```
|
||
Tool counts should match your `claude_desktop_config.json`. The "UtilityProcess Check: Extension X not found in installed extensions" warnings are benign — Claude Desktop just notes that your MCPs aren't bundled built-in extensions (because they're WSL-launched).
|
||
|
||
---
|
||
|
||
## Prevention — fire-and-forget reboot patterns
|
||
|
||
Don't hand the MCP shell a command that intentionally severs its own SSH session and expects the shell to wait for clean closure. Instead, schedule the reboot to happen **after** SSH disconnects:
|
||
|
||
### Option A — `nohup` + background (most portable)
|
||
|
||
```bash
|
||
ssh host 'nohup shutdown -r +1 >/dev/null 2>&1 &'
|
||
```
|
||
|
||
Schedules a reboot 1 minute out, returns immediately, SSH closes cleanly. The minute delay gives you time to cancel (`ssh host 'sudo shutdown -c'`) if you change your mind.
|
||
|
||
### Option B — bounded keepalive timeout
|
||
|
||
```bash
|
||
ssh -o ServerAliveInterval=5 -o ServerAliveCountMax=2 host 'systemctl reboot'
|
||
```
|
||
|
||
If the remote drops without responding within 10 s of keepalives, the local SSH client hangs up — bounding the worst case to ~10 s instead of "until something kills the MCP." Less elegant than Option A but works for one-shot situations.
|
||
|
||
### Option C — schedule on the box itself
|
||
|
||
Use a cron `@reboot` reschedule, a `systemd` oneshot timer, or `at` on the box:
|
||
|
||
```bash
|
||
ssh host 'echo "systemctl reboot" | at now + 1 minute'
|
||
```
|
||
|
||
### Anti-pattern (don't do this)
|
||
|
||
```bash
|
||
# ❌ Synchronous reboot through MCP shell
|
||
ssh host reboot
|
||
ssh host sudo reboot
|
||
ssh host 'shutdown -r now'
|
||
```
|
||
|
||
These all hold the MCP stdio pipe open waiting for a session that is being severed at the kernel level on the remote side.
|
||
|
||
---
|
||
|
||
## Worked example — 2026-05-10 majorhome reboot
|
||
|
||
| Time (EDT) | Event |
|
||
|---|---|
|
||
| 00:41:06 | Claude Desktop emits permission prompt for `mcp__shell__shell_exec` |
|
||
| 00:41:08 | Shell MCP disconnect+reconnect cycle (transient, recovered in 2 s) |
|
||
| 00:41:10 | `[LocalMcpServerManager] Connected to shell (1 tools)` |
|
||
| 00:41:26 | Permission granted — likely the `ssh majorhome reboot` call |
|
||
| 00:42:16 | `[Result] Turn succeeded` → session marked `running → idle` |
|
||
| 00:42 | `main.log` goes silent |
|
||
| 04:10:17 UTC (00:10:17 EDT *prior* — note timezone delta in mcp.log vs main.log) | All 5 MCPs disconnect within 35 ms |
|
||
| 01:00–01:10 | majorhome physically recovers, comes back up clean (`uptime` 19 min, `systemctl is-system-running` = `running`) |
|
||
| 01:13:42 | After full Claude Desktop restart, all 5 MCPs respawn |
|
||
| 01:15:22 | All 5 MCPs reconnected, tools registered |
|
||
|
||
majorhome itself was never the problem — the reboot succeeded. The damage was the SSH session that never closed cleanly, which poisoned the local Claude Desktop MCP transport.
|
||
|
||
---
|
||
|
||
## See also
|
||
|
||
- [Claude Desktop MCP Server Started via wsl.exe Sees Empty Environment (WSLENV)](wsl-env-claude-desktop-mcp.md) — different failure mode (start-up env passing) on the same Claude Desktop + WSL stack
|
||
- [Pi-hole AI Blocklist Blocks Claude Desktop (ERR_CONNECTION_REFUSED)](networking/pihole-blocks-claude-desktop.md) — another Claude Desktop transport-layer failure
|
||
- [Windows OpenSSH: WSL as Default Shell Breaks Remote Commands](networking/windows-openssh-wsl-default-shell-breaks-remote-commands.md) — related WSL/SSH stdio behavior
|