MajorLinux 545df9f5c6 Add troubleshooting article: Claude Desktop MCP mass-disconnect from blocking SSH reboot

Documents the failure mode where issuing a synchronous `ssh host reboot`
through Claude Desktop's shell MCP poisons the local MCP transport when
the target severs its session before responding cleanly — eventually
force-disconnecting every MCP at once. Covers diagnostic chain, recovery,
fire-and-forget reboot patterns, and worked example from the 2026-05-10
majorhome AMD-card reboot.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

2026-05-10 01:28:11 -04:00

8.8 KiB

Raw Permalink Blame History

title

domain

Claude Desktop MCP Mass-Disconnect After Blocking SSH Reboot

TL;DR — Issuing a synchronous ssh host reboot through Claude Desktop's shell MCP can hang the MCP transport when the target dies mid-session. Eventually the MCP manager force-disconnects every MCP at once. Recovery is a full Claude Desktop restart. Prevention is a fire-and-forget reboot pattern that lets the SSH session close cleanly before the target goes down.

Symptom

You're running Claude Desktop with several MCPs configured (shell, filesystem, mail, etc.), most launched via wsl.exe against your WSL2 distro. You ask Claude to reboot a remote host through the shell MCP — typically something like ssh fleethost reboot or ssh fleethost sudo systemctl reboot. Things appear to succeed. Then, anywhere from immediately to ~30 minutes later:

Every MCP disconnects within tens of milliseconds of each other — not in the order you'd expect from independent failures
Claude Desktop's main panel shows all MCP servers as failed/disconnected
The app itself is still running but cannot reconnect MCPs cleanly until you fully restart it
New chats can't use any MCP tools

The MCP server logs (%APPDATA%\Claude\logs\mcp-server-*.log) end with the standard "Server transport closed unexpectedly, this is likely due to the process exiting early" message — but they end at the same instant for every server.

Why this happens

Claude Desktop launches each MCP server as a stdio child process (commonly wsl.exe npx -y <server> or wsl.exe <binary>). The MCP manager owns the stdio pipes and a transport per server. When you ask Claude to run a synchronous ssh remote reboot via the shell MCP:

The shell MCP calls SSH and waits for the remote process to exit so it can return stdout/stderr to Claude Desktop
The remote reboot (or systemctl reboot) executes on the target — but reboot is special: the target severs its own SSH session as part of going down, often without sending a clean TCP FIN
The local SSH client sits there waiting for a response that never comes
The shell MCP's stdio pipe stays open, blocked on the SSH child
Claude Desktop's MCP manager waits on the shell MCP's stdio pipe
After some watchdog/timeout interval, the manager force-tears-down — and because of how the manager is wired, it tears down all MCP transports together, not just the wedged one

The blast radius is "every MCP in the session," not just the one that issued the reboot.

Diagnostic chain

Use this exact order — it lets you rule out each layer cleanly.

1. Are the disconnect timestamps clustered?

Open %APPDATA%\Claude\logs\mcp.log (or each per-server log) and find the Server transport closed lines for each MCP. Are they within tens or hundreds of milliseconds of each other?

2026-05-10T04:10:17.167Z [shell] Server transport closed unexpectedly
2026-05-10T04:10:17.175Z [mail] Server transport closed unexpectedly
2026-05-10T04:10:17.177Z [majorvault] Server transport closed unexpectedly
2026-05-10T04:10:17.202Z [filesystem] Server transport closed unexpectedly

If yes → a parent killed the children. This is not independent MCP failures.

2. Is there a Crashpad minidump?

dir "$env:APPDATA\Claude\Crashpad\reports"
dir "$env:APPDATA\Claude\Crashpad\pending"

Empty directories (or directories with no files newer than the disconnect time) = Claude Desktop did not crash, it hung. A real crash would have written a minidump.

3. Are the MCP child processes still alive in WSL?

ps -eo pid,etime,cmd | grep -E 'mcp|claude' | grep -v grep

If you see your MCP server processes still running with elapsed times spanning the disconnect (or fresh respawns from auto-recovery attempts), the WSL side is healthy. The damage is on the Claude Desktop ↔ MCP transport, not the MCP servers themselves.

4. What was the shell MCP doing right before the disconnect?

Check %APPDATA%\Claude\logs\main.log for the last mcp__shell__shell_exec permission grants and tool calls, and %APPDATA%\Claude\logs\mcp-server-shell.log for the last commands invoked. If you see an SSH command issued against a host that you also know to be currently rebooting / unreachable, you've found the trigger.

Confirm with a separate health probe of the remote host (do this in WSL or a fresh terminal, not through the wedged Claude Desktop):

ping -c 3 -W 2 <host-or-tailscale-ip>
ssh -o ConnectTimeout=5 -o BatchMode=yes <host> uptime
tailscale status | grep <host>

100% packet loss + missing tailnet entry + SSH timeout = the target is genuinely down or hung mid-reboot.

Recovery

Fully quit Claude Desktop — system tray icon → Quit. Closing the window is not enough; you must terminate the main process so the MCP manager state is cleared.
(Optional) If you want a clean slate in WSL, kill orphaned MCP child processes:
```
pkill -f mcp-shell
pkill -f mail-mcp
pkill -f mcp-majorvault
# ...etc for any other MCP binaries you run
```
This is rarely necessary — fresh spawns will replace them on next launch.
Reopen Claude Desktop. Watch mcp.log and main.log:
```
[LocalMcpServerManager] Connected to shell (1 tools)
[LocalMcpServerManager] Connected to filesystem (14 tools)
[LocalMcpServerManager] Connected to mail (30 tools)
...
```
Tool counts should match your claude_desktop_config.json. The "UtilityProcess Check: Extension X not found in installed extensions" warnings are benign — Claude Desktop just notes that your MCPs aren't bundled built-in extensions (because they're WSL-launched).

Prevention — fire-and-forget reboot patterns

Don't hand the MCP shell a command that intentionally severs its own SSH session and expects the shell to wait for clean closure. Instead, schedule the reboot to happen after SSH disconnects:

Option A — `nohup` + background (most portable)

ssh host 'nohup shutdown -r +1 >/dev/null 2>&1 &'

Schedules a reboot 1 minute out, returns immediately, SSH closes cleanly. The minute delay gives you time to cancel (ssh host 'sudo shutdown -c') if you change your mind.

Option B — bounded keepalive timeout

ssh -o ServerAliveInterval=5 -o ServerAliveCountMax=2 host 'systemctl reboot'

If the remote drops without responding within 10 s of keepalives, the local SSH client hangs up — bounding the worst case to ~10 s instead of "until something kills the MCP." Less elegant than Option A but works for one-shot situations.

Option C — schedule on the box itself

Use a cron @reboot reschedule, a systemd oneshot timer, or at on the box:

ssh host 'echo "systemctl reboot" | at now + 1 minute'

Anti-pattern (don't do this)

# ❌ Synchronous reboot through MCP shell
ssh host reboot
ssh host sudo reboot
ssh host 'shutdown -r now'

These all hold the MCP stdio pipe open waiting for a session that is being severed at the kernel level on the remote side.

Worked example — 2026-05-10 majorhome reboot

Time (EDT)	Event
00:41:06	Claude Desktop emits permission prompt for `mcp__shell__shell_exec`
00:41:08	Shell MCP disconnect+reconnect cycle (transient, recovered in 2 s)
00:41:10	`[LocalMcpServerManager] Connected to shell (1 tools)`
00:41:26	Permission granted — likely the `ssh majorhome reboot` call
00:42:16	`[Result] Turn succeeded` → session marked `running → idle`
00:42	`main.log` goes silent
04:10:17 UTC (00:10:17 EDT prior — note timezone delta in mcp.log vs main.log)	All 5 MCPs disconnect within 35 ms
01:00–01:10	majorhome physically recovers, comes back up clean (`uptime` 19 min, `systemctl is-system-running` = `running`)
01:13:42	After full Claude Desktop restart, all 5 MCPs respawn
01:15:22	All 5 MCPs reconnected, tools registered

majorhome itself was never the problem — the reboot succeeded. The damage was the SSH session that never closed cleanly, which poisoned the local Claude Desktop MCP transport.

8.8 KiB Raw Permalink Blame History Unescape Escape