-
Notifications
You must be signed in to change notification settings - Fork 1.4k
Fix process termination in stdio client #496
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Fix process termination in stdio client #496
Conversation
@jerome3o-anthropic @jspahrsummers This is my first pull request. I know am supposed to add the two of you as reviewers, but I haven't been able to figure out how to do that. I never got the gear icon that is shown in online tutorials |
src/mcp/client/stdio/__init__.py
Outdated
else: | ||
process.terminate() | ||
tg.cancel_scope.cancel() | ||
process.terminate() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This won't work if the process doesn't react to SIGTERM. We need a timeout and a fallback on SIGKILL.
Please test it with https://github.com/modelcontextprotocol/servers/blob/main/src/everything/index.ts#L13 (windows & unix)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ref: #555
…) and infinite hang (modelcontextprotocol#552) by recursively terminating all spawned child processes before primary subprocess using psutil
I've found a solution that works without breaking the termination of unresponsive servers. I have added psutil as a dependency and am using it to kill the child processes attached to the main subprocess before process.terminate() is called. This should negate the need for terminate_windows_process() again, but this time I kept it as a fallback. But in my tests against all of the servers in the MCP servers repo and a dozen others installed on my system on both Windows 11 and Ubuntu 22.04.5 via WSL, using Python versions 3.10-3.13, I was able to terminate all of the servers, even non-responsive ones, cleanly even when terminate_windows_process() was removed. By terminating the child processes first, we eliminate the race condition between the Windows kernel and the Python garbage collector when process.terminate() was closing those child processes. I also wrapped the process termination in the suppression of the ProcessLookupError, because one failing MCP server would find that exception being raise while terminating the process that was never fully realized. This behavior was observed in both Windows and Linux, but is now fixed. |
Github-Issue: #391
tg.cancel_scope.cancel()
in thefinally
block ofstdio_client
to ensure thatstdin_writer
andstdout_reader
tasks are properly cancelledprocess.terminate()
terminate_windows_process()
functionMotivation and Context
This change addresses a race condition specific to Windows' Proactor event loop, where ignored exceptions would occur during shutdown of the stdio client.
The underlying problem was that subprocess transports (stdin/stdout) remained open when
process.terminate()
was called. On Windows, this could trigger a race during garbage collection and transport teardown, resulting in uncatchable exceptions.The previous workaround,
terminate_windows_process()
, attempted to mitigate this by waiting before forcefully terminating the process. However, on Windows,kill()
is functionally identical toterminate()
(https://docs.python.org/3/library/asyncio-subprocess.html#asyncio.subprocess.Process.kill), so the method provided no real benefit. Explicitly cancelling the stdio tasks ensures a clean and predictable shutdown.How Has This Been Tested?
Tested on Windows 11 by repeatedly connecting and disconnecting multiple clients over stdio transport to multiple servers found both within the repository and using installed locally.
The ignored exception could not be captured via standard exception handling (
try/except
,sys.unraisablehook
, or pytest'scapsys
/capfd
). As a result, an automated test was not feasible for this specific issue. All testing was conducted through repeated manual runs and verification of clean shutdown behavior.Breaking Changes
No breaking changes are expected. The removed function,
terminate_windows_process()
, was internal and not designed for external use. Unless it was being used directly (which would be a misuse), no user code should be affected.Types of changes
Checklist
Additional context
There are two failing tests that predate this change:
test_messages_are_executed_concurrently
fails with a timing difference of less than 5mstest_desktop
fails due to assumptions about POSIX-style file pathsNo changes to documentation or error handling were necessary for this fix.