Skip to content

Fix process termination in stdio client #496

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 3 commits into
base: main
Choose a base branch
from

Conversation

ashrobertsdragon
Copy link

Github-Issue: #391

  • Added tg.cancel_scope.cancel() in the finally block of stdio_client to ensure that stdin_writer and stdout_reader tasks are properly cancelled
  • This ensures associated transports are closed before invoking process.terminate()
  • Removed the now-unnecessary terminate_windows_process() function

Motivation and Context

This change addresses a race condition specific to Windows' Proactor event loop, where ignored exceptions would occur during shutdown of the stdio client.

The underlying problem was that subprocess transports (stdin/stdout) remained open when process.terminate() was called. On Windows, this could trigger a race during garbage collection and transport teardown, resulting in uncatchable exceptions.

The previous workaround, terminate_windows_process(), attempted to mitigate this by waiting before forcefully terminating the process. However, on Windows, kill() is functionally identical to terminate()(https://docs.python.org/3/library/asyncio-subprocess.html#asyncio.subprocess.Process.kill), so the method provided no real benefit. Explicitly cancelling the stdio tasks ensures a clean and predictable shutdown.

How Has This Been Tested?

Tested on Windows 11 by repeatedly connecting and disconnecting multiple clients over stdio transport to multiple servers found both within the repository and using installed locally.

The ignored exception could not be captured via standard exception handling (try/except, sys.unraisablehook, or pytest's capsys / capfd). As a result, an automated test was not feasible for this specific issue. All testing was conducted through repeated manual runs and verification of clean shutdown behavior.

Breaking Changes

No breaking changes are expected. The removed function, terminate_windows_process(), was internal and not designed for external use. Unless it was being used directly (which would be a misuse), no user code should be affected.

Types of changes

  • Bug fix (non-breaking change which fixes an issue)
  • New feature
  • Breaking change
  • Documentation update

Checklist

  • I have read the MCP Documentation
  • My code follows the repository's style guidelines
  • New and existing tests pass locally
  • I have added appropriate error handling
  • I have added or updated documentation as needed

Additional context

There are two failing tests that predate this change:

  • test_messages_are_executed_concurrently fails with a timing difference of less than 5ms
  • test_desktop fails due to assumptions about POSIX-style file paths

No changes to documentation or error handling were necessary for this fix.

@ashrobertsdragon ashrobertsdragon marked this pull request as draft April 12, 2025 05:20
@ashrobertsdragon ashrobertsdragon marked this pull request as ready for review April 12, 2025 05:21
@ashrobertsdragon ashrobertsdragon marked this pull request as draft April 12, 2025 05:21
@ashrobertsdragon ashrobertsdragon marked this pull request as ready for review April 12, 2025 05:26
@ashrobertsdragon ashrobertsdragon marked this pull request as draft April 14, 2025 04:35
@ashrobertsdragon ashrobertsdragon marked this pull request as ready for review April 14, 2025 04:46
@ashrobertsdragon
Copy link
Author

@jerome3o-anthropic @jspahrsummers This is my first pull request. I know am supposed to add the two of you as reviewers, but I haven't been able to figure out how to do that. I never got the gear icon that is shown in online tutorials

else:
process.terminate()
tg.cancel_scope.cancel()
process.terminate()

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This won't work if the process doesn't react to SIGTERM. We need a timeout and a fallback on SIGKILL.
Please test it with https://github.com/modelcontextprotocol/servers/blob/main/src/everything/index.ts#L13 (windows & unix)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ref: #555

@ihrpr ihrpr added this to the r-05-25 milestone Apr 29, 2025
…) and infinite hang (modelcontextprotocol#552) by recursively terminating all spawned child processes before primary subprocess using psutil
@ashrobertsdragon
Copy link
Author

I've found a solution that works without breaking the termination of unresponsive servers. I have added psutil as a dependency and am using it to kill the child processes attached to the main subprocess before process.terminate() is called. This should negate the need for terminate_windows_process() again, but this time I kept it as a fallback. But in my tests against all of the servers in the MCP servers repo and a dozen others installed on my system on both Windows 11 and Ubuntu 22.04.5 via WSL, using Python versions 3.10-3.13, I was able to terminate all of the servers, even non-responsive ones, cleanly even when terminate_windows_process() was removed.

By terminating the child processes first, we eliminate the race condition between the Windows kernel and the Python garbage collector when process.terminate() was closing those child processes.

I also wrapped the process termination in the suppression of the ProcessLookupError, because one failing MCP server would find that exception being raise while terminating the process that was never fully realized. This behavior was observed in both Windows and Linux, but is now fixed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants