Skip to content

Unify process termination on POSIX & Windows (+ tests) #1044

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

felixweinberger
Copy link
Contributor

@felixweinberger felixweinberger commented Jun 26, 2025

Motivation and Context

#555, #559 address important but different ways MCP servers can hang or fail to successfully terminate on Windows & POSIX systems.

This PR re-implements #555 by aligning process termination between Windows and Unix to make sure SIGTERM-ignoring processes don't inadvertently cause hangs.

How Has This Been Tested?

New regression tests added for #555 and #559. These were developed and tested on both POSIX and Windows systems.

Breaking Changes

None.

Types of changes

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to change)
  • Documentation update

Checklist

  • I have read the MCP Documentation
  • My code follows the repository's style guidelines
  • New and existing tests pass locally
  • I have added appropriate error handling
  • I have added or updated documentation as needed

Additional context

@felixweinberger felixweinberger force-pushed the fweinberger/stream-cleanup-only-approach branch 2 times, most recently from 9f88ea7 to a27fcef Compare June 26, 2025 21:31
@felixweinberger felixweinberger force-pushed the fweinberger/stream-cleanup-only-approach branch 6 times, most recently from caa5628 to 1d495ca Compare June 30, 2025 14:55
@felixweinberger felixweinberger changed the title Add regression tests for #559 and #555 Add regression tests for #559, #555, #765 Jun 30, 2025
@felixweinberger felixweinberger changed the title Add regression tests for #559, #555, #765 Add regression tests for #555, #559, #765 Jun 30, 2025
@felixweinberger felixweinberger force-pushed the fweinberger/stream-cleanup-only-approach branch from 9e8d74b to 3f6c472 Compare June 30, 2025 15:39
@felixweinberger felixweinberger changed the title Add regression tests for #555, #559, #765 [WIP] Add regression tests for #555, #559, #765, #729, #850 Jul 1, 2025
@felixweinberger felixweinberger force-pushed the fweinberger/stream-cleanup-only-approach branch 15 times, most recently from cbaee2e to e8954dd Compare July 1, 2025 17:45
@felixweinberger felixweinberger changed the title [WIP] Add regression tests for #555, #559, #765, #729, #850 Unify process termination on POSIX & Windows and add comprehensive regression testing Jul 1, 2025
felixweinberger added a commit that referenced this pull request Jul 1, 2025
This test shows that MCP server cleanup code in lifespan doesn't run
when the process is terminated, but does run when stdin is closed first
(as implemented in PR #1044).

The test includes:
- Demonstration of current broken behavior (cleanup doesn't run)
- Verification that stdin closure allows graceful shutdown
- Windows-specific ResourceWarning handling
- Detailed documentation of the issue and solution

Github-Issue:#1027
felixweinberger added a commit that referenced this pull request Jul 1, 2025
This test shows that MCP server cleanup code in lifespan doesn't run
when the process is terminated, but does run when stdin is closed first
(as implemented in PR #1044).

The test includes:
- Demonstration of current broken behavior (cleanup doesn't run)
- Verification that stdin closure allows graceful shutdown
- Windows-specific ResourceWarning handling
- Detailed documentation of the issue and solution

Github-Issue:#1027
felixweinberger added a commit that referenced this pull request Jul 1, 2025
This test shows that MCP server cleanup code in lifespan doesn't run
when the process is terminated, but does run when stdin is closed first
(as implemented in PR #1044).

The test includes:
- Demonstration of current broken behavior (cleanup doesn't run)
- Verification that stdin closure allows graceful shutdown
- Windows-specific ResourceWarning handling
- Detailed documentation of the issue and solution

Github-Issue:#1027
@jingx8885
Copy link

jingx8885 commented Jul 2, 2025

Verification of Windows Process Tree Termination Fix
(Originally encountered in #729 context)

Scenario:
In a Python MCP server (uv-pyproject structure) using stdio mode launched via:
uv run --env-file=xxx mcp_server.py

We observed the following process hierarchy:

Python parent process  
  → Python subprocess (middle layer)  
    → uv.exe  
      → MCP stdio service  

Failure:
Terminating the middle Python subprocess failed to terminate its descendants, leaving:

  1. uv.exe process
  2. MCP stdio service process

...as orphaned/zombie processes on Windows.

Validation:
➡️ This specific failure mode was resolved by the process termination logic **originally implemented in #729 **.
➡️ The equivalent solution now exists in #1044's unified approach.
➡️ This logic has been production-validated on Windows 11 across:

  • Live production systems
  • Daily development/testing cycles
  • High-frequency restart scenarios

Conclusion:
#1044 successfully addresses this Windows-specific process tree termination flaw. The child process cleanup behavior required for this scenario is correctly implemented.

@felixweinberger felixweinberger force-pushed the fweinberger/stream-cleanup-only-approach branch from 50559f7 to 84c4b0f Compare July 3, 2025 11:28
@felixweinberger
Copy link
Contributor Author

felixweinberger commented Jul 3, 2025

After receiving offline feedback from @dsp-ant I'm splitting this PR into 2:

FYI for @jingx8885 @surya-prakash-susarla, it's currently in draft and still needs work, I'll tag you once its ready

@Kludex
Copy link
Member

Kludex commented Jul 4, 2025

@agronholm Should we have to create OSS specific logic for termination when using anyio.open_process?

@agronholm
Copy link

@agronholm Should we have to create OSS specific logic for termination when using anyio.open_process?

I'm not sure I follow. Can you give me more context? I'm very new to this project.

@felixweinberger felixweinberger force-pushed the fweinberger/stream-cleanup-only-approach branch from 84c4b0f to 4d1804a Compare July 4, 2025 12:16
@felixweinberger felixweinberger marked this pull request as draft July 4, 2025 14:39
@felixweinberger felixweinberger force-pushed the fweinberger/stream-cleanup-only-approach branch 11 times, most recently from 7dff9c1 to 0205aa0 Compare July 4, 2025 17:08
The stdio cleanup was hanging indefinitely when processes ignored
termination signals or took too long to exit. This caused the MCP
client to freeze during shutdown, especially with servers that don't
handle SIGTERM properly.

This was already being handled on Windows, but not Unix systems.
This Commit unifies the two approaches, removing special logic
for windows process termination.

The fix introduces a 2-second timeout for process termination. If a
process doesn't exit gracefully within this window, it's forcefully
killed. This ensures the client always completes cleanup in bounded
time while still giving well-behaved servers a chance to exit cleanly.

This resolves hanging issues reported when MCP servers ignore standard
termination signals.

resolves #555

Also adds regression tests for #559.

Co-authored-by: Cristian Pufu <cristian.pufu@uipath.com>
@felixweinberger felixweinberger force-pushed the fweinberger/stream-cleanup-only-approach branch from 0205aa0 to 2be6f09 Compare July 4, 2025 17:13
This re-establishes behavior before #596 in the default case.

- Attempt to use anyio's native open_process function on Windows
- Fall back to subprocess.Popen only if NotImplementedError is raised
- This improves compatibility with event loops that support async subprocesses
- Extract fallback logic into separate function for clarity
@felixweinberger felixweinberger force-pushed the fweinberger/stream-cleanup-only-approach branch from e817dfb to 7af9e65 Compare July 4, 2025 17:39
@felixweinberger
Copy link
Contributor Author

felixweinberger commented Jul 4, 2025

Hi @Kludex, @agronholm thanks for taking a look!

For context, we actually don't spawn windows processes with anyio.open_process on Windows anymore since #596. We now use subprocess.Popen - the reason was that trying to spawn processes with anyio way was causing NotImplementedError on Windows when using the SelectorEventLoop (apparently the default for Python 3.13+ and e.g. in Streamlit, Google AgentKit google/adk-python#1321), which was blocking Windows development of MCP servers in those environments.

However, your comment made me realize that we should probably explicitly guard for that case and still use anyio by default if possible as a first try, so I added a commit on top of this PR to attempt with anyio first and only in the NotImplementedError case use the FallbackProcess wrapper.

If this could somehow be fixed upstream in anyio that would of course be fantastic - though we did get feedback from Microsoft that Python's asyncio is relatively underdeveloped on Windows, which might be a root cause why we ran into this issue in the first place and might make it difficult to address there?

@felixweinberger felixweinberger marked this pull request as ready for review July 4, 2025 17:41
@agronholm
Copy link

Any idea why SelectorEventLoop would be the default there? It stopped being the Python default on Windows on Python 3.8.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants