-
-
Notifications
You must be signed in to change notification settings - Fork 31.8k
Add terminate_workers
to ProcessPoolExecutor
#128041
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Provides a way to forcefully stop all the workers in the pool Typically this would be used as a last effort to stop all workers if unable to shutdown / join in the expected way
…essPoolExecutor (GH-128043) This adds two new methods to `multiprocessing`'s `ProcessPoolExecutor`: - **`terminate_workers()`**: forcefully terminates worker processes using `Process.terminate()` - **`kill_workers()`**: forcefully kills worker processes using `Process.kill()` These methods provide users with a direct way to stop worker processes without `shutdown()` or relying on implementation details, addressing situations where immediate termination is needed. Co-authored-by: Bénédikt Tran <10796600+picnixz@users.noreply.github.com> Co-authored-by: blurb-it[bot] <43283697+blurb-it[bot]@users.noreply.github.com> Commit-message-mostly-authored-by: Claude Sonnet 3.7 (because why not -greg)
thanks for the feature contribution! |
Re-opening because we have some build bots failing (I'm not sure whether the macOS failure is related but the other seems so) https://github.com/python/cpython/actions/runs/13631446353/job/38099943353#step:11:686 |
Opened #130812 to try to fix the transiency. I can't seem to add the skip-news label, but it should get that. |
…ethods to ProcessPoolExecutor (pythonGH-128043)" The test_concurrent_futures.test_process_pool test is failing in CI. This reverts commit f97e409.
Thanks! @colesbury. I'll dig more to figure out the issue before opening a fresh pr. |
Current issue:
Interestingly enough it was suggested that commenting out
makes the test pass. It does.. most of the time. It still transiently fails the same way with a leaked child process (though much rarer). So I guess the timing changed caused by that line make it more likely to happen, but we're still dealing with a race condition. |
I think I figured it out. Will open a PR in a few minutes. |
…o ProcessPoolExecutor Add some fixes to tests to make them no longer transient
Opened a new PR: #130849. I believe the transient issues are resolved. |
FWIW on the original PR while I suspected we might have flakiness issues... I was way underestimating that apparently. I should've does the buildbot tests before merging. sorry for the churn! :) |
…essPoolExecutor (GH-130849) This adds two new methods to `multiprocessing`'s `ProcessPoolExecutor`: - **`terminate_workers()`**: forcefully terminates worker processes using `Process.terminate()` - **`kill_workers()`**: forcefully kills worker processes using `Process.kill()` These methods provide users with a direct way to stop worker processes without `shutdown()` or relying on implementation details, addressing situations where immediate termination is needed. Co-authored-by: Bénédikt Tran <10796600+picnixz@users.noreply.github.com> Co-authored-by: blurb-it[bot] <43283697+blurb-it[bot]@users.noreply.github.com> Co-authored-by: Sam Gross @colesbury Commit-message-mostly-authored-by: Claude Sonnet 3.7 (because why not -greg)
merged again with the improvements. 🤞🏾 |
Tiny related fix: #130900 (docs only) |
This bug is caused by race conditions in the poll implementations (which are called by join/wait) where if multiple threads try to reap the dead process only one "wins" and gets the exit code, while the others get an error. In the forkserver implementation the losing thread(s) set the code to an error, possibly overwriting the correct code set by the winning thread. This is relatively easy to fix: we can just take a lock before waiting for the process, since at that point we know the call should not block. In the fork and spawn implementations the losers of the race return before the exit code is set, meaning the process may still report itself as alive after join returns. Fixing this is trickier as we have to support a mixture of blocking and non-blocking calls to poll, and we cannot have the latter waiting to take a lock held by the former. The approach taken is to split the blocking and non-blocking call variants. The non-blocking variant does its work with the lock held: since it won't block this should be safe. The blocking variant releases the lock before making the blocking operating system call. It then retakes the lock and either sets the code if it wins or waits for a potentially racing thread to do so otherwise. If a non-blocking call is racing with the unlocked part of a blocking call it may still "lose" the race, and return None instead of the exit code, even though the process is dead. However, as the process could be alive at the time the call is made but die immediately afterwards, this situation should already be handled by correctly written code. To verify the behaviour a test is added which reliably triggers failures for all three implementations. A work-around for this bug in a test added for pythongh-128041 is also reverted.
This bug is caused by race conditions in the poll implementations (which are called by join/wait) where if multiple threads try to reap the dead process only one "wins" and gets the exit code, while the others get an error. In the forkserver implementation the losing thread(s) set the code to an error, possibly overwriting the correct code set by the winning thread. This is relatively easy to fix: we can just take a lock before waiting for the process, since at that point we know the call should not block. In the fork and spawn implementations the losers of the race return before the exit code is set, meaning the process may still report itself as alive after join returns. Fixing this is trickier as we have to support a mixture of blocking and non-blocking calls to poll, and we cannot have the latter waiting to take a lock held by the former. The approach taken is to split the blocking and non-blocking call variants. The non-blocking variant does its work with the lock held: since it won't block this should be safe. The blocking variant releases the lock before making the blocking operating system call. It then retakes the lock and either sets the code if it wins or waits for a potentially racing thread to do so otherwise. If a non-blocking call is racing with the unlocked part of a blocking call it may still "lose" the race, and return None instead of the exit code, even though the process is dead. However, as the process could be alive at the time the call is made but die immediately afterwards, this situation should already be handled by correctly written code. To verify the behaviour a test is added which reliably triggers failures for all three implementations. A work-around for this bug in a test added for pythongh-128041 is also reverted.
…o ProcessPoolExecutor (pythonGH-128043) This adds two new methods to `multiprocessing`'s `ProcessPoolExecutor`: - **`terminate_workers()`**: forcefully terminates worker processes using `Process.terminate()` - **`kill_workers()`**: forcefully kills worker processes using `Process.kill()` These methods provide users with a direct way to stop worker processes without `shutdown()` or relying on implementation details, addressing situations where immediate termination is needed. Co-authored-by: Bénédikt Tran <10796600+picnixz@users.noreply.github.com> Co-authored-by: blurb-it[bot] <43283697+blurb-it[bot]@users.noreply.github.com> Commit-message-mostly-authored-by: Claude Sonnet 3.7 (because why not -greg)
…ethods to ProcessPoolExecutor (pythonGH-128043)" (python#130838) The test_concurrent_futures.test_process_pool test is failing in CI. This reverts commit f97e409.
…o ProcessPoolExecutor (pythonGH-130849) This adds two new methods to `multiprocessing`'s `ProcessPoolExecutor`: - **`terminate_workers()`**: forcefully terminates worker processes using `Process.terminate()` - **`kill_workers()`**: forcefully kills worker processes using `Process.kill()` These methods provide users with a direct way to stop worker processes without `shutdown()` or relying on implementation details, addressing situations where immediate termination is needed. Co-authored-by: Bénédikt Tran <10796600+picnixz@users.noreply.github.com> Co-authored-by: blurb-it[bot] <43283697+blurb-it[bot]@users.noreply.github.com> Co-authored-by: Sam Gross @colesbury Commit-message-mostly-authored-by: Claude Sonnet 3.7 (because why not -greg)
Feature or enhancement
Proposal:
This is an interpretation of the feature ask in: https://discuss.python.org/t/cancel-running-work-in-processpoolexecutor/58605/1. It would be a way to stop all the workers running in a
ProcessPoolExecutor
Previously the way to do this was to use to loop through the
._processes
of theProcessPoolExecutor
though, preferably this should be possible without accessing implementation details.Has this already been discussed elsewhere?
I have already discussed this feature proposal on Discourse
Links to previous discussion of this feature:
https://discuss.python.org/t/cancel-running-work-in-processpoolexecutor/58605/1
Linked PRs
terminate_workers
andkill_workers
methods to ProcessPoolExecutor (GH-128043)" #130838terminate_workers
andkill_workers
methods to ProcessPoolExecutor #130849The text was updated successfully, but these errors were encountered: