-
-
Notifications
You must be signed in to change notification settings - Fork 32.1k
PyErr_CheckSignals should be callable without holding the GIL #133465
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Please see https://research.owlfolio.org/pubs/2025-pyext-ctrlc-talk/ for a "decompressed" rationale for this change, with analysis of the status quo, performance measurements, and discussion. |
Discussion in #133466 wound up requesting that design issues be forwarded to the C-API working group; that thread is linked above. I have also posted https://discuss.python.org/t/ergonomics-of-signal-checks-with-detached-thread-state/94355 soliciting feedback from extension developers. |
Add new C-API functions `PyErr_CheckSignalsDetached` and `PyErr_AreSignalsPending`. `PyErr_CheckSignalsDetached` can *only* be called by threads that *don’t* have an attached thread state. It does the same thing as `PyErr_CheckSignals`, except that it guarantees it will only reattach the supplied thread state if necessary in order to run signal handlers. (Also, it never runs the cycle collector.) `PyErr_AreSignalsPending` can be called with or without an attached thread state. It reports to its caller whether signals are pending, but never runs any handlers itself. Rationale: Compiled-code modules that implement time-consuming operations that don’t require manipulating Python objects, are supposed to call PyErr_CheckSignals frequently throughout each such operation, so that if the user interrupts the operation with control-C, it is canceled promptly. In the normal case where no signals are pending, PyErr_CheckSignals is cheap (two atomic memory operations); however, callers must have an attached thread state, and compiled-code modules that implement time-consuming operations are also supposed to detach their thread state during each such operation. The overhead of re-attaching a thread state in order to call PyErr_CheckSignals, and then releasing it again, sufficiently often for reasonable user responsiveness, can be substantial, particularly in traditional (non-free-threaded) builds. These new functions permit compiled-code modules to avoid that extra overhead.
…n the stdlib. This patch demonstrates how `PyErr_CheckSignalsDetached` (python#135358; see also python#133465 and capi-workgroup/decisions#68) might be used for some (small) efficiency improvements in the stdlib. Almost all existing uses of `PyErr_CheckSignals` in the stdlib appear in loops with this general structure: ```c int res = 0; int async_err = 0; do { Py_BEGIN_ALLOW_THREADS res = system_call(arg1, arg2); Py_END_ALLOW_THREADS } while (res < 0 && errno == EINTR && !(async_err = PyErr_CheckSignals())); if (res < 0) { return !async_err ? PyErr_SetFromErrno(PyExc_OSError) : NULL; } Py_RETURN_NONE; ``` With the addition of `PyErr_CheckSignalsDetached`, this can become ```c int res = 0; int async_err = 0; PyThreadState *tstate = PyEval_SaveThread(); do { res = system_call(arg1, arg2); } while (res < 0 && errno == EINTR && !(async_err = PyErr_CheckSignalsDetached(tstate))); PyEval_RestoreThread(tstate); if (res < 0) { return !async_err ? PyErr_SetFromErrno(PyExc_OSError) : NULL; } Py_RETURN_NONE; ``` There were also a small number of places where ``` Py_BLOCK_THREADS sig = PyErr_CheckSignals(); Py_UNBLOCK_THREADS ``` could become just ``` sig = PyErr_CheckSignalsDetached(tstate); ``` I chose to use explicit `PyErr_{Save,Restore}Thread` throughout this patch, in order to give the thread-state parameter to `PyErr_CheckSignalsDetached` a visible name in the calling code. This is possibly an argument for adding a wrapper macro (`PyErr_CHECK_SIGNALS_DETACHED()`, perhaps) that is part of the `Py_BEGIN_ALLOW_THREADS` family, and is therefore entitled to use the hidden `_state` variable. Or it could be an argument for adding a new construct like ``` Py_WITH_DETACHED_THREAD_STATE (tstate) { do { res = system_call(arg1, arg2); } while (res < 0 && errno == EINTR && !(async_err = PyErr_CheckSignalsDetached(tstate))); } ``` (This could be implemented in plain C by off-label use of a for loop. I’m not sure if that’s a good idea or not. In particular, `break` and `continue` at the top level of a `Py_WITH_DETACHED_THREAD_STATE` block would have surprising effects.) The patch only tackles the most straightforward conversions. Places in the stdlib that could probably benefit from a conversion, but that I did not touch, are: * `signal.sigtimedwait`, `select.select`, `select.poll`, `select.devpoll`, `select.epoll`, `select.kevent`, `time.sleep`, `_multiprocessing.semaphore`, and the ssl and socket modules. All of these places need to do arithmetic on some combination of `struct timespec`, `struct timeval`, and `Py_time_t` in their loops, and the APIs for doing so are not currently safe to use without an attached thread state (in particular, they may set the error indicator). * The ssl and socket modules also do a bunch of other work in the same loops that detach the thread state and I’m not familiar enough with those APIs to know if it’s safe to expand the regions with detached thread state. This is unfortunate as I think these are the loops with the biggest chance of _substantial_ efficiency gains from enlarging those regions. * `time.sleep` is also a horrible maze of `#ifdef`s and I think I see a race bug in the Windows-specific code (filed separately as python#135407). * The tkinter module goes back and forth between holding a Python thread state and holding a _Tcl_ thread state and I didn’t want to mess with that. There are also many places where no conversion is necessary: * `signal.pause`, `signal.raise`, `signal.signal`, `signal.pthread_sigmask`, `signal.pthread_kill`, `os.kill`, `os._getdiskusage`, `PyObject_Print`, `PyObject_Repr`, `PyObject_Str`, `PyErr_SetFromErrnoWithFilenameObjects`, `builtin_input_impl`: These do not contain a loop, so there’s no reason to mess with them. * The `_sre` module, `longobject.c`, `faulthandler.c`, `traceback.c`: These don’t detach the thread state at all, so again there’s no reason to mess with them. As a final note, it would probably be a good idea to make it a documented guarantee that `PyEval_{Save,Restore}Thread` and `PyErr_CheckSignals{,Detached}` preserve the value of `errno` and all the Windows-specific errno analogues. A lot of this code is already assuming that this is the case, and the code that isn’t making that assumption is substantially clunkier for it.
Uh oh!
There was an error while loading. Please reload this page.
Feature or enhancement
Proposal:
Compiled-code modules that implement time-consuming operations that don’t require manipulating Python objects, are supposed to call
PyErr_CheckSignals
frequently throughout each such operation, so that if the user interrupts the operation with control-C, it is cancelled promptly.In the normal case where no signals are pending,
PyErr_CheckSignals
is cheap; however, callers must hold the GIL, and compiled-code modules that implement time-consuming operations are also supposed to release the GIL during each such operation. The overhead of reclaiming the GIL in order to callPyErr_CheckSignals
,and then releasing it again, sufficiently often for reasonable user responsiveness, can be substantial.
If my understanding of the thread-state rules is correct, PyErr_CheckSignals only needs the GIL if it has work to do. Checking whether there is a pending signal, or a pending request to run the cycle collector, requires only a couple of atomic loads. Therefore, I propose that we should restructure
PyErr_CheckSignals
and its close relatives (_PyErr_CheckSignals
and_PyErr_CheckSignalsTstate
) so that all the “do we have anything to do” checks are done in a batch before anything that needs the GIL. If any of them are true, then the function should acquire the GIL itself, repeat the check (because another thread could have stolen the event while we were waiting for the GIL), and then actually do the work, thus enabling callers to not hold the GIL.I have already prepared and tested an implementation of this change and will submit a PR shortly.
Has this already been discussed elsewhere?
This is a minor feature, which does not need previous discussion elsewhere
Links to previous discussion of this feature:
No response
Linked PRs
The text was updated successfully, but these errors were encountered: