-
-
Notifications
You must be signed in to change notification settings - Fork 32.1k
gh-133465: Efficient signal checks with detached thread state. #135358
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Add new C-API functions `PyErr_CheckSignalsDetached` and `PyErr_AreSignalsPending`. `PyErr_CheckSignalsDetached` can *only* be called by threads that *don’t* have an attached thread state. It does the same thing as `PyErr_CheckSignals`, except that it guarantees it will only reattach the supplied thread state if necessary in order to run signal handlers. (Also, it never runs the cycle collector.) `PyErr_AreSignalsPending` can be called with or without an attached thread state. It reports to its caller whether signals are pending, but never runs any handlers itself. Rationale: Compiled-code modules that implement time-consuming operations that don’t require manipulating Python objects, are supposed to call PyErr_CheckSignals frequently throughout each such operation, so that if the user interrupts the operation with control-C, it is canceled promptly. In the normal case where no signals are pending, PyErr_CheckSignals is cheap (two atomic memory operations); however, callers must have an attached thread state, and compiled-code modules that implement time-consuming operations are also supposed to detach their thread state during each such operation. The overhead of re-attaching a thread state in order to call PyErr_CheckSignals, and then releasing it again, sufficiently often for reasonable user responsiveness, can be substantial, particularly in traditional (non-free-threaded) builds. These new functions permit compiled-code modules to avoid that extra overhead.
This pull request semi-supersedes #133466. Given the substantial changes requested in discussion there, I found it easier to start over from scratch. As requested, it has been minimized down to the new API change. Once we are all 100% happy with the new API, then I will make follow-up PRs for the more elaborate documentation changes and the stdlib uses of the new API that were also included in #133466. This is also semi-draft; I would appreciate code review but the API design discussion in capi-workgroup/decisions#68 has not been finalized yet. Please note that I will be offline from Friday of this week (that is, 2025 June 13) through the 22nd. |
Marking as |
…n the stdlib. This patch demonstrates how `PyErr_CheckSignalsDetached` (python#135358; see also python#133465 and capi-workgroup/decisions#68) might be used for some (small) efficiency improvements in the stdlib. Almost all existing uses of `PyErr_CheckSignals` in the stdlib appear in loops with this general structure: ```c int res = 0; int async_err = 0; do { Py_BEGIN_ALLOW_THREADS res = system_call(arg1, arg2); Py_END_ALLOW_THREADS } while (res < 0 && errno == EINTR && !(async_err = PyErr_CheckSignals())); if (res < 0) { return !async_err ? PyErr_SetFromErrno(PyExc_OSError) : NULL; } Py_RETURN_NONE; ``` With the addition of `PyErr_CheckSignalsDetached`, this can become ```c int res = 0; int async_err = 0; PyThreadState *tstate = PyEval_SaveThread(); do { res = system_call(arg1, arg2); } while (res < 0 && errno == EINTR && !(async_err = PyErr_CheckSignalsDetached(tstate))); PyEval_RestoreThread(tstate); if (res < 0) { return !async_err ? PyErr_SetFromErrno(PyExc_OSError) : NULL; } Py_RETURN_NONE; ``` There were also a small number of places where ``` Py_BLOCK_THREADS sig = PyErr_CheckSignals(); Py_UNBLOCK_THREADS ``` could become just ``` sig = PyErr_CheckSignalsDetached(tstate); ``` I chose to use explicit `PyErr_{Save,Restore}Thread` throughout this patch, in order to give the thread-state parameter to `PyErr_CheckSignalsDetached` a visible name in the calling code. This is possibly an argument for adding a wrapper macro (`PyErr_CHECK_SIGNALS_DETACHED()`, perhaps) that is part of the `Py_BEGIN_ALLOW_THREADS` family, and is therefore entitled to use the hidden `_state` variable. Or it could be an argument for adding a new construct like ``` Py_WITH_DETACHED_THREAD_STATE (tstate) { do { res = system_call(arg1, arg2); } while (res < 0 && errno == EINTR && !(async_err = PyErr_CheckSignalsDetached(tstate))); } ``` (This could be implemented in plain C by off-label use of a for loop. I’m not sure if that’s a good idea or not. In particular, `break` and `continue` at the top level of a `Py_WITH_DETACHED_THREAD_STATE` block would have surprising effects.) The patch only tackles the most straightforward conversions. Places in the stdlib that could probably benefit from a conversion, but that I did not touch, are: * `signal.sigtimedwait`, `select.select`, `select.poll`, `select.devpoll`, `select.epoll`, `select.kevent`, `time.sleep`, `_multiprocessing.semaphore`, and the ssl and socket modules. All of these places need to do arithmetic on some combination of `struct timespec`, `struct timeval`, and `Py_time_t` in their loops, and the APIs for doing so are not currently safe to use without an attached thread state (in particular, they may set the error indicator). * The ssl and socket modules also do a bunch of other work in the same loops that detach the thread state and I’m not familiar enough with those APIs to know if it’s safe to expand the regions with detached thread state. This is unfortunate as I think these are the loops with the biggest chance of _substantial_ efficiency gains from enlarging those regions. * `time.sleep` is also a horrible maze of `#ifdef`s and I think I see a race bug in the Windows-specific code (filed separately as python#135407). * The tkinter module goes back and forth between holding a Python thread state and holding a _Tcl_ thread state and I didn’t want to mess with that. There are also many places where no conversion is necessary: * `signal.pause`, `signal.raise`, `signal.signal`, `signal.pthread_sigmask`, `signal.pthread_kill`, `os.kill`, `os._getdiskusage`, `PyObject_Print`, `PyObject_Repr`, `PyObject_Str`, `PyErr_SetFromErrnoWithFilenameObjects`, `builtin_input_impl`: These do not contain a loop, so there’s no reason to mess with them. * The `_sre` module, `longobject.c`, `faulthandler.c`, `traceback.c`: These don’t detach the thread state at all, so again there’s no reason to mess with them. As a final note, it would probably be a good idea to make it a documented guarantee that `PyEval_{Save,Restore}Thread` and `PyErr_CheckSignals{,Detached}` preserve the value of `errno` and all the Windows-specific errno analogues. A lot of this code is already assuming that this is the case, and the code that isn’t making that assumption is substantially clunkier for it.
Add new C-API functions
PyErr_CheckSignalsDetached
andPyErr_AreSignalsPending
.PyErr_CheckSignalsDetached
can only be called by threads that don’t have an attached thread state. It does the same thing asPyErr_CheckSignals
, except that it guarantees it will only reattach the supplied thread state if necessary in order to run signal handlers. (Also, it never runs the cycle collector.)PyErr_AreSignalsPending
can be called with or without an attached thread state. It reports to its caller whether signals are pending, but never runs any handlers itself.Rationale: Compiled-code modules that implement time-consuming operations that don’t require manipulating Python objects, are supposed to call PyErr_CheckSignals frequently throughout each such operation, so that if the user interrupts the operation with control-C, it is canceled promptly.
In the normal case where no signals are pending, PyErr_CheckSignals is cheap (two atomic memory operations); however, callers must have an attached thread state, and compiled-code modules that implement time-consuming operations are also supposed to detach their thread state during each such operation. The overhead of re-attaching a thread state in order to call PyErr_CheckSignals, and then releasing it again, sufficiently often for reasonable user responsiveness, can be substantial, particularly in traditional (non-free-threaded) builds. These new functions permit compiled-code modules to avoid that extra overhead.
📚 Documentation preview 📚: https://cpython-previews--135358.org.readthedocs.build/en/135358/c-api/exceptions.html#c.PyErr_CheckSignalsDetached