Skip to content

Commit f3e301d

Browse files
jbmsgpshead
andcommitted
[3.13] gh-87135: Hang non-main threads that attempt to acquire the GIL during finalization (GH-105805)
Instead of surprise crashes and memory corruption, we now hang threads that attempt to re-enter the Python interpreter after Python runtime finalization has started. These are typically daemon threads (our long standing mis-feature) but could also be threads spawned by extension modules that then try to call into Python. This marks the `PyThread_exit_thread` public C API as deprecated as there is no plausible safe way to accomplish that on any supported platform in the face of things like C++ code with finalizers anywhere on a thread's stack. Doing this was the least bad option. (cherry picked from commit 8cc5aa4) Co-authored-by: Jeremy Maitin-Shepard <jeremy@jeremyms.com> Co-authored-by: Gregory P. Smith <greg@krypto.org>
1 parent 0607fdd commit f3e301d

File tree

10 files changed

+247
-29
lines changed

10 files changed

+247
-29
lines changed

Doc/c-api/init.rst

Lines changed: 60 additions & 16 deletions
Original file line numberDiff line numberDiff line change
@@ -447,7 +447,11 @@ Initializing and finalizing the interpreter
447447
freed. Some memory allocated by extension modules may not be freed. Some
448448
extensions may not work properly if their initialization routine is called more
449449
than once; this can happen if an application calls :c:func:`Py_Initialize` and
450-
:c:func:`Py_FinalizeEx` more than once.
450+
:c:func:`Py_FinalizeEx` more than once. :c:func:`Py_FinalizeEx` must not be
451+
called recursively from within itself. Therefore, it must not be called by
452+
any code that may be run as part of the interpreter shutdown process, such
453+
as :py:mod:`atexit` handlers, object finalizers, or any code that may be run
454+
while flushing the stdout and stderr files.
451455
452456
.. audit-event:: cpython._PySys_ClearAuditHooks "" c.Py_FinalizeEx
453457
@@ -1074,6 +1078,37 @@ thread, where the CPython global runtime was originally initialized.
10741078
The only exception is if :c:func:`exec` will be called immediately
10751079
after.
10761080
1081+
.. _cautions-regarding-runtime-finalization:
1082+
1083+
Cautions regarding runtime finalization
1084+
---------------------------------------
1085+
1086+
In the late stage of :term:`interpreter shutdown`, after attempting to wait for
1087+
non-daemon threads to exit (though this can be interrupted by
1088+
:class:`KeyboardInterrupt`) and running the :mod:`atexit` functions, the runtime
1089+
is marked as *finalizing*: :c:func:`_Py_IsFinalizing` and
1090+
:func:`sys.is_finalizing` return true. At this point, only the *finalization
1091+
thread* that initiated finalization (typically the main thread) is allowed to
1092+
acquire the :term:`GIL`.
1093+
1094+
If any thread, other than the finalization thread, attempts to acquire the GIL
1095+
during finalization, either explicitly via :c:func:`PyGILState_Ensure`,
1096+
:c:macro:`Py_END_ALLOW_THREADS`, :c:func:`PyEval_AcquireThread`, or
1097+
:c:func:`PyEval_AcquireLock`, or implicitly when the interpreter attempts to
1098+
reacquire it after having yielded it, the thread enters **a permanently blocked
1099+
state** where it remains until the program exits. In most cases this is
1100+
harmless, but this can result in deadlock if a later stage of finalization
1101+
attempts to acquire a lock owned by the blocked thread, or otherwise waits on
1102+
the blocked thread.
1103+
1104+
Gross? Yes. This prevents random crashes and/or unexpectedly skipped C++
1105+
finalizations further up the call stack when such threads were forcibly exited
1106+
here in CPython 3.13 and earlier. The CPython runtime GIL acquiring C APIs
1107+
have never had any error reporting or handling expectations at GIL acquisition
1108+
time that would've allowed for graceful exit from this situation. Changing that
1109+
would require new stable C APIs and rewriting the majority of C code in the
1110+
CPython ecosystem to use those with error handling.
1111+
10771112
10781113
High-level API
10791114
--------------
@@ -1147,11 +1182,14 @@ code, or when embedding the Python interpreter:
11471182
ensues.
11481183
11491184
.. note::
1150-
Calling this function from a thread when the runtime is finalizing
1151-
will terminate the thread, even if the thread was not created by Python.
1152-
You can use :c:func:`Py_IsFinalizing` or :func:`sys.is_finalizing` to
1153-
check if the interpreter is in process of being finalized before calling
1154-
this function to avoid unwanted termination.
1185+
Calling this function from a thread when the runtime is finalizing will
1186+
hang the thread until the program exits, even if the thread was not
1187+
created by Python. Refer to
1188+
:ref:`cautions-regarding-runtime-finalization` for more details.
1189+
1190+
.. versionchanged:: next
1191+
Hangs the current thread, rather than terminating it, if called while the
1192+
interpreter is finalizing.
11551193
11561194
.. c:function:: PyThreadState* PyThreadState_Get()
11571195
@@ -1207,11 +1245,14 @@ with sub-interpreters:
12071245
to call arbitrary Python code. Failure is a fatal error.
12081246
12091247
.. note::
1210-
Calling this function from a thread when the runtime is finalizing
1211-
will terminate the thread, even if the thread was not created by Python.
1212-
You can use :c:func:`Py_IsFinalizing` or :func:`sys.is_finalizing` to
1213-
check if the interpreter is in process of being finalized before calling
1214-
this function to avoid unwanted termination.
1248+
Calling this function from a thread when the runtime is finalizing will
1249+
hang the thread until the program exits, even if the thread was not
1250+
created by Python. Refer to
1251+
:ref:`cautions-regarding-runtime-finalization` for more details.
1252+
1253+
.. versionchanged:: next
1254+
Hangs the current thread, rather than terminating it, if called while the
1255+
interpreter is finalizing.
12151256
12161257
.. c:function:: void PyGILState_Release(PyGILState_STATE)
12171258
@@ -1503,17 +1544,20 @@ All of the following functions must be called after :c:func:`Py_Initialize`.
15031544
If this thread already has the lock, deadlock ensues.
15041545
15051546
.. note::
1506-
Calling this function from a thread when the runtime is finalizing
1507-
will terminate the thread, even if the thread was not created by Python.
1508-
You can use :c:func:`Py_IsFinalizing` or :func:`sys.is_finalizing` to
1509-
check if the interpreter is in process of being finalized before calling
1510-
this function to avoid unwanted termination.
1547+
Calling this function from a thread when the runtime is finalizing will
1548+
hang the thread until the program exits, even if the thread was not
1549+
created by Python. Refer to
1550+
:ref:`cautions-regarding-runtime-finalization` for more details.
15111551
15121552
.. versionchanged:: 3.8
15131553
Updated to be consistent with :c:func:`PyEval_RestoreThread`,
15141554
:c:func:`Py_END_ALLOW_THREADS`, and :c:func:`PyGILState_Ensure`,
15151555
and terminate the current thread if called while the interpreter is finalizing.
15161556
1557+
.. versionchanged:: next
1558+
Hangs the current thread, rather than terminating it, if called while the
1559+
interpreter is finalizing.
1560+
15171561
:c:func:`PyEval_RestoreThread` is a higher-level function which is always
15181562
available (even when threads have not been initialized).
15191563

Include/internal/pycore_pythread.h

Lines changed: 13 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -152,6 +152,19 @@ PyAPI_FUNC(int) PyThread_join_thread(PyThread_handle_t);
152152
* a non-zero value on failure.
153153
*/
154154
PyAPI_FUNC(int) PyThread_detach_thread(PyThread_handle_t);
155+
/*
156+
* Hangs the thread indefinitely without exiting it.
157+
*
158+
* gh-87135: There is no safe way to exit a thread other than returning
159+
* normally from its start function. This is used during finalization in lieu
160+
* of actually exiting the thread. Since the program is expected to terminate
161+
* soon anyway, it does not matter if the thread stack stays around until then.
162+
*
163+
* This is unfortunate for embedders who may not be terminating their process
164+
* when they're done with the interpreter, but our C API design does not allow
165+
* for safely exiting threads attempting to re-enter Python post finalization.
166+
*/
167+
void _Py_NO_RETURN PyThread_hang_thread(void);
155168

156169
#ifdef __cplusplus
157170
}

Include/pythread.h

Lines changed: 20 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -17,7 +17,26 @@ typedef enum PyLockStatus {
1717

1818
PyAPI_FUNC(void) PyThread_init_thread(void);
1919
PyAPI_FUNC(unsigned long) PyThread_start_new_thread(void (*)(void *), void *);
20-
PyAPI_FUNC(void) _Py_NO_RETURN PyThread_exit_thread(void);
20+
/* Terminates the current thread. Considered unsafe.
21+
*
22+
* WARNING: This function is only safe to call if all functions in the full call
23+
* stack are written to safely allow it. Additionally, the behavior is
24+
* platform-dependent. This function should be avoided, and is no longer called
25+
* by Python itself. It is retained only for compatibility with existing C
26+
* extension code.
27+
*
28+
* With pthreads, calls `pthread_exit` causes some libcs (glibc?) to attempt to
29+
* unwind the stack and call C++ destructors; if a `noexcept` function is
30+
* reached, they may terminate the process. Others (macOS) do unwinding.
31+
*
32+
* On Windows, calls `_endthreadex` which kills the thread without calling C++
33+
* destructors.
34+
*
35+
* In either case there is a risk of invalid references remaining to data on the
36+
* thread stack.
37+
*/
38+
Py_DEPRECATED(3.14) PyAPI_FUNC(void) _Py_NO_RETURN PyThread_exit_thread(void);
39+
2140
PyAPI_FUNC(unsigned long) PyThread_get_thread_ident(void);
2241

2342
#if (defined(__APPLE__) || defined(__linux__) || defined(_WIN32) \

Lib/test/test_threading.py

Lines changed: 71 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1237,6 +1237,77 @@ def test_native_id_after_fork(self):
12371237
self.assertEqual(len(native_ids), 2)
12381238
self.assertNotEqual(native_ids[0], native_ids[1])
12391239

1240+
@cpython_only
1241+
def test_finalize_daemon_thread_hang(self):
1242+
if support.check_sanitizer(thread=True, memory=True):
1243+
# the thread running `time.sleep(100)` below will still be alive
1244+
# at process exit
1245+
self.skipTest(
1246+
"https://github.com/python/cpython/issues/124878 - Known"
1247+
" race condition that TSAN identifies.")
1248+
# gh-87135: tests that daemon threads hang during finalization
1249+
script = textwrap.dedent('''
1250+
import os
1251+
import sys
1252+
import threading
1253+
import time
1254+
import _testcapi
1255+
1256+
lock = threading.Lock()
1257+
lock.acquire()
1258+
thread_started_event = threading.Event()
1259+
def thread_func():
1260+
try:
1261+
thread_started_event.set()
1262+
_testcapi.finalize_thread_hang(lock.acquire)
1263+
finally:
1264+
# Control must not reach here.
1265+
os._exit(2)
1266+
1267+
t = threading.Thread(target=thread_func)
1268+
t.daemon = True
1269+
t.start()
1270+
thread_started_event.wait()
1271+
# Sleep to ensure daemon thread is blocked on `lock.acquire`
1272+
#
1273+
# Note: This test is designed so that in the unlikely case that
1274+
# `0.1` seconds is not sufficient time for the thread to become
1275+
# blocked on `lock.acquire`, the test will still pass, it just
1276+
# won't be properly testing the thread behavior during
1277+
# finalization.
1278+
time.sleep(0.1)
1279+
1280+
def run_during_finalization():
1281+
# Wake up daemon thread
1282+
lock.release()
1283+
# Sleep to give the daemon thread time to crash if it is going
1284+
# to.
1285+
#
1286+
# Note: If due to an exceptionally slow execution this delay is
1287+
# insufficient, the test will still pass but will simply be
1288+
# ineffective as a test.
1289+
time.sleep(0.1)
1290+
# If control reaches here, the test succeeded.
1291+
os._exit(0)
1292+
1293+
# Replace sys.stderr.flush as a way to run code during finalization
1294+
orig_flush = sys.stderr.flush
1295+
def do_flush(*args, **kwargs):
1296+
orig_flush(*args, **kwargs)
1297+
if not sys.is_finalizing:
1298+
return
1299+
sys.stderr.flush = orig_flush
1300+
run_during_finalization()
1301+
1302+
sys.stderr.flush = do_flush
1303+
1304+
# If the follow exit code is retained, `run_during_finalization`
1305+
# did not run.
1306+
sys.exit(1)
1307+
''')
1308+
assert_python_ok("-c", script)
1309+
1310+
12401311
class ThreadJoinOnShutdown(BaseTestCase):
12411312

12421313
def _run_and_join(self, script):
Lines changed: 15 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,15 @@
1+
Attempting to acquire the GIL after runtime finalization has begun in a
2+
different thread now causes the thread to hang rather than terminate, which
3+
avoids potential crashes or memory corruption caused by attempting to
4+
terminate a thread that is running code not specifically designed to support
5+
termination. In most cases this hanging is harmless since the process will
6+
soon exit anyway.
7+
8+
The ``PyThread_exit_thread`` function is now deprecated. Its behavior is
9+
inconsistent across platforms, and it can only be used safely in the
10+
unlikely case that every function in the entire call stack has been designed
11+
to support the platform-dependent termination mechanism. It is recommended
12+
that users of this function change their design to not require thread
13+
termination. In the unlikely case that thread termination is needed and can
14+
be done safely, users may migrate to calling platform-specific APIs such as
15+
``pthread_exit`` (POSIX) or ``_endthreadex`` (Windows) directly.

Modules/_testcapimodule.c

Lines changed: 29 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -3322,6 +3322,34 @@ test_atexit(PyObject *self, PyObject *Py_UNUSED(args))
33223322
Py_RETURN_NONE;
33233323
}
33243324

3325+
// Used by `finalize_thread_hang`.
3326+
#ifdef _POSIX_THREADS
3327+
static void finalize_thread_hang_cleanup_callback(void *Py_UNUSED(arg)) {
3328+
// Should not reach here.
3329+
Py_FatalError("pthread thread termination was triggered unexpectedly");
3330+
}
3331+
#endif
3332+
3333+
// Tests that finalization does not trigger pthread cleanup.
3334+
//
3335+
// Must be called with a single nullary callable function that should block
3336+
// (with GIL released) until finalization is in progress.
3337+
static PyObject *
3338+
finalize_thread_hang(PyObject *self, PyObject *callback)
3339+
{
3340+
// WASI builds some pthread stuff but doesn't have these APIs today?
3341+
#if defined(_POSIX_THREADS) && !defined(__wasi__)
3342+
pthread_cleanup_push(finalize_thread_hang_cleanup_callback, NULL);
3343+
#endif
3344+
PyObject_CallNoArgs(callback);
3345+
// Should not reach here.
3346+
Py_FatalError("thread unexpectedly did not hang");
3347+
#if defined(_POSIX_THREADS) && !defined(__wasi__)
3348+
pthread_cleanup_pop(0);
3349+
#endif
3350+
Py_RETURN_NONE;
3351+
}
3352+
33253353

33263354
static void
33273355
tracemalloc_track_race_thread(void *data)
@@ -3590,6 +3618,7 @@ static PyMethodDef TestMethods[] = {
35903618
{"test_atexit", test_atexit, METH_NOARGS},
35913619
{"tracemalloc_track_race", tracemalloc_track_race, METH_NOARGS},
35923620
{"toggle_reftrace_printer", toggle_reftrace_printer, METH_O},
3621+
{"finalize_thread_hang", finalize_thread_hang, METH_O, NULL},
35933622
{NULL, NULL} /* sentinel */
35943623
};
35953624

Python/ceval_gil.c

Lines changed: 17 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -7,6 +7,7 @@
77
#include "pycore_pylifecycle.h" // _PyErr_Print()
88
#include "pycore_pymem.h" // _PyMem_IsPtrFreed()
99
#include "pycore_pystats.h" // _Py_PrintSpecializationStats()
10+
#include "pycore_pythread.h" // PyThread_hang_thread()
1011

1112
/*
1213
Notes about the implementation:
@@ -284,10 +285,9 @@ drop_gil(PyInterpreterState *interp, PyThreadState *tstate, int final_release)
284285
/* Take the GIL.
285286
286287
The function saves errno at entry and restores its value at exit.
288+
It may hang rather than return if the interpreter has been finalized.
287289
288-
tstate must be non-NULL.
289-
290-
Returns 1 if the GIL was acquired, or 0 if not. */
290+
tstate must be non-NULL. */
291291
static void
292292
take_gil(PyThreadState *tstate)
293293
{
@@ -300,12 +300,18 @@ take_gil(PyThreadState *tstate)
300300

301301
if (_PyThreadState_MustExit(tstate)) {
302302
/* bpo-39877: If Py_Finalize() has been called and tstate is not the
303-
thread which called Py_Finalize(), exit immediately the thread.
303+
thread which called Py_Finalize(), this thread cannot continue.
304304
305305
This code path can be reached by a daemon thread after Py_Finalize()
306306
completes. In this case, tstate is a dangling pointer: points to
307-
PyThreadState freed memory. */
308-
PyThread_exit_thread();
307+
PyThreadState freed memory.
308+
309+
This used to call a *thread_exit API, but that was not safe as it
310+
lacks stack unwinding and local variable destruction important to
311+
C++. gh-87135: The best that can be done is to hang the thread as
312+
the public APIs calling this have no error reporting mechanism (!).
313+
*/
314+
PyThread_hang_thread();
309315
}
310316

311317
assert(_PyThreadState_CheckConsistency(tstate));
@@ -349,7 +355,9 @@ take_gil(PyThreadState *tstate)
349355
if (drop_requested) {
350356
_Py_unset_eval_breaker_bit(holder_tstate, _PY_GIL_DROP_REQUEST_BIT);
351357
}
352-
PyThread_exit_thread();
358+
// gh-87135: hang the thread as *thread_exit() is not a safe
359+
// API. It lacks stack unwind and local variable destruction.
360+
PyThread_hang_thread();
353361
}
354362
assert(_PyThreadState_CheckConsistency(tstate));
355363

@@ -390,7 +398,7 @@ take_gil(PyThreadState *tstate)
390398

391399
if (_PyThreadState_MustExit(tstate)) {
392400
/* bpo-36475: If Py_Finalize() has been called and tstate is not
393-
the thread which called Py_Finalize(), exit immediately the
401+
the thread which called Py_Finalize(), gh-87135: hang the
394402
thread.
395403
396404
This code path can be reached by a daemon thread which was waiting
@@ -400,7 +408,7 @@ take_gil(PyThreadState *tstate)
400408
/* tstate could be a dangling pointer, so don't pass it to
401409
drop_gil(). */
402410
drop_gil(interp, NULL, 1);
403-
PyThread_exit_thread();
411+
PyThread_hang_thread();
404412
}
405413
assert(_PyThreadState_CheckConsistency(tstate));
406414

Python/pylifecycle.c

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -2069,7 +2069,7 @@ _Py_Finalize(_PyRuntimeState *runtime)
20692069
/* Ensure that remaining threads are detached */
20702070
_PyEval_StopTheWorldAll(runtime);
20712071

2072-
/* Remaining daemon threads will automatically exit
2072+
/* Remaining daemon threads will be trapped in PyThread_hang_thread
20732073
when they attempt to take the GIL (ex: PyEval_RestoreThread()). */
20742074
_PyInterpreterState_SetFinalizing(tstate->interp, tstate);
20752075
_PyRuntimeState_SetFinalizing(runtime, tstate);

Python/thread_nt.h

Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -291,6 +291,14 @@ PyThread_exit_thread(void)
291291
_endthreadex(0);
292292
}
293293

294+
void _Py_NO_RETURN
295+
PyThread_hang_thread(void)
296+
{
297+
while (1) {
298+
SleepEx(INFINITE, TRUE);
299+
}
300+
}
301+
294302
/*
295303
* Lock support. It has to be implemented as semaphores.
296304
* I [Dag] tried to implement it with mutex but I could find a way to

0 commit comments

Comments
 (0)