Skip to content

Deadlock at shutdown with stop-the-world and daemon threads #137433

@colesbury

Description

@colesbury

Bug report

Bug description:

Reported by @pablogsal / @godlygeek from memray

Stack trace:
https://gist.github.com/pablogsal/513fa8b0c29cda852ce11c86ce3b1345

We have two threads, the main thread (M) and a daemon thread (D). The main thread starts _Py_Finalize() and performs a global stop the world. The daemon thread is disabling profiling and so tries to performa a stop-the-world specific to it's interpreter:

cpython/Python/pystate.c

Lines 2256 to 2267 in 9745976

static void
stop_the_world(struct _stoptheworld_state *stw)
{
_PyRuntimeState *runtime = &_PyRuntime;
PyMutex_Lock(&stw->mutex);
if (stw->is_global) {
_PyRWMutex_Lock(&runtime->stoptheworld_mutex);
}
else {
_PyRWMutex_RLock(&runtime->stoptheworld_mutex);
}

M: _PyEval_StopTheWorldAll():
M: acquires runtime->stoptheworld->mutex
M: acquires RW lock runtime->stoptheworld_mutex in W (exclusive) mode
M: ... waits on threads
D: _PyEval_StopTheWorld(interp):
D: acquires interp->stoptheworld->mutex
D: ... blocks trying to acquire runtime->stoptheworld_mutex in R mode. Later, the daemon thread will hang in _PyThreadState_HangThread() when trying to re-attach it's thread state.
M: _PyEval_StopTheWorldAll() finishes, marks the interpreter as finalizing
M: ...
M: calls _PyGC_CollectNoFail() which tries to run _PyEval_StopTheWorld(interp)
M: ... blocks trying to acquire interp->stoptheworld->mutex, which is still held by the daemon thread!

Deadlock! Summary:

The daemon thread holds interp->stoptheworld->mutex and is hanging because the interpreter is shutting down.

The main thread is trying to perform the shutdown procedure, including calling the GC a few times, which requires interp->stoptheworld->mutex.

Fix???

  • Release the previously acquired interp->stoptheworld->mutex when hanging the thread if necessary? Crosses a bunch of abstraction barriers, which is messy and tricky

CPython versions tested on:

CPython main branch

Operating systems tested on:

No response

Metadata

Metadata

Assignees

No one assigned

    Labels

    3.13bugs and security fixes3.14bugs and security fixes3.15new features, bugs and security fixesinterpreter-core(Objects, Python, Grammar, and Parser dirs)topic-free-threadingtype-bugAn unexpected behavior, bug, or error

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions