Skip to content

Deadlock when shutting down ThreadPoolExecutor from inside OS Signal handler #121649

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
bostrt opened this issue Jul 12, 2024 · 2 comments
Open
Labels
type-bug An unexpected behavior, bug, or error

Comments

@bostrt
Copy link

bostrt commented Jul 12, 2024

Bug report

Bug description:

When running the code below and sending a SIGTERM to the process results in what looks like a deadlock on the ThreadPoolExecutor._shutdown_lock. I guess the issue is that the SIGTERM is being handled in main thread while in the submit function where the _shutdown_lock is already locked. When I try calling shutdown inside the signal handler it waits forever.

TBH I'm not sure if this is a bug or expected behavior but I wanted to file this anyways. Thanks!

from concurrent.futures import ThreadPoolExecutor
import signal


def main():
    with ThreadPoolExecutor(max_workers=2) as executor:

        def __exit(signal, frame):
            executor.shutdown(wait=True, cancel_futures=True)

        signal.signal(signal.SIGTERM, __exit)

        while True:
            executor.submit(lambda: 1 + 1)


if __name__ == "__main__":
    main()

AFTER sending SIGTERM to the process, this is what pystack shows:

(v) rbost@fedora:~/code/CheckDuplicate$ pystack remote 3431749
Traceback for thread 3431751 (python) [] (most recent call last):
    (Python) File "/usr/lib64/python3.12/threading.py", line 1030, in _bootstrap
        self._bootstrap_inner()
    (Python) File "/usr/lib64/python3.12/threading.py", line 1073, in _bootstrap_inner
        self.run()
    (Python) File "/usr/lib64/python3.12/threading.py", line 1010, in run
        self._target(*self._args, **self._kwargs)
    (Python) File "/usr/lib64/python3.12/concurrent/futures/thread.py", line 89, in _worker
        work_item = work_queue.get(block=True)

Traceback for thread 3431750 (python) [] (most recent call last):
    (Python) File "/usr/lib64/python3.12/threading.py", line 1030, in _bootstrap
        self._bootstrap_inner()
    (Python) File "/usr/lib64/python3.12/threading.py", line 1073, in _bootstrap_inner
        self.run()
    (Python) File "/usr/lib64/python3.12/threading.py", line 1010, in run
        self._target(*self._args, **self._kwargs)
    (Python) File "/usr/lib64/python3.12/concurrent/futures/thread.py", line 89, in _worker
        work_item = work_queue.get(block=True)

Traceback for thread 3431749 (python) [] (most recent call last):
    (Python) File "/home/rbost/code/CheckDuplicate/test.py", line 18, in <module>
        main()
    (Python) File "/home/rbost/code/CheckDuplicate/test.py", line 14, in main
        executor.submit(lambda: 1 + 1)
    (Python) File "/usr/lib64/python3.12/concurrent/futures/thread.py", line 175, in submit
        f = _base.Future()
    (Python) File "/usr/lib64/python3.12/concurrent/futures/_base.py", line 328, in __init__
        def __init__(self):
    (Python) File "/home/rbost/code/CheckDuplicate/test.py", line 9, in __exit
        executor.shutdown(wait=True, cancel_futures=True)
    (Python) File "/usr/lib64/python3.12/concurrent/futures/thread.py", line 220, in shutdown
        with self._shutdown_lock:

CPython versions tested on:

3.12

Operating systems tested on:

Linux

@bostrt bostrt added the type-bug An unexpected behavior, bug, or error label Jul 12, 2024
@bostrt
Copy link
Author

bostrt commented Jul 12, 2024

For what its worth, I workaround this like using a var to track if signal has been received:

from concurrent.futures import ThreadPoolExecutor
import signal
import threading


def main():
    with ThreadPoolExecutor(max_workers=2) as executor:

        exit_signal = threading.Event()

        def __exit(signal, frame):
            exit_signal.set()

        signal.signal(signal.SIGTERM, __exit)

        while True:
            if exit_signal.is_set():
                executor.shutdown(wait=True, cancel_futures=True)
                break
            executor.submit(lambda: 1 + 1)


if __name__ == "__main__":
    main()

@duaneg
Copy link
Contributor

duaneg commented Jun 3, 2025

This is expected: it is not safe to take possibly contended non-recursive locks in signal handling code.

Signal handler code "interrupts" the main thread and runs at an arbitrary point, which means it may be holding arbitrary locks. In this case the main thread is in a hot loop running code that holds the thread pool's (non-recursive) lock, so when the handler runs and calls shutdown it immediately deadlocks when it tries to acquire the same lock again.

Something along the lines of your work-around is the correct way to implement this sort of functionality. Note that setting the event also takes a lock, but one that your main thread never takes itself. Be careful if that might ever change: e.g. if you had a graceful shutdown path that sets it from the main thread, that would introduce the possibility of a deadlock. Signal handler code needs to be written very carefully!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type-bug An unexpected behavior, bug, or error
Projects
None yet
Development

No branches or pull requests

2 participants