Skip to content

traceback.clear_frames manages to deadlock a background task #79932

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
tvoinarovskyi mannequin opened this issue Jan 16, 2019 · 13 comments · Fixed by #111792
Closed

traceback.clear_frames manages to deadlock a background task #79932

tvoinarovskyi mannequin opened this issue Jan 16, 2019 · 13 comments · Fixed by #111792
Labels
topic-asyncio type-bug An unexpected behavior, bug, or error

Comments

@tvoinarovskyi
Copy link
Mannequin

tvoinarovskyi mannequin commented Jan 16, 2019

BPO 35751
Nosy @asvetlov, @1st1, @tvoinarovskyi
Files
  • test_async_deadlock.py: Script to reproduce issue
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = None
    closed_at = None
    created_at = <Date 2019-01-16.10:34:07.787>
    labels = ['type-bug', '3.7', 'expert-asyncio']
    title = 'traceback.clear_frames manages to deadlock a background task'
    updated_at = <Date 2019-01-17.00:50:59.457>
    user = 'https://github.com/tvoinarovskyi'

    bugs.python.org fields:

    activity = <Date 2019-01-17.00:50:59.457>
    actor = 'tvoinarovskyi'
    assignee = 'none'
    closed = False
    closed_date = None
    closer = None
    components = ['asyncio']
    creation = <Date 2019-01-16.10:34:07.787>
    creator = 'tvoinarovskyi'
    dependencies = []
    files = ['48059']
    hgrepos = []
    issue_num = 35751
    keywords = []
    message_count = 4.0
    messages = ['333759', '333770', '333775', '333808']
    nosy_count = 3.0
    nosy_names = ['asvetlov', 'yselivanov', 'tvoinarovskyi']
    pr_nums = []
    priority = 'normal'
    resolution = None
    stage = None
    status = 'open'
    superseder = None
    type = 'behavior'
    url = 'https://bugs.python.org/issue35751'
    versions = ['Python 3.6', 'Python 3.7']

    Linked PRs

    @tvoinarovskyi
    Copy link
    Mannequin Author

    tvoinarovskyi mannequin commented Jan 16, 2019

    My use case:
    I have a background task, say called "coordination". In that task, I want to catch any errors and push those to the user waiting in the main task and only continue running the background coroutine after the user manually resolves the exception.

    Issue:
    When testing the behaviour with unittest.Case and using assertRaises to catch the exception, the background coroutine manages to just freeze. I have narrowed it down to traceback.clear_frames in assertRaises that causes a GeneratorExit in the background coroutine.

    I believe this issue is a duplicate to https://bugs.python.org/issue29211, but wanted to provide another actual use case where it can pop up. Also even if the generator raises a GeneratorExit, why did the background thread freeze is still a mystery to me.

    Script to reproduce in my case is attached.

    @tvoinarovskyi tvoinarovskyi mannequin added 3.7 (EOL) end of life topic-asyncio type-bug An unexpected behavior, bug, or error labels Jan 16, 2019
    @tvoinarovskyi
    Copy link
    Mannequin Author

    tvoinarovskyi mannequin commented Jan 16, 2019

    So yes, the clear_frames function will force a running generator to close. See https://github.com/python/cpython/blob/3.7/Objects/frameobject.c#L566, it explicitly does a Finalize. Would that be the desired behaviour for assertRaises is not clear. I find it strange that catching an exception is closing my running coroutine.

    The reproduce example can be lowered to something like::

        import asyncio
    
    
        async def background(error_future):
            try:
                raise ValueError
            except Exception as exc:
                error_future.set_exception(exc)
    
    
            await asyncio.sleep(1)
    
        async def main():
            loop = asyncio.get_event_loop()
            error_future = loop.create_future()
            task = asyncio.create_task(background(error_future))
    
            await asyncio.wait([error_future])
            exc = error_future.exception()
            import traceback
            traceback.clear_frames(exc.__traceback__)
    
            # Will block forever, as task will never be waken up
            await task
    
    
        if __name__ == "__main__":
            asyncio.run(main())

    @1st1
    Copy link
    Member

    1st1 commented Jan 16, 2019

    Very interesting. Thanks for reporting this issue. I'll definitely take a look at this before 3.8 is released. For 3.7 you'll likely have to find a workaround.

    @tvoinarovskyi
    Copy link
    Mannequin Author

    tvoinarovskyi mannequin commented Jan 17, 2019

    For now, it seems like copy.copy(exception) before raising seems to prevent this behaviour. So for all exceptions originating from background tasks I raise a copy to the user, rather than the original exception. It prints correct stack, so no real impact on the library.

    @ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
    @ezio-melotti ezio-melotti moved this to Todo in asyncio Jul 17, 2022
    @kumaraditya303 kumaraditya303 removed the 3.7 (EOL) end of life label Oct 18, 2022
    @kumaraditya303
    Copy link
    Contributor

    traceback.clear_frames can close generators too so this is not special to coroutines and asyncio. I guess the simple solution here is "don't do that".

    @gvanrossum
    Copy link
    Member

    @iritkatriel ^^

    @iritkatriel
    Copy link
    Member

    I think this is the fix:

    diff --git a/Objects/frameobject.c b/Objects/frameobject.c
    index dd69207571..a512acf3f0 100644
    --- a/Objects/frameobject.c
    +++ b/Objects/frameobject.c
    @@ -949,7 +949,8 @@ frame_clear(PyFrameObject *f, PyObject *Py_UNUSED(ignored))
     {
         if (f->f_frame->owner == FRAME_OWNED_BY_GENERATOR) {
             PyGenObject *gen = _PyFrame_GetGenerator(f->f_frame);
    -        if (gen->gi_frame_state == FRAME_EXECUTING) {
    +        if (gen->gi_frame_state == FRAME_EXECUTING ||
    +            gen->gi_frame_state == FRAME_SUSPENDED) {
                 goto running;
             }
             _PyGen_Finalize((PyObject *)gen);
    

    But when I apply it we get two failures from tests that seem to imply that the current behaviour is a feature:

    ======================================================================
    ERROR: test_clear_executing_generator (test.test_frame.ClearTest.test_clear_executing_generator)
    ----------------------------------------------------------------------
    Traceback (most recent call last):
      File "/Users/iritkatriel/src/cpython/Lib/test/test_frame.py", line 112, in test_clear_executing_generator
        f.clear()
    RuntimeError: cannot clear an executing frame
    
    ======================================================================
    ERROR: test_clear_generator (test.test_frame.ClearTest.test_clear_generator)
    ----------------------------------------------------------------------
    Traceback (most recent call last):
      File "/Users/iritkatriel/src/cpython/Lib/test/test_frame.py", line 78, in test_clear_generator
        gen.gi_frame.clear()
    RuntimeError: cannot clear an executing frame
    
    ----------------------------------------------------------------------
    

    @kumaraditya303
    Copy link
    Contributor

    But when I apply it we get two failures from tests that seem to imply that the current behaviour is a feature:

    I had already tried this and hence I wrote simple solution here is "don't do that". :)

    @iritkatriel
    Copy link
    Member

    I think we should consider changing it.

    @iritkatriel
    Copy link
    Member

    iritkatriel commented Oct 29, 2022

    Note the confusion in the comments in test_clear_executing_generator:

    # Attempting to clear an executing generator frame is forbidden.
    and then (when the frame is suspended)

    # Clearing the frame closes the generator

    The second one was added later - the bpo issue justifies with just "the variable f wasn't used, let's use it for something".

    Was there ever a conscious decision that clearing a suspended frame should be allowed, and that it closes the generator?

    We could add a close() function Since generator has a close() function, we could deprecate clear() on a suspended frame.

    @markshannon
    Copy link
    Member

    Was there ever a conscious decision that clearing a suspended frame should be allowed, and that it closes the generator?

    That closing a frame closes the generator, yes. Something to do with PEP 442.

    That clearing a suspended frame should be allowed, I don't know.

    Why not change assertRaises() to not close suspended generator/coroutines?

    @iritkatriel
    Copy link
    Member

    Why not change assertRaises() to not close suspended generator/coroutines?

    I think we could either change generator.frame() to disallow it or traceback.clear_frames() to skip those frames. assertRaises is too high level to care about this.

    Currently traceback.clear_frames() swallows exceptions from clear(), and the exception comes from clearing a running (but not a suspended) frame. So either we change frame.clear() to also raise on a suspended frame, or we change the code in traceback.clear_frame() to check for suspended/running instead of relying on the exception.

    @iritkatriel
    Copy link
    Member

    iritkatriel commented Nov 6, 2023

    I've created a PR to raise an exception when the frame is suspended deprecate clearing suspended frames, add an option to raise a RuntimeError in that case, and use this option in traceback.clear_frames.

    #111792

    aisk pushed a commit to aisk/cpython that referenced this issue Feb 11, 2024
    Glyphack pushed a commit to Glyphack/cpython that referenced this issue Sep 2, 2024
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    topic-asyncio type-bug An unexpected behavior, bug, or error
    Projects
    Status: Done
    5 participants