Skip to content

GH-90997: Wrap yield from/await in a virtual try/except StopIteration #96010

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 13 commits into from
Aug 19, 2022

Conversation

brandtbucher
Copy link
Member

@brandtbucher brandtbucher commented Aug 15, 2022

Supersedes #31968 by setting up a virtual try/except StopIteration to handle the case where a delegated sub-generator returns a value during a close or throw call through a suspended frame. This moves the logic out of the generator throw implementation and into the bytecode, which requires fewer frame hacks. It does, however, require replacing LOAD_ASSERTION_ERROR with a more general LOAD_EXCEPTION_TYPE opcode that can also handle StopIteration.

It also updates the docs for SEND, which are incomplete.

@brandtbucher brandtbucher added the interpreter-core (Objects, Python, Grammar, and Parser dirs) label Aug 15, 2022
@brandtbucher brandtbucher self-assigned this Aug 15, 2022
1: c
0: value
1: b
2: c
Variable names:
0: a
1: d"""
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be good to also add a test that shows what you're fixing here.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm, we're not really fixing anything (in fact, we want the behavior to stay exactly the same). All we're really doing is moving some of the error handling logic for close and throw into the bytecode, so I'm not really sure how to test that.

I'm confident that the tests in test_yield_from.TestInterestingEdgeCases exercise this path enough to prevent any behavior changes (like we've seen with other approaches to address this issue).

@markshannon
Copy link
Member

This adds a lot of bulk to the bytecode.
Take this minimal function:

async def foo(x):
    await x

main

  1           0 RETURN_GENERATOR
              2 POP_TOP
              4 RESUME                   0

  2           6 LOAD_FAST                0 (x)
              8 GET_AWAITABLE            0
             10 LOAD_CONST               0 (None)
        >>   12 SEND                     3 (to 20)
             14 YIELD_VALUE              2
             16 RESUME                   3
             18 JUMP_BACKWARD_NO_INTERRUPT     4 (to 12)
        >>   20 POP_TOP
             22 LOAD_CONST               0 (None)
             24 RETURN_VALUE

this PR

  1           0 RETURN_GENERATOR
              2 POP_TOP
              4 RESUME                   0

  2           6 LOAD_FAST                0 (x)
              8 GET_AWAITABLE            0
             10 LOAD_CONST               0 (None)
        >>   12 SEND                     3 (to 20)
             14 YIELD_VALUE              2
             16 RESUME                   3
             18 JUMP_BACKWARD_NO_INTERRUPT     4 (to 12)
        >>   20 POP_TOP
             22 LOAD_CONST               0 (None)
             24 RETURN_VALUE
        >>   26 LOAD_EXCEPTION_TYPE      1 (StopIteration)
             28 CHECK_EXC_MATCH
             30 POP_JUMP_FORWARD_IF_TRUE     1 (to 34)
             32 RERAISE                  0
        >>   34 LOAD_ATTR                0 (value)
             54 SWAP                     3
             56 POP_TOP
             58 POP_TOP
             60 POP_TOP
             62 LOAD_CONST               0 (None)
             64 RETURN_VALUE
ExceptionTable:
  14 to 14 -> 26 [2]

What I had envisioned

  1           0 RETURN_GENERATOR
              2 POP_TOP
              4 RESUME                   0

  2           6 LOAD_FAST                0 (x)
              8 GET_AWAITABLE            0
             10 LOAD_CONST               0 (None)
        >>   12 SEND                     3 (to 20)
             14 YIELD_VALUE              2
             16 RESUME                   3
             18 JUMP_BACKWARD_NO_INTERRUPT     4 (to 12)
        >>   20 POP_TOP
             22 LOAD_CONST               0 (None)
             24 RETURN_VALUE
             26 THROW                (to 20)
             28 JUMP_BACKWARD_NO_INTERRUPT     (to 14)
ExceptionTable:
  14 to 14 -> 26 [2]

Where THROW throws into the delegated awaitable.
If that delegated awaitable returns, then jump.
if it yields, then push the value.
If it raises, then propagate.

Much like SEND sends to the delegated awaitable, THROW throws into it.

See also faster-cpython/ideas#448

@brandtbucher
Copy link
Member Author

brandtbucher commented Aug 17, 2022

This adds a lot of bulk to the bytecode.

Yeah, like I said in the meetings: this about triples the size of the bytecode (all in the cold path, though). If all we care about here is the literal size of the bytecode, then we can pack everything in the exception handler into its own dedicated instruction.

This would look a lot like END_ASYNC_FOR, except it would push a value and jump on return. But combining the instructions like that seems to be moving in the wrong direction - in fact, I'd like to see END_ASYNC_FOR broken up like we have here.

What I had envisioned

  1           0 RETURN_GENERATOR
              2 POP_TOP
              4 RESUME                   0

  2           6 LOAD_FAST                0 (x)
              8 GET_AWAITABLE            0
             10 LOAD_CONST               0 (None)
        >>   12 SEND                     3 (to 20)
             14 YIELD_VALUE              2
             16 RESUME                   3
             18 JUMP_BACKWARD_NO_INTERRUPT     4 (to 12)
        >>   20 POP_TOP
             22 LOAD_CONST               0 (None)
             24 RETURN_VALUE
             26 THROW                (to 20)
             28 JUMP_BACKWARD_NO_INTERRUPT     (to 14)
ExceptionTable:
  14 to 14 -> 26 [2]

Where THROW throws into the delegated awaitable. If that delegated awaitable returns, then jump. if it yields, then push the value. If it raises, then propagate.

Much like SEND sends to the delegated awaitable, THROW throws into it.

See also faster-cpython/ideas#448

I've been trying to get something like this to work for the last few weeks in between release blocker firefighting and vacations, but I've just been left convinced that there are too many wrinkles to make it this elegant. Even setting aside the relatively minor annoyance that we need two THROW instructions right now (one forward and one backward), here are just a couple of the issues make this much more complicated than "just SEND but call throw":

  • We actually need to call close if the exception is a GeneratorExit...
  • ...except if we're getting it from an athrow or aclose call, in which case we have different logic for deciding to call athrow or aclose, which have different semantics in the GeneratorExit case. Grep for close_on_genexit in genobject.c for the first breadcrumb in the trail of special async generator logic.
  • We aren't supposed to chain a thrown/closing GeneratorExit to the RuntimeError complaining that we've ignored it, which is a pain to do without changing how literally all other exceptions are chained.
  • The current code has fast paths for closing or throwing into a delegated generator or coroutine if they are exact. Moving all throws and closes into the bytecode would probably make the majority of throws and closes (including generator finalization) a bit slower. If we want to leave these fast paths in the C code, then I don't much see the point of moving the rest into THROW.

For reference, here is my mostly-but-not-quite working THROW branch. If we're really sure that it's worth trying to hack around these issues (and others) in the THROW instructions, then I can keep plugging away at it. But this PR only extracts the problematic parts of the throw/close implementation while leaving all of the other subtle logic alone. That feels like the right separation between the C/Python boundary, in my opinion.

Add to all this that I'm very confident that this PR behaves correctly, and I understand it very well, but am very far from being able to promise that a THROW implementation will preserve the exact semantics for things like async generators and coroutines, of which I know very, very little.

@brandtbucher
Copy link
Member Author

brandtbucher commented Aug 17, 2022

Also note that for many compiler-generated await loops, we can simplify the emitted bytecode to skip extracting and pushing the return value where the compiler knows it's not used. Doing the same for all await and yield from expressions would be possible but harder, since by the point we're emitting it we're buried deep in general-purpose expression-compiling code and don't have much context on how the result is used.

@markshannon
Copy link
Member

markshannon commented Aug 17, 2022

Sure, there are many complications around what is thrown, or whether close is called, etc.
First, move all those complications into a new function, _Py_throw_into().

With this new function:

PySendResult
_Py_throw_into(PyGenObject *subgen, int close_on_genexit, PyObject *typ, PyObject *val, PyObject *tb, PyObject **presult);

We can initially refactor _gen_throw to:

static PyObject *
_gen_throw(PyGenObject *gen, int close_on_genexit,
           PyObject *typ, PyObject *val, PyObject *tb)
{
    PyObject *yf = _PyGen_yf(gen);
    if (yf) {
        PyObect *res;
        PySendResult what = _Py_throw_into(yf, close_on_genexit, type, val, tb, &res);
        switch(what) {
            case PYGEN_RETURN:
                /* Stop SEND loop */
                /* Awkward popping and jumping goes here */
                return gen_send(gen, val);
            case PYGEN_ERROR:
                 return NULL;
            case PYGEN_NEXT:
               return res;
        }
    }
throw_here:
    ...

That doesn't help much by itself, but _Py_throw_into is what we need to implement THROW which, as you point out, cannot jump because it needs close_on_genexit as its oparg.

Instead of generating

  THROW  target
  JUMP_BACKWARD_NO_INTERRUPT back_to_yield

we generate

  THROW close_on_genexit
  POP_JUMP_IF_TRUE target
  JUMP_BACKWARD_NO_INTERRUPT back_to_yield

THROW should be little more than a wrapper around _Py_throw_into, raising on PYGEN_ERROR, pushing True for PYGEN_RETURN and False for PYGEN_NEXT.

@markshannon
Copy link
Member

See also #96040

@brandtbucher
Copy link
Member Author

Closing and reopening to re-run the Azure Pipelines failure.

@brandtbucher
Copy link
Member Author

@markshannon, I've updated the PR based on our discussion this morning. Let me know how it looks.

Copy link
Member

@markshannon markshannon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a definite improvement over the current implementation.

Ultimately we want all the handling of sub-generators to done in bytecode.
This is a good first step, moving the flow control into the bytecode.

@@ -567,6 +567,17 @@ the original TOS1.
.. versionchanged:: 3.11
Exception representation on the stack now consist of one, not three, items.


.. opcode:: END_THROW
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

END suggests that this is the end of a sequence of bytecodes.
Maybe CLEANUP_THROW as it cleans up _gen_throw?

Python/compile.c Outdated
// Set up a virtual try/except to handle when StopIteration is raised during
// a close or throw call. The only way YIELD_VALUE raises if they do!
ADDOP_JUMP(c, SETUP_FINALLY, fail);
RETURN_IF_FALSE(compiler_push_fblock(c, TRY_EXCEPT, send, NO_LABEL, NULL));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There can be no call to compiler_unwind_fblock before this block is popped, so we don't need to push a block here.

Python/compile.c Outdated
ADDOP_I(c, YIELD_VALUE, 0);
compiler_pop_fblock(c, TRY_EXCEPT, send);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nor pop it here

@bedevere-bot
Copy link

When you're done making the requested changes, leave the comment: I have made the requested changes; please review again.

@brandtbucher
Copy link
Member Author

This is a definite improvement over the current implementation.

Ultimately we want all the handling of sub-generators to done in bytecode. This is a good first step, moving the flow control into the bytecode.

Yeah, this fixes the main issue we were concerned with, but I agree that it makes sense to keep trying to port over more of the throw logic. I plan to return to this whenever I get a few extra cycles.

@brandtbucher brandtbucher merged commit 5bfb3c3 into python:main Aug 19, 2022
@encukou
Copy link
Member

encukou commented Apr 8, 2024

With this PR, async functions with exactly 20 nested blocks started crashing with an assert failure, e.g.:

async def t():
    async with h,t,t,o,f,y,o,t,r,o,f,t,f,r,t,m,r,o,t,l:n
python: Python/compile.c:7185: pop_except_block: Assertion `stack->depth > 0' failed.

see #116767

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
interpreter-core (Objects, Python, Grammar, and Parser dirs)
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants