gh-91048: Add support for reconstructing async call stacks #103976

mpage · 2023-04-28T18:29:54Z

When profiling an async Python application it's useful to see both the
stack for the currently executing task as well as the chain of coroutines
that are transitively awaiting the task. Consider the following example,
where T represents a task, C represents a coroutine, and A '->' B
indicates A is awaiting B.

T0    +---> T1
|     |     |
C0    |     C2
|     |     |
v     |     v
C1    |     C3
|     |
+-----|

The async stack from C3 would be C3, C2, C1, C0. In contrast, the
synchronous call stack while C3 is executing is only C3, C2. It's
possible to reconstruct this view in most cases using what is
currently available in CPython, however it's difficult to do so
efficiently, and would be very challenging to do so, let alone
efficiently, in an out of process profiler that leverages eBPF.

This introduces a new field onto coroutines and async generators
that makes it easy to efficiently reconstruct the async call stack.
The field stores an owned reference, set by the interpreter, to
the coroutine or async generator that is awaiting the field's owner.
To reconstruct the chain of coroutines/async generators one only
needs to walk the new field backwards.

Intermediate awaitables (e.g. Task, _GatheringFuture) complicate
maintaining a complete chain of awaiters. A special method, __set_awaiter__
is introduced to simplify the process. Types can provide an implementation
of this method to forward the awaiter on child objects.

Issue: Async Call-Stack Reconstruction #91048

When profiling an async Python application it's useful to see both the stack for the currently executing task as well as the chain of coroutines that are transitively awaiting the task. Consider the following example, where T represents a task, C represents a coroutine, and A '->' B indicates A is awaiting B. T0 +---> T1 | | | C0 | C2 | | | v | v C1 | C3 | | +-----| The async stack from C3 would be C3, C2, C1, C0. In contrast, the synchronous call stack while C3 is executing is only C3, C2. It's possible to reconstruct this view in most cases using what is currently available in CPython, however it's difficult to do so efficiently, and would be very challenging to do so, let alone efficiently, in an out of process profiler that leverages eBPF. This introduces a new field onto coroutines and async generators that makes it easy to efficiently reconstruct the async call stack. The field stores an owned reference, set by the interpreter, to the coroutine or async generator that is awaiting the field's owner. To reconstruct the chain of coroutines/async generators one only needs to walk the new field backwards. Intermediate awaitables (e.g. `Task`, `_GatheringFuture`) complicate maintaining a complete chain of awaiters. A special method, `__set_awaiter__` is introduced to simplify the process. Types can provide an implementation of this method to forward the awaiter on child objects.

carljm

Nice! Do you have pyperformance benchmark results?

Python/bytecodes.c

mpage · 2023-05-01T17:25:33Z

Nice! Do you have pyperformance benchmark results?

Nope, will run them.

arhadthedev · 2023-05-01T17:45:47Z

This branch has conflicts that must be resolved
[...]
Python/generated_cases.c.h

I tried to regenerate Python/generated_cases.c.h by porting the make rule:

cpython/Makefile.pre.in

Lines 1490 to 1503 in a679c3d

    
           regen-cases: 
        
           	# Regenerate Python/generated_cases.c.h 
        
           	# and Python/opcode_metadata.h 
        
           	# from Python/bytecodes.c 
        
           	# using Tools/cases_generator/generate_cases.py 
        
           	PYTHONPATH=$(srcdir)/Tools/cases_generator \ 
        
           	$(PYTHON_FOR_REGEN) \ 
        
           	    $(srcdir)/Tools/cases_generator/generate_cases.py \ 
        
           		--emit-line-directives \ 
        
           		-o $(srcdir)/Python/generated_cases.c.h.new \ 
        
           		-m $(srcdir)/Python/opcode_metadata.h.new \ 
        
           		$(srcdir)/Python/bytecodes.c 
        
           	$(UPDATE_FILE) $(srcdir)/Python/generated_cases.c.h $(srcdir)/Python/generated_cases.c.h.new 
        
           	$(UPDATE_FILE) $(srcdir)/Python/opcode_metadata.h $(srcdir)/Python/opcode_metadata.h.new

to the Windows cmd. However, I get an exception:

D:\Oleg\cpython>set PYTHONPATH=D:\Oleg\cpython

D:\Oleg\cpython>python Tools/cases_generator/generate_cases.py --emit-line-directives Python/generated_cases.c.h.new Python/opcode_metadata.h.new
Running Release|x64 interpreter...
Traceback (most recent call last):
  File "D:\Oleg\cpython\Tools\cases_generator\generate_cases.py", line 1259, in <module>
    main()
  File "D:\Oleg\cpython\Tools\cases_generator\generate_cases.py", line 1250, in main
    a.parse()  # Raises SyntaxError on failure
    ^^^^^^^^^
  File "D:\Oleg\cpython\Tools\cases_generator\generate_cases.py", line 557, in parse
    self.parse_file(filename, instrs_idx)
  File "D:\Oleg\cpython\Tools\cases_generator\generate_cases.py", line 568, in parse_file
    with open(filename) as file:
         ^^^^^^^^^^^^^^
FileNotFoundError: [Errno 2] No such file or directory: 'Python/generated_cases.c.h.new'

D:\Oleg\cpython>python Tools/cases_generator/generate_cases.py --emit-line-directives Python/generated_cases.c.h Python/opcode_metadata.h
Running Release|x64 interpreter...
Traceback (most recent call last):
  File "D:\Oleg\cpython\Tools\cases_generator\generate_cases.py", line 1259, in <module>
    main()
  File "D:\Oleg\cpython\Tools\cases_generator\generate_cases.py", line 1250, in main
    a.parse()  # Raises SyntaxError on failure
    ^^^^^^^^^
  File "D:\Oleg\cpython\Tools\cases_generator\generate_cases.py", line 557, in parse
    self.parse_file(filename, instrs_idx)
  File "D:\Oleg\cpython\Tools\cases_generator\generate_cases.py", line 583, in parse_file
    raise psr.make_syntax_error(
  File "Python/generated_cases.c.h", line 4658
    }
    ^
SyntaxError: Couldn't find '// BEGIN BYTECODES //' in Python/generated_cases.c.h

D:\Oleg\cpython>python
Running Release|x64 interpreter...
Python 3.12.0a5+ (tags/v3.12.0a5-186-gf300a1fa4c:f300a1fa4c, Apr 30 2023, 22:46:31) [MSC v.1929 64 bit (AMD64)] on win32

How should I call it correctly?

…D_GEN`

carljm · 2023-05-01T18:03:39Z

@arhadthedev it would be better to make a new separate issue for difficulties regenerating cases on Windows, rather than conflating it into this PR, which isn't really related.

gvanrossum · 2023-05-01T19:56:21Z

FYI you should be able to run the script without any parameters and it will use the right files.

Also I thought it is run automatically by the Windows build.bat script.

mpage · 2023-05-01T21:49:02Z

PyPerformance results: https://gist.github.com/mpage/b8f712396755c3fee277f84352058608

Looks neutral except for unpack sequence (1.23x faster) and unpickle (1.30x slower). I'm surprised that this change would affect either of these benchmarks. Are they considered to be reliable?

carljm · 2023-05-01T22:03:56Z

Just chatted with @mpage and there are a few things we could do to streamline this more:

The FRAME_OWNED_BY_GENERATOR pre-check shouldn't be necessary for correctness, we can probably just eliminate it?
Since we are doing a runtime check here based on characteristics of the code object (is it an async def) it would make more sense to push that into the compiler. We could extract this into a dedicated SET_AWAITER opcode that the compiler emits only in async defs; then no runtime check is needed.
Alternatively if we don't want it in a separate opcode, we could at least specialize away the check by having two versions of SEND_GEN, one where the caller is known to be an async def and one where it isn't.
We could decide to have this feature enabled only by a runtime flag (since it's for improved debugging/profiling). In that case we probably wouldn't want it handled in the compiler, since then we'd need different sets of pycs for flag on/off.

My inclination is for #2 (do it in the compiler, new opcode.) Would welcome comments from other reviewers (e.g. @markshannon, @gvanrossum, @willingc).

This can happen if task's coro completes eagerly

markshannon · 2023-05-03T18:11:06Z

This adds considerable overhead to the fast path of generators, generator expressions and creating coroutines.
It would be more efficient to link the tasks.
So instead of

T0    +---> T1
|     |     |
C0    |     C2
|     |     |
v     |     v
C1    |     C3
|     |
+-----|

We would have

T0  ---> T1
|        |
C0       C2
|        |
v        v
C1       C3

The original could then be reconstructed as needed.

njsmith · 2023-05-03T23:30:37Z

I guess this is a similar question to Mark's... this seems like something that you could do much more simply directly in asyncio, without touching the core interpreter? When a task does await some_task, then it invokes asyncio.Task.__await__, which is arbitrary code that asyncio controls -- wouldn't it be simpler for that code to make a note about the relationship between the tasks? In some cases it might even let you chain together relationships/"causality" through regular Futures, not just tasks.

Also, I might be missing something, but I think the code here gives the wrong answer in cases where a task has multiple waiters, which is allowed by asyncio semantics:

async def wait_on(task):
    await task

async def main():
    sleep_task = asyncio.ensure_future(asyncio.sleep(10))
    task1 = asyncio.ensure_future(wait_on(sleep_task))
    task2 = asyncio.ensure_future(wait_on(sleep_task))

I think in this case the link from task1 -> sleep_task would get overwritten and lost?

mpage · 2023-05-08T16:26:51Z

This adds considerable overhead to the fast path of generators, generator expressions and creating coroutines. It would be more efficient to link the tasks. So instead of

The original could then be reconstructed as needed.

It's worth mentioning that the choice to maintain the async stacks on coroutines was to make it as simple as possible to reconstruct the stack from an eBPF probe. I think that linking tasks should be doable, it'll mean that we have to recreate the logic of _PyGen_yf in a probe, but that isn't terribly complicated.

Where were you thinking that we would update the links between tasks?

mpage · 2023-05-08T16:34:15Z

I guess this is a similar question to Mark's... this seems like something that you could do much more simply directly in asyncio, without touching the core interpreter? When a task does await some_task, then it invokes asyncio.Task.__await__, which is arbitrary code that asyncio controls -- wouldn't it be simpler for that code to make a note about the relationship between the tasks? In some cases it might even let you chain together relationships/"causality" through regular Futures, not just tasks.

It probably doesn't matter in practice, but since __await__ is an ordinary method there's nothing that guarantees that whatever calls __await__ will await the awaitable that is returned (wow that was a mouthful!).

Also, I might be missing something, but I think the code here gives the wrong answer in cases where a task has multiple waiters, which is allowed by asyncio semantics:
async def wait_on(task):
    await task

async def main():
    sleep_task = asyncio.ensure_future(asyncio.sleep(10))
    task1 = asyncio.ensure_future(wait_on(sleep_task))
    task2 = asyncio.ensure_future(wait_on(sleep_task))
I think in this case the link from task1 -> sleep_task would get overwritten and lost?

Yep, this is a limitation of the implementation, the last awaiter wins. This has worked out OK for us so far.

gvanrossum · 2023-05-08T22:52:12Z

I'm not going to be able to give this enough attention any time soon. Would it be okay to consider this for 3.13 instead of 3.12? 3.12 feature freeze (beta 1) was supposed to be today, and while the RM has postponed that by two weeks, he also gave guidance that this is not meant as an encouragement for new features (the SC ruled on a few pending PEPs, allowing some, postponing others to 3.13).

carljm · 2023-05-09T13:51:28Z

I was already assuming this was under discussion for 3.13, not a candidate for 3.12.

jbower-fb · 2023-05-28T05:22:43Z

I'm having another stab at this taking into account what was discussed here. For more details see my comment here: #91048 (comment).

ambv · 2025-01-22T20:14:57Z

Thanks for this, we went with gh-124640 instead.

mpage added 4 commits April 28, 2023 10:35

Fix formatting of blurb

975bd29

Fix style for _PyAwaitable_SetAwaiter

669fe71

Revert newline removal

9e234bb

bedevere-bot mentioned this pull request Apr 28, 2023

Async Call-Stack Reconstruction #91048

Open

bedevere-bot added the awaiting review label Apr 28, 2023

Fix inline literal syntax in blurb

34e674e

mpage marked this pull request as ready for review April 28, 2023 20:33

mpage requested review from markshannon, 1st1, asvetlov, gvanrossum, kumaraditya303 and willingc as code owners April 28, 2023 20:33

carljm reviewed Apr 29, 2023

View reviewed changes

Python/bytecodes.c Outdated Show resolved Hide resolved

Python/bytecodes.c Outdated Show resolved Hide resolved

gvanrossum removed request for 1st1, asvetlov and kumaraditya303 April 29, 2023 01:29

Move awaiter setting after specialization block and fix check in `SEN…

c8c348a

…D_GEN`

arhadthedev mentioned this pull request May 1, 2023

PCBuild needs an analog of make regen-cases #104055

Open

mpage added 2 commits May 1, 2023 15:24

Merge branch 'main' into port-awaiters

67f2394

Handle case where the task's coroutine is NULL

5f63310

This can happen if task's coro completes eagerly

ambv closed this Jan 22, 2025

Uh oh!

gh-91048: Add support for reconstructing async call stacks #103976

gh-91048: Add support for reconstructing async call stacks #103976

Uh oh!

Conversation

mpage commented Apr 28, 2023 • edited by bedevere-bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

carljm left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

mpage commented May 1, 2023

Uh oh!

arhadthedev commented May 1, 2023

Uh oh!

carljm commented May 1, 2023

Uh oh!

gvanrossum commented May 1, 2023

Uh oh!

mpage commented May 1, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

carljm commented May 1, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

markshannon commented May 3, 2023

Uh oh!

njsmith commented May 3, 2023

Uh oh!

mpage commented May 8, 2023

Uh oh!

mpage commented May 8, 2023

Uh oh!

gvanrossum commented May 8, 2023

Uh oh!

carljm commented May 9, 2023

Uh oh!

jbower-fb commented May 28, 2023

Uh oh!

ambv commented Jan 22, 2025

Uh oh!

Uh oh!

mpage commented Apr 28, 2023 •

edited by bedevere-bot

Loading

mpage commented May 1, 2023 •

edited

Loading

carljm commented May 1, 2023 •

edited

Loading