-
-
Notifications
You must be signed in to change notification settings - Fork 31.8k
gh-91048: Add support for reconstructing async call stacks #103976
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
When profiling an async Python application it's useful to see both the stack for the currently executing task as well as the chain of coroutines that are transitively awaiting the task. Consider the following example, where T represents a task, C represents a coroutine, and A '->' B indicates A is awaiting B. T0 +---> T1 | | | C0 | C2 | | | v | v C1 | C3 | | +-----| The async stack from C3 would be C3, C2, C1, C0. In contrast, the synchronous call stack while C3 is executing is only C3, C2. It's possible to reconstruct this view in most cases using what is currently available in CPython, however it's difficult to do so efficiently, and would be very challenging to do so, let alone efficiently, in an out of process profiler that leverages eBPF. This introduces a new field onto coroutines and async generators that makes it easy to efficiently reconstruct the async call stack. The field stores an owned reference, set by the interpreter, to the coroutine or async generator that is awaiting the field's owner. To reconstruct the chain of coroutines/async generators one only needs to walk the new field backwards. Intermediate awaitables (e.g. `Task`, `_GatheringFuture`) complicate maintaining a complete chain of awaiters. A special method, `__set_awaiter__` is introduced to simplify the process. Types can provide an implementation of this method to forward the awaiter on child objects.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice! Do you have pyperformance benchmark results?
Nope, will run them. |
I tried to regenerate Lines 1490 to 1503 in a679c3d
to the Windows cmd. However, I get an exception:
How should I call it correctly? |
@arhadthedev it would be better to make a new separate issue for difficulties regenerating cases on Windows, rather than conflating it into this PR, which isn't really related. |
FYI you should be able to run the script without any parameters and it will use the right files. Also I thought it is run automatically by the Windows build.bat script. |
PyPerformance results: https://gist.github.com/mpage/b8f712396755c3fee277f84352058608 Looks neutral except for unpack sequence (1.23x faster) and unpickle (1.30x slower). I'm surprised that this change would affect either of these benchmarks. Are they considered to be reliable? |
Just chatted with @mpage and there are a few things we could do to streamline this more:
My inclination is for #2 (do it in the compiler, new opcode.) Would welcome comments from other reviewers (e.g. @markshannon, @gvanrossum, @willingc). |
This can happen if task's coro completes eagerly
This adds considerable overhead to the fast path of generators, generator expressions and creating coroutines.
We would have
The original could then be reconstructed as needed. |
I guess this is a similar question to Mark's... this seems like something that you could do much more simply directly in asyncio, without touching the core interpreter? When a task does Also, I might be missing something, but I think the code here gives the wrong answer in cases where a task has multiple waiters, which is allowed by asyncio semantics:
I think in this case the link from task1 -> sleep_task would get overwritten and lost? |
It's worth mentioning that the choice to maintain the async stacks on coroutines was to make it as simple as possible to reconstruct the stack from an eBPF probe. I think that linking tasks should be doable, it'll mean that we have to recreate the logic of Where were you thinking that we would update the links between tasks? |
It probably doesn't matter in practice, but since
Yep, this is a limitation of the implementation, the last awaiter wins. This has worked out OK for us so far. |
I'm not going to be able to give this enough attention any time soon. Would it be okay to consider this for 3.13 instead of 3.12? 3.12 feature freeze (beta 1) was supposed to be today, and while the RM has postponed that by two weeks, he also gave guidance that this is not meant as an encouragement for new features (the SC ruled on a few pending PEPs, allowing some, postponing others to 3.13). |
I was already assuming this was under discussion for 3.13, not a candidate for 3.12. |
I'm having another stab at this taking into account what was discussed here. For more details see my comment here: #91048 (comment). |
Thanks for this, we went with gh-124640 instead. |
When profiling an async Python application it's useful to see both the
stack for the currently executing task as well as the chain of coroutines
that are transitively awaiting the task. Consider the following example,
where T represents a task, C represents a coroutine, and A '->' B
indicates A is awaiting B.
The async stack from C3 would be C3, C2, C1, C0. In contrast, the
synchronous call stack while C3 is executing is only C3, C2. It's
possible to reconstruct this view in most cases using what is
currently available in CPython, however it's difficult to do so
efficiently, and would be very challenging to do so, let alone
efficiently, in an out of process profiler that leverages eBPF.
This introduces a new field onto coroutines and async generators
that makes it easy to efficiently reconstruct the async call stack.
The field stores an owned reference, set by the interpreter, to
the coroutine or async generator that is awaiting the field's owner.
To reconstruct the chain of coroutines/async generators one only
needs to walk the new field backwards.
Intermediate awaitables (e.g.
Task
,_GatheringFuture
) complicatemaintaining a complete chain of awaiters. A special method,
__set_awaiter__
is introduced to simplify the process. Types can provide an implementation
of this method to forward the awaiter on child objects.