GH-91048: Add utils for capturing async call stack for asyncio programs and enable profiling #124640

1st1 · 2024-09-26T22:52:13Z

This PR introduces low overhead utils to enable async stack reconstruction for running and suspended tasks and futures. The runtime overhead this PR adds is making tasks set & reset a reference to future/tasks they await on. Control flow primitives like asyncio.TaskGroup, asyncio.gather(), asyncio.shield() are also updated to perform this simple bookkeeping.

Remaining to-do for @pablogsal, @ambv, and myself

swap_current_task in asynciomodule.c needs to be updated to take care of the new ts->asyncio_running_task
Add a few tests for eager tasks and how they would interact with this (e.g. running only eager tasks, running eager task within a task, running a task from an eager task from a task, etc.)
test_running_loop_within_a_loop seg faults if the test is ran twice (I think), a repro is running ./python.exe -m test test_asyncio -R3:3
-R3:3 reports refleaks ~~, might be nothing, but needs to be checked~~ (OK, definitely looks like there's a leak "test_asyncio.test_unix_events leaked [492, 492, 492] references, sum=1476", repro ./python.exe -m test test_asyncio.test_subprocess test_asyncio.test_taskgroups -R3:3)
Run full perf test again, we might have other regressions besides the already fixed gather()
Cover both Python and C implementations (right now only C is being tested)

New APIs:

asyncio.capture_call_graph(*, future=None, depth=1) to capture the call stack for the current task or for the specified task or future
asyncio.print_call_graph(*, future=None, file=None, depth=1) to print the call stack for the current task or for the specified task or future
asyncio.future_add_to_awaited_by() and asyncio.future_discard_from_awaited_by() to enable "stitching" tasks and futures that are awaiting other tasks and futures.
frame.f_generator a getter to return the generator object associated with the frame or None. The implementation is maximally straightforward and does not require any new fields in any of the internal interpreter structs.

New C APIs:

Coroutine and generator structs gain a new pointer cr_task. It is a borrowed pointer to the task that runs the coroutine. This is meant for external profilers to quickly find the associated task (an alternative to this would be a rather costly traversal of interpreter state to find the module state of asyncio, requiring to read many Python dict structs).
~~A "private" C API function void _PyCoro_SetTask(PyObject *coro, PyObject *task) to set the cr_task field from _asynciomodule.c.~~

We have a complete functional test for out of process introspection in test_external_inspection.py (sampling profilers and debuggers will follow the same approach).

Example

This example program:

async def deep():
    await asyncio.sleep(0)
    import pprint
    pprint.pp(asyncio.capture_call_graph())

async def c1():
    await asyncio.sleep(0)
    await deep()

async def c2():
    await asyncio.sleep(0)

async def main():
    await asyncio.gather(c1(), c2())

asyncio.run(main())

will print:

FrameCallGraphEntry(
    future=<Task pending name='Task-2' ...>, 
    call_stack=[
        FrameCallGraphEntry(
            frame=..., 
        FrameCallGraphEntry(
            frame=...)
    ], 
    awaited_by=[
        FrameCallGraphEntry(
            future=<Task pending name='Task-1' ...>, 
            call_stack=[
                FrameCallGraphEntry(
                    frame=...)
            ], 
            awaited_by=[]
        )
    ]
)

Issue: Async Call-Stack Reconstruction #91048

📚 Documentation preview 📚: https://cpython-previews--124640.org.readthedocs.build/

1st1 · 2024-09-27T21:45:06Z

I've implemented proof of concept out of process async stack reconstruction here: 1st1@7185f8d

Modules/_testexternalinspection.c

Modules/_asynciomodule.c

mpage

Thanks for doing this! Do you know how much overhead the awaiter tracking adds?

Doc/library/asyncio-stack.rst

Modules/_asynciomodule.c

1st1 · 2024-10-02T18:19:27Z

@mpage

Thanks for doing this! Do you know how much overhead the awaiter tracking adds?

I haven't measured, but I don't expect the overhead to be detectable at all. In the most common scenario the tracking is just assigning / resetting a pointer in Task/Future C structs.

pablogsal · 2024-10-02T20:08:36Z

@mpage

Thanks for doing this! Do you know how much overhead the awaiter tracking adds?

I haven't measured, but I don't expect the overhead to be detectable at all. In the most common scenario the tracking is just assigning / resetting a pointer in Task/Future C structs.

I ran some async benchmarks from pyperformance and a small echo tcp server and the Perf impact is below the noise.

`f_generator` returns the generator / coroutine / async generator object that owns the frame. For all other kinds of frames it will return `None`. This is useful to reconstruct call stack for async/await code.

picnixz · 2025-01-21T22:37:48Z

Lib/asyncio/graph.py

+import dataclasses
+import sys
+import types
+import typing


Didn't we remove typing imports recently from asyncio to speed-up import time? Because I think importing asyncio would now also import typing since we do from .graph import * in asyncio.__init__. I think we should let typeshed the responsibility of having the type hints.

I just rebased this PR, so the current source may be outdated with other changes we did elsewhere

First off, I don't think the crusade to remove typing imports from the standard library makes sense. Any non-trivial application will import typing anyway through some third-party dependency. It's a lot of effort for a very brittle outcome.

As for this particular import, we need it to type a file. I'll see what I can do to avoid the import.

Well.. whether it makes sense or not is a legitimate question, but since something was recently approved to explicitly removed typing from asyncio imports, I thought it would have been better not to implicitly revert it (7363476).

Lib/asyncio/graph.py

pablogsal · 2025-01-22T00:43:56Z

@1st1 @ambv I fixed the crashes, refleaks and I did one of the most painful rebases ever, so please let get this merged as soon as possible to avoid collisions with the other work going on :)

Concerns addressed.

kumaraditya303 · 2025-01-22T15:45:36Z

There have been some changes in asyncio which are relevant for this:

The C implementation is thread safe so new code should ideally be thread safe as well, thread safety is superficial here as tasks aren't thread safe really, it just shouldn't crash, mostly adding @critical_section to getter and methods would make it work.
Add object locked assertion to any internal method if necessary
There have been discussions about moving current task to per loop for free-threading, that might break the external introspection in future.

I haven't followed on this PR so not sure how many of those points matter here, but still I wrote it incase anyone else missed them.

pablogsal · 2025-01-22T16:01:36Z

There have been discussions about moving current task to per loop for free-threading, that might break the external introspection in future.

Then we should find some solution that doesn't break the external introspection in the future ;)

pablogsal · 2025-01-22T16:02:49Z

The C implementation is thread safe so new code should ideally be thread safe as well, thread safety is superficial here as tasks aren't thread safe really, it just shouldn't crash, mostly adding @critical_section to getter and methods would make it work.

Add object locked assertion to any internal method if necessary

This is slightly tricky and if you recall we removed the locks we added at your request. This PR is already unwieldy complicated so I would prefer if we work together in a separate PR to ensure the lock safety is addressed as you would like it

kumaraditya303 · 2025-01-22T16:24:01Z

This is slightly tricky and if you recall we removed the locks we added at your request. This PR is already unwieldy complicated so I would prefer if we work together in a separate PR to ensure the lock safety is addressed as you would like it

Yes, I do remember but it was long time ago 2-3 months, I have made significant changes related to free-threading since that time. It's fine by me to work on free-threading later and merge this first "as-is".

ambv · 2025-01-22T16:31:39Z

Thank you, Kumar. I landed this feature. We've received a ton of review here, definitely more than most changes. We addressed all feedback on design, correctness, and performance. The long-lived branch is hard to keep alive for longer. Thank you to everyone involved, let's evolve this forward from the main branch.

We'll help with #128869 and future free-threading compatibility to make sure external introspection keeps working. As part of the free-threading compatibility of this particular feature, we'll address this in a subsequent PR soon, where we will also add support for eager task external introspection. The new tests added by Kumar in January show some interesting behavior of eager tasks, I'm looking into that. Will provide more information on the subsequent PR.

…r tasks This was missing from pythongh-124640. It's already covered by the new test_asyncio/test_free_threading.py in combination with the runtime assertion in set_ts_asyncio_running_task.

picnixz · 2025-01-23T10:59:58Z

Not sure if this is directly related, but we're seeing build bot failures now: https://buildbot.python.org/#/builders/125/builds/6894

test_async_gather_remote_stack_trace (test.test_external_inspection.TestGetStackTrace.test_async_gather_remote_stack_trace) ... Fatal Python error: _Py_CheckFunctionResult: a function returned NULL without setting an exception

pablogsal · 2025-01-23T13:18:28Z

Not sure if this is directly related, but we're seeing build bot failures now: https://buildbot.python.org/#/builders/125/builds/6894
test_async_gather_remote_stack_trace (test.test_external_inspection.TestGetStackTrace.test_async_gather_remote_stack_trace) ... Fatal Python error: _Py_CheckFunctionResult: a function returned NULL without setting an exception

On it

vstinner · 2025-01-23T16:23:30Z

See also my comment #129189 (comment)

#129197) This was missing from gh-124640. It's already covered by the new test_asyncio/test_free_threading.py in combination with the runtime assertion in set_ts_asyncio_running_task. Co-authored-by: Kumar Aditya <kumaraditya@python.org>

encukou · 2025-01-24T12:54:02Z

@pablogsal @ambv, this broke several tier-1 buildbots for more than a day so, by the book, it should be reverted. But, I also know there are issues with a buildbot update currently, and reverting & reapplying would be a lot churn in this case.

I'll leave it to you to handle this one.

FWIW, each working day I check there are no other buildbot failures, but I'll be offline during the weekend.

vstinner · 2025-01-24T13:55:33Z

I wrote a PR to fix the test: #129262

1st1 requested review from ambv, mpage and pablogsal September 26, 2024 22:52

bedevere-app bot mentioned this pull request Sep 26, 2024

Async Call-Stack Reconstruction #91048

Open

pablogsal force-pushed the stack branch 3 times, most recently from 760ad58 to 4f7bf44 Compare September 30, 2024 11:28

1st1 commented Sep 30, 2024

View reviewed changes

Modules/_testexternalinspection.c Show resolved Hide resolved

1st1 commented Sep 30, 2024

View reviewed changes

Modules/_asynciomodule.c Show resolved Hide resolved

mpage reviewed Sep 30, 2024

View reviewed changes

1st1 force-pushed the stack branch from c252d04 to 6053f24 Compare October 2, 2024 18:34

1st1 marked this pull request as ready for review October 2, 2024 20:37

1st1 requested review from ericsnowcurrently, markshannon, asvetlov, gvanrossum, kumaraditya303 and willingc as code owners October 2, 2024 20:37

bedevere-app bot added the awaiting core review label Oct 2, 2024

1st1 added 8 commits October 2, 2024 13:40

Add f_generator property to Python frame objects

1b01a91

`f_generator` returns the generator / coroutine / async generator object that owns the frame. For all other kinds of frames it will return `None`. This is useful to reconstruct call stack for async/await code.

Working implementation of asyncio.capture_call_stack()

0fc5511

Address Guido's comments

1d20a51

Add a comment for capture_call_stack()

c8be18e

Add a couple more tests

abf2cb9

Remove setter for C impl of Task._awaited_by

20ceab7

Intoduce cr_task

72d9321

Unbreak shield() and gather()

c9475f6

pablogsal force-pushed the stack branch from d41f62f to 38f061d Compare January 21, 2025 22:30

picnixz reviewed Jan 21, 2025

View reviewed changes

use private method for the policy

a8dd667

ambv added 2 commits January 22, 2025 14:53

Avoid importing typing

cf8f5e5

Remove debug printing from test_asyncio.test_graph

eda9c7c

ambv added awaiting merge and removed awaiting changes labels Jan 22, 2025

pablogsal mentioned this pull request Jan 22, 2025

gh-128002: use per threads tasks linked list in asyncio #128869

Merged

ambv merged commit 1885988 into python:main Jan 22, 2025
44 of 45 checks passed

bedevere-app bot removed the awaiting merge label Jan 22, 2025

exhaustedreader approved these changes Jan 22, 2025

View reviewed changes

ambv mentioned this pull request Jan 22, 2025

gh-91048: Add support for reconstructing async call stacks #103976

Closed

ambv mentioned this pull request Jan 22, 2025

gh-91048: Also clear and set ts->asyncio_running_task with eager tasks #129197

Merged

pablogsal mentioned this pull request Jan 23, 2025

The compiler may optimise away globals with debug offsets #129223

Closed

sergey-miryanov mentioned this pull request Feb 21, 2025

gh-130052: Fix some exceptions on error paths in _testexternalinspection #130053

Merged

chris-eibl mentioned this pull request Apr 4, 2025

gh-132132: Upgrade LLVM on tail calling CI #132098

Merged

Uh oh!

GH-91048: Add utils for capturing async call stack for asyncio programs and enable profiling #124640

GH-91048: Add utils for capturing async call stack for asyncio programs and enable profiling #124640

Uh oh!

Conversation

1st1 commented Sep 26, 2024 • edited by ambv Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Remaining to-do for @pablogsal, @ambv, and myself

New APIs:

New C APIs:

Example

Uh oh!

1st1 commented Sep 27, 2024

Uh oh!

Uh oh!

Uh oh!

mpage left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

1st1 commented Oct 2, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pablogsal commented Oct 2, 2024

Uh oh!

picnixz Jan 21, 2025

Choose a reason for hiding this comment

Uh oh!

pablogsal Jan 21, 2025

Choose a reason for hiding this comment

Uh oh!

ambv Jan 22, 2025

Choose a reason for hiding this comment

Uh oh!

picnixz Jan 23, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

pablogsal commented Jan 22, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

kumaraditya303 commented Jan 22, 2025

Uh oh!

pablogsal commented Jan 22, 2025

Uh oh!

pablogsal commented Jan 22, 2025

Uh oh!

kumaraditya303 commented Jan 22, 2025

Uh oh!

Uh oh!

ambv commented Jan 22, 2025

Uh oh!

picnixz commented Jan 23, 2025

Uh oh!

pablogsal commented Jan 23, 2025

Uh oh!

vstinner commented Jan 23, 2025

Uh oh!

encukou commented Jan 24, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

vstinner commented Jan 24, 2025

Uh oh!

Uh oh!

1st1 commented Sep 26, 2024 •

edited by ambv

Loading

mpage left a comment •

edited

Loading

1st1 commented Oct 2, 2024 •

edited

Loading

pablogsal commented Jan 22, 2025 •

edited

Loading

encukou commented Jan 24, 2025 •

edited

Loading