-
-
Notifications
You must be signed in to change notification settings - Fork 31.8k
Better mapping of sys.monitoring
branches back to source code
#122762
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Making locations "make sense to someone only looking at the source code" is hard as it is somewhat subjective, but I'll do my best 🙂 It seems that the locations that don't "make sense ..." are branches that exit the scope.
and exiting the loop in:
The problem is that there is no location for the exit from the function, but we need to provide a location for the destination of the branch. PEP 626 says that the line events must reflect the lines executed, so we cannot introduce spurious line changes. Not all locations have been correct in earlier versions and the locations for some parts of compounds statements were overly broad, but that has been fixed now. I believe the locations are all good in 3.13 and 3.14 @iritkatriel is that correct?
The actual offsets should be regarded as an implementation detail. It is the locations that matter. The runtime may skip instructions that it can tell it doesn't need to execute them. |
3.13 and 3.14 are awesome. The only open issue I am aware of re location is #96999 . |
Thanks, Mark. Here's an example with def foo(n):
while n<3:
print(n)
n += 1
return None
foo(0)
In the first iteration, the branch is shown as going into the block, but |
Coverage tools need to be able to enumerate all possible branches... Is it reasonable to do so |
I've created an issue for the inconsistency in while statement locations. |
I'd be careful using Maybe we should add something like |
I would love that :) Coverage tools also need to know about all lines that contain code, which |
I've added |
I've been experimenting with using the AST to identify what the branches do in the code (e.g., from an I am mapping each branch source and destination to the source code using
@markshannon @iritkatriel and possibly others: does this make sense? Does this way of interpreting the branches have a good chance of staying stable across Python releases? Thank you! |
The structure of the AST is not stable, so something like |
It's hard to imagine the AST not having a But in any case, we'd really appreciate help with how to correctly interpret branches. def foo(x):
if x: print(x) produces this
and |
What does exiting the function mean? Unless there is an infinite loop, all functions will exit. To determine that, you'll need to project the execution until the next branch or exit is reached. I don't think that is something we want to support directly in CPython. For prototyping, you could use |
I'm experimenting with |
Doing some more experiments with simple code.. I'm trying to understand what offsets will be reported for BRANCH_LEFT and BRANCH_RIGHT. foo.py: def foo():
for v in [1, 2]:
if v > 0:
x = v
return x
foo() Disassembled:
Here is the data reported by sys.monitoring as shown by run_sysmon.py. Ignore the line numbers (
The two branches from offset 6 are confusing:
What are the rules for how instruction offsets get reported? How can I see the bytecode that will make the most sense as those offsets? |
For mapping back to source code, try using locations not offsets. When tracking which branches have executed, offsets will be best for speed. All |
I don't understand what you mean by locations here? Can you give me an example of mapping the events I showed above? |
The locations provided by
Suppose we have three instructions, at offsets O1, O2 and O3 at locations L1, L2 and L2 respectively. Note that the second and third locations have the same location. In more concrete terms using the example above: why is the branch from Running the example on latest main we have:
|
I see what you are saying. I've adjusted my code to walk the trails from each branch destination: This gives me a way to look up the offsets reported in the events and map them to lines. Thanks. |
Feature or enhancement
Proposal:
sys.monitoring
allows us to record branches taken during execution, reporting them in terms of their source and destination offsets in bytecode. These offsets are meaningless to developers, however, unless they take the time to generate and study the bytecode disassembly. Code objects provideco_positions()
to map bytecode locations to the source code that originated it, but the branches don't always map to locations that make sense to someone only looking at the source code.For example, when executed,
shows a branch from offset 4, which maps to the
if
keyword, to offset 30, which also maps to thatif
.How can/should a coverage tool attempting to report this branch recognize that it is a branch out of the function?
The branch does go to a
RETURN_...
opcode, but is that reliable?Branches from
for
loops also show a bit obfuscated. In the examplethe branch taken into the block containing the
print
call actually shows as a branch to thei
in line 2:The final branch once the loop is done also shows a bit funny, not to offset 54, as expected, but to offset 56:
I've seen unexpected behavior with
while
loops as well... I can probably recreate it if desired.@markshannon, @nedbat and I spoke about this issue previously; Mark asked me to open this issue. It should arguably be a bug report rather than a feature request... it can be your call, Mark.
Has this already been discussed elsewhere?
No response given
Links to previous discussion of this feature:
I spoke briefly about these issues in my PyCon'24 talk: https://www.youtube.com/watch?v=X9aXWeLC_C0
The text was updated successfully, but these errors were encountered: