Skip to content

gh-59013: Make line number of function breakpoint more precise #110582

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 5 commits into from
Oct 27, 2023

Conversation

gaogaotiantian
Copy link
Member

@gaogaotiantian gaogaotiantian commented Oct 9, 2023

Currently if you set a breakpoint on a function like break foo, it will claim that it sets a breakpoint on the line the function is defined (aka def foo()). However, if you set a breakpoint using line number (break 4 for example), even if the description of the breakpoint is exactly the same, they have different behaviors.

Actually, when we set a breakpoint on a function, we did not set the breakpoint to the line it is defined, because normally there's no executable code on that line. The first line we would stop, is the first executable line in that function.

This patch uses a heuristic - to find the line number of the first instruction that is not RESUME. If failed, fall back to co_firstlineno.

This should cover almost all cases (I can't think of outliners, but maybe there is), and won't give a worse result than before.

Lib/pdb.py Outdated
Return code.co_firstlineno if no executable line is found.
"""
for instr in dis.get_instructions(code):
if instr.opname != 'RESUME' and instr.positions.lineno is not None:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This might not always work:

>>> def f():
...    yield 42
... 
>>> dis.dis(f)
   1           0 RETURN_GENERATOR

None           2 POP_TOP

   1           4 RESUME                   0

   2           6 LOAD_CONST               1 (42)
               8 YIELD_VALUE              1
              10 RESUME                   1
              12 POP_TOP
              14 RETURN_CONST             0 (None)

None     >>   16 CALL_INTRINSIC_1         3 (INTRINSIC_STOPITERATION_ERROR)
              18 RERAISE                  1
ExceptionTable:
  4 to 14 -> 16 [0] lasti
>>> 

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

True - it's not "worse" than the current solution though.

Actually would it be more reasonable to use the line number of the instruction after RESUME?

Now that I think about it, generators probabaly have more problems with function breakpoints - when I set a breakpoint on a generator function, I'd hope that the breakpoint is hit every time the function is entered right? And pdb is not able to do that now - it stores the line that executed first and break on that line. We could potentially enter the generator on a different line. So the problem is more serious already on generators.

Copy link
Member

@iritkatriel iritkatriel Oct 11, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually would it be more reasonable to use the line number of the instruction after RESUME?

I think the code object already has that in a field called _co_firsttraceable.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now that I think about it, generators probabaly have more problems with function breakpoints

I can believe that. When you call a generator function it creates a generator object and returns it. Then you repeatedly call the generator object (which executes the same code, past the point of the RETURN_GENERATOR).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the code object already has that in a field called _co_firsttraceable.

Yes and I don't think it was exposed to Python level.

I can believe that. When you call a generator function it creates a generator object and returns it. Then you repeatedly call the generator object (which executes the same code, past the point of the RETURN_GENERATOR).

I caused an assertion error when I was trying to test a little bit more with generators - I'll investigate into it.

From a user's point, what would be the expected behavior if they set a breakpoint to the generator function? Do they want a break when the generator is being created (so actually RETURN_GENERATOR)? That's a valid call. Or do they want a break when the "first time" the generator is executed? Or every time the generator is executed? Those are three different behaviors and the third one has issues with display the line number.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would expect it to be the first time the generator is executed. The next time it will start executing after some yield, and that's not the first line of the function.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In that case, we can use the RESUME method I mentioned above (which I think is basically how _co_firsttraceable works). Or do you think we should expose that member to Python?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you can re-implement it. As long as we have a test that will give us a heads up when it needs to change it should be ok.

@markshannon do you agree?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I updated the line searching method to use the instruction after RESUME. Also a generator test case is added.

There is one thing that I realized - if you do break func before the func is defined(evaluated), you'll still get a line number at the function definition - it uses re to find the function.

Is it better to have a consistant wrong answer, or a partially correct one?

@gaogaotiantian
Copy link
Member Author

@markshannon @iritkatriel do you have any feedback on this PR? Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants