Skip to content

Setting f_trace_opcodes to True can lead to f_lineno being removed in some cases (using breakpoint()/pdb.set_trace()) #127321

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
Viicos opened this issue Nov 27, 2024 · 15 comments
Labels
3.13 bugs and security fixes 3.14 bugs and security fixes stdlib Python modules in the Lib dir type-bug An unexpected behavior, bug, or error

Comments

@Viicos
Copy link
Contributor

Viicos commented Nov 27, 2024

Bug report

Bug description:

Since #118579 (introduced in 3.13), Bdb.set_trace will call Bdb.set_stepinstr instead of Bdb.set_step (on L406):

cpython/Lib/bdb.py

Lines 389 to 407 in 6d3b520

def set_trace(self, frame=None):
"""Start debugging from frame.
If frame is not specified, debugging starts from caller's frame.
"""
sys.settrace(None)
if frame is None:
frame = sys._getframe().f_back
self.reset()
self.enterframe = frame
while frame:
frame.f_trace = self.trace_dispatch
self.botframe = frame
self.frame_trace_lines_opcodes[frame] = (frame.f_trace_lines, frame.f_trace_opcodes)
# We need f_trace_lines == True for the debugger to work
frame.f_trace_lines = True
frame = frame.f_back
self.set_stepinstr()
sys.settrace(self.trace_dispatch)

This ends up setting f_trace_opcodes to True on all the frames of the stack.

This is fine for most use cases, but for some reason, this removes the f_lineno attribute of frames in exotic setups using breakpoint():

any(some_cond for el in it) and breakpoint()
# -> Warning: lineno is None

[1, 2] and breakpoint()
# -> Warning: lineno is None

True and breakpoint()
# Fine, lineno available

I'm using these inline conditions a lot to conditionally add a breakpoint, and not having access to the line number is annoying as many commands (such as list) will fail because they expect frame.f_lineno to not be None (and thus crashes and exits the debugger).

I'm not familiar with opcodes and how this interacts with frames. It is expected for the f_lineno to be lost here?

CPython versions tested on:

3.13, 3.14

Operating systems tested on:

Linux

Linked PRs

@Viicos Viicos added the type-bug An unexpected behavior, bug, or error label Nov 27, 2024
@picnixz picnixz added the stdlib Python modules in the Lib dir label Nov 27, 2024
@picnixz
Copy link
Member

picnixz commented Nov 27, 2024

cc @gaogaotiantian

@gaogaotiantian
Copy link
Member

I'm not able to reproduce this on either main or 3.13. Could you make a full example?

[1, 2] and breakpoint()
> /home/gaogaotiantian/programs/mycpython/scrabble.py(1)<module>()
-> [1, 2] and breakpoint()
(Pdb) list
  1  -> [1, 2] and breakpoint()
[EOF]
(Pdb) $_frame.f_lineno
1
(Pdb) 

@Viicos
Copy link
Contributor Author

Viicos commented Nov 27, 2024

Ok took some time to find how to reproduce, but turns out you need at least one extra statement after the breakpoint call. So something like:

[1,2] and breakpoint()

a = 1

should work.

@gaogaotiantian
Copy link
Member

Ah, interesting. Worked for me, thanks!

@gaogaotiantian
Copy link
Member

@iritkatriel the bytecode of

[1, 2] and breakpoint()
a = 2

is

   0           RESUME                   0

   1           LOAD_SMALL_INT           1
               LOAD_SMALL_INT           2
               BUILD_LIST               2
               COPY                     1
               TO_BOOL
               POP_JUMP_IF_FALSE        7 (to L1)
               POP_TOP
               LOAD_NAME                0 (breakpoint)
               PUSH_NULL
               CALL                     0

  --   L1:     POP_TOP

   2           LOAD_SMALL_INT           2
               STORE_NAME               1 (a)
               LOAD_CONST               0 (None)
               RETURN_VALUE

The label L1 is the short circuit label. Is it expected that the bytecode is not associated with any line number? I mean I can understand that the label itself is probably not generated directly by the code, just want to confirm if this is a desired behavior. If so, I'll need to work on some workaround for pdb.

@iritkatriel
Copy link
Member

The POP_TOP is there because we just compiled an expression (and nothing is consuming its result from the stack). If the POP_TOP was only reachable via fall through, the line number of the previous instruction would propagate to it. But here it's reachable via a jump as well (and it seems we don't check that the jump was on the same line as the fall through, but even if we did this would fail in the case where the boolean expression spans more than one line).

There are situations (such as return and raise instructions) where we force instructions to have a line number even if it requires making multiple copies of some instructions. But this is not one of those cases. Why is it a problem here that there is no line number?

@gaogaotiantian
Copy link
Member

Why is it a problem here that there is no line number?

Because breakpoint() will stop at the next instruction after CALL. In this case, it would be POP_TOP. pdb relies on the line number for multiple things, for example, show the current line. It will also use it for list command to show source code. It would be helpful to know the line number.

The workaround could be stop at opcode that has line number, but in this case, the next instruction belongs to the next line, which would break the behavior that breakpoint() will stop at the line of breakpoint().

@iritkatriel
Copy link
Member

Can the breakpoint builtin do something to help know where it is?

@gaogaotiantian
Copy link
Member

Well it also affects pdb.set_trace() so we can't only deal with it in breakpoint(). If assigning a line number to this opcode is a no-go, we probably have 3 ways to deal with this:

  1. Let it be, it's a corner case anyway.
  2. Make a proxy to "fake" the line number. We can intercept the access to f.f_lineno inside pdb and fake some number. The way I can think of is to get the previous opcode (hopefully CALL) and the line number associated. The downside of this way is that we might have plenty of places that we need to patch and the magic is not pretty. Also when the users access $_frame, they'll see that the lineno is -1, which contradicts what we show them.
  3. Like I said, for set_trace(), stop only when there is a line number. In this case, it will stop at the next line, which is a slight deviation from our existing behavior.

As this is a rare case, I don't think either of the options above is horrible (although I'm a bit hesitated about 2. as it impacts too much code in pdb). However, from the other point of view, none of the options is perfect either. My perfect solution was for the opcode to have a line number, but maybe that's too much a burden for the compiler.

@iritkatriel
Copy link
Member

My perfect solution was for the opcode to have a line number, but maybe that's too much a burden for the compiler.

It's not that it's a burden on the compiler. It's that line numbers are static, so if you can reach this NOP from two different paths, you need to make two copies of this NOP, and then add a jump over one of the copies to the next instruction (here it's the LOAD_SMALL_INT). So we emit more code, and add a jump, just for this.

@iritkatriel
Copy link
Member

It might be possible to improve the situation if the compiler does propagate the line number if it's the same for both predecessors (as it is in this example). It won't completely fix the problem, this will still not work:

>>> (1 or
...   2) and breakpoint()

@gaogaotiantian
Copy link
Member

Yeah I understand the problem behind. I actually have a question here:

If the compiler knows the return value is not used, why does it need to copy the return value? Can it do a simple POP_JUMP_IF_FALSE? We can save a COPY, TO_BOOL and the extra POP_TOP. (I'm not the expert in bytecodes so I might be completely wrong).

Also, even if we want a single POP_TOP at the end, do we leave the line number empty? Can we think of it as "pop the eval value from stack after the full expression" so the opcode should be associated to the end of the expression (last line)? I think we have this kind of idea in mind when we deal with loops or other code blocks? I mean in reality, the opcodes we generated are not directly corresponding to an exact line of code, is it better to just leave it blank? Comparing to assigning a reasonable but not completely accurate value? In my mind it gives more useful information to users, even not considering the usage of pdb.

I think line numbers are majorly used by debuggers, so it's debugging information anyway (correct me if I'm wrong). "I don't know where this opcode is generated" is correct by less useful information for debugging.

@iritkatriel
Copy link
Member

Yeah I understand the problem behind. I actually have a question here:

If the compiler knows the return value is not used, why does it need to copy the return value? Can it do a simple POP_JUMP_IF_FALSE? We can save a COPY, TO_BOOL and the extra POP_TOP. (I'm not the expert in bytecodes so I might be completely wrong).

It's a feature of the language that the result of a boolean expression is the value of the element that gave the truthy result:

>>> x = 0 or 1 or 2
>>> x
1

It's common to use this for None handling:

def f(arg = None):
    arg = arg or DEFAULT_VALUE_OF_ARG

Also, even if we want a single POP_TOP at the end, do we leave the line number empty? Can we think of it as "pop the eval value from stack after the full expression" so the opcode should be associated to the end of the expression (last line)?

The last line of the expression may not have executed. we want the POP_TOP to have the line number of the last instruction from the expression that executed, otherwise it would mess up the trace.

I think line numbers are majorly used by debuggers, so it's debugging information anyway (correct me if I'm wrong). "I don't know where this opcode is generated" is correct by less useful information for debugging.

They are also used for other things, like tracebacks and coverage. Coverage is particularly sensitive to inaccurate line numbers.

@gaogaotiantian
Copy link
Member

>>> x = 0 or 1 or 2
>>> x
1

These are cases where the expressions are assigned to a variable, and this is not causing issue if I understood correctly. I meant for the cases where the expressions are not assigned to anything, so the value is unused. We should be able to optimize that case right? It's faster and it generates less code (and it's rare).

otherwise it would mess up the trace.

Hmm okay it might even generate a new line event which is probably something we don't want. I guess I'll have to deal with this in pdb.

gaogaotiantian added a commit to gaogaotiantian/cpython that referenced this issue Dec 1, 2024
…ated line number for breakpoint() (pythonGH-127457)

(cherry picked from commit 1bc4f07)

Co-authored-by: Tian Gao <gaogaotiantian@hotmail.com>
@picnixz picnixz added 3.13 bugs and security fixes 3.14 bugs and security fixes labels Dec 1, 2024
gaogaotiantian added a commit that referenced this issue Dec 1, 2024
…ine number for breakpoint() (GH-127457) (#127487)

(cherry picked from commit 1bc4f07)
@gaogaotiantian
Copy link
Member

This is fixed in main and backported to 3.13.

picnixz pushed a commit to picnixz/cpython that referenced this issue Dec 2, 2024
srinivasreddy pushed a commit to srinivasreddy/cpython that referenced this issue Jan 8, 2025
ebonnal pushed a commit to ebonnal/cpython that referenced this issue Jan 12, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
3.13 bugs and security fixes 3.14 bugs and security fixes stdlib Python modules in the Lib dir type-bug An unexpected behavior, bug, or error
Projects
None yet
Development

No branches or pull requests

4 participants