Skip to content

Remote PDB can't interrupt an infinite loop in an evaluated command #132975

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
godlygeek opened this issue Apr 25, 2025 · 9 comments
Open

Remote PDB can't interrupt an infinite loop in an evaluated command #132975

godlygeek opened this issue Apr 25, 2025 · 9 comments
Labels
3.14 new features, bugs and security fixes stdlib Python modules in the Lib dir type-bug An unexpected behavior, bug, or error

Comments

@godlygeek
Copy link
Contributor

godlygeek commented Apr 25, 2025

Bug report

Bug description:

As @gaogaotiantian pointed out in #132451 (comment) PDB's new remote attaching feature can enter an uninterruptible infinite loop if you type while True: pass at a PDB prompt. We're able to interrupt the script itself, when you 'cont'inue past a breakpoint, but aren't able to interrupt a statement evaluated interactively at a PDB prompt.

I have a few different ideas for how this could be fixed, though they all have different tradeoffs.

Options:

  1. the client can send a SIGINT signal to the remote.
    • Pro: it can interrupt IO, as long as PyErr_CheckSignals() is being called by the main thread
    • Con: it will not work on Windows
    • Con: it will kill the process if the SIGINT handler has been unset (and PDB does mess with the signal handler - I'm not sure this is just a theoretical problem)
  1. we can raise a KeyboardInterrupt from an injected script
    • Pro: It will work on Windows
    • Con: We need to change sys.remote_exec() to allow KeyboardInterrupt to escape
    • Pro: A looping injected script can be killed now, though!
    • Con: It cannot interrupt IO
    • Con: The KeyboardInterrupt will get raised even in places where signals were blocked, potentially leading to a crash where something that should never get an exception got one
  1. we can make PDB stop if set_trace() is called while it's evaluating a user-supplied statement
    • Pro: it will work on Windows
    • Con: It cannot interrupt IO
    • Con: it will lead to recursion of PDB's trace function if the script is interrupted, and I don't know how well PDB will handle that
  1. the PDB server can start a thread that listens to a socket and calls signal.raise_signal(signal.SIGINT) every time it gets a message on the socket
    • Pro: it can interrupt IO in the same cases as 1 can
    • Pro: it works in "true remote" scenarios where the client and server are on different hosts (which isn't currently a feature, but would be a nice future feature)
    • Pro: it works on Windows
    • Con: more complexity, more failure modes

I'm currently leaning towards (4) - it's the most complete fix and works on the most platforms and the most scenarios (interrupting both CPU-bound and IO-bound stuff). I plan to prototype this soon.

CPython versions tested on:

CPython main branch

Operating systems tested on:

No response

Linked PRs

@godlygeek godlygeek added the type-bug An unexpected behavior, bug, or error label Apr 25, 2025
@gaogaotiantian
Copy link
Member

I'm leaning towards 2. I think remote_exec should be as close to exec as possible. exec does not shield exceptions, why should remote_exec? It introduces no new complexity in the system - it would be like hitting Ctrl+C in the host process.

As for the concern to interrupt the other programs - pdb actually swapped SIGINT handler when it resumes (unless c is used and no breakpoint left where it just stops tracing) so that users can already use Ctrl+C to stop the program without breaking it. I think that part should just work. Worst case, we check if we are in pdb in our interrupt script.

Can Ctrl+C interrupt IO on normal programs? I think if the process is stuck in C it won't work either right?

I would really love if we could avoid multi-threading (at least for now). That's a whole other layer of complexity. pdb itself can't even deal with multi-thread now.

@picnixz picnixz added stdlib Python modules in the Lib dir 3.14 new features, bugs and security fixes labels Apr 25, 2025
@pablogsal
Copy link
Member

I strongly advocate for any solution that allows interrupting IO-bound operations. While I'm leaning toward the thread-based approach (option 4), I'm not married to that specific implementation as long as we can reliably interrupt IO.

Being able to interrupt IO is absolutely critical for debugging real-world applications. Many of the most frustrating debugging scenarios involve programs that appear completely frozen while waiting on network requests, database operations, or file system access. Without IO interruption capability, users would be forced to kill the entire debugging session when encountering these common situations.

In a normal Python process this is possible due to how Python handles signals that are delivered while blocked on long system calls - the syscall returns EINTR and allows Python to react to it. That is why you fan still press Ctrl-C while calling sleep() for example.

While threading introduces complexity, I believe a solution that can't handle IO interruption would be a serious functional limitation that would affect many users. The thread has the advantage of working on all OSes and doing what we want. But I can understand if we want to explore other options

@godlygeek
Copy link
Contributor Author

godlygeek commented Apr 25, 2025

I think remote_exec should be as close to exec as possible. exec does not shield exceptions, why should remote_exec?

The user controls when exec runs, but not when a script injected by remote_exec runs. And so there might not be any frames above it on the call stack which are prepared to handle an exception, and certainly they're not expecting that, for instance, x = 1 might raise a ValueError because a remote process injected a script that executed int("blah").

There's a good reason to swallow exceptions in general. It might be justifiable to allow KeyboardInterrupt through, though - that one is special in that, unlike a ValueError, programs already expect to handle KeyboardInterrupt almost anywhere. Unless they change the default signal handler for SIGINT, at least...

Can Ctrl+C interrupt IO on normal programs?

Yes - try:

(Pdb) import time; time.sleep(30)

Hit Ctrl-C and it gets interrupted and you get your PDB prompt back. That's only possible with a signal; option 2 and 3 cannot possibly interrupt IO (the injected script won't get run until after the sleep finishes).

I think if the process is stuck in C it won't work either right?

Not generally, but C code that's long running is supposed to periodically poll PyErr_CheckSignals. If the C code calls PyErr_CheckSignals after a syscall gets an EINTR, it will be possible to break out of it, and if it doesn't it won't be. With the same caveats as above - this could only work for option (1) and (4), never for (2) or (3). Neither of them will run the injected script until control returns to the Python eval loop.

@godlygeek
Copy link
Contributor Author

I believe a solution that can't handle IO interruption would be a serious functional limitation that would affect many users.

And indeed, the current remote PDB implementation handles calling input() just as badly as it handles calling while True: pass. You can't recover from that input() call without sending the process that is being debugged a SIGINT, or writing some data to its stdin.

@gaogaotiantian
Copy link
Member

Being able to interrupt IO is absolutely critical for debugging real-world applications.

You can't even attach pdb if the host process is stuck in IO operation - it requires a bytecode to run, so that's an issue anyway. Unless you change sys.remote_exec(), we won't be able to solve it for good.

So we are talking about a solution that can interrupt IO operation after pdb is attached, when the program itself is not in IO operation to begin with - that would be less useful for users.

I agree a functional limited debugger is frustrating to users, but what's more frustrating is a debugger that can't keep its promise and fails frequently.

If we do want a separate thread, we should do all of our communication there - still keep a single socket. We should be able to reuse the existing code to raise all kinds of signals to the main thread. However we are really close to beta freeze, do we want to onboard a even more complicated feature with potential racing issues? pdb has never had anything to do with multi-threading until now.

@godlygeek
Copy link
Contributor Author

While researching, I found that IDLE's REPL faces this same problem - #74112

I've done some experiments.

On Windows, signal.raise_signal(signal.SIGINT) in a background thread does interrupt at least some IO that's happening on the main thread. It doesn't cause a call to input() to fail with a KeyboardInterrupt, but a call to time.sleep() does raise a KeyboardInterrupt.

On Unix, however, signal.raise_signal(signal.SIGINT) in a background thread won't interrupt IO running in a different thread. In retrospect this makes sense - the signal just gets processed by the thread that raised it, and there's no need to interrupt IO in other threads when calling Python's C signal handler. I also checked if _thread.interrupt_main() would work, but it doesn't interrupt IO in the main thread either; it just directly calls Python's C signal handler without even raising a signal. We can interrupt IO on the main thread using a (Unix specific) call, though:

signal.pthread_kill(threading.main_thread().ident, signal.SIGINT)

So at this point, the best option that I see is to have PDB run a thread that listens for messages telling it to interrupt the main thread, and for it to call signal.raise_signal() on Windows and call signal.pthread_kill() on everything else. That should work as well as Ctrl-C on Unix, and will work in at least some cases on Windows. If anyone sees a better solution for interrupting IO on Windows, I'm all ears, though.

If we do want a separate thread, we should do all of our communication there - still keep a single socket.

If you're sure that's what you want, we can do that, but it's considerably more complex. It means that every message except interrupt signals needs to be read by the background thread, re-written to a threading.Queue, and re-read by tracing thread before it can be processed, and it also means that every message needs to be pattern matched twice, once for the signal handling thread to decide if it's responsible for handling it, and once for the tracing thread to decide how to handle it.

And if by "do all of our communication there" you mean socket writes as well as socket reads, then it's drastically more complicated, and in fact we either need two background threads instead of one or for the tracing thread to communicate with the single background thread over a separate socket. This is necessary so that the background thread can wake up both when there's messages originating from the tracing thread that need to be written to the client and when there's messages originating from the client that need to be written to the tracing thread. At that point we're pulling in selectors or something like it. Or I guess we could run an asyncio event loop in that background thread...

Logically, interrupt signals are out-of-band events, and handling them out-of-band is simpler. Having one socket mimicking what a "normal" non-remote PDB would read on stdin and write to stdout and a separate socket mimicking asynchronous signals seems much simpler and cleaner to me. When PDB is running non-remotely in a terminal, it has two separate input streams for those two separate types of input. I really don't think we should entangle those two input streams in remote PDB's case.

@gaogaotiantian
Copy link
Member

Okay. Let's do a prototype that has a minimum thread that only does interrupt signal handling and see how that goes. I guess that eliminated some code for interrupt script. Hopefully the thread itself is small enough to be easily maintainable.

@godlygeek
Copy link
Contributor Author

I've hit a complication with the thread approach: if you 'cont'inue the script from a remote PDB and it finishes, it falls off the end of main with the client still attached and the thread still running. If I make the thread a daemon thread things work fine, but those are notoriously problematic and easily lead to crashes after main. If I don't make it a daemon thread, continuing past the end of main results in a deadlock: Py_Finalize is waiting for the background thread to stop, the background thread is waiting for someone to tell it to stop (either an EOF from the client or an explicit stop request from detach()), and the client is waiting for the server to give it a prompt or an EOF. Unless there's some convenient place to detach the PDB tracer once the main script finishes, this idea might be dead in the water.

@gaogaotiantian please let me know if there is some reasonable way to force detach() to be called once the main script ends that I haven't noticed.

If there's not, we'll either need to decide to bite the bullet and use a daemon thread, or we'll need to give up on the thread idea. If option (4) isn't workable, option (1) is definitely our second best choice on Unix, but I don't think there's any good option on Windows. Maybe we can use CreateRemoteThread to inject a non-Python thread that never acquires the GIL or a threadstate, and just calls raise(SIGINT) and then exits. That's reasonably clean, but very platform specific, and requires us to have an extension module or builtin module that exposes the function we want to call, plus the call to CreateRemoteThread itself. All told I'm not sure if that's more or less safe than a daemon thread.

Another option is to use option (1) on Unix, but keep letting Windows do what it's doing today (which can interrupt a 'cont'inued script while it's running Python bytecode, but can't interrupt IO nor any statement executed from a PDB prompt).

@gaogaotiantian
Copy link
Member

I don't know a lot about how can we detach the thread properly at the finalization phase. I always have concerns about introducing a new thread to handling stuff in pdb. I'm okay to have a simple solution that works on Unix and leave Windows for future.

However, if we do swallow exceptions in sys.remote_exec, we should mention it in the docs - that's definitely a surprise to me. There might be design concerns behind it, but we should say something. I don't think the majority of the people will expect that raise ValueError() will do nothing in the remote exec.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
3.14 new features, bugs and security fixes stdlib Python modules in the Lib dir type-bug An unexpected behavior, bug, or error
Projects
None yet
Development

No branches or pull requests

4 participants