gh-132975: Improve Remote PDB interrupt handling #133223

godlygeek · 2025-04-30T20:35:26Z

Fix several problems in how Ctrl-C works in the Remote PDB client:

In some places we would capture a KeyboardInterrupt and proxy it to the remote, but in other cases the KeyboardInterrupt wound up killing the client (notably a Ctrl-C caught while it was printing a message would kill it, and you could easily get into that situation with a PDB command that spams messages, like for x in itertools.count(): x). Fix this by explicitly blocking SIGINT signals except for the times when we're prepared to handle them and ready to send them on to the remote.
On Windows, it's not possible to interrupt the sockfile.readline() call made while waiting for the next message from the remote. Work around this by using selectors to listen for updates on either of two streams - either a message from the remote, or a signal arriving and being written to a socketpair by our signal handler.
On Unix, send a SIGINT signal directly to the remote process when we want to interrupt it. This isn't applicable on Windows, where we can't send SIGINT to another process. Sending a SIGINT signal has all the disadvantages listed in https://docs.python.org/3/library/signal.html#note-on-signal-handlers-and-exceptions but it has the advantage of matching what happens for normal non-remote PDB, and the additional advantage of being able to interrupt IO.

Making this a draft PR for now to check whether the selectors and socketpair usage need us to skip tests on additional platforms.

Issue: Remote PDB can't interrupt an infinite loop in an evaluated command #132975

Because our goal is to proxy interrupt signals to the remote process, it's important that we only accept those signals at moments when we're ready to handle them and send them on. In particular, we're prepared to handle them when reading user input for a prompt, and when waiting for the server to send a new prompt after our last command. We do not want to accept them at other times, like while we're in the middle of printing output to the screen, as otherwise a while True: "Hello" executed at the PDB REPL can't reliably be Ctrl-C'd: it's more likely that the signal will arrive while the client is printing than while it's waiting for the next line from the server, and if so nothing will be prepared to handle the `KeyboardInterrupt` and the client will exit.

Previously we worked with a file object created with `socket.makefile`, but on Windows the `readline()` method of a socket file can't be interrupted with Ctrl-C, and neither can `recv()` on a socket. This refactor lays the ground work for a solution this this problem.

Since a socket read can't be directly interrupted on Windows, use the `selectors` module to watch for either new data to be read from that socket, or for a signal to arrive (leveraging `signal.set_wakeup_fd`). This allows us to watch for both types of events and handle whichever arrives first.

This unfortunately cannot be applied to Windows, which has no way to trigger a SIGINT signal in a remote process without running code inside of that process itself.

godlygeek · 2025-04-30T22:22:52Z

OK, this is ready for a review (and needs the skip-news label)

CC @gaogaotiantian @pablogsal

gaogaotiantian

Okay so _readline is to make it possible for the client to be interrupted while receiving data from the server. You have a _handle_sigint outside of each _readline call which enables SIGINT. What if the interrupt happens before your try ... finally ... block in _readline? old_wakeup_fd would not be restored I think? Not sure if other evil thing could happen.

As far as I can tell, if you want to set your signal handler in _readline() anyway, you should just restore the signal handling in readline() - you wouldn't need to wrap every call with _handle_sigint and less racing concerns.

I see that you put some effort to achieve a "delayed" interrupt compared "ignored" ones on Windows. Is that critical? We will have much less code if we treat them the same. I think interrupting at ignored region should be the rare case anyway right?

I'm not an expert in signal, if we have multiple SIGINT during the blackout region, will we trigger multiple handlers?

signal.default_int_handler is not documented, but it is used a few times in stdlib. If that's the only handler we are going to use for _handle_sigint, should we just put it into the function and call it _unblock_sigint so it's obvious that it has the opposite effect of _block_sigint?

Also, we need a news entry for this PR. This is a user observable change (even though it in alpha phase).

godlygeek · 2025-05-01T17:07:05Z

What if the interrupt happens before your try ... finally ... block in _readline? old_wakeup_fd would not be restored I think?

Ooh, yes, that's a bug. I'll take another look at that.

I think interrupting at ignored region should be the rare case anyway right?

It depends... it can be very common, depending on what's happening. If you do the while True: "x" loop I mentioned above, most of the time is spent printing, and comparatively little time is spent reading input.

I see that you put some effort to achieve a "delayed" interrupt compared "ignored" ones on Windows. Is that critical? We will have much less code if we treat them the same.

I think I've thought of an idea for how to treat them the same, and get the "delayed" interrupts even on Windows. I'll get back to you on this...

Also, we need a news entry for this PR. This is a user observable change (even though it in alpha phase).

I thought news entries were only for things that have made it into releases, not for bug fixes to stuff that's never been released. But OK, I can add one if it's needed.

gaogaotiantian · 2025-05-01T17:33:24Z

It depends... it can be very common, depending on what's happening. If you do the while True: "x" loop I mentioned above, most of the time is spent printing, and comparatively little time is spent reading input.

Okay so what if the server is somehow dumping a lot of data to the client? Your ideal solution is to be able to delay the interrupt signal then send it to the server right?

godlygeek · 2025-05-01T18:04:35Z

Okay so what if the server is somehow dumping a lot of data to the client? Your ideal solution is to be able to delay the interrupt signal then send it to the server right?

Yes, exactly. If the user hits Ctrl-C while we're printing data, we want to set that Ctrl-C aside for later, and handle it the next time we're prepared for it - either when we next call input() to show the user a prompt, or when we next read a message from the socket, whichever comes first. This is far easier than trying to handle the interrupt asynchronously at arbitrary points.

godlygeek · 2025-05-01T19:59:12Z

So we've got 3 different ways that we want to handle SIGINT, at different points in the client:

When we've called input(), we want a KeyboardInterrupt to be raised by the signal handler, breaking us out of the input() call.
When we've called socket.recv(), we want a flag to be set and something to be written to the wakeup fd, so that we can break out of the select() call and raise a KeyboardInterrupt ourselves.
In all other cases we want a flag to be set that we can check later, when we're about to request input() from the user or a recv() from the remote.

I think I can make this simpler and more robust by using one SIGINT handler that does all 3 of those things, based on some flags being set by the application to tell it which mode it should be in. I'm gonna try that out, and if it does work it ought to be much safer than setting different handlers all over the place.

It seems like this is raised explicitly in some cases.

godlygeek · 2025-05-01T22:38:51Z

OK, I switched this around to use a single signal handler and some flags to tell it what to do. It's about the same amount of code, but I do think it's easier to reason about now. Please take another look.

gaogaotiantian

I think this approach is easier to understand than the previous one. I have a few comments. Also do we still have any coverage for the new feature?

Lib/pdb.py

godlygeek · 2025-05-02T00:40:10Z

Also do we still have any coverage for the new feature?

No tests yet, mostly because I'm still trying to figure out if there's something better we can do for Windows or not. This PR solves #132975 pretty thoroughly for Unix, but Windows still has the problem that a while True: pass executed at the PDB prompt can't be interrupted, and I haven't yet figured out a solution I'm happy with.

godlygeek · 2025-05-02T20:54:32Z

OK - I've landed on something that works for Windows. I've gone with the minimal thread approach that we discussed in #132975 (comment) but only for Windows. We could use this approach for Unix too. The advantage of using it only for Windows is that we don't need to do the more complex thing with more failure modes on all the other platforms, but the disadvantage of using it only for Windows is that we now have two different code paths to maintain and test.

Take a look and let me know a) if you're OK with using this approach for Windows, and b) if you'd like to also use this approach for Unix for the sake of consistency or if you'd rather stick with the simpler os.kill() version on Unix.

gaogaotiantian · 2025-05-02T22:01:03Z

Did we create a thread that will be in the process forever every time we attach to the process?

godlygeek · 2025-05-02T22:05:50Z

No, the thread dies 0.25 seconds after atexit handlers run or when the client disconnects, whichever comes first.

godlygeek · 2025-05-02T22:12:36Z

Looks like I did break the graceful error handling for another PDB session already being attached (the server sends an error and disconnects, the client is still waiting for the server to make the second connection for interrupt signals). I'll fix that once you confirm you're OK with the overall approach.

gaogaotiantian

I'm okay with the overall approach. A few minor points:

Could you test that the current code works with commands? I was a bit worried about how you use commands there - will it work when you expand those and write as a file?
We have a few random functions that are only about remote pdb lying around now. It may not be a huge issue, but I was wondering if it makes more sense to put them into _PdbServer and PdbClient as classmethods - so it's clear that they belong to remote pdb and we know which end we should call it.

godlygeek · 2025-05-03T23:41:13Z

Could you test that the current code works with commands? I was a bit worried about how you use commands there

I'm just gonna change that. I was trying to be clever and avoid adding an extra parameter to _connect, but that's ultimately the cause of this problem:

Looks like I did break the graceful error handling for another PDB session already being attached (the server sends an error and disconnects, the client is still waiting for the server to make the second connection for interrupt signals).

Rather than trying to smuggle client setup logic in the commands, I'll just be explicit about it, and do it in _connect before we start processing the commands, and before we check if there's another session attached.

2. We have a few random functions that are only about remote pdb lying around now. It may not be a huge issue, but I was wondering if it makes more sense to put them into _PdbServer and PdbClient as classmethods

Sure, should be easy enough.

godlygeek · 2025-05-04T04:18:15Z

I'm okay with the overall approach.

Do you want to use the thread approach on both Unix and Windows (less code, greater consistency, less to test, more potential failure modes on Unix, works in truly remote scenarios) or do you want to have the client use os.kill() on Unix (more code, less consistency, more to test, fewer potential failure modes on Unix, not usable in truly remote scenarios)?

godlygeek · 2025-05-04T17:32:32Z

I'm leaning towards using the same approach on all platforms. It's overkill for what we actually need on Unix, but it'll be less code and fewer tests, and fewer platform-specific differences that maintainers need to remember. If we ever find a reason why this causes problems on Unix, we can always revert back to using os.kill() directly as a bug fix.

Let the caller of `_connect` tell us whether to start the thread.

This gives us greater flexibility to reorder things or add things in the future without breaking backwards compatibility.

gaogaotiantian · 2025-05-04T19:22:07Z

Yeah I think consistent code on different platforms is probably easier to maintain.

gaogaotiantian

Overall I think it's good for now. Need to resovle the conflict though.

Lib/pdb.py

…interrupt_handling

godlygeek · 2025-05-04T21:06:48Z

Overall I think it's good for now. Need to resovle the conflict though.

Resolved that, moved the contextlib imports onto one line, and added some tests now that we've settled on the approach. I think this is ready to merge if you're happy with it.

godlygeek · 2025-05-04T21:26:34Z

I'm leaning towards using the same approach on all platforms.

Yeah I think consistent code on different platforms is probably easier to maintain.

Ugh, no, there's a problem with that: it won't interrupt IO on Unix. I do think we need two different paths. Or alternatively, the signal listening thread can spawn a subprocess that sends a SIGINT to its parent process (more complex, more failure modes, but works in a truly remote scenario with the client and server on different hosts).

godlygeek · 2025-05-04T22:40:08Z

I'm back to thinking this is ready to merge. I've switched it back to sending the signal from the client on Unix, so that it interrupts IO on the main thread.

pablogsal · 2025-05-05T17:36:48Z

In plan to land this today so it makes it to beta 1 so people can try it unless @gaogaotiantian thinks there are still any big concerns pending that cannot be addressed later.

gaogaotiantian

Yeah let's land this today and polish it later if necessary.

pablogsal · 2025-05-05T18:13:39Z

@godlygeek could you resolve the merge conflicts? If you don't have time I can try to do it in a couple of hours

bedevere-app bot mentioned this pull request Apr 30, 2025

Remote PDB can't interrupt an infinite loop in an evaluated command #132975

Closed

godlygeek added 3 commits April 30, 2025 17:53

Use a SIGINT to interrupt the remote on Unix

ed664b8

This unfortunately cannot be applied to Windows, which has no way to trigger a SIGINT signal in a remote process without running code inside of that process itself.

godlygeek force-pushed the improve_remote_pdb_interrupt_handling branch from 098b752 to ed664b8 Compare April 30, 2025 21:53

godlygeek marked this pull request as ready for review April 30, 2025 22:21

godlygeek requested a review from gaogaotiantian as a code owner April 30, 2025 22:21

bedevere-app bot added the awaiting review label Apr 30, 2025

gaogaotiantian reviewed May 1, 2025

View reviewed changes

godlygeek added 3 commits May 1, 2025 18:26

Handle a ValueError for operations on a closed file

aafa48c

It seems like this is raised explicitly in some cases.

Use a single complex signal handler for the PDB client

42e7cec

Add a news entry

02da647

gaogaotiantian reviewed May 1, 2025

View reviewed changes

Lib/pdb.py Show resolved Hide resolved

Lib/pdb.py Show resolved Hide resolved

Lib/pdb.py Outdated Show resolved Hide resolved

Lib/pdb.py Outdated Show resolved Hide resolved

godlygeek and others added 5 commits May 1, 2025 20:43

Update the comment to explain why we catch two exception types

5abd63a

Swap signal_read/signal_write resetting to the outer finally block

c9c5bf2

Use os.kill() on every platform but Windows

8fa88a5

Use ExitStack to reduce nesting in attach

4c0b431

Use a thread to manage interrupts on Windows

5993e06

gaogaotiantian reviewed May 3, 2025

View reviewed changes

godlygeek added 2 commits May 4, 2025 14:36

Use the signal handling thread approach on all platforms

dd7c9b1

Let the caller of `_connect` tell us whether to start the thread.

Make all _connect arguments keyword-only

e2391b0

This gives us greater flexibility to reorder things or add things in the future without breaking backwards compatibility.

gaogaotiantian reviewed May 4, 2025

View reviewed changes

Lib/pdb.py Outdated Show resolved Hide resolved

Lib/pdb.py Show resolved Hide resolved

godlygeek added 3 commits May 4, 2025 16:29

Merge remote-tracking branch 'upstream/main' into improve_remote_pdb_…

edd7517

…interrupt_handling

One line for contextlib imports

14aa8e7

Add some tests for handling SIGINT in the PDB client

b26ed5c

godlygeek added 2 commits May 4, 2025 17:39

Switch back to os.kill() on Unix

a860087

Wrap input() calls in a function

e1bb1d3

gaogaotiantian approved these changes May 5, 2025

View reviewed changes

bedevere-app bot added awaiting merge and removed awaiting review labels May 5, 2025

pablogsal enabled auto-merge (squash) May 5, 2025 18:12

Merge branch 'main' into improve_remote_pdb_interrupt_handling

2a7807c

pablogsal merged commit 9434709 into python:main May 5, 2025
39 checks passed

bedevere-app bot removed the awaiting merge label May 5, 2025

Uh oh!

gh-132975: Improve Remote PDB interrupt handling #133223

gh-132975: Improve Remote PDB interrupt handling #133223

Uh oh!

Conversation

godlygeek commented Apr 30, 2025 • edited by bedevere-app bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

godlygeek commented Apr 30, 2025

Uh oh!

gaogaotiantian left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

godlygeek commented May 1, 2025

Uh oh!

gaogaotiantian commented May 1, 2025

Uh oh!

godlygeek commented May 1, 2025

Uh oh!

godlygeek commented May 1, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

godlygeek commented May 1, 2025

Uh oh!

gaogaotiantian left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

godlygeek commented May 2, 2025

Uh oh!

godlygeek commented May 2, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

gaogaotiantian commented May 2, 2025

Uh oh!

godlygeek commented May 2, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

godlygeek commented May 2, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

gaogaotiantian left a comment

Choose a reason for hiding this comment

Uh oh!

godlygeek commented May 3, 2025

Uh oh!

godlygeek commented May 4, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

godlygeek commented May 4, 2025

Uh oh!

gaogaotiantian commented May 4, 2025

Uh oh!

gaogaotiantian left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

godlygeek commented May 4, 2025

Uh oh!

godlygeek commented May 4, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

godlygeek commented May 4, 2025

Uh oh!

pablogsal commented May 5, 2025

Uh oh!

gaogaotiantian left a comment

Choose a reason for hiding this comment

Uh oh!

pablogsal commented May 5, 2025

Uh oh!

Uh oh!

Uh oh!

godlygeek commented Apr 30, 2025 •

edited by bedevere-app bot

Loading

gaogaotiantian left a comment •

edited

Loading

godlygeek commented May 1, 2025 •

edited

Loading

godlygeek commented May 2, 2025 •

edited

Loading

godlygeek commented May 2, 2025 •

edited

Loading

godlygeek commented May 2, 2025 •

edited

Loading

godlygeek commented May 4, 2025 •

edited

Loading

gaogaotiantian left a comment •

edited

Loading

godlygeek commented May 4, 2025 •

edited

Loading