Skip to content

gh-132975: Improve Remote PDB interrupt handling #133223

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 10 commits into
base: main
Choose a base branch
from

Conversation

godlygeek
Copy link
Contributor

@godlygeek godlygeek commented Apr 30, 2025

Fix several problems in how Ctrl-C works in the Remote PDB client:

  • In some places we would capture a KeyboardInterrupt and proxy it to the remote, but in other cases the KeyboardInterrupt wound up killing the client (notably a Ctrl-C caught while it was printing a message would kill it, and you could easily get into that situation with a PDB command that spams messages, like for x in itertools.count(): x). Fix this by explicitly blocking SIGINT signals except for the times when we're prepared to handle them and ready to send them on to the remote.
  • On Windows, it's not possible to interrupt the sockfile.readline() call made while waiting for the next message from the remote. Work around this by using selectors to listen for updates on either of two streams - either a message from the remote, or a signal arriving and being written to a socketpair by our signal handler.
  • On Unix, send a SIGINT signal directly to the remote process when we want to interrupt it. This isn't applicable on Windows, where we can't send SIGINT to another process. Sending a SIGINT signal has all the disadvantages listed in https://docs.python.org/3/library/signal.html#note-on-signal-handlers-and-exceptions but it has the advantage of matching what happens for normal non-remote PDB, and the additional advantage of being able to interrupt IO.

Making this a draft PR for now to check whether the selectors and socketpair usage need us to skip tests on additional platforms.

Because our goal is to proxy interrupt signals to the remote process,
it's important that we only accept those signals at moments when we're
ready to handle them and send them on. In particular, we're prepared to
handle them when reading user input for a prompt, and when waiting for
the server to send a new prompt after our last command. We do not want
to accept them at other times, like while we're in the middle of
printing output to the screen, as otherwise a

    while True: "Hello"

executed at the PDB REPL can't reliably be Ctrl-C'd: it's more likely
that the signal will arrive while the client is printing than while it's
waiting for the next line from the server, and if so nothing will be
prepared to handle the `KeyboardInterrupt` and the client will exit.
Previously we worked with a file object created with `socket.makefile`,
but on Windows the `readline()` method of a socket file can't be
interrupted with Ctrl-C, and neither can `recv()` on a socket. This
refactor lays the ground work for a solution this this problem.
Since a socket read can't be directly interrupted on Windows, use the
`selectors` module to watch for either new data to be read from that
socket, or for a signal to arrive (leveraging `signal.set_wakeup_fd`).
This allows us to watch for both types of events and handle whichever
arrives first.
This unfortunately cannot be applied to Windows, which has no way to
trigger a SIGINT signal in a remote process without running code inside
of that process itself.
@godlygeek godlygeek force-pushed the improve_remote_pdb_interrupt_handling branch from 098b752 to ed664b8 Compare April 30, 2025 21:53
@godlygeek godlygeek marked this pull request as ready for review April 30, 2025 22:21
@godlygeek
Copy link
Contributor Author

OK, this is ready for a review (and needs the skip-news label)

CC @gaogaotiantian @pablogsal

Copy link
Member

@gaogaotiantian gaogaotiantian left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay so _readline is to make it possible for the client to be interrupted while receiving data from the server. You have a _handle_sigint outside of each _readline call which enables SIGINT. What if the interrupt happens before your try ... finally ... block in _readline? old_wakeup_fd would not be restored I think? Not sure if other evil thing could happen.

As far as I can tell, if you want to set your signal handler in _readline() anyway, you should just restore the signal handling in readline() - you wouldn't need to wrap every call with _handle_sigint and less racing concerns.

I see that you put some effort to achieve a "delayed" interrupt compared "ignored" ones on Windows. Is that critical? We will have much less code if we treat them the same. I think interrupting at ignored region should be the rare case anyway right?

I'm not an expert in signal, if we have multiple SIGINT during the blackout region, will we trigger multiple handlers?

signal.default_int_handler is not documented, but it is used a few times in stdlib. If that's the only handler we are going to use for _handle_sigint, should we just put it into the function and call it _unblock_sigint so it's obvious that it has the opposite effect of _block_sigint?

Also, we need a news entry for this PR. This is a user observable change (even though it in alpha phase).

@godlygeek
Copy link
Contributor Author

What if the interrupt happens before your try ... finally ... block in _readline? old_wakeup_fd would not be restored I think?

Ooh, yes, that's a bug. I'll take another look at that.

I think interrupting at ignored region should be the rare case anyway right?

It depends... it can be very common, depending on what's happening. If you do the while True: "x" loop I mentioned above, most of the time is spent printing, and comparatively little time is spent reading input.

I see that you put some effort to achieve a "delayed" interrupt compared "ignored" ones on Windows. Is that critical? We will have much less code if we treat them the same.

I think I've thought of an idea for how to treat them the same, and get the "delayed" interrupts even on Windows. I'll get back to you on this...

Also, we need a news entry for this PR. This is a user observable change (even though it in alpha phase).

I thought news entries were only for things that have made it into releases, not for bug fixes to stuff that's never been released. But OK, I can add one if it's needed.

@gaogaotiantian
Copy link
Member

It depends... it can be very common, depending on what's happening. If you do the while True: "x" loop I mentioned above, most of the time is spent printing, and comparatively little time is spent reading input.

Okay so what if the server is somehow dumping a lot of data to the client? Your ideal solution is to be able to delay the interrupt signal then send it to the server right?

@godlygeek
Copy link
Contributor Author

Okay so what if the server is somehow dumping a lot of data to the client? Your ideal solution is to be able to delay the interrupt signal then send it to the server right?

Yes, exactly. If the user hits Ctrl-C while we're printing data, we want to set that Ctrl-C aside for later, and handle it the next time we're prepared for it - either when we next call input() to show the user a prompt, or when we next read a message from the socket, whichever comes first. This is far easier than trying to handle the interrupt asynchronously at arbitrary points.

@godlygeek
Copy link
Contributor Author

godlygeek commented May 1, 2025

So we've got 3 different ways that we want to handle SIGINT, at different points in the client:

  • When we've called input(), we want a KeyboardInterrupt to be raised by the signal handler, breaking us out of the input() call.
  • When we've called socket.recv(), we want a flag to be set and something to be written to the wakeup fd, so that we can break out of the select() call and raise a KeyboardInterrupt ourselves.
  • In all other cases we want a flag to be set that we can check later, when we're about to request input() from the user or a recv() from the remote.

I think I can make this simpler and more robust by using one SIGINT handler that does all 3 of those things, based on some flags being set by the application to tell it which mode it should be in. I'm gonna try that out, and if it does work it ought to be much safer than setting different handlers all over the place.

@godlygeek
Copy link
Contributor Author

OK, I switched this around to use a single signal handler and some flags to tell it what to do. It's about the same amount of code, but I do think it's easier to reason about now. Please take another look.

Copy link
Member

@gaogaotiantian gaogaotiantian left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this approach is easier to understand than the previous one. I have a few comments. Also do we still have any coverage for the new feature?

@@ -2677,7 +2678,7 @@ def _send(self, **kwargs):
try:
self._sockfile.write(json_payload.encode() + b"\n")
self._sockfile.flush()
except OSError:
except (OSError, ValueError):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the new ValueError about?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I hit it while testing something unrelated. Seems like we can either get an OSError if the socket has been disconnected but .close() hasn't been called on the sockfile yet, or a ValueError if .close() has been called.

I think that case happens if we try to send a message after detach() has been called, though I'm not sure how exactly I triggered it. Maybe sending a KeyboardInterrupt that arrived after a q was processed?

def read_command(self, prompt):
reply = input(prompt)
with self._sigint_raises_keyboard_interrupt():
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we have to put this exactly around input()? We should be able to do just around read_command right? KeyboardInteruupt is handled outside of the function. Or this has something to do with the one-shot thing?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it would be fine to put it around read_command, if you prefer that. I tried to scope it as narrowly as possible to make it as clear as possible why we're doing it, but I can't see any harm to doing it outside instead.

Lib/pdb.py Outdated
# Restore the old wakeup fd if we installed a new one
if old_wakeup_fd is not sentinel:
signal.set_wakeup_fd(old_wakeup_fd)
self.signal_read = self.signal_write = None
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this line be in the outmost finally?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It doesn't really matter much, and we could also just remove this line entirely - once we've restored the old wakeup fd, nothing will ever use either of those sockets again, whether we set them to None or not.

But I do prefer cleaning up after myself, so you're right, and I'll swap it to the outer finally.

Lib/pdb.py Outdated
" (Use 'cont' to resume)."
)
sys.remote_exec(self.pid, self.interrupt_script)
if hasattr(signal, "pthread_kill"):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now that we don't even use pthread_kill, I think we could have better options to detect platform. I think the most commonly used is sys.platform == "win32".

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What we want here isn't really Windows vs not-Windows, it's Unix vs not-Unix.

For now sys.platform == "win32" is good enough, but if we ever got sys.remote_exec() and remote PDB ported to WASI or something weird like that, that guard might need updating.

I'll update it to sys.platform == "win32", though, and leave that problem for another day.

@godlygeek
Copy link
Contributor Author

Also do we still have any coverage for the new feature?

No tests yet, mostly because I'm still trying to figure out if there's something better we can do for Windows or not. This PR solves #132975 pretty thoroughly for Unix, but Windows still has the problem that a while True: pass executed at the PDB prompt can't be interrupted, and I haven't yet figured out a solution I'm happy with.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants