Skip to content

Unable to cancel Server.serve_forever() if a reader is blocked while reading (3.12 regression) #123720

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
paravoid opened this issue Sep 5, 2024 · 12 comments
Assignees
Labels
3.12 only security fixes 3.13 bugs and security fixes 3.14 bugs and security fixes sprint stdlib Python modules in the Lib dir topic-asyncio type-bug An unexpected behavior, bug, or error

Comments

@paravoid
Copy link
Contributor

paravoid commented Sep 5, 2024

Bug report

Bug description:

Consider this code, slightly simplifying the documentation's TCPServer example code:

import asyncio

async def handle_echo(reader, writer):
    print("Reading")
    data = await reader.read(100)
    message = data.decode()
    print(f"Received '{message!r}'")

    print("Closing the connection")
    writer.close()
    await writer.wait_closed()

async def main():
    server = await asyncio.start_server(handle_echo, "127.0.0.1", 8888)
    print("Serving forever...")
    async with server:
        try:
            await server.serve_forever()  
        except asyncio.CancelledError:
            print("Cancelled by Ctrl+C")
            server.close()

asyncio.run(main())

(My code is closer to a while True: await reader.readline(); ..., but the above probably suffices as a demonstration)

Running this results in a Serving forever, and hitting Ctrl+C, results in a Cancelled by Ctrl+C, and a normal exit.

However, if in another window we nc 127.0.0.1 8888, and leave the connection open, Ctrl+C (SIGINT) does nothing, and a second Ctrl+C is required to terminate. (This however breaks out of the asyncio loop by raising KeyboardInterrupt() , as documented).

So basically clients can prevent the server from cleanly exiting by just keeping their connection open.

This is a regression: this fails with 3.12.5 and 3.13.0-rc1 but works with 3.11.9.

This is because (TTBOMK) of this code in base_events.py:

        try:
            await self._serving_forever_fut
        except exceptions.CancelledError:
            try:
                self.close()
                await self.wait_closed()
            finally:
                raise
        finally:
            self._serving_forever_fut = None

I believe this to be related to the wait_closed() changes, 5d09d11, 2655369 etc. (Cc @gvanrossum). Related issues #104344 and #113538.

Per @gvanrossum in #113538 (comment): "In 3.10 and before, server.wait_closed() was a no-op, unless you called it before server.close(), in a task (asyncio.create_task(server.wait_closed())). The unclosed connection was just getting abandoned."

CancelledError() is caught here, which spawns wait_closed() before re-raising the exception. In 3.12+, wait_closed()... actually waits for the connection to close, as intended. However, while this prevents the reader task from being abandoned, it does not allow neither the callers of reader.read() or serve_forever() to catch CancelledError() and clean up (such as actually closing the connection, potentially after e.g. a signaling to the client a server close through whatever protocol is implemented here).

Basically no user code is executed until the client across the network drops the connection.

As far as I know, it's currently impossible to handle SIGINTs cleanly with clients blocked in a read() without messing with deep asyncio/selector internals, which seems like a pretty serious limitation? Have I missed something?

CPython versions tested on:

3.11, 3.12, 3.13

Operating systems tested on:

Linux

Linked PRs

@paravoid paravoid added the type-bug An unexpected behavior, bug, or error label Sep 5, 2024
@github-project-automation github-project-automation bot moved this to Todo in asyncio Sep 5, 2024
@picnixz picnixz added 3.12 only security fixes 3.13 bugs and security fixes 3.14 bugs and security fixes stdlib Python modules in the Lib dir labels Sep 5, 2024
@gvanrossum gvanrossum self-assigned this Sep 7, 2024
@gvanrossum
Copy link
Member

I will try to look at this when I'm at the core sprint, Sept 23-27.

@gvanrossum
Copy link
Member

@paravoid The above code doesn't work; it crashes on line 6. I guess you didn't actually run it before filing the bug? Also I don't know what nc does with these parameters, could you please add a client that triggers the issue so I can understand, or at least explain what this nc invocation does?

Finally, your code differs (apart from the crash) from the original code in several ways:

  • The handler never writes back to the writer
  • It has a try/except around serve_forever()

Which of these is significant?

@paravoid
Copy link
Contributor Author

paravoid commented Sep 26, 2024

@gvanrossum Thanks for taking a look! I did run the code but tried simplifying it after the fact (it's all in the GitHub history). Fixed now, it was a s/line/data/. Sorry about that!

nc just connects and waits for input. There is no need to do anything on the socket. Instead of nc you can also do: python3 -c "import socket, time; s = socket.socket(socket.AF_INET, socket.SOCK_STREAM); s.connect(('localhost', 8888)); time.sleep(100)" in another window.

Finally, your code differs (apart from the crash) from the original code in several ways:
* The handler never writes back to the writer
* It has a try/except around serve_forever()

Which of these is significant?

None of the two are significant. The write has been removed for simplicity (it doesn't matter). The try/except is there to make it easier to see that error handling does work when a client is not connected.

@gvanrossum
Copy link
Member

gvanrossum commented Sep 26, 2024

My first reaction, after putting a bunch of print calls in the code you quoted in Server.close() in base_events.py, is that this was done intentionally (but without thinking through all the consequences).

The cure would then to be to put a timeout in each handler, e.g. async with asyncio.timeout(5) around the await reader.read(100). Then the behavior will be:

  • If a request comes in and the client hangs, and you don't interrupt the server, after 5 seconds the handler will terminate with a TimeoutError, which will be logged (and the server will close the connection), and the server will continue serving. (And the client's connection will be closed, although if you use a hard time.sleep(100) it will not realize this.)
  • If a request comes in, the client hangs, and you hit ^C before the handler times out, the handler will continue to wait until its 5 seconds are up, and then raise TimeoutError as before. At that point the server's wait_closed() detects that the last handler is done and exits, continuing the path taken by the ^C.

You could also catch the TimeoutError and continue to close the connection (so you don't get a traceback logged). However it looks that in this case you will hang in the handler's await writer.wait_closed() until the client cooperates (which might be never) so you still have the same problem. It seems that a secondary bug is that that await doesn't honor timeouts -- this is something we might be able to fix separately.

In any case, this gives a workaround for 3.12 and 3.13 (the latter isn't out yet but has been in feature freeze for months) but it's far from perfect. I'm not sure what to do to make the experience better without adding timeouts to every handler. Maybe the solution is just to send CancelledError into every handler that's still active when the server is closed? That's close to what I assumed was happening before 3.12 -- the server process responds immediately (?) to ^C, so the handlers are effectively abandoned.

What do you think, @paravoid ?

@gvanrossum
Copy link
Member

Here's a diff that appears to fix your issue and break no tests (though it clearly needs a new test that will demonstrate it changed something).

index ffcc0174e1e..a42c4ec0037 100644
--- a/Lib/asyncio/base_events.py
+++ b/Lib/asyncio/base_events.py
@@ -381,6 +381,7 @@ async def serve_forever(self):
         except exceptions.CancelledError:
             try:
                 self.close()
+                self.close_clients()
                 await self.wait_closed()
             finally:
                 raise

@paravoid
Copy link
Contributor Author

Here's a diff that appears to fix your issue and break no tests (though it clearly needs a new test that will demonstrate it changed something).

Ah! That's neat. I originally discovered this in 3.12 and only later reproduced it in 3.13, so I hadn't noticed the addition of close_clients(). This fix seems so simple in retrospect! I tried it out and it seems to work as intended with my limited testing.

In light of this fix, it may be entirely moot now, but with regards to your previous message: adding arbitrarily-valued timeouts doesn't strike me as a great solution. The value may be too short for legitimate use cases, forcing users to have to reimplement retries in a loop to read() again etc. It'll be even more of a nightmare with readline(). That's going to be a lot of tricky boilerplate code, to address pitfalls similar to the ones that PEP 475 (Retry system calls failing with EINTR) tried to address IMHO. At the same time, the timeout value may also be too long, and a noticeable lag for shutdowns.

@gvanrossum
Copy link
Member

gvanrossum commented Sep 27, 2024

I will have to ponder whether it's okay to backport this to 3.12 and 3.13. I can't decide whether it's a new feature that might break code that worked before in 3.12 or 3.13, or just a bug fix (which would qualify for backporting). To deal with the cancellation, handlers that want to send a final goodbye to their clients will now have to catch CancelledError. Then again in 3.12/3.13 they currently have no way to tell that the server is being closed, so, yeah, I'm thinking of it more and more as a bugfix that more or less restores the old (3.11 and earlier) behavior.

I have to come up with a test case too. :-(

@paravoid
Copy link
Contributor Author

paravoid commented Oct 8, 2024

Thanks so much for pushing a fix! My (non-strong) opinion is that this deserves to be fixed in a minor release, both because it's a regression compared to 3.11, and because it's hard to workaround in downstream code, at least without reimplementing some of stdlib's internals. (For 3.12 it's even worse, because close_clients() doesn't even exist there.)

@gvanrossum
Copy link
Member

I will try to find time to complete the PR. I will have to ask the 3.12 and 3.13 release managers for their opinion on the backports.

@paravoid
Copy link
Contributor Author

paravoid commented Mar 7, 2025

It seems like we lost our momentum... I see there is an unmerged PR to fix this -- any way I can help to move this forward? Thanks again!

@ordinary-jamie
Copy link
Contributor

ordinary-jamie commented Mar 9, 2025

Hi @gvanrossum, hope you have been well!

I have to come up with a test case too. :-(

Here's a crude test that will raise a TimeoutError without Server.close_clients() and will pass with your fix:

Edit: Updated to include @pskopnik suggestion!

test_asyncio.test_server.TestServer2

class TestServer2(unittest.IsolatedAsyncioTestCase):

    # ... existing tests

    async def test_close_with_hanging_client(self):
        # Synchronize server cancellation only after the socket connection is
        # accepted and this event is set
        conn_event = asyncio.Event()
        class Proto(asyncio.Protocol):
            def connection_made(self, transport):
                conn_event.set()

        loop = asyncio.get_running_loop()
        srv = await loop.create_server(Proto, socket_helper.HOSTv4, 0)

        # Start the server
        serve_forever_task = asyncio.create_task(srv.serve_forever())
        await asyncio.sleep(0)

        # Create a connection to server
        addr = srv.sockets[0].getsockname()
        sock = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
        sock.connect(addr)
        self.addCleanup(sock.close)

        # Send a CancelledError into the server to emulate a Ctrl+C
        # KeyboardInterrupt whilst the server is handling a hanging client
        await conn_event.wait()
        serve_forever_task.cancel()

        # Ensure the client is closed within a timeout
        async with asyncio.timeout(0.5):
            await srv.wait_closed()

        self.assertFalse(srv.is_serving())

@pskopnik
Copy link

@ordinary-jamie, thanks for putting that together :)
Small change suggestion: Store the serve_forever() task in a variable and cancel the task instead of setting the exception on the internal future. Now you are purely testing the public API!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
3.12 only security fixes 3.13 bugs and security fixes 3.14 bugs and security fixes sprint stdlib Python modules in the Lib dir topic-asyncio type-bug An unexpected behavior, bug, or error
Projects
Status: Todo
Status: Todo
Development

No branches or pull requests

5 participants