-
-
Notifications
You must be signed in to change notification settings - Fork 31.8k
Cannot cleanly shut down an asyncio based server #113538
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
This might be a symptom of a more general issue, that there is a need for cancelling everything that's dependent on the event loop we are shutting down. Currently, it is possible to cancel tasks, async generators, and executors. The issue here is (sort-of) about shutting down pending file descriptors. But it might also be necessary to be able to cancel pending futures, and possibly more things? |
I've always assumed that it is up to the application to keep track of connections (etc.) in a form that is suitable for shutting down. After all in a typical application there's more to clean up than the client connections -- there may be database connections, and who know what else to handle. I would definitely not mess with pending futures -- those are sufficiently low-level that you cannot know who is waiting for them and why, and in which order they should be cancelled. The key point of control you have is probably cancelling tasks -- tasks can do cleanup in response to a cancellation request, and if your tasks form a tree (or even a DAG), they will be cancelled recursively (if task t1 is waiting for task t2, cancelling t1 first cancels t2). So perhaps your application should keep track of the main task per connection? |
Another thought is that this might be something that a higher-level framework or library could offer. E.g. gunicorn or aio-http? (Note that I've never used either, so I may be way off base here.) |
asyncio unfortunately makes this difficult, as it is the one in control of creating the connections. It is done by the Clean up should hopefully be possible using the normal signals from the transports, i.e.
You are probably right. But it does make me uneasy that there isn't a way to centrally force a clean-up on shutdown. If you have to build your own everywhere, then it's easy to make a mistake and overlook something.
Unfortunately, there isn't always a task involved. Sometimes there is just a protocol, or a construct using futures and callbacks.
Maybe. Those two specific projects are not sufficient, though, as they are just for HTTP. |
Okay, let's dive into the original problem a bit more. I think I would like to see a "scaled-down" example of a realistic server that we might be able to use to demonstrate the problem, reason about it, and evaluate potential solutions. It should be small enough that I can understand the code in 5-10 minutes (my typical attention span :-). Would you be willing to code up a first attempt at such an example? We can iterate on it together. Regarding Futures, these are used all over the place (internally in asyncio) with specific semantics, sometimes involving cancellation. If we were to start cancelling Futures automatically when a loop is stopped that could have all sorts of unexpected consequences -- probably worse than leaving them alone. So let's declare this out of scope (or if it really bugs you, please open a separate issue, as it has nothing to do with closing servers). |
This should be a rough mockup of the situation: #!/usr/bin/python3
import asyncio
class EchoServerProtocol(asyncio.Protocol):
def connection_made(self, transport):
peername = transport.get_extra_info('peername')
print('Connection from {}'.format(peername))
self.transport = transport
def data_received(self, data):
message = data.decode()
print('Data received: {!r}'.format(message))
print('Send: {!r}'.format(message))
self.transport.write(data)
print('Close the client socket')
self.transport.close()
def stop():
loop = asyncio.get_running_loop()
loop.stop()
def server_up(fut):
global server
server = fut.result()
loop = asyncio.new_event_loop()
asyncio.set_event_loop(loop)
fut = asyncio.ensure_future(loop.create_server(
lambda: EchoServerProtocol(),
'127.0.0.1', 8888))
fut.add_done_callback(server_up)
# Simulate termination
loop.call_later(2.5, stop)
try:
loop.run_forever()
finally:
server.close()
# Can't run this as it will hang:
#loop.run_until_complete(server.wait_closed())
loop.close() If you run this in dev mode and connect a client, you see the issue: $ python3.12 -X dev echosrv.py
Connection from ('127.0.0.1', 56290)
sys:1: ResourceWarning: unclosed <socket.socket fd=7, family=2, type=1, proto=6, laddr=('127.0.0.1', 8888), raddr=('127.0.0.1', 56290)>
ResourceWarning: Enable tracemalloc to get the object allocation traceback
/usr/lib64/python3.12/asyncio/selector_events.py:875: ResourceWarning: unclosed transport <_SelectorSocketTransport fd=7>
_warn(f"unclosed transport {self!r}", ResourceWarning, source=self)
ResourceWarning: Enable tracemalloc to get the object allocation traceback |
This wasn't a problem until 3.11.5. Here's a simpler example of what I think is the same root issue: #!/usr/bin/env python3
import asyncio
async def main():
async def on_connection(reader, writer):
while True:
buf = await reader.read(1024)
print("got bytes", buf)
server = await asyncio.start_unix_server(on_connection, path="/tmp/sock")
cr, cw = await asyncio.open_unix_connection(path="/tmp/sock")
print("sending hi")
cw.write(b"hi")
await cw.drain()
await asyncio.sleep(0.2)
server.close()
await server.wait_closed()
asyncio.run(main()) then using pyenv to set the python version and run it we get the following:
I have yet to look at the underlying code but I'm guessing the server implementation in <=3.11.4 is cancelling the reader.read coroutine and ignoring the cancellation error as I'd expect.
skipping 3.11.6, it's identical to 3.11.5.
then 3.12.0 and 3.12.1 hang indefinitely on |
looks like this is related to #109538 |
With @CendioOssman's example (connecting a client that doesn't send anything, just hangs) I get different results -- at first I thought I saw what they saw, but then I couldn't repro it again. Weird, let's say that's my fault. With @jsundahl's example I get exactly what they got; main is the same as 3.12. Let me do some spelunking. |
A bit more...
Alas, the fix waits until all handlers complete, which in this example is never. Hence the hang. I guess the correct way to write the handler is to put a timeout on the I have to think about this more, but I'm out of time for today. Sorry. |
I think the use case for E.g. a clean version of: for fd, transport in loop._transports.items():
transport.abort()
loop.run_until_complete(server.wait_closed()) |
One workaround: if you're happy with the old (pre-3.11) behavior, just delete the If you don't trust your clients, you can just exit the process without prejudice, right? Perhaps after a timeout. Any warnings or errors getting logged at that point will help you track down the unreliable bits. I am reluctant trying to backport any more fixes to 3.12 or 3.11, given how sketchy this has been already. I don't think we can afford to keep track of all transports -- those are owned by the application, not by the loop. However, you could (after a delay) enumerate all tasks and cancel them. That should clean things up. |
@gvanrossum that's only kind of a solution for 3.12. For 3.11 there's still that "event loop is closed" issue. It seems to me like it should be feasible for the server to own its "on_connection" tasks and cancel them on close. |
some way to forcibly cancel this task defined here cpython/Lib/asyncio/streams.py Line 258 in 7d01fb4
|
Okay, now I see one improvement: the if task.cancelled():
transport.close()
return and this can be backported to 3.12 and 3.11. I think you will find that with this change, even when that task lingers because the connection is never closed, it doesn't cause any problems. Shall I proceed with that? @kumaraditya303 I suppose this was an oversight in your original PR #111601 ? |
Yeah that's definitely an improvement that could be made. I don't think that fixes the hang in python 3.12 though, but maybe that's not something to take on for 3.12. I'm going to try to find some time in the next couple days to dig in a bit deeper and get a dev environment set up to play around with this issue. |
I'm actually getting cold feet about that fix. The task exists because Not much is said about the case where If it's a coroutine (i.e., But what should happen if the task is cancelled? The current state of affairs is that the |
FWIW the hang is because the callback never returns and the (toy) application has neither a timeout on its reads nor another way to make it exit its infinite loop. One way to for the appplication to have more control would be to use a plain function as the The most likely reason the task is cancelled would seem to be when ^C is hit. In this case, if we don't close the transport, we get these messages (in addition to the traceback):
If we do close the transport, that's reduced to this:
Those remaining messages are about the client ( So, concluding, I think that closing the transport when the task is cancelled is the right thing to do. If you agree, please approve the PR. |
…task is cancelled (pythonGH-113690) (cherry picked from commit 4681a52) Co-authored-by: Guido van Rossum <guido@python.org>
…task is cancelled (pythonGH-113690) (cherry picked from commit 4681a52) Co-authored-by: Guido van Rossum <guido@python.org>
@jsundahl @CendioOssman Is there anything else you think needs to be done here? You have my view on the hang in 3.12. |
…task is cancelled (python#113690)
Give applications the option of more forcefully terminating client connections for asyncio servers. Useful when terminating a service and there is limited time to wait for clients to finish up their work.
…task is cancelled (python#113690)
These give applications the option of more forcefully terminating client connections for asyncio servers. Useful when terminating a service and there is limited time to wait for clients to finish up their work.
…hon#114432)" Reason: The new test doesn't always pass: python#116423 (comment) This reverts commit 1d0d49a.
Reopening the issue since the PR was reverted |
For posterity, this was the traceback seen in a test:
|
These give applications the option of more forcefully terminating client connections for asyncio servers. Useful when terminating a service and there is limited time to wait for clients to finish up their work. This is a do-over with a test fix for gh-114432, which was reverted.
…on#116784) These give applications the option of more forcefully terminating client connections for asyncio servers. Useful when terminating a service and there is limited time to wait for clients to finish up their work. This is a do-over with a test fix for pythongh-114432, which was reverted.
These give applications the option of more forcefully terminating client connections for asyncio servers. Useful when terminating a service and there is limited time to wait for clients to finish up their work.
…ort}_clients (python#114432)" (python#116632) Revert "pythongh-113538: Add asycio.Server.{close,abort}_clients (python#114432)" Reason: The new test doesn't always pass: python#116423 (comment) This reverts commit 1d0d49a.
…on#116784) These give applications the option of more forcefully terminating client connections for asyncio servers. Useful when terminating a service and there is limited time to wait for clients to finish up their work. This is a do-over with a test fix for pythongh-114432, which was reverted.
These give applications the option of more forcefully terminating client connections for asyncio servers. Useful when terminating a service and there is limited time to wait for clients to finish up their work.
…ort}_clients (python#114432)" (python#116632) Revert "pythongh-113538: Add asycio.Server.{close,abort}_clients (python#114432)" Reason: The new test doesn't always pass: python#116423 (comment) This reverts commit 1d0d49a.
…on#116784) These give applications the option of more forcefully terminating client connections for asyncio servers. Useful when terminating a service and there is limited time to wait for clients to finish up their work. This is a do-over with a test fix for pythongh-114432, which was reverted.
…task is cancelled (python#113690)
Bug report
Bug description:
When writing an asyncio based service, you basically have this sequence:
loop.run_forever()
loop.stop()
If there are any connections active at this point, then they don't get discarded until interpreter shutdown, with the result that you get a bunch of ResourceWarnings (and cleanup code might not run).
It would be very useful if there was a
Server.close_clients()
or something like that. Even aServer.all_transports()
would be useful, as then you could do something similar as when doing aTask.cancel()
on what you get fromloop.all_tasks()
.We could poke at
Server._transports
, but that is something internal that might change in the future.There is
Server.wait_closed()
, but that hangs until all clients have gracefully disconnected. It doesn't help us when we want to shut down the service now.CPython versions tested on:
3.12
Operating systems tested on:
Linux
Linked PRs
The text was updated successfully, but these errors were encountered: