-
-
Notifications
You must be signed in to change notification settings - Fork 31.8k
Nested TaskGroup can silently swallow cancellation request from parent TaskGroup #116720
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Thanks for the very careful bug report. I can repro this in 3.13 (main) as well. Including metadata in This same problem makes it problematic to reverse the relationship (having the task group that tracks the I'm also not excited about your other idea, keeping track of the nesting of task groups, but honestly I haven't thought much about it yet -- it just sounds tricky to keep track of. I will have to mull this over for a bit. Meanwhile if you want to throw out more ideas please go ahead! I have found that for this kind of issues, putting a debugging breakpoint (or using |
Thanks Guido. Not sure if relevant, but I've now I was wrong about Trio including metadata in cancellation exceptions about which scope it belongs to. It used to do that but changed to use the other idea (scopes decide at the point of receiving a cancellation exception whether they are the outermost active scope) in Trio 0.11.0 (2019-02-09); the release note describes it as:
I suspect that asyncio's edge-based cancellation nature means that metadata in the exception could never work. |
No worries, I tend to ignore comparisons with Trio anyways -- they chose a different model, and good for them, but asyncio has to steer its own course. I haven't made much progress with understanding what exactly happens, but I think that, regardless of context, in this snippet: try:
async with asyncio.TaskGroup() as inner_tg:
inner_tg.create_task(raise_after(1, ExceptionInner))
outer_tg.create_task(raise_after(1, ExceptionOuter))
except* ExceptionInner:
print("Got inner exception")
print("should not get here") that last So it seems that things go wrong right after that, here: await asyncio.sleep(0.2) This |
Hm, I may have to disagree with myself. The |
It turns out I don't know if there is any way out then without breaking other scenarios. Even putting the inner task group in a separate task doesn't help. Sorry. |
I see, that's a pity. Thanks for looking into it. |
Hm, there's hope. If I insert if self._parent_task.uncancel() > 0:
self._parent_task.cancel() after
Note @arthur-tacca Are you interested in contributing a PR for this? |
@arthur-tacca Ping? "No" is a perfectly fine answer too. :-) |
Sorry I didn't reply earlier. Partly I wanted to look into a couple of other test cases first and partly I wanted to think about making the pull request myself. Sadly it's a "no" to me doing the pull request - I'd like to but it's just not realistic at the moment. Your comment with a suggested fix is really good news, it's very promising, and not a disaster that it runs that extra bit of sync code that I contentiously labelled "should not get here". I've found the fix is not quite complete but I do have a suggestion based on it that I think could work. (Side note: I could be totally wrong here... but I don't think TaskGroup is actually using cancellation counts at the moment. This sounds silly but actually task groups can usually get away with the rule "if any task raises an exception, propagate that, otherwise if there's a cancellation error it must be external so raise that" (as implemented in the lines " In addition to the test case in my original post, I wanted to look at a variant: instead of two nested task groups, both shut down at the same time, consider a single task group within a task that is cancelled in the usual old-school way of Test code for cancel task containing task groupimport asyncio
from asyncio import TaskGroup
async def raise_after(t, e):
await asyncio.sleep(t)
print(f"Raising {e}")
raise e()
async def foo():
print("foo(): starting")
try:
async with TaskGroup() as g:
g.create_task(raise_after(0.5, RuntimeError))
await asyncio.sleep(1) # <- this needs to be canceled
# by the TaskGroup, e.g.
# foo() needs to be cancelled
except Exception as e:
# Ignore any exceptions raised in the TaskGroup
print(f"foo() ex 1: {e!r}")
except BaseException as be:
print(f"foo() base ex 1: {be!r}")
raise
print("foo(): before second sleep")
try:
await asyncio.sleep(1) # this line has to be called
# after TaskGroup is finished.
print("foo(): before third sleep - FAIL")
await asyncio.sleep(1)
print("foo(): after third sleep - FAIL")
except Exception as e:
print(f"foo() ex 2: {e!r}")
except BaseException as be:
print(f"foo() base ex 2: {be!r}")
print("foo(): finish")
async def call_foo():
print("call_foo(): starting")
t = asyncio.create_task(foo())
await asyncio.sleep(0.5)
print("call_foo(): cancelling")
t.cancel()
try:
await t
except BaseException as be:
print(f"call_foo(): foo raised {be!r}")
print("call_foo(): finished\n\n")
asyncio.run(call_foo()) I found that this was broken both before and after applying your fix. Looking more closely, the problem is that this: if self._parent_task.uncancel() > 0:
self._parent_task.cancel() Is equivalent (I believe) to this: if self._parent_task.cancelling() > 1:
self._parent_task.uncancel()
self._parent_task.cancel() But I think we want to re-raise cancel even when if self._parent_task.cancelling() > 0:
self._parent_task.uncancel()
self._parent_task.cancel() But when I tried this I got spurious cancels propagating out of task groups that ought to catch it. That's because that one call to So I further modified the TaskGroup to remember whether it had called uncancel at the start of The revisions of this gist show the process I went through. In the end, this is a task group implementation that fixes the two currently-broken cases and didn't break any existing working cases that I tried. But I haven't tried the whole Python test suite. |
By the way: Another test case I tried is where we Unsurprisingly broken example mixing task groups with cancellation propagation using await taskimport asyncio
async def raise_after(t, e):
await asyncio.sleep(t)
print(f"Raising {e}")
raise e()
async def safe_await(task: asyncio.Task):
try:
return await task
except asyncio.CancelledError:
if task.cancelled():
asyncio.current_task().cancel()
raise
async def upward_propagation_test():
print("upward_propagation_test(): starting")
try:
async with asyncio.TaskGroup() as g:
g.create_task(raise_after(0.5, RuntimeError))
try:
await asyncio.sleep(10)
finally:
t = asyncio.create_task(asyncio.sleep(10))
await asyncio.sleep(0.1)
t.cancel()
await asyncio.sleep(0.1)
print("upward_propagation_test(): waiting for cancelled task...")
await t # fixed by using this instead: await safe_await(t)
print("upward_propagation_test(): should not get here (t was cancelled)")
except Exception as e:
# Ignore any exceptions raised in the TaskGroup
print(f"upward_propagation_test() ex 1: {e!r}")
except BaseException as be:
print(f"upward_propagation_test() base ex 1: {be!r}")
raise
print("upward_propagation_test(): before second sleep")
try:
await asyncio.sleep(1) # this line has to be called
# after TaskGroup is finished.
print("upward_propagation_test(): before third sleep - FAIL")
await asyncio.sleep(1)
print("upward_propagation_test(): after third sleep - FAIL")
except Exception as e:
print(f"upward_propagation_test() ex 2: {e!r}")
except BaseException as be:
print(f"upward_propagation_test() base ex 2: {be!r}")
print("upward_propagation_test(): finish")
async def upward_wrapper():
try:
await upward_propagation_test()
except:
pass
asyncio.run(upward_wrapper()) The reason for this is that the code: ret = await task Will raise Instead it works if you wait for a child task like this: try:
ret = await task
except asyncio.CancelledError:
if task.cancelled():
asyncio.current_task().cancel()
raise |
Thanks, it will take me some time to process all this research. Much appreciated! |
Thanks 😊 Sorry to add even more to the wall of text:
async def uncancel_test():
for i in range(5):
try:
async with asyncio.TaskGroup() as tg:
tg.create_task(raise_after(0.2, RuntimeError))
print("task group about to wait")
except* RuntimeError:
print("Ignoring exception")
print(f"After try/except*, have cancelling={asyncio.current_task().cancelling()}")
|
I finally found some time to think about this some more. (Somewhat off-topic: I take it that you're not that comfortable with Git? I recommend that you spend some time learning about forks and branches -- what you put in a Gist really belongs on a branch in your own fork of the cpython repo. I promise it will pay off. You should also be able to clone the repo, build Python from source, and run parts of the test suite, e.g. It sounds like we're hitting a dead end by just considering individual examples and test cases, and instead it looks useful to try to reason about tasks and task groups as little state machines. We're also reaching the limits of "programming by random modification", something we're all likely to do when we're trying to put off the inevitable deep thinking. (It's how I came up with that two line "fix" before. :-) You noticed an important distinction. A task receiving
Now I have to really understand your examples, which will take me more time. |
I just found a really interesting thing that you have hinted at: The sequence t.cancel()
t.uncancel() is not always a no-op. If initially Worse, there is no unittest that checks for this behavior: If I add
to Given that all this is kind of subtle regarding backwards compatibility, I wouldn't want to change this, but it may make it harder to fix task groups. There is also a corner case where a task is cancelled but import asyncio
async def main():
f = asyncio.Future()
f.set_result(None)
this = asyncio.current_task()
this.cancel()
print("BEFORE", this.cancelling())
await f # Already done
print("AFTER", this.cancelling())
await asyncio.sleep(0) # <-- CancelledError raised here
asyncio.run(main()) This prints
|
However, the latter corner case is only an issue if the task cancels itself (which is rather uncommon, although it is in fact what I proposed to fix the original problem). If it is cancelled by something else (another task or a callback) while blocked in |
(This is just me brainstorming, sorry.) I have a feeling we need to change things in I fear we will end up rewriting a fair amount of I have one concert about |
Final thought of the night: all problems seem related to |
Case analysis at the point we enter
We can't even tell the difference between a cancellation that was delivered and then caught and ignored, and one that is still pending (to be delivered at the next async with asyncio.TaskGroup() as tg:
...
try:
await <something>
except asyncio.CancelledError:
asyncio.current_task.uncancel() # Indicate we're suppressing the CancelledError
... However, while this works if the (I'm still just blabbering on to expose my thought process. Jump in any time, or you can wait until I start proposing some code. :-) |
Have a look at this: #117407 I think this addresses all your examples. It would be great if you could contribute tests. |
(And by "contributing tests" I meant to ask if you wanted to turn your example programs into unittests that can be added to test_taskgroups.py.) FWIW, Here's why I think the changes I proposed are the right ones. The first commit in the PR moves this block: if self._parent_cancel_requested:
# If this flag is set we *must* call uncancel().
if self._parent_task.uncancel() == 0:
# If there are no pending cancellations left,
# don't propagate CancelledError.
propagate_cancellation_error = None from its original location near the top of The next commit inserts this code: # If the parent task is being cancelled from the outside,
# re-cancel it, while keeping the cancel count stable.
if self._parent_task.cancelling():
self._parent_task.uncancel()
self._parent_task.cancel() in the block that constructs and raises the exception group. This is basically your version of my earlier partial fix. Finally, to keep one test in test_timeouts.py working now that task groups properly maintain the cancellation count, I had to modify There's one more issue with task groups: If one of the child tasks ignores cancellation, the whole task group will wait until that task voluntarily exits, and re-cancelling the parent doesn't help (it doesn't get propagated to the children). I propose to leave this for another time -- nobody has complained about it yet (and there is a strong recommendation against doing this). |
Ooph, interesting. I spent an hour playing with the OP's example to try to understand the problem. I tried playing out the example in my head to see where reality would diverge. On the face of it, the main issue is that the I agree we
Here's a very simple test that fails: async def main():
async def raise_after(t, e):
await sleep(t)
raise e()
try:
async with TaskGroup() as tg:
tg.create_task(raise_after(0.1, Exception))
except ExceptionGroup:
pass
assert current_task().cancelling() == 0 (If you insert an I think doing the |
Thanks for confirming that. (And I may use that as a unit test. :-) At least one of the OP's examples also requires the other thing I changed in taskgroups.py -- if the parent task is cancelled but we also have errors for the exception group, re-cancel the parent, so that it will be cancelled at the next Finally, after all was said and done, one existing unittest failed, and resetting |
I'm trying to figure out if this is what we want. Without thinking about it too much, my gut feeling is if the parent is cancelled, there should not be a CancelledError on 'next await'. The inner |
I want my cake and eat it too. :-) The reason we want the parent cancellation to bubble up (if it came from the outside -- not from And indeed if there are no errors from the child group, if propagate_cancellation_error is not None and not self._errors:
raise propagate_cancellation_error The problem occurs when there are also errors from the child task group. Certainly, without a pending "outside" cancellation, we want to raise those errors, grouped together in an exception group, so that a But when there are errors to be reported and a cancellation from the parent, what are we supposed to do? The hack I came up with is that in this case only we cancel the parent, which mostly has the intended effect: a This isn't perfect, but it's better than before, IMO. |
Right. Just to be clear, I'm not arguing we should do nothing. I'm also good with your solution, and will try to mimic our ultimate approach in Quattro. I'm trying to collaboratively brainstorm out loud before we commit. So in I wrote a larger reply but I think I'm good with your approach here. My initial gut feeling was that it was weird to allow an additional event loop step in this case, but this is a race condition so the impact should be minimal. It also helps us not let errors pass silently. |
In addition to what I wrote on the merge request, here is some Docs broader discussion I think many users (it would be all of them from my point of view!) would deal with tasks exclusively by using task groups. They would never ever call Trio doesn't have any way to wait for a task to finish or get its result - I know that Trio isn't asyncio and vice-versa but I still think that's interesting. I actually wrote a very small library aioresult for Trio / AnyIO to allow getting the result of a task in a nursery (this is the "something else I planned to spend my time on" I mentioned in my other comment). I used exactly that strategy of having a separate |
Thanks for the deeper feedback! You are absolutely right that the docs around cancellation are still pretty vague and confusing if you're looking for a precise spec, and probably give incorrect guidance to beginners. (Even the simple "don't try to catch Unfortunately I think I am going to have to give the job of improving those docs to someone else (or possibly myself in the distant future) -- there just are too many other things to do. There are probably also still things that could be improved in the implementation of cancellation. But it is in the nature of Python's development process to iterate, rather than to attempt to come up with the perfect solution in one go. In the meantime, I hope you come back and help us find some more subtle issues. You've been quite helpful. |
This prevents external cancellations of a task group's parent task to be dropped when an internal cancellation happens at the same time. Also strengthen the semantics of uncancel() to clear self._must_cancel when the cancellation count reaches zero. Co-Authored-By: Tin Tvrtković <tinchester@gmail.com> Co-Authored-By: Arthur Tacca
@gvanrossum Thanks for the nice comment, I hope so too. And thanks for taking the time to fix this issue. |
This prevents external cancellations of a task group's parent task to be dropped when an internal cancellation happens at the same time. Also strengthen the semantics of uncancel() to clear self._must_cancel when the cancellation count reaches zero. Co-Authored-By: Tin Tvrtković <tinchester@gmail.com> Co-Authored-By: Arthur Tacca
Bug report
Bug description:
In the following code snippet, I start an
asyncio.TaskGroup
calledouter_tg
then start another one within it calledinner_tg
. The inner task group is wrapped in anexcept*
block to catch a specific type of exception. A background task in each task group raises a different exception at roughly the same time, so the inner one ends up (correctly) raising the inner exception in anExceptionGroup
, which is caught and discarded by theexcept*
block. However, this means that there is now no longer any exception bubbling up through the call stack, and the main body of the outer task group just continues on, even allowing waiting on more routines (but not creating more tasks in the outer group, as it is still shutting down).Expected vs observed behaviour:
Observed behaviour:
Expected behaviour: the rest of the main body of the outer task group is skipped, so we just see:
Variations:
Making either of these changes (or both) still gives the same issue:
Add a third line within the inner task group
await asyncio.sleep(10)
. This means that the main body of the inner task group finishes withasyncio.CancelledError
rather than just finishing before any exceptions are raised by tasks.Replace
inner_tg.create_task(raise_after(1, ExceptionInner))
withinner_tg.create_task(raise_in_cancel())
, where:It's fair to dismiss this second case because it violates the rule about not suppressing
CancelledError
).Root cause:
I think the problem is that
TaskGroup.__aexit__()
never includesCancelledError
in the ExceptionGroup it raises if there are any other types of exception, even if a parent task group is shutting down. That appears to be due to this code inTaskGroup.__aexit__()
(specifically theand not self._errors
part, becauseself._errors
is the list of all non-cancellation exceptions):I'm not sure of the right solution. Here are a couple of possibilities that come to mind, but I'm not sure if either would work or what the wider implications would be.
TaskGroup.__aexit__()
, after needs to walk up the stack of parent task groups and see if any of those are shutting down, and if so include aCancelledError
in theExceptionGroup
.CPython versions tested on:
3.11, 3.12
Operating systems tested on:
Windows
Linked PRs
The text was updated successfully, but these errors were encountered: