-
-
Notifications
You must be signed in to change notification settings - Fork 8.4k
extmod/uasyncio: Fix race with cancelled task waiting on finished task. #7399
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
extmod/uasyncio: Fix race with cancelled task waiting on finished task. #7399
Conversation
Could this change be used to allow additional states so that this implementation could be used to implement shield (e.g. when using an integer for the state instead of False,None,True)? #5928 |
05893cc
to
62d260d
Compare
Yes possibly. The big issue is whether the main dispatch loop needs to handle these special cases/states or not. One nice thing about the patch in this PR is that the main dispatch loop is unchanged, it's still either |
shield() only needs modifications in task.py, the main loop is not touched. It just needs a place to store the shielded state but upon taking a closer look, I realize that it won't work with this PR because
A shielded task would have other Tasks waiting on the completion, so it would need this state allocation and can't use it for the shielded state anyway. So we're back at where we started.. |
It could possibly use a tuple to indicate shielded: self.state = (TaskQueue(),) That uses memory but only if the code uses the shield feature. |
This commit fixes a problem with a race between cancellation of task A and completion of task B, when A waits on B. If task B completes just before task A is cancelled then the cancellation of A does not work. Instead, the CancelledError meant to cancel A gets passed through to B (that's expected behaviour) but B handles it as a "Task exception wasn't retrieved" scenario, printing out such a message (this is because finished tasks point their "coro" attribute to themselves to indicate they are done, and implement the throw() method, but that method inadvertently catches the CancelledError). The correct behaviour is for B to bounce that CancelledError back out. This bug is mainly seen when wait_for() is used, and in that context the symptoms are: - occurs when using wait_for(T, S), if the task T being waited on finishes at exactly the same time as the wait-for timeout S expires - task T will have run to completion - the "Task exception wasn't retrieved message" is printed with "<class 'CancelledError'>" as the error (ie no traceback) - the wait_for(T, S) call never returns (it's never put back on the uasyncio run queue) and all tasks waiting on this are blocked forever from running - uasyncio otherwise continues to function and other tasks continue to be scheduled as normal The fix here reworks the "waiting" attribute of Task to be called "state" and uses it to indicate whether a task is: running and not awaited on, running and awaited on, finished and not awaited on, or finished and awaited on. This means the task does not need to point "coro" to itself to indicate finished, and also allows removal of the throw() method. A benefit of this is that "Task exception wasn't retrieved" messages can go back to being able to print the name of the coroutine function. Fixes issue micropython#7386. Signed-off-by: Damien George <damien@micropython.org>
But wouldn't that require that the rest of the code always checks if that argument is a tuple or just a simple TaskQueue? That would require more code space and execution time even if it saves a bit of RAM. |
62d260d
to
514bf1a
Compare
Yes it would, but there could be only a few such checks. It would need investigation. The reason to be careful with RAM here is because if there are 100's of tasks then the memory usage can grow quite large if each task uses more memory. |
I can also confirm that this fix has solved the issue we've seen on our device. Thanks a lot for the quick fix! A note on the fix.. We've been using the "coro" attribute on the task's object to implement a tracer that measures the execution of each task automatically. For this we need write access to the |
Bascially it would boil down to this diff
|
I'd really prefer not to make coro writable, or even to expose it at all to user code. Can you please elaborate on how you use it? It could be that it does make sense to expose it. |
@hoihu actually, please open an issue to discuss the above point about making coro writable, so it doesn't get lost in this closed/merged PR. |
Added M5Stack Atom Matrix board
This commit fixes a problem with a race between cancellation of task A and
completion of task B, when A waits on B. If task B completes just before
task A is cancelled then the cancellation of A does not work. Instead,
the CancelledError meant to cancel A gets passed through to B (that's
expected behaviour) but B handles it as a "Task exception wasn't retrieved"
scenario, printing out such a message (this is because finished tasks point
their "coro" attribute to themselves to indicate they are done, and
implement the throw() method, but that method inadvertently catches the
CancelledError). The correct behaviour is for B to bounce that
CancelledError back out.
The fix here reworks the "waiting" attribute of Task to be called "state"
and uses it to indicate whether a task is: running and not awaited on,
running and awaited on, finished and not awaited on, or finished and
awaited on. This means the task does not need to point "coro" to itself to
indicate finished, and also allows removal of the throw() method.
A benefit of this is that "Task exception wasn't retrieved" messages can go
back to being able to print the name of the coroutine function.
Fixes issue #7386.