extmod/uasyncio: Fix race with cancelled task waiting on finished task. #7399

dpgeorge · 2021-06-14T14:02:26Z

This commit fixes a problem with a race between cancellation of task A and
completion of task B, when A waits on B. If task B completes just before
task A is cancelled then the cancellation of A does not work. Instead,
the CancelledError meant to cancel A gets passed through to B (that's
expected behaviour) but B handles it as a "Task exception wasn't retrieved"
scenario, printing out such a message (this is because finished tasks point
their "coro" attribute to themselves to indicate they are done, and
implement the throw() method, but that method inadvertently catches the
CancelledError). The correct behaviour is for B to bounce that
CancelledError back out.

The fix here reworks the "waiting" attribute of Task to be called "state"
and uses it to indicate whether a task is: running and not awaited on,
running and awaited on, finished and not awaited on, or finished and
awaited on. This means the task does not need to point "coro" to itself to
indicate finished, and also allows removal of the throw() method.

A benefit of this is that "Task exception wasn't retrieved" messages can go
back to being able to print the name of the coroutine function.

Fixes issue #7386.

kevinkk525 · 2021-06-14T15:34:27Z

Could this change be used to allow additional states so that this implementation could be used to implement shield (e.g. when using an integer for the state instead of False,None,True)? #5928
(Just a quick thought without having gone through any possible implementations)

dpgeorge · 2021-06-15T03:04:54Z

Could this change be used to allow additional states so that this implementation could be used to implement shield (e.g. when using an integer for the state instead of False,None,True)?

Yes possibly. The big issue is whether the main dispatch loop needs to handle these special cases/states or not. One nice thing about the patch in this PR is that the main dispatch loop is unchanged, it's still either t.coro.send(None) or t.coro.throw(exc).

kevinkk525 · 2021-06-15T05:21:20Z

shield() only needs modifications in task.py, the main loop is not touched. It just needs a place to store the shielded state but upon taking a closer look, I realize that it won't work with this PR because

#  Allocated head of linked list of Tasks waiting on completion of this task.
self.state = TaskQueue()

A shielded task would have other Tasks waiting on the completion, so it would need this state allocation and can't use it for the shielded state anyway. So we're back at where we started..

dpgeorge · 2021-06-15T05:46:14Z

A shielded task would have other Tasks waiting on the completion, so it would need this state allocation and can't use it for the shielded state anyway.

It could possibly use a tuple to indicate shielded:

self.state = (TaskQueue(),)

That uses memory but only if the code uses the shield feature.

This commit fixes a problem with a race between cancellation of task A and completion of task B, when A waits on B. If task B completes just before task A is cancelled then the cancellation of A does not work. Instead, the CancelledError meant to cancel A gets passed through to B (that's expected behaviour) but B handles it as a "Task exception wasn't retrieved" scenario, printing out such a message (this is because finished tasks point their "coro" attribute to themselves to indicate they are done, and implement the throw() method, but that method inadvertently catches the CancelledError). The correct behaviour is for B to bounce that CancelledError back out. This bug is mainly seen when wait_for() is used, and in that context the symptoms are: - occurs when using wait_for(T, S), if the task T being waited on finishes at exactly the same time as the wait-for timeout S expires - task T will have run to completion - the "Task exception wasn't retrieved message" is printed with "<class 'CancelledError'>" as the error (ie no traceback) - the wait_for(T, S) call never returns (it's never put back on the uasyncio run queue) and all tasks waiting on this are blocked forever from running - uasyncio otherwise continues to function and other tasks continue to be scheduled as normal The fix here reworks the "waiting" attribute of Task to be called "state" and uses it to indicate whether a task is: running and not awaited on, running and awaited on, finished and not awaited on, or finished and awaited on. This means the task does not need to point "coro" to itself to indicate finished, and also allows removal of the throw() method. A benefit of this is that "Task exception wasn't retrieved" messages can go back to being able to print the name of the coroutine function. Fixes issue micropython#7386. Signed-off-by: Damien George <damien@micropython.org>

kevinkk525 · 2021-06-16T05:36:47Z

But wouldn't that require that the rest of the code always checks if that argument is a tuple or just a simple TaskQueue? That would require more code space and execution time even if it saves a bit of RAM.

dpgeorge · 2021-06-17T02:44:17Z

But wouldn't that require that the rest of the code always checks if that argument is a tuple or just a simple TaskQueue? That would require more code space and execution time even if it saves a bit of RAM.

Yes it would, but there could be only a few such checks. It would need investigation.

The reason to be careful with RAM here is because if there are 100's of tasks then the memory usage can grow quite large if each task uses more memory.

hoihu · 2021-06-23T08:45:47Z

I can also confirm that this fix has solved the issue we've seen on our device. Thanks a lot for the quick fix!

A note on the fix.. We've been using the "coro" attribute on the task's object to implement a tracer that measures the execution of each task automatically.

For this we need write access to the coro attribute. I saw that in the C-code of the asyncio's task, write access to this attribute has been deleted... Would you accept a PR that would add / allow writing the coro attribute again (as it was before)?

hoihu · 2021-06-23T08:48:10Z

Bascially it would boil down to this diff

diff --git a/extmod/moduasyncio.c b/extmod/moduasyncio.c
index 9717e3856..50a5e5855 100644
--- a/extmod/moduasyncio.c
+++ b/extmod/moduasyncio.c
@@ -243,7 +243,10 @@ STATIC void task_attr(mp_obj_t self_in, qstr attr, mp_obj_t *dest) {
         }
     } else if (dest[1] != MP_OBJ_NULL) {
         // Store
-        if (attr == MP_QSTR_data) {
+        if (attr == MP_QSTR_coro) {
+            self->coro = dest[1];
+            dest[0] = MP_OBJ_NULL;
+        } else if (attr == MP_QSTR_data) {
             self->data = dest[1];
             dest[0] = MP_OBJ_NULL;
         } else if (attr == MP_QSTR_state) {

dpgeorge · 2021-06-23T09:07:10Z

Bascially it would boil down to this diff

I'd really prefer not to make coro writable, or even to expose it at all to user code. Can you please elaborate on how you use it? It could be that it does make sense to expose it.

dpgeorge · 2021-06-23T09:08:04Z

@hoihu actually, please open an issue to discuss the above point about making coro writable, so it doesn't get lost in this closed/merged PR.

Added M5Stack Atom Matrix board

dpgeorge added the extmod Relates to extmod/ directory in source label Jun 14, 2021

dpgeorge mentioned this pull request Jun 14, 2021

Possible race condition failure in uasyncio wait_for/wait_for_ms #7386

Closed

dpgeorge force-pushed the extmod-uasyncio-fix-cancel-finish-race branch from 05893cc to 62d260d Compare June 15, 2021 03:02

dpgeorge force-pushed the extmod-uasyncio-fix-cancel-finish-race branch from 62d260d to 514bf1a Compare June 16, 2021 07:09

dpgeorge merged commit 514bf1a into micropython:master Jun 17, 2021

dpgeorge deleted the extmod-uasyncio-fix-cancel-finish-race branch June 17, 2021 02:48

hoihu mentioned this pull request Jun 24, 2021

Uasyncio task profiling #7456

Open

tannewt pushed a commit to tannewt/circuitpython that referenced this pull request Jan 4, 2023

Merge pull request micropython#7399 from CDarius/m5stack_atom_matrix

3cf947f

Added M5Stack Atom Matrix board

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

extmod/uasyncio: Fix race with cancelled task waiting on finished task. #7399

extmod/uasyncio: Fix race with cancelled task waiting on finished task. #7399

Uh oh!

dpgeorge commented Jun 14, 2021

Uh oh!

kevinkk525 commented Jun 14, 2021

Uh oh!

dpgeorge commented Jun 15, 2021

Uh oh!

kevinkk525 commented Jun 15, 2021

Uh oh!

dpgeorge commented Jun 15, 2021

Uh oh!

kevinkk525 commented Jun 16, 2021

Uh oh!

dpgeorge commented Jun 17, 2021

Uh oh!

hoihu commented Jun 23, 2021 •

edited

Loading

Uh oh!

hoihu commented Jun 23, 2021

Uh oh!

dpgeorge commented Jun 23, 2021

Uh oh!

dpgeorge commented Jun 23, 2021

Uh oh!

Uh oh!

Uh oh!

extmod/uasyncio: Fix race with cancelled task waiting on finished task. #7399

extmod/uasyncio: Fix race with cancelled task waiting on finished task. #7399

Uh oh!

Conversation

dpgeorge commented Jun 14, 2021

Uh oh!

kevinkk525 commented Jun 14, 2021

Uh oh!

dpgeorge commented Jun 15, 2021

Uh oh!

kevinkk525 commented Jun 15, 2021

Uh oh!

dpgeorge commented Jun 15, 2021

Uh oh!

kevinkk525 commented Jun 16, 2021

Uh oh!

dpgeorge commented Jun 17, 2021

Uh oh!

hoihu commented Jun 23, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

hoihu commented Jun 23, 2021

Uh oh!

dpgeorge commented Jun 23, 2021

Uh oh!

dpgeorge commented Jun 23, 2021

Uh oh!

Uh oh!

hoihu commented Jun 23, 2021 •

edited

Loading