Skip to content

extmod/uasyncio: Fix race with cancelled task waiting on finished task. #7399

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged

Conversation

dpgeorge
Copy link
Member

This commit fixes a problem with a race between cancellation of task A and
completion of task B, when A waits on B. If task B completes just before
task A is cancelled then the cancellation of A does not work. Instead,
the CancelledError meant to cancel A gets passed through to B (that's
expected behaviour) but B handles it as a "Task exception wasn't retrieved"
scenario, printing out such a message (this is because finished tasks point
their "coro" attribute to themselves to indicate they are done, and
implement the throw() method, but that method inadvertently catches the
CancelledError). The correct behaviour is for B to bounce that
CancelledError back out.

The fix here reworks the "waiting" attribute of Task to be called "state"
and uses it to indicate whether a task is: running and not awaited on,
running and awaited on, finished and not awaited on, or finished and
awaited on. This means the task does not need to point "coro" to itself to
indicate finished, and also allows removal of the throw() method.

A benefit of this is that "Task exception wasn't retrieved" messages can go
back to being able to print the name of the coroutine function.

Fixes issue #7386.

@dpgeorge dpgeorge added the extmod Relates to extmod/ directory in source label Jun 14, 2021
@kevinkk525
Copy link
Contributor

Could this change be used to allow additional states so that this implementation could be used to implement shield (e.g. when using an integer for the state instead of False,None,True)? #5928
(Just a quick thought without having gone through any possible implementations)

@dpgeorge dpgeorge force-pushed the extmod-uasyncio-fix-cancel-finish-race branch from 05893cc to 62d260d Compare June 15, 2021 03:02
@dpgeorge
Copy link
Member Author

Could this change be used to allow additional states so that this implementation could be used to implement shield (e.g. when using an integer for the state instead of False,None,True)?

Yes possibly. The big issue is whether the main dispatch loop needs to handle these special cases/states or not. One nice thing about the patch in this PR is that the main dispatch loop is unchanged, it's still either t.coro.send(None) or t.coro.throw(exc).

@kevinkk525
Copy link
Contributor

shield() only needs modifications in task.py, the main loop is not touched. It just needs a place to store the shielded state but upon taking a closer look, I realize that it won't work with this PR because

#  Allocated head of linked list of Tasks waiting on completion of this task.
self.state = TaskQueue()

A shielded task would have other Tasks waiting on the completion, so it would need this state allocation and can't use it for the shielded state anyway. So we're back at where we started..

@dpgeorge
Copy link
Member Author

A shielded task would have other Tasks waiting on the completion, so it would need this state allocation and can't use it for the shielded state anyway.

It could possibly use a tuple to indicate shielded:

self.state = (TaskQueue(),)

That uses memory but only if the code uses the shield feature.

This commit fixes a problem with a race between cancellation of task A and
completion of task B, when A waits on B.  If task B completes just before
task A is cancelled then the cancellation of A does not work.  Instead,
the CancelledError meant to cancel A gets passed through to B (that's
expected behaviour) but B handles it as a "Task exception wasn't retrieved"
scenario, printing out such a message (this is because finished tasks point
their "coro" attribute to themselves to indicate they are done, and
implement the throw() method, but that method inadvertently catches the
CancelledError).  The correct behaviour is for B to bounce that
CancelledError back out.

This bug is mainly seen when wait_for() is used, and in that context the
symptoms are:
- occurs when using wait_for(T, S), if the task T being waited on finishes
  at exactly the same time as the wait-for timeout S expires
- task T will have run to completion
- the "Task exception wasn't retrieved message" is printed with
  "<class 'CancelledError'>" as the error (ie no traceback)
- the wait_for(T, S) call never returns (it's never put back on the
  uasyncio run queue) and all tasks waiting on this are blocked forever
  from running
- uasyncio otherwise continues to function and other tasks continue to be
  scheduled as normal

The fix here reworks the "waiting" attribute of Task to be called "state"
and uses it to indicate whether a task is: running and not awaited on,
running and awaited on, finished and not awaited on, or finished and
awaited on.  This means the task does not need to point "coro" to itself to
indicate finished, and also allows removal of the throw() method.

A benefit of this is that "Task exception wasn't retrieved" messages can go
back to being able to print the name of the coroutine function.

Fixes issue micropython#7386.

Signed-off-by: Damien George <damien@micropython.org>
@kevinkk525
Copy link
Contributor

But wouldn't that require that the rest of the code always checks if that argument is a tuple or just a simple TaskQueue? That would require more code space and execution time even if it saves a bit of RAM.

@dpgeorge dpgeorge force-pushed the extmod-uasyncio-fix-cancel-finish-race branch from 62d260d to 514bf1a Compare June 16, 2021 07:09
@dpgeorge
Copy link
Member Author

But wouldn't that require that the rest of the code always checks if that argument is a tuple or just a simple TaskQueue? That would require more code space and execution time even if it saves a bit of RAM.

Yes it would, but there could be only a few such checks. It would need investigation.

The reason to be careful with RAM here is because if there are 100's of tasks then the memory usage can grow quite large if each task uses more memory.

@dpgeorge dpgeorge merged commit 514bf1a into micropython:master Jun 17, 2021
@dpgeorge dpgeorge deleted the extmod-uasyncio-fix-cancel-finish-race branch June 17, 2021 02:48
@hoihu
Copy link
Contributor

hoihu commented Jun 23, 2021

I can also confirm that this fix has solved the issue we've seen on our device. Thanks a lot for the quick fix!

A note on the fix.. We've been using the "coro" attribute on the task's object to implement a tracer that measures the execution of each task automatically.

For this we need write access to the coro attribute. I saw that in the C-code of the asyncio's task, write access to this attribute has been deleted... Would you accept a PR that would add / allow writing the coro attribute again (as it was before)?

@hoihu
Copy link
Contributor

hoihu commented Jun 23, 2021

Bascially it would boil down to this diff

diff --git a/extmod/moduasyncio.c b/extmod/moduasyncio.c
index 9717e3856..50a5e5855 100644
--- a/extmod/moduasyncio.c
+++ b/extmod/moduasyncio.c
@@ -243,7 +243,10 @@ STATIC void task_attr(mp_obj_t self_in, qstr attr, mp_obj_t *dest) {
         }
     } else if (dest[1] != MP_OBJ_NULL) {
         // Store
-        if (attr == MP_QSTR_data) {
+        if (attr == MP_QSTR_coro) {
+            self->coro = dest[1];
+            dest[0] = MP_OBJ_NULL;
+        } else if (attr == MP_QSTR_data) {
             self->data = dest[1];
             dest[0] = MP_OBJ_NULL;
         } else if (attr == MP_QSTR_state) {

@dpgeorge
Copy link
Member Author

Bascially it would boil down to this diff

I'd really prefer not to make coro writable, or even to expose it at all to user code. Can you please elaborate on how you use it? It could be that it does make sense to expose it.

@dpgeorge
Copy link
Member Author

@hoihu actually, please open an issue to discuss the above point about making coro writable, so it doesn't get lost in this closed/merged PR.

tannewt pushed a commit to tannewt/circuitpython that referenced this pull request Jan 4, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
extmod Relates to extmod/ directory in source
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants