Skip to content

py/scheduler,rp2: Avoid scheduler race conditions when GIL disabled, fix "TinyUSB callback can't recurse" error #15448

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
Sep 19, 2024

Conversation

projectgus
Copy link
Contributor

@projectgus projectgus commented Jul 11, 2024

Summary

Looks like there was a bug since v1.22 when using rp2 threads, where either CPU may poll CDC input and trigger the TinyUSB task. TinyUSB could run on both CPUs concurrently, which may have lead to some incorrect behaviour.

The race started triggering an exception when runtime USB support was added, and a check was added for the USB task recursing on itself from a Python handler function. This check can be incorrectly triggered by the race (as the flag that indicates the TinyUSB task is running is set by the other CPU).

The race is most commonly triggered when working from the interactive REPL, even a minimal running thread can trigger it. This commit adds a test case that triggers it in a different way (polling stdin from a thread).

The root cause is that the scheduler on threaded ports without GIL can run on any thread, and creates potential for race conditions that are hard to avoid. So the first fix is to run all scheduler callbacks on the main thread if GIL disabled.

The secondary fix is not to run the TinyUSB task when polled from another task, if the GIL is disabled. Instead we schedule the TinyUSB task to run on the main thread.

Closes #15390.

Trade-offs and Alternatives

Could make the scheduler always only run on the main thread, but with the GIL this has quite an impact on scheduler latency - and isn't necessary for correct behaviour. There is a branch linked from a comment below that mostly works around this, but it's a more complex change (and will still have higher latency than running the callback immediately in whatever thread is currently holding the GIL).

Testing

  • Ran all thread tests, including the new test case, on rp2 port with and without MICROPY_HW_ENABLE_USB_RUNTIME_DEVICE set, verified no longer raises an exception.
  • Ran all thread tests on esp32 port, verified no regression when GIL enabled.

This work was funded through GitHub Sponsors.

@projectgus projectgus added port-rp2 shared Relates to shared/ directory in source labels Jul 11, 2024
Copy link

codecov bot commented Jul 11, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 98.44%. Comparing base (451ba1c) to head (5d8878b).
Report is 2 commits behind head on master.

Additional details and impacted files
@@           Coverage Diff           @@
##           master   #15448   +/-   ##
=======================================
  Coverage   98.43%   98.44%           
=======================================
  Files         163      163           
  Lines       21294    21296    +2     
=======================================
+ Hits        20960    20964    +4     
+ Misses        334      332    -2     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Copy link

github-actions bot commented Jul 11, 2024

Code size report:

   bare-arm:    +0 +0.000% 
minimal x86:    +0 +0.000% 
   unix x64:   +16 +0.002% standard
      stm32:    +0 +0.000% PYBV10
     mimxrt:    +0 +0.000% TEENSY40
        rp2:   +40 +0.004% RPI_PICO_W
       samd:    +0 +0.000% ADAFRUIT_ITSYBITSY_M4_EXPRESS

@projectgus projectgus force-pushed the bugfix/rp2_tinyusb_recurse branch 3 times, most recently from 215791a to c902051 Compare July 11, 2024 04:41
@dpgeorge
Copy link
Member

Looks like there was a long standing underlying bug here when using rp2 threads, where either CPU may poll CDC input and trigger the TinyUSB task. TinyUSB could run on both CPUs concurrently, which may have lead to some incorrect behaviour.

I guess that was introduced in bcbdee2 ? Prior to that commit, tinyusb was guarded so it ran only on core 0.

@projectgus
Copy link
Contributor Author

I guess that was introduced in bcbdee2 ? Prior to that commit, tinyusb was guarded so it ran only on core 0.

I think that commit set up the conditions for the bug. TinyUSB still only runs on CPU0 as of this commit because that's where the scheduler runs it. However, that commit introduces a different bug - waiting for data in places where the scheduler doesn't run will hang. Commit a00c9d5 fixed that by re-introducing a TinyUSB task hook, but without the CPU check.

@projectgus
Copy link
Contributor Author

projectgus commented Jul 11, 2024

TinyUSB still only runs on CPU0 as of this commit because that's where the scheduler runs it.

Oh wait, I have this wrong don't I? mp_handle_pending() can run on either CPU... I didn't realise this...

(The bug still doesn't happen in that commit because the scheduler has a lock, but I think that this fix is incomplete as it doesn't handle the case where mp_usb_task is scheduled to run, the scheduler runs on CPU1, and simultaneously is polled from code running on CPU0...)

@dpgeorge
Copy link
Member

mp_handle_pending() can run on either CPU

Correct. Eg called from the VM itself.

The bug still doesn't happen in that commit because the scheduler has a lock, but I think that this fix is incomplete as it doesn't handle the case where mp_usb_task is scheduled to run, the scheduler runs on CPU1, and simultaneously is polled from code running on CPU0...)

Are you saying this PR is not correct ("this fix is incomplete"), or are you referring to the commit a00c9d5 not being fully correct?

@projectgus
Copy link
Contributor Author

Are you saying this PR is not correct ("this fix is incomplete")

This PR isn't a complete fix. I think there's still a race where:

  • CPU0 is calling poll_cdc_interfaces() which runs mp_usbd_task (allowed by the CPU0 check)
  • CPU1 runs the scheduler and this runs mp_usbd_task

The simple fix is to always guard the scheduled mp_usb_task to not run on CPU1, but because mp_usbd_task is run on-demand then there needs to be a way to re-schedule mp_usbd_task to run later, and keep retrying until it runs on CPU0. I have a fix for this here now, but I'm not sure about it (also requires a scheduler change).

Or, add proper locking around mp_usbd_task so it can run on either CPU.

@projectgus
Copy link
Contributor Author

projectgus commented Jul 11, 2024

I've pushed two commits which are a possible fix for this, but I don't like it much.

I think the smart fix here might be to revert bcbdee2 and re-introduce the dedicated TinyUSB task hook for CPU0, instead of calling TinyUSB via the scheduler. If the performance hit of that change is high then can keep the DCD event handler function and use it to set a flag that we check before calling into TinyUSB.

@dpgeorge
Copy link
Member

We need to think about this in depth, because all other bare-metal ports that use TinyUSB need to do the same thing.

@projectgus
Copy link
Contributor Author

We need to think about this in depth, because all other bare-metal ports that use TinyUSB need to do the same thing.

Agree about thinking it through. Currently we have rp2 and samd using the scheduler approach, and all of the other ports using a task hook.

The queue-via-scheduler approach should work fine on all the single core bare metal ports. It's only made complex in the dual core case where it's possible for both CPUs to be trying to call into the TinyUSB task at once.

Will think about it some more. Maybe adding another recursive mutex on rp2 is the way to go.

@dpgeorge
Copy link
Member

dpgeorge commented Aug 2, 2024

@projectgus and @jimmo and I discussed this and decided to fix it by only allowing scheduled functions to run on the main thread. See related #8838 and pybricks@b196c78

@projectgus projectgus added this to the release-1.24.0 milestone Sep 3, 2024
@projectgus
Copy link
Contributor Author

projectgus commented Sep 5, 2024

decided to fix it by only allowing scheduled functions to run on the main thread

I've implemented this "scheduler only on the main thread" change and it's working but I don't think it's a good solution after all. See this branch:
https://github.com/projectgus/micropython/commits/bugfix/rp2_tinyusb_recursive_mainthread/

It's a problem for ports with GIL and multiple threads (i.e. ESP32). The stress_schedule.py test really falls over, and I think real workloads will also suffer:

  • vm.c:1358 only bounces the GIL if the scheduler is idle, not pending, which is a problem if only main thread can run the scheduler and another thread is running. Changing this to also bounce the GIL if pending on a non-main thread (similar to the pybricks patch) helps but it's still quite slow because the GIL only bounces every MICROPY_PY_THREAD_GIL_VM_DIVISOR branches. Therefore scheduled function latency can be quite high if another thread is running.
  • Replaced that with a change in the linked branch, where scheduler.c bounces the GIL instead of running the scheduler on a non-main threads. This improves things a lot, but it's still higher latency because now the main thread has to take the GIL, then execute another iteration of the main VM loop, and only then it handles the scheduled operation. Whereas before the non-main thread would have run the scheduler immediately.

Going to take another look at this next week. @jimmo mentioned the idea of having a flag so some scheduler nodes only run on the main thread, this might be the best compromise. Otherwise maybe implementing a mutex for TinyUSB (with some caution around ensuring race conditions can't lose TinyUSB events despite the mutex).

@dpgeorge
Copy link
Member

dpgeorge commented Sep 6, 2024

I've implemented this "scheduler only on the main thread" change and it's working but I don't think it's a good solution after all.

Hmmm, OK, I'll also have a think about this. We do need to try and reduce the latency of scheduled callbacks, because they are being used more and more for background workloads.

@projectgus projectgus added the py-core Relates to py/ directory in source label Sep 10, 2024
@projectgus projectgus changed the title usb,rp2: Fix "TinyUSB callback can't recurse" error when using threads. py/scheduler,rp2: Fix scheduler race conditions when GIL disabled, fix "TinyUSB callback can't recurse" error Sep 10, 2024
@projectgus projectgus changed the title py/scheduler,rp2: Fix scheduler race conditions when GIL disabled, fix "TinyUSB callback can't recurse" error py/scheduler,rp2: Avoid scheduler race conditions when GIL disabled, fix "TinyUSB callback can't recurse" error Sep 10, 2024
@projectgus projectgus force-pushed the bugfix/rp2_tinyusb_recurse branch from 76c025b to a0e1d30 Compare September 10, 2024 08:04
@projectgus
Copy link
Contributor Author

Finally got a version of this fix I'm happy with, after @dpgeorge pointed out that this problem only exists where GIL is disabled.

(An aside, I also noticed that there is no user-facing documentation for the GIL at all. I made a note to submit something.)

@projectgus projectgus force-pushed the bugfix/rp2_tinyusb_recurse branch from a0e1d30 to 4732969 Compare September 10, 2024 08:08
@projectgus projectgus requested a review from dpgeorge September 10, 2024 08:09
@projectgus
Copy link
Contributor Author

@dlech This may be of interest, although I'm unsure if its equivalent to pybricks@b196c78 or not (specifically, I don't know if pybricks enables the GIL or not).

@dlech
Copy link
Contributor

dlech commented Sep 10, 2024

Yeah, we are using that patch on the Unix port with the GIL enabled. So it looks like this isn't quite the same. The problems were were trying to solve are A) Unix signal handlers should always run on the main thread, like in CPython and B) if there is an unhandled exception in any background task, we want it to stop the whole program by throwing in the main thread rather than throwing in a random thread.

@projectgus
Copy link
Contributor Author

@dlech That makes sense, thanks for the extra detail.

@dpgeorge
Copy link
Member

OK, so PyBrick's use case is more about functionality, rather than fixing a bug. And because it's on the unix port there's less concern about loss of performance / increase latency of scheduled code.


I have tested this latest PR on rp2 and it fixes the original report in #15390.

@projectgus projectgus force-pushed the bugfix/rp2_tinyusb_recurse branch from 4732969 to 54df502 Compare September 11, 2024 07:01
@projectgus projectgus requested a review from dpgeorge September 11, 2024 07:03
@projectgus projectgus force-pushed the bugfix/rp2_tinyusb_recurse branch from 54df502 to cf8d5f4 Compare September 17, 2024 01:09
Otherwise it's very difficult to reason about thread safety in a
scheduler callback, as it can run at any time on any thread - including
racing against any bytecode operation on any thread.

This work was funded through GitHub Sponsors.

Signed-off-by: Angus Gratton <angus@redyak.com.au>
If GIL is disabled then there's threat of a race condition if some other
code specifically requests USB processing (i.e. to unblock stdio), while
a scheduled TinyUSB callback is already running on another thread.

Relies on the change in the parent commit, where scheduler is restricted
to main thread if GIL is disabled.

Fixes micropython#15390 - "TinyUSB callback can't recurse" exceptions on rp2 when
using _thread module and USB serial I/O.

Adds a unit test for stdin functioning correctly in threads (fails on rp2
port without this fix).

This work was funded through GitHub Sponsors.

Signed-off-by: Angus Gratton <angus@redyak.com.au>
@dpgeorge dpgeorge force-pushed the bugfix/rp2_tinyusb_recurse branch from cf8d5f4 to 5d8878b Compare September 19, 2024 03:28
@dpgeorge dpgeorge merged commit 5d8878b into micropython:master Sep 19, 2024
63 checks passed
@dpgeorge
Copy link
Member

This turned out to be a relatively small patch, which is great. Thank you!

@projectgus projectgus deleted the bugfix/rp2_tinyusb_recurse branch September 25, 2024 03:46
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
port-rp2 py-core Relates to py/ directory in source shared Relates to shared/ directory in source
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Raspberry Pi Pico _thread OSError: TinyUSB callback can't recurse
3 participants