Skip to content

py/scheduler: Allow C scheduler callbacks to re-queue themselves. #17347

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
Jun 4, 2025

Conversation

projectgus
Copy link
Contributor

@projectgus projectgus commented May 23, 2025

Summary

Currently if a scheduler callbacks queues a new C scheduled callback, that callback will run in the same schedule pass. This means if a scheduler callback re-queues itself, it can "infinite loop" inside the scheduler.

This change means that any call to mp_handle_pending() will only process the callbacks which were queued when mp_handle_pending() started running. Therefore any callback which re-queues itself should have that callback handled the next time mp_handle_pending() is called.

This is an alternative to #17248 and should remove the need for #17264. Fixes #17246.

This does create potential for some additional callback latency (i.e. if an interrupt fires while a different scheduler callback is running, the interrupt callback is now deferred until the next scheduler run). However the worst-case scheduler latency remains similar (interrupt fires at end of one scheduler pass, has to wait until the next one). I think most MicroPython systems only sparsely call C scheduler callbacks, so this shouldn't be noticeable in most cases.

There is also potential for a misbehaving callback that re-schedules itself continuously to degrade performance of MicroPython (as the callback runs continuously with some allowance for Python code to run), whereas before it would have locked up MicroPython entirely which is more obviously broken.

This work was funded through GitHub Sponsors.

Testing

  • I adapted the test case @andrewleech added in py/scheduler: warning about C callbacks scheduling new tasks. #17248 for the new behaviour and resubmitted here, so coverage test now includes C scheduler callbacks (testing the new logic).
  • Manually ran the rp2 unit tests on RP2_PICO board. As this is a tickless port I felt it had the most potential for an interrupt timing bug to surface due to this change. All passed.

Trade-offs and Alternatives

@projectgus projectgus added the py-core Relates to py/ directory in source label May 23, 2025
@projectgus projectgus requested a review from andrewleech May 23, 2025 05:07
@projectgus
Copy link
Contributor Author

@andrewleech Are you able to easily reproduce the issue that #17264 is fixing? Am interested to know if this fixes it OK, I noticed just now that PR links to your commit af4a8e9 which implies the problem is actually with recursive calls to mp_handle_pending() itself rather than callbacks re-adding themselves, so that might need a different fix...

Copy link

codecov bot commented May 23, 2025

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 98.54%. Comparing base (b153484) to head (8204853).
Report is 2 commits behind head on master.

Additional details and impacted files
@@            Coverage Diff             @@
##           master   #17347      +/-   ##
==========================================
- Coverage   98.54%   98.54%   -0.01%     
==========================================
  Files         169      169              
  Lines       21910    21943      +33     
==========================================
+ Hits        21591    21623      +32     
- Misses        319      320       +1     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

Copy link

github-actions bot commented May 23, 2025

Code size report:

   bare-arm:    +0 +0.000% 
minimal x86:    +0 +0.000% 
   unix x64:    +0 +0.000% standard
      stm32:    +8 +0.002% PYBV10
     mimxrt:    +8 +0.002% TEENSY40
        rp2:   +16 +0.002% RPI_PICO_W
       samd:    +8 +0.003% ADAFRUIT_ITSYBITSY_M4_EXPRESS
  qemu rv32:    +0 +0.000% VIRT_RV32

@andrewleech
Copy link
Contributor

Thanks for this @projectgus yes it's easy to reproduce with a stm32wb55 that's had its rfcore firmware wiped, when trying to start BLE it gets stuck trying to init forever in a loop, the higher level timeout code never runs.
I'll get a unit set up to test the change.

@andrewleech
Copy link
Contributor

Current master:

MicroPython v1.26.0-preview.148.g49f81d5046 on 2025-05-24; NUCLEO-WB55 with STM32WB55RGV6
Type "help()" for more information.
>>> 
>>> 
>>> from bluetooth import BLE
>>> BLE().active()
False
>>> BLE().active(True)
tl_ble_wait_resp: timeout
tl_ble_wait_resp: timeout
tl_ble_wait_resp: timeout
tl_ble_wait_resp: timeout
tl_ble_wait_resp: timeout
tl_ble_wait_resp: timeout
tl_ble_wait_resp: timeout
tl_ble_wait_resp: timeout
tl_ble_wait_resp: timeout
tl_ble_wait_resp: timeout
tl_ble_wait_resp: timeout
tl_ble_wait_resp: timeout

Note: Ctrl-C did not working here, I couldn't break out of the error loop other than hard reset.

With this PR:

MicroPython v1.26.0-preview.150.g8b14be1c02 on 2025-05-24; NUCLEO-WB55 with STM32WB55RGV6
Type "help()" for more information.
>>> from bluetooth import BLE
>>> BLE().active()
False
>>> BLE().active(True)
tl_ble_wait_resp: timeout
tl_ble_wait_resp: timeout
tl_ble_wait_resp: timeout
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
OSError: [Errno 110] ETIMEDOUT
>>> 

@projectgus
Copy link
Contributor Author

Thanks @andrewleech for confirming this fixes the WB55 issue!

@projectgus projectgus force-pushed the feature/c-scheduler-recurse branch from 8b14be1 to 98e28ef Compare May 30, 2025 06:33
projectgus and others added 2 commits June 4, 2025 11:31
Without this change, a scheduler callback which itself queues a new
callback will have that callback executed as part of the same scheduler
run. Where a callback may re-queue itself, this can lead to an infinite
loop.

With this change, each call to mp_handle_pending() will only service the
callbacks which were queued when the scheduler pass started - any callbacks
added during the run are serviced on the next mp_handle_pending().

This does mean some interrupts may have higher latency (as callback is
deferred until next scheduler run), but the worst-case latency should stay
very similar.

This work was funded through GitHub Sponsors.

Signed-off-by: Angus Gratton <angus@redyak.com.au>
Test modified to reschedule itself based on a flag setting. Without the
change in the parent commit, this test executes the callback indefinitely
and hangs but with the change it runs only once each time
mp_handle_pending() is called.

Modified-by: Angus Gratton <angus@redyak.com.au>
Signed-off-by: Andrew Leech <andrew.leech@planetinnovation.com.au>
@dpgeorge dpgeorge force-pushed the feature/c-scheduler-recurse branch from 98e28ef to 8204853 Compare June 4, 2025 01:31
Copy link
Member

@dpgeorge dpgeorge left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good now!

@dpgeorge dpgeorge merged commit 8204853 into micropython:master Jun 4, 2025
66 checks passed
@projectgus projectgus deleted the feature/c-scheduler-recurse branch June 5, 2025 01:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
py-core Relates to py/ directory in source
Projects
None yet
Development

Successfully merging this pull request may close these issues.

BLE init failures can cause infinite loop in scheduler
4 participants