Skip to content

rp2: Fix soft timer expiry waking early from lightsleep. #16431

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed

Conversation

projectgus
Copy link
Contributor

@projectgus projectgus commented Dec 17, 2024

Summary

Fixes #16181 and partial fix for #16180. Specifically: bug where PICO-W build won't lightsleep() for more than 64ms.

Details

This is a regression introduced in 74fb42a (#13329) when we switched away from the pico-sdk alarm pool for soft timers and accidentally made soft timer expiry a wakeup source.

Before 74fb42a (including V1.23), both the pico-sdk alarm pool and the lightsleep wakeup timer uses alarm 3. The lightsleep timer would quietly clobber alarm 3's timeout, meaning softtimer events wouldn't wake the chip from lightsleep.

After 74fb42a (including V1.24), soft timer wakeup happens on timer alarm 2 so this interrupt can wake the chip from light sleep. On PICO-W builds this happens every lwIP tick (64ms) which is unexpected.

The change here is to go back to using the same timer alarm for both lightsleep wakeup and soft timer, but now being explicit about lightsleep wakeup clobbering any soft timer wakeup. Also adds a "catch up" call to soft timer handler as it's not currently there.

This also reworks the changes added in 19844b4 to enable the timer IRQ on CPU1. The functionality is basically the same, just being a bit more explicit and by reworking this and the other code there's no more "magic IRQ number 3" in this code.

Thanks to @cpottle9 for investigating this issue and pinpointing 74fb42a as the regression commit. 🙏

This work was funded through GitHub Sponsors.

Testing

  • Basic test case on PICO-W.
  • Unit tests pass on PICO-W.
  • Test wakeup from pin IRQ as also reported in lightsleep() not working properly Pi Pico WH v1.24.0 #16180 (might need to be fixed separately). (Note: Tested this works provided timeout is provided, issue without timeout is tracked in rp2: Can't wake from lightsleep with Pin IRQ #7035)
  • Test on RP2350 (help wanted, have ordered hardware but probably won't get until next year).
  • Add unit test for light sleep running from thread on core1.
  • Add unit test verifying that soft timer callbacks resume after waking from lightsleep. EDIT: Can't do this actually, as machine.Timer() uses the alarm pool. However, lightsleep and Wi-Fi are compatible so soft timer is resuming OK.
  • Test Wi-Fi stays valid after lightsleep (with a timeout).

Trade-offs and Alternatives

  • We were hoping to go back to the pico-sdk alarm pool rather than using our own hardware timer alarm now that Issues with best_effort_wfe_or_timeout and WFE in SDK 2.0.0 raspberrypi/pico-sdk#1812 is closed, however @dpgeorge mentioned here that the new version of the alarm pool may have another bug - maybe the one fixed in ta_set_timeout can fail to set an alarm raspberrypi/pico-sdk#2127 ?
  • In general, it might be expected that soft timers added by Python code can wake the rp2 from lightsleep. In that case, we could fix this by disabling the 64ms lwIP tick wakeup timer before going to lightsleep. Lightsleep wakeup could then be just another soft timer entry.
  • More extreme analysis would be that lwIP tick should wake from lightsleep so that TCP connections can stay up, etc. I'm not sure I buy this one as the rp2 port is already tickless so "idle" is close to this behaviour, but you could make the case. If doing that instead then we should at least disable the lwIP tick while no network interface is up, though.

Copy link

github-actions bot commented Dec 17, 2024

Code size report:

   bare-arm:    +0 +0.000% 
minimal x86:    +0 +0.000% 
   unix x64:    +0 +0.000% standard
      stm32:    +0 +0.000% PYBV10
     mimxrt:    +0 +0.000% TEENSY40
        rp2:   +32 +0.004% RPI_PICO_W
       samd:    +0 +0.000% ADAFRUIT_ITSYBITSY_M4_EXPRESS
  qemu rv32:    +0 +0.000% VIRT_RV32

@projectgus projectgus changed the title WIP: rp2: Use the soft timer hardware alarm to wake from lightsleep. WIP: rp2: Fix soft timer expiry waking early from lightsleep. Dec 17, 2024
@projectgus projectgus force-pushed the bugfix/rp2_lightsleep_timeout branch from 0c92842 to 9a34859 Compare December 17, 2024 00:39
@cpottle9
Copy link
Contributor

@projectgus, I built micropython for RPI_PICO_W with your changes and tested them.
Looks great.

As part of that, I ran the attached example running a periodic Timer once per second during the lightsleep.
lightsleep_with_timer.txt

I just happened to have this code lying around for testing sleep_ms and I thought to give it a try.
This specific case worked correctly.
lightsleep() returned after 1 second as expected and the Timer function ran.

I know interrupts from other sources (EG: Pin) don't work but this one does.

@projectgus
Copy link
Contributor Author

Thanks for taking a look and testing this, @cpottle9!

I just happened to have this code lying around for testing sleep_ms and I thought to give it a try. This specific case worked correctly. lightsleep() returned after 1 second as expected and the Timer function ran.

Ironically, machine.Timer can now wake the rp2 from lightsleep because it uses the alarm pool (timer interrupt 3). So I assume it doesn't work on either V1.23 or V1.24, and may stop working on future versions if we switch back to also using the alarm pool for the soft timer!

This is a bit hand-wavey as USB data wakes up lightsleep() pretty often,
but this modified test passes on PICO builds and fails consistently on
PICO_W builds where the 64ms LWIP tick timer always wakes up lightsleep.

This work was funded through GitHub Sponsors.

Signed-off-by: Angus Gratton <angus@redyak.com.au>
Fixes issue where PICO-W build won't lightsleep() for more than 64ms. This
is a regression introduced in 74fb42a when we switched away from the
pico-sdk alarm pool for soft timers and accidentally made soft timer expiry
a wakeup source.

Before 74fb42a, both the pico-sdk alarm pool and the lightsleep wakeup
timer uses alarm 3. The lightsleep wakeup would quietly clobber alarm 3's
timeout, meaning softtimer wouldn't wake the chip from lightsleep.

After 74fb42a, soft timer wakeup happens on timer alarm 2 so this interrupt
wakes the chip from light sleep. On PICO-W builds this happens every lwIP
tick (64ms).

The change is to go back to using the same timer alarm for both lightsleep
wakeup and soft timer, but now being explicit about lightsleep wakeup
clobbering any soft timer wakeup. Also adds a "catch up" call to soft timer
handler as it's currently possible to miss soft timer events after waking
up.

This also reworks the changes added in 19844b to enable the timer IRQ on
CPU1. This is still needed, but have changed the implementation to be more
explicit about when it's needed.

This work was funded through GitHub Sponsors.

Signed-off-by: Angus Gratton <angus@redyak.com.au>
This work was funded through GitHub Sponsors.

Signed-off-by: Angus Gratton <angus@redyak.com.au>
@projectgus projectgus force-pushed the bugfix/rp2_lightsleep_timeout branch from 9a34859 to e9b8a36 Compare December 18, 2024 06:15
@projectgus projectgus changed the title WIP: rp2: Fix soft timer expiry waking early from lightsleep. rp2: Fix soft timer expiry waking early from lightsleep. Dec 18, 2024
Copy link

codecov bot commented Dec 18, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 98.57%. Comparing base (a9945fc) to head (e9b8a36).
Report is 265 commits behind head on master.

Additional details and impacted files
@@           Coverage Diff           @@
##           master   #16431   +/-   ##
=======================================
  Coverage   98.57%   98.57%           
=======================================
  Files         164      164           
  Lines       21349    21349           
=======================================
  Hits        21045    21045           
  Misses        304      304           

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@peterharperuk
Copy link
Contributor

Having a constant lwIP tick counter of 64ms is probably a waste when it will tell us when it next needs to wake up via sys_timeouts_sleeptime.

@cpottle9
Copy link
Contributor

@projectgus and other maintainers of the rp2 port, I've been following changes related to lightsleep and the shoe just dropped for me. I did not see what you are trying to do until now.

You are trying to solve world hunger for lightsleep and I don't think that is good use of your time.
By that, I mean you are trying to provide maximum power reduction and yet not break functionality.
I see the functionality you are trying to preserve includes USB, WIFI, and use of the second core.

If I'm running a PICO_W from battery to save power the first thing I need to do is disable the cyw43 (and WIFI) when I don't need it. That includes not using the system LED. I have an application where I do wlan.deinit() and then lightsleep.
Further, when my application is deployed I don't need USB (I know, you fixed that one).
Running on battery I probably don't want to make use of the second core either.

I propose you need make lightsleep simpler and empower developers(like me) to decide which hardware blocks need to enabled for a given application.
To that end, I created micropython code that allows a developer to modify the WAKE_ENx and SLEEP_ENx registers on the RP2040 and RP2350. If you have time, take a look at my github RP2-PowerControl.
My idea is users stop using machine.lightsleep and use time.sleep_ms instead ( make machined.lightsleep do the same thing as time.sleep_ms).
Whenever a __WFI or __WFE happens the contents of SLEEP_ENx registers will be used.
Further, if a user knows they do not need specific blocks the can update WAKE_ENx registers to power them down all the time.

@projectgus
Copy link
Contributor Author

Having a constant lwIP tick counter of 64ms is probably a waste when it will tell us when it next needs to wake up via sys_timeouts_sleeptime.

Ah, thanks @peterharperuk! The rp2 port is the only tickless port in MicroPython at the moment, so it's the only place this really counts. If we can easily make lwIP tickless as well then we have a lot more options. I'd assumed this wasn't simple but sounds like it may well be.

You are trying to solve world hunger for lightsleep and I don't think that is good use of your time.
By that, I mean you are trying to provide maximum power reduction and yet not break functionality.
I see the functionality you are trying to preserve includes USB, WIFI, and use of the second core.

This is a good point, @cpottle9, although it's less "not break functionality" and more "not crash or hang" which is the bottom line here. For example, the reason that USB gets special handling in rp2 lightsleep is because disabling (or suspending) on sleep and re-enabling on wake caused intermittent crashes when either the peripheral or TinyUSB gets in some permanent bad state. Similarly, the workarounds for the second core are to prevent execution stalling indefinitely if the other CPU gets permanently stuck in WFI. If the impact is limited to "feature stops on sleep and starts on wake" or "power consumption is higher when this feature is in use" then these are probably reasonable trade-offs. Permanent hangs or hard faults aren't reasonable trade-offs.

I propose you need make lightsleep simpler and empower developers(like me) to decide which hardware blocks need to enabled for a given application.
To that end, I created micropython code that allows a developer to modify the WAKE_ENx and SLEEP_ENx registers on the RP2040 and RP2350. If you have time, take a look at my github RP2-PowerControl.

That's a great project, and I think it's a good approach for people who want the absolute most control over the sleep modes.

Part of the goal of MicroPython, and the machine module in particular, is to provide some abstraction of the hardware. To date, low-power modes are not the best example of this in MicroPython as there's not much of a common design and things that are quite different between each port. However, long term that's the kind of direction we'd want to go. I would say that as a first goal, having a "lowest reasonable consumption" machine.lightsleep() and a "low power but max functionality" time.sleep() will meet a lot of people's needs.

That's not to say that something like RP2-PowerControl isn't a great option for folks who need that fine-grained register-level control, though.

@projectgus
Copy link
Contributor Author

If the impact is limited to "feature stops on sleep and starts on wake" or "power consumption is higher when this feature is in use" then these are probably reasonable trade-offs

I realise there is another trade-off we could choose for some of these states, which would be to have lightsleep degrade to a normal WFI or even raise a RuntimeException if we know it's not safe to disable clocks for lightsleep.

There are still footguns we don't want to leave lying around, though. For example, even if someone's battery powered application isn't designed to run with USB in the field then they might want to sometimes plug USB in and do some debugging. If the behaviour in that state is significantly different then it can become quite hard to debug.

@cpottle9
Copy link
Contributor

@projectgus, your USB issue is valid.
But dealing with can be the developers problem not yours.

Their application can determine if USB is present and have RP2-PowerCtrl disable USB only when it is not.

By the way, I initially coded RP2-PowerCtrl as part of modrp2.c.
If the owners of micropython agree I could finish that work and create a pull request.

@cpottle9
Copy link
Contributor

cpottle9 commented Dec 18, 2024

@projectgus one other thing. Prior to commit d1423ef on entry lightsleep saved the current value of sleep_en0 and sleep_en1.
Before exiting it restored the saved values.

Now before exiting lightsleep sets those registers to the power on defaults.

That means if someone uses RP2-PowerCtrl and then calls lightsleep their changes to sleep_en0 and sleep_en1 are lost.

Should I open an issue?

@projectgus
Copy link
Contributor Author

By the way, I initially coded RP2-PowerCtrl as part of modrp2.c.
If the owners of micropython agree I could finish that work and create a pull request.

That could be one way forward, although if everything works the same in pure Python code then it's less appealing to add binary size for it in modrp2. I also just tagged you in this discussion, where I vaguely outlined another possible approach:
#16183 (comment)

Now before exiting lightsleep sets those registers to the power on defaults.

That means if someone uses RP2-PowerCtrl and then calls lightsleep their changes to sleep_en0 and sleep_en1 are lost.

Should I open an issue?

Yes, we can definitely change this back. If possible then suggest you open a Pull Request restoring behaviour that works for RP2-PowerCtrl, and then we can go from there.

@projectgus
Copy link
Contributor Author

projectgus commented Dec 20, 2024

Couple of things have become apparent about this approach:

  1. It seems much more sensible to merge Alarm pool sleep changes #16454 and go back to using the Pico alarm pool. That PR currently switches to use a dedicated timer alarm for lightsleep wakeup, which seems like a good choice but it will mean that the lwIP tick timer still wakes us up from lightsleep.
  2. Switching to "tickless" lwIP doesn't immediately solve the problem either. LWIP runs all its "cyclic" timers all of the time, the fastest of which run at 100ms for IGMP and MLD6. MicroPython will need to either calculate when no interface is enabled and disable timer callbacks (which means lightsleep+wifi can work seamlessly), or we disable either the lwIP timer callbacks while we're in lightsleep or the entire alarm pool (in which case we may as well keep the 64ms lwIP tick).

@projectgus
Copy link
Contributor Author

projectgus commented Jan 14, 2025

Quick update after discussing with @dpgeorge : Strong preference for #16454 to be the long-term fix. Hopeful that there will be a pico-sdk release with the final alarm pool fix included and we can go direct to that one.

So I'm not planning to pick this approach up again for now. If this is a major inconvenience for anyone, please leave a comment.

EDIT/note-to-self: If this is eventually closed then should open a new PR with just the test improvements.

@cpottle9
Copy link
Contributor

cpottle9 commented Jan 14, 2025

@projectgus, thanks for the update. You prompted me to look at #16454.
It explains the symptoms I described in #16562.

If I understand correctly it will be a while before the issues in pico-sdk are addressed.
So, this problem will not get resolved quickly. Correct?

@projectgus
Copy link
Contributor Author

If I understand correctly it will be a while before the issues in pico-sdk are addressed. So, this problem will not get resolved quickly. Correct?

The issue in pico-sdk is addressed now, but there hasn't yet been a pico-sdk release that includes the fix. Once a release comes out, and assuming no other issues crop up when we update pico-sdk, then we'll be able to proceed with the changes in #16454.

@projectgus
Copy link
Contributor Author

Closing in favour of the approach from #16454

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

lightsleep on Pico W regression for 1.24.0
3 participants