-
-
Notifications
You must be signed in to change notification settings - Fork 8.3k
ports/rp2: Mark gc_heap NOLOAD for faster boot. #9017
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
It would definitely be good to improve rp2 start up speed, and this looks like a nice and simple thing to do to begin with. |
ports/rp2/memmap_mp.ld
Outdated
@@ -180,6 +180,10 @@ SECTIONS | |||
*(.uninitialized_data*) | |||
} > RAM | |||
|
|||
.noinit (NOLOAD) : { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can't you just use the uninitialized_data
section? It should do the same thing, no?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's what I thought, but in my tests it did not seem to make a difference. I might have to test again lest my assumptions have failed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The generated .map file should be enough to tell you if it works or not.
If it does require a new section, I'd suggest calling it uninitialized_bss
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And it should probably go after the bss
section, and be 4-aligned.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think uninitialized_bss
is a good shout, since it isn't strictly data (it's not initialized with any known value) that makes it clearer about intent.
This has made me wonder what I've raised an issue against Pico SDK since there's either something amiss or I'm very confused - |
Just to go a little further with my Normal build:
With
With
The linker script In terms of the final result- you were correct, simply marking |
Create a new linker section .unitialized_bss for bss that does not need zero-initialising. Move gc_heap to .unitialized_bss section. Saves ~30ms from rising edge of RESET to setting a pin HIGH in MicroPython. Zero fill happens in Pico SDK crt0.S before ROSC is configured. It's very, very slow. Signed-off-by: Phil Howard <phil@gadgetoid.com>
4600547
to
eedc9f9
Compare
Rebased and merged in 71f6eb5 |
Parts of the VM/runtime are relocated to RAM at start up, and this could probably also benefit from improvements. Eg only copy this data after switching to a higher clock frequency. |
This is the main thrust of my speedup hacks which are applied as a patch to The Pico SDK maintainers have shown some interest in making this easier, but I don't know how imminent that will be. You can currently use There's some discussion and links out to relevant issues here - raspberrypi/pico-sdk#959 As I mentioned above, this could potentially bring the stock Pico MicroPython startup to ~30-50ms. A dramatic improvement over even this change, since all copies to RAM (VM, runtime and initialized data) are sped up. Definitely something to keep in mind, since I think early clock setup - via whatever solution Pico SDK canonicalizes upon - would likely be the easiest way to accomplish what you mention. |
Let's keep an eye on that pico-sdk ticket. There are definitely gains to be made here in terms of improving startup time. |
I've been doing a fair bit of prodding and poking with MicroPython's startup, and there are a few speed gains to be had.
Notably the zero fill for BSS takes a long time, and the SRAM copy for initialized data even longer, since these both happen before ROSC is configured to any appreciable speed.
Editing Pico SDK's
crt0.S
entry point and configuring ROSC to ~48MHz dramatically reduces the time these copies take- reducing a MicroPython Pico W cold boot (including our libraries) from ~200ms to ~50ms. This is not something that's easy to do for MicroPython and for our own builds I'm hot patchingcrt0.S
with some extra features...Instead we can skip zeroing of
gc_heap.
This change doesn't get us as far as the clock config, but saves ~30ms by markinggc_heap
as.noinit
which, in turn, ensures it is not zero-filled at startup.This takes startup from a measured ~156ms down to ~120ms.
Before:

After:

This change was suggested by @jimmo on raspberrypi/pico-sdk#959