-
-
Notifications
You must be signed in to change notification settings - Fork 8.2k
py/qstr: Disable qstr hashing on low-flash/mem boards. #12835
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Code size report:
|
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## master #12835 +/- ##
=======================================
Coverage 98.36% 98.36%
=======================================
Files 159 159
Lines 21088 21090 +2
=======================================
+ Hits 20743 20745 +2
Misses 345 345 ☔ View full report in Codecov by Sentry. |
@@ -35,7 +35,6 @@ | |||
#define MICROPY_GC_STACK_ENTRY_TYPE uint16_t | |||
#define MICROPY_GC_ALLOC_THRESHOLD (0) | |||
#define MICROPY_ALLOC_PATH_MAX (256) | |||
#define MICROPY_QSTR_BYTES_IN_HASH (1) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe this should stay, to keep SAMD size down? Or maybe move it to samd/mcu/samd21/mpconfigmcu.h
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
SAMD doesn't set the feature level, so this is the default anyway.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But the samd port has increased in size, so this change is not a no-op. We need to make a conscious decision about whether the current config for samd should be changed, or not.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right. I missed that it's set in the samd/mcu/*/mpconfigmcu.h
...
I think it makes sense then for samd21 to be 1 and samd51 to be 2. i.e. d21 is unchanged, but d51 is now moving to 2.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated the commit message to match.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jimmo I made the cross-check with the SAMD ItsyBitsy M4 build. Setting #define MICROPY_QSTR_BYTES_IN_HASH (0)
in mpconfigport,h reduced the flash size by 2064 bytes compared to the setting of this PR. Compared to the initial setting, the reduction is 1008 bytes. For the ItsyBitsy M0, the size decrease is 684 bytes.
Or in absolute numbers for the M4
QSTR_BYTES Flash size
2 260020
1 259012 (Previous setting)
0 257956
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see, the increase of 1020 bytes for SAMD51 was caused by the change of the hash size from 1 to 2 bytes. That should not be a problem. The only device & configuration short of flash and RAM is the SAMD21 without external flash. For these, setting the QSTR hash to 0 is an advantage. I could add that to the next service PR.
Edit: A single test using pystone_lowmem.py
shows a ~20% performance penalty. Not sure if that's worth the memory saving.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@robert-hh I think all those results sound expected. Thanks for checking. Just to confirm, I expect from this PR:
- D21 boards are unchanged and should have no firmware size change.
- D51 boards should grow by roughly 1kiB (and get a small performance boost).
For these, setting the QSTR hash to 0 is an advantage.
I don't think we should set bytes-in-hash to zero by default for anything except the "true" minimal ports/variants (where we're really trying to highlight the minimal possible configuration), or unless we're at the size limit (for example some of those stm32x0 boards). As you've noted, the performance impact is non-trivial.
20b8267
to
72512e7
Compare
py/qstr.c
Outdated
|
||
// search pools for the data | ||
for (const qstr_pool_t *pool = MP_STATE_VM(last_pool); pool != NULL; pool = pool->prev) { | ||
size_t low = 0; | ||
size_t low = pool->prev ? 0 : 1; // skip MP_QSTRnull at the start of the first pool. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure it's worth adding this logic. It's not needed (you'll never get to this point with str_len == 0
so the check agains the length in the search will never match MP_QSTRnull
or MP_QSTR_
) and adds an extra if/jump in a somewhat critical loop. Skipping just 1 entry in a binary search also won't make it noticeably faster. It also increases code size.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes I think I solved this in two different ways and didn't catch that in the rebase.
tools/mpy-tool.py
Outdated
qstr_content += ( | ||
config.MICROPY_QSTR_BYTES_IN_LEN + config.MICROPY_QSTR_BYTES_IN_HASH + len(qbytes) + 1 | ||
) | ||
qstr_content = qstr_size["metadata"] + qstr_size["data"] + 1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The + 1
should probably be + len(new)
, because it's counting the terminating null byte on each string data.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reworked this to make it clearer.
This disables using qstr hashes altogether, which saves RAM and flash (two bytes per interned string on a typical build) as well as code size. On PYBV11 this is worth over 3k flash. qstr comparison will now be done just by length then data. This affects qstr_find_strn although this has a negligible performance impact as, for a given comparison, the length and first character will ~usually be different anyway. String hashing (e.g. builtin `hash()` and map.c) now need to compute the hash dynamically, and for the map case this does come at a performance cost. This work was funded through GitHub Sponsors. Signed-off-by: Jim Mussared <jim.mussared@gmail.com>
Sets MICROPY_QSTR_BYTES_IN_HASH==0 on stm32x0 boards. This saves e.g. 2kiB on NUCLEO_F091. This work was funded through GitHub Sponsors. Signed-off-by: Jim Mussared <jim.mussared@gmail.com>
This will apply to bare-arm and minimal, as well as the minimal unix variant. Change the default to MICROPY_QSTR_BYTES_IN_HASH=1 for the CORE,BASIC levels, 2 for >=EXTRA. Removes explicit setting of MICROPY_QSTR_BYTES_IN_HASH==1 in ports that don't set the feature level (because 1 is implied by the default level, CORE). Applies to cc3200, pic16bt, powerpc. Removes explicit setting for nRF (which sets feature level). Also for samd, which sets CORE for d21 and FULL for d51. This means that d21 is unchanged with MICROPY_QSTR_BYTES_IN_HASH==1, but d51 now moves from 1 to 2 (roughly adds 1kiB). The only remaining port which explicitly set bytes-in-hash is rp2 because it's high-flash (hence CORE level) but lowish-SRAM, so it's worthwhile saving the RAM for runtime qstrs. This work was funded through GitHub Sponsors. Signed-off-by: Jim Mussared <jim.mussared@gmail.com>
72512e7
to
d419081
Compare
Rebased and updated. |
Similar perf results as before when running on pybv11 (with qstr-hash explicitly disabled, otherwise this PR is a no-op on pybv11).
|
(And it's a -3312 byte saving on pybv11) |
Thanks for updating, this is a good new option to reduce code size. |
This is the remainder of #10758 (after the qstr sorting was split into #12678).
Allow setting
MICROPY_QSTR_BYTES_IN_HASH
to zero, which has a significant code size saving (~3.5kiB on PYBv11) due to removing two bytes per qstr. This also translates to RAM saving at runtime for any strings interned at runtime. It also frees up a size_t field in mp_obj_str_t, which we could use in the future as the buffer for short string data (in particular for slices of bytes this might be a win).However, it comes at a performance cost, so I've only enabled it for minimal/bare-arm and very small boards.
Here's the perf diff on PYBV11 for reference: