py/qstr: Disable qstr hashing on low-flash/mem boards. #12835

jimmo · 2023-10-30T23:56:25Z

This is the remainder of #10758 (after the qstr sorting was split into #12678).

Allow setting MICROPY_QSTR_BYTES_IN_HASH to zero, which has a significant code size saving (~3.5kiB on PYBv11) due to removing two bytes per qstr. This also translates to RAM saving at runtime for any strings interned at runtime. It also frees up a size_t field in mp_obj_str_t, which we could use in the future as the buffer for short string data (in particular for slices of bytes this might be a win).

However, it comes at a performance cost, so I've only enabled it for minimal/bare-arm and very small boards.

Here's the perf diff on PYBV11 for reference:

$ ./run-perfbench.py -s ~/qstr-bytes-in-hash-baseline ~/qstr-bytes-in-hash-disabled 
diff of scores (higher is better)
N=100 M=100                /home/jimmo/qstr-bytes-in-hash-baseline -> /home/jimmo/qstr-bytes-in-hash-disabled         diff      diff% (error%)
bm_chaos.py                    347.77 ->     341.83 :      -5.94 =  -1.708% (+/-0.02%)
bm_fannkuch.py                  74.05 ->      73.83 :      -0.22 =  -0.297% (+/-0.01%)
bm_fft.py                     2339.34 ->    2335.32 :      -4.02 =  -0.172% (+/-0.00%)
bm_float.py                   5627.21 ->    5588.31 :     -38.90 =  -0.691% (+/-0.02%)
bm_hexiom.py                    46.90 ->      45.98 :      -0.92 =  -1.962% (+/-0.01%)
bm_nqueens.py                 4212.22 ->    4166.87 :     -45.35 =  -1.077% (+/-0.00%)
bm_pidigits.py                 648.46 ->     647.17 :      -1.29 =  -0.199% (+/-0.32%)
bm_wordcount.py                 46.53 ->      45.84 :      -0.69 =  -1.483% (+/-0.01%)
core_import_mpy_multi.py       633.19 ->     615.11 :     -18.08 =  -2.855% (+/-0.01%)
core_import_mpy_single.py      102.01 ->      98.43 :      -3.58 =  -3.509% (+/-0.07%)
core_locals.py                  38.94 ->      36.86 :      -2.08 =  -5.342% (+/-0.00%)
core_qstr.py                   207.64 ->     216.67 :      +9.03 =  +4.349% (+/-0.00%)
core_str.py                     28.31 ->      23.93 :      -4.38 = -15.472% (+/-0.00%)
core_yield_from.py             352.53 ->     356.25 :      +3.72 =  +1.055% (+/-0.02%)
misc_aes.py                    404.50 ->     405.32 :      +0.82 =  +0.203% (+/-0.00%)
misc_mandel.py                2989.24 ->    2984.08 :      -5.16 =  -0.173% (+/-0.01%)
misc_pystone.py               2286.84 ->    2224.38 :     -62.46 =  -2.731% (+/-0.00%)
misc_raytrace.py               359.96 ->     353.00 :      -6.96 =  -1.934% (+/-0.00%)

github-actions · 2023-10-31T00:06:37Z

Code size report:

   bare-arm:  -244 -0.429% 
minimal x86:  -371 -0.198% [incl -8(data)]
   unix x64:    -8 -0.001% standard
      stm32:    +0 +0.000% PYBV10
     mimxrt:    +0 +0.000% TEENSY40
        rp2:    +0 +0.000% RPI_PICO
       samd: +1020 +0.389% ADAFRUIT_ITSYBITSY_M4_EXPRESS

codecov · 2023-10-31T00:20:42Z

Codecov Report

All modified and coverable lines are covered by tests ✅

Comparison is base (307ecc5) 98.36% compared to head (d419081) 98.36%.

Additional details and impacted files

@@           Coverage Diff           @@
##           master   #12835   +/-   ##
=======================================
  Coverage   98.36%   98.36%           
=======================================
  Files         159      159           
  Lines       21088    21090    +2     
=======================================
+ Hits        20743    20745    +2     
  Misses        345      345

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

dpgeorge · 2023-10-31T02:17:43Z

ports/samd/mpconfigport.h

@@ -35,7 +35,6 @@
 #define MICROPY_GC_STACK_ENTRY_TYPE         uint16_t
 #define MICROPY_GC_ALLOC_THRESHOLD          (0)
 #define MICROPY_ALLOC_PATH_MAX              (256)
-#define MICROPY_QSTR_BYTES_IN_HASH          (1)


Maybe this should stay, to keep SAMD size down? Or maybe move it to samd/mcu/samd21/mpconfigmcu.h?

SAMD doesn't set the feature level, so this is the default anyway.

But the samd port has increased in size, so this change is not a no-op. We need to make a conscious decision about whether the current config for samd should be changed, or not.

Right. I missed that it's set in the samd/mcu/*/mpconfigmcu.h...

I think it makes sense then for samd21 to be 1 and samd51 to be 2. i.e. d21 is unchanged, but d51 is now moving to 2.

Updated the commit message to match.

@jimmo I made the cross-check with the SAMD ItsyBitsy M4 build. Setting #define MICROPY_QSTR_BYTES_IN_HASH (0) in mpconfigport,h reduced the flash size by 2064 bytes compared to the setting of this PR. Compared to the initial setting, the reduction is 1008 bytes. For the ItsyBitsy M0, the size decrease is 684 bytes.

Or in absolute numbers for the M4

QSTR_BYTES Flash size 2 260020 1 259012 (Previous setting) 0 257956

I see, the increase of 1020 bytes for SAMD51 was caused by the change of the hash size from 1 to 2 bytes. That should not be a problem. The only device & configuration short of flash and RAM is the SAMD21 without external flash. For these, setting the QSTR hash to 0 is an advantage. I could add that to the next service PR.
Edit: A single test using pystone_lowmem.py shows a ~20% performance penalty. Not sure if that's worth the memory saving.

@robert-hh I think all those results sound expected. Thanks for checking. Just to confirm, I expect from this PR:

D21 boards are unchanged and should have no firmware size change.

D51 boards should grow by roughly 1kiB (and get a small performance boost).

For these, setting the QSTR hash to 0 is an advantage.

I don't think we should set bytes-in-hash to zero by default for anything except the "true" minimal ports/variants (where we're really trying to highlight the minimal possible configuration), or unless we're at the size limit (for example some of those stm32x0 boards). As you've noted, the performance impact is non-trivial.

py/qstr.c

dpgeorge · 2024-01-16T23:32:08Z

py/qstr.c


    // search pools for the data
    for (const qstr_pool_t *pool = MP_STATE_VM(last_pool); pool != NULL; pool = pool->prev) {
-        size_t low = 0;
+        size_t low = pool->prev ? 0 : 1; // skip MP_QSTRnull at the start of the first pool.


I'm not sure it's worth adding this logic. It's not needed (you'll never get to this point with str_len == 0 so the check agains the length in the search will never match MP_QSTRnull or MP_QSTR_) and adds an extra if/jump in a somewhat critical loop. Skipping just 1 entry in a binary search also won't make it noticeably faster. It also increases code size.

Yes I think I solved this in two different ways and didn't catch that in the rebase.

py/qstr.c

tools/mpy-tool.py

dpgeorge · 2024-01-16T23:44:17Z

tools/mpy-tool.py

-        qstr_content += (
-            config.MICROPY_QSTR_BYTES_IN_LEN + config.MICROPY_QSTR_BYTES_IN_HASH + len(qbytes) + 1
-        )
+    qstr_content = qstr_size["metadata"] + qstr_size["data"] + 1


The + 1 should probably be + len(new), because it's counting the terminating null byte on each string data.

Reworked this to make it clearer.

ports/stm32/boards/NUCLEO_G0B1RE/mpconfigboard.h

This disables using qstr hashes altogether, which saves RAM and flash (two bytes per interned string on a typical build) as well as code size. On PYBV11 this is worth over 3k flash. qstr comparison will now be done just by length then data. This affects qstr_find_strn although this has a negligible performance impact as, for a given comparison, the length and first character will ~usually be different anyway. String hashing (e.g. builtin `hash()` and map.c) now need to compute the hash dynamically, and for the map case this does come at a performance cost. This work was funded through GitHub Sponsors. Signed-off-by: Jim Mussared <jim.mussared@gmail.com>

Sets MICROPY_QSTR_BYTES_IN_HASH==0 on stm32x0 boards. This saves e.g. 2kiB on NUCLEO_F091. This work was funded through GitHub Sponsors. Signed-off-by: Jim Mussared <jim.mussared@gmail.com>

This will apply to bare-arm and minimal, as well as the minimal unix variant. Change the default to MICROPY_QSTR_BYTES_IN_HASH=1 for the CORE,BASIC levels, 2 for >=EXTRA. Removes explicit setting of MICROPY_QSTR_BYTES_IN_HASH==1 in ports that don't set the feature level (because 1 is implied by the default level, CORE). Applies to cc3200, pic16bt, powerpc. Removes explicit setting for nRF (which sets feature level). Also for samd, which sets CORE for d21 and FULL for d51. This means that d21 is unchanged with MICROPY_QSTR_BYTES_IN_HASH==1, but d51 now moves from 1 to 2 (roughly adds 1kiB). The only remaining port which explicitly set bytes-in-hash is rp2 because it's high-flash (hence CORE level) but lowish-SRAM, so it's worthwhile saving the RAM for runtime qstrs. This work was funded through GitHub Sponsors. Signed-off-by: Jim Mussared <jim.mussared@gmail.com>

jimmo · 2024-01-25T05:40:40Z

Rebased and updated.

jimmo · 2024-01-25T05:53:26Z

Similar perf results as before when running on pybv11 (with qstr-hash explicitly disabled, otherwise this PR is a no-op on pybv11).

$ ./run-perfbench.py -s ~/mpy/perf/qstr-hash/*
diff of scores (higher is better)
N=100 M=100                /home/jimmo/mpy/perf/qstr-hash/baseline-pybv11 -> /home/jimmo/mpy/perf/qstr-hash/nohash-pybv11         diff      diff% (error%)
bm_chaos.py                    352.34 ->     346.21 :      -6.13 =  -1.740% (+/-0.00%)
bm_fannkuch.py                  75.08 ->      74.28 :      -0.80 =  -1.066% (+/-0.01%)
bm_fft.py                     2346.26 ->    2343.39 :      -2.87 =  -0.122% (+/-0.00%)
bm_float.py                   5724.38 ->    5695.75 :     -28.63 =  -0.500% (+/-0.03%)
bm_hexiom.py                    46.65 ->      45.70 :      -0.95 =  -2.036% (+/-0.00%)
bm_nqueens.py                 4224.78 ->    4222.52 :      -2.26 =  -0.053% (+/-0.00%)
bm_pidigits.py                 648.53 ->     649.82 :      +1.29 =  +0.199% (+/-0.32%)
bm_wordcount.py                 47.61 ->      46.57 :      -1.04 =  -2.184% (+/-0.01%)
core_import_mpy_multi.py       604.12 ->     587.47 :     -16.65 =  -2.756% (+/-0.00%)
core_import_mpy_single.py       99.28 ->      95.94 :      -3.34 =  -3.364% (+/-0.01%)
core_locals.py                  40.87 ->      38.34 :      -2.53 =  -6.190% (+/-0.00%)
core_qstr.py                   205.10 ->     214.22 :      +9.12 =  +4.447% (+/-0.00%)
core_str.py                     27.20 ->      23.34 :      -3.86 = -14.191% (+/-0.00%)
core_yield_from.py             354.68 ->     357.16 :      +2.48 =  +0.699% (+/-0.00%)
misc_aes.py                    406.64 ->     408.54 :      +1.90 =  +0.467% (+/-0.00%)
misc_mandel.py                3018.46 ->    3012.86 :      -5.60 =  -0.186% (+/-0.00%)
misc_pystone.py               2355.08 ->    2282.43 :     -72.65 =  -3.085% (+/-0.01%)
misc_raytrace.py               368.25 ->     361.76 :      -6.49 =  -1.762% (+/-0.00%)

jimmo · 2024-01-25T05:54:41Z

(And it's a -3312 byte saving on pybv11)

dpgeorge · 2024-01-26T02:44:00Z

Thanks for updating, this is a good new option to reduce code size.

dpgeorge reviewed Oct 31, 2023

View reviewed changes

jimmo force-pushed the qstr-disable-hash branch 3 times, most recently from 20b8267 to 72512e7 Compare October 31, 2023 05:57

dpgeorge added py-core Relates to py/ directory in source micropython-lib and removed micropython-lib labels Nov 3, 2023

dpgeorge reviewed Jan 16, 2024

View reviewed changes

jimmo added 3 commits January 25, 2024 16:38

stm32: Disable qstr hashing on small boards.

8486e28

Sets MICROPY_QSTR_BYTES_IN_HASH==0 on stm32x0 boards. This saves e.g. 2kiB on NUCLEO_F091. This work was funded through GitHub Sponsors. Signed-off-by: Jim Mussared <jim.mussared@gmail.com>

jimmo force-pushed the qstr-disable-hash branch from 72512e7 to d419081 Compare January 25, 2024 05:40

dpgeorge merged commit d419081 into micropython:master Jan 26, 2024

dhalbert mentioned this pull request Mar 19, 2025

MICROPY_QSTR_BYTES_IN_HASH performance/size tradeoff adafruit/circuitpython#10151

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

py/qstr: Disable qstr hashing on low-flash/mem boards. #12835

py/qstr: Disable qstr hashing on low-flash/mem boards. #12835

jimmo commented Oct 30, 2023

github-actions bot commented Oct 31, 2023 •

edited

Loading

codecov bot commented Oct 31, 2023 •

edited

Loading

dpgeorge Oct 31, 2023

jimmo Oct 31, 2023

dpgeorge Oct 31, 2023

jimmo Oct 31, 2023

jimmo Oct 31, 2023

robert-hh Oct 31, 2023 •

edited

Loading

robert-hh Oct 31, 2023 •

edited

Loading

jimmo Nov 1, 2023

dpgeorge Jan 16, 2024

dpgeorge Jan 25, 2024

jimmo Jan 25, 2024

dpgeorge Jan 16, 2024

jimmo Jan 25, 2024

jimmo commented Jan 25, 2024

jimmo commented Jan 25, 2024

jimmo commented Jan 25, 2024

dpgeorge commented Jan 26, 2024

py/qstr: Disable qstr hashing on low-flash/mem boards. #12835

py/qstr: Disable qstr hashing on low-flash/mem boards. #12835

Conversation

jimmo commented Oct 30, 2023

github-actions bot commented Oct 31, 2023 • edited Loading

codecov bot commented Oct 31, 2023 • edited Loading

Codecov Report

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

robert-hh Oct 31, 2023 • edited Loading

Choose a reason for hiding this comment

robert-hh Oct 31, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jimmo commented Jan 25, 2024

jimmo commented Jan 25, 2024

jimmo commented Jan 25, 2024

dpgeorge commented Jan 26, 2024

github-actions bot commented Oct 31, 2023 •

edited

Loading

codecov bot commented Oct 31, 2023 •

edited

Loading

robert-hh Oct 31, 2023 •

edited

Loading

robert-hh Oct 31, 2023 •

edited

Loading