py/persistentcode: Store constants more efficiently #12008

arbrauns · 2023-07-14T10:25:37Z

Instead of being stored as strings:

Floats are stored as little-endian bytes
Integers are stored as a length byte and the minimum number of little-endian bytes required.

This yields .mpy file space savings of 0.8% to 17% for the scripts I tested with.

Generated using: make -C mpy-cross clean && make -C mpy-cross cd tests/frozen/ ../../mpy-cross/build/mpy-cross frozentest.py For some reason, ordering of interned strings has changed, and the file still used the old "feature flags" in the header instead of the native arch and subversion combination. Signed-off-by: Armin Brauns <armin.brauns@embedded-solutions.at>

arbrauns · 2023-07-17T08:24:43Z

Some CI runs seem to get stuck, any ideas?

jimmo · 2023-07-17T09:23:00Z

This looks really interesting, thanks @arbrauns !

Some CI runs seem to get stuck, any ideas?

mpy-cross is getting stuck in an infinite loop while compiling extmod/asyncio/__init__.py

The infinite loop is the for (mp_int_t mask = (mp_int_t)0xff << 8 * (sizeof(mp_int_t) - 1); mask != 0; mask >>= 8) { in save_code (persistentcode.c:656). Seems to be when val==0.

arbrauns · 2023-07-17T09:30:56Z

Ah! If you know where to look... That was supposed to be an mp_uint_t; mp_int_t uses arithmetic right shifts and can never reach zero, of course.

jimmo · 2023-07-17T09:30:57Z

@arbrauns I think you just need to change for (mp_int_t mask to for (mp_uint_t mask.

I'm curious though why this didn't come up in your local testing...? What compiler are you using?

arbrauns · 2023-07-17T09:34:01Z

I'm curious though why this didn't come up in your local testing...? What compiler are you using?

It seems like none of the files I tested against had a 0 integer constant.

github-actions · 2023-07-17T09:41:55Z

Code size report:

   bare-arm:    +0 +0.000% 
minimal x86:    +0 +0.000% 
   unix x64:  +432 +0.054% standard
      stm32:  +272 +0.069% PYBV10
     mimxrt:  +264 +0.074% TEENSY40
        rp2:  +280 +0.086% PICO
       samd:  +200 +0.077% ADAFRUIT_ITSYBITSY_M4_EXPRESS

jimmo · 2023-07-17T09:49:10Z

So this is +224 bytes on PYBV11 (for example). This is small, but not trivial.

One consideration is that freezing undoes all of of the encoding and turns them into the exact structures that will be loaded. So the Python code frozen into the firmware gets no benefit.

It also has no change on the RAM usage once the .mpy is loaded (because the loaded mp_float_t or integer value is the same).

But, making .mpy files smaller is in general a good goal, so this is definitely worth spending some time thinking about! Is your use case dominated primarily by any one type (float, small int, or big int?)

arbrauns · 2023-07-17T09:50:33Z

My use case is mostly floats, which are always exactly 4 bytes when binary-encoded, but potentially huge (10+ bytes) as strings.

This allows users of mp_obj_int_to_bytes_impl() to allocate a buffer of the correct size. Signed-off-by: Armin Brauns <armin.brauns@embedded-solutions.at>

Previously, the bytes would always be interpreted as unsigned. Signed-off-by: Armin Brauns <armin.brauns@embedded-solutions.at>

Instead of being stored as strings: - Floats are stored as little-endian bytes - Integers are stored as a length byte and the minimum number of little-endian bytes required. This also necessitates a bump of the mpy version. Signed-off-by: Armin Brauns <armin.brauns@embedded-solutions.at>

The mpy version has changed. Signed-off-by: Armin Brauns <armin.brauns@embedded-solutions.at>

codecov · 2023-07-17T15:08:18Z

Codecov Report

Merging #12008 (578256a) into master (14c2b64) will decrease coverage by 0.04%.
The diff coverage is 87.93%.

@@            Coverage Diff             @@
##           master   #12008      +/-   ##
==========================================
- Coverage   98.41%   98.38%   -0.04%     
==========================================
  Files         155      155              
  Lines       20564    20600      +36     
==========================================
+ Hits        20238    20267      +29     
- Misses        326      333       +7

Impacted Files	Coverage Δ
py/builtinimport.c	`100.00% <ø> (ø)`
py/mpz.h	`100.00% <ø> (ø)`
py/objint_mpz.c	`98.12% <33.33%> (-1.88%)`	⬇️
py/persistentcode.c	`95.97% <91.17%> (-1.13%)`	⬇️
py/compile.c	`99.87% <100.00%> (-0.01%)`	⬇️
py/mpz.c	`100.00% <100.00%> (ø)`
py/objint.c	`100.00% <100.00%> (ø)`

jimmo · 2023-07-21T00:57:15Z

@arbrauns I was thinking about this some more...

The current behavior seems inefficient, but because it's inherently variable-length, it's actually quite good for values that have a short decimal representation. i.e. 1.3 is 3 bytes, but a value with many decimal places is potentially e.g. 15+ bytes.
With the proposed change, because mpy-cross always runs with MP_FLOAT_IMPL=double we will now always emit 8 bytes for a floating point value. So we bound the worst case, but make the best (and I suspect average) case worse.

For integers, the proposed change improves the efficiency of the current variable-length scheme. The current implementation is already variable-length and works well for small values.

So I wondered maybe a small modification to the existing scheme would be worthwhile -- encode as BCD (i.e. nibble per digit, use 0xff as .). For doubles with 15-16 digits, you'll now be at ~8 bytes worst case. Although you do still need the length byte.

It's a bit tricky to implement this efficiently for encoding (because the current code gets a nice benefit from just using mp_obj_print_helper), but this is only enabled in mpy-cross so adding code size isn't so important. On the decode side it might not be too bad.

arbrauns · 2023-07-21T07:15:41Z

That would be another good chunk of work that I don't currently have time to implement and test. As an alternative idea, how about supporting both float encodings in mpy (binary and via mp_obj_print_helper), and encoding each float constant as whichever variant is shorter? It would strictly make float decoding more complex than it was before, of course, but as far as I can tell, the binary float decoding should be fairly trivial and not have much code size impact.

projectgus · 2024-03-07T23:50:04Z

This is an automated heads-up that we've just merged a Pull Request
that removes the STATIC macro from MicroPython's C API.

See #13763

A search suggests this PR might apply the STATIC macro to some C code. If it
does, then next time you rebase the PR (or merge from master) then you should
please replace all the STATIC keywords with static.

Although this is an automated message, feel free to @-reply to me directly if
you have any questions about this.

dpgeorge added the py-core Relates to py/ directory in source label Jul 14, 2023

arbrauns force-pushed the persistentcode-const-shrink branch from 845946e to 913ac6a Compare July 17, 2023 08:03

arbrauns force-pushed the persistentcode-const-shrink branch from 913ac6a to 6120f4c Compare July 17, 2023 09:30

py/objint: Add mp_obj_int_max_bytes_needed_impl().

83db674

This allows users of mp_obj_int_to_bytes_impl() to allocate a buffer of the correct size. Signed-off-by: Armin Brauns <armin.brauns@embedded-solutions.at>

arbrauns force-pushed the persistentcode-const-shrink branch from 6120f4c to c4e9ac4 Compare July 17, 2023 13:21

py/mpz: Allow reading signed bigint from bytes.

4e53e49

Previously, the bytes would always be interpreted as unsigned. Signed-off-by: Armin Brauns <armin.brauns@embedded-solutions.at>

arbrauns force-pushed the persistentcode-const-shrink branch 3 times, most recently from eeda2e1 to 09505ad Compare July 17, 2023 14:13

arbrauns added 2 commits July 17, 2023 14:33

tests: Regenerate frozentest.mpy.

578256a

The mpy version has changed. Signed-off-by: Armin Brauns <armin.brauns@embedded-solutions.at>

arbrauns force-pushed the persistentcode-const-shrink branch from 09505ad to 578256a Compare July 17, 2023 14:34

Gadgetoid mentioned this pull request Feb 29, 2024

global: Remove the STATIC macro. #13763

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

py/persistentcode: Store constants more efficiently #12008

py/persistentcode: Store constants more efficiently #12008

arbrauns commented Jul 14, 2023

arbrauns commented Jul 17, 2023

jimmo commented Jul 17, 2023 •

edited

Loading

arbrauns commented Jul 17, 2023

jimmo commented Jul 17, 2023

arbrauns commented Jul 17, 2023

github-actions bot commented Jul 17, 2023 •

edited

Loading

jimmo commented Jul 17, 2023

arbrauns commented Jul 17, 2023

codecov bot commented Jul 17, 2023 •

edited

Loading

jimmo commented Jul 21, 2023

arbrauns commented Jul 21, 2023

projectgus commented Mar 7, 2024

py/persistentcode: Store constants more efficiently #12008

Are you sure you want to change the base?

py/persistentcode: Store constants more efficiently #12008

Conversation

arbrauns commented Jul 14, 2023

arbrauns commented Jul 17, 2023

jimmo commented Jul 17, 2023 • edited Loading

arbrauns commented Jul 17, 2023

jimmo commented Jul 17, 2023

arbrauns commented Jul 17, 2023

github-actions bot commented Jul 17, 2023 • edited Loading

jimmo commented Jul 17, 2023

arbrauns commented Jul 17, 2023

codecov bot commented Jul 17, 2023 • edited Loading

Codecov Report

jimmo commented Jul 21, 2023

arbrauns commented Jul 21, 2023

projectgus commented Mar 7, 2024

jimmo commented Jul 17, 2023 •

edited

Loading

github-actions bot commented Jul 17, 2023 •

edited

Loading

codecov bot commented Jul 17, 2023 •

edited

Loading