Skip to content

py/persistentcode: Store constants more efficiently #12008

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 5 commits into
base: master
Choose a base branch
from

Conversation

arbrauns
Copy link
Contributor

Instead of being stored as strings:

  • Floats are stored as little-endian bytes
  • Integers are stored as a length byte and the minimum number of little-endian bytes required.

This yields .mpy file space savings of 0.8% to 17% for the scripts I tested with.

Generated using:

make -C mpy-cross clean && make -C mpy-cross
cd tests/frozen/
../../mpy-cross/build/mpy-cross frozentest.py

For some reason, ordering of interned strings has changed, and the file
still used the old "feature flags" in the header instead of the native
arch and subversion combination.

Signed-off-by: Armin Brauns <armin.brauns@embedded-solutions.at>
@dpgeorge dpgeorge added the py-core Relates to py/ directory in source label Jul 14, 2023
@arbrauns arbrauns force-pushed the persistentcode-const-shrink branch from 845946e to 913ac6a Compare July 17, 2023 08:03
@arbrauns
Copy link
Contributor Author

Some CI runs seem to get stuck, any ideas?

@jimmo
Copy link
Member

jimmo commented Jul 17, 2023

This looks really interesting, thanks @arbrauns !

Some CI runs seem to get stuck, any ideas?

mpy-cross is getting stuck in an infinite loop while compiling extmod/asyncio/__init__.py

The infinite loop is the for (mp_int_t mask = (mp_int_t)0xff << 8 * (sizeof(mp_int_t) - 1); mask != 0; mask >>= 8) { in save_code (persistentcode.c:656). Seems to be when val==0.

@arbrauns
Copy link
Contributor Author

Ah! If you know where to look... That was supposed to be an mp_uint_t; mp_int_t uses arithmetic right shifts and can never reach zero, of course.

@jimmo
Copy link
Member

jimmo commented Jul 17, 2023

@arbrauns I think you just need to change for (mp_int_t mask to for (mp_uint_t mask.

I'm curious though why this didn't come up in your local testing...? What compiler are you using?

@arbrauns arbrauns force-pushed the persistentcode-const-shrink branch from 913ac6a to 6120f4c Compare July 17, 2023 09:30
@arbrauns
Copy link
Contributor Author

I'm curious though why this didn't come up in your local testing...? What compiler are you using?

It seems like none of the files I tested against had a 0 integer constant.

@github-actions
Copy link

github-actions bot commented Jul 17, 2023

Code size report:

   bare-arm:    +0 +0.000% 
minimal x86:    +0 +0.000% 
   unix x64:  +432 +0.054% standard
      stm32:  +272 +0.069% PYBV10
     mimxrt:  +264 +0.074% TEENSY40
        rp2:  +280 +0.086% PICO
       samd:  +200 +0.077% ADAFRUIT_ITSYBITSY_M4_EXPRESS

@jimmo
Copy link
Member

jimmo commented Jul 17, 2023

So this is +224 bytes on PYBV11 (for example). This is small, but not trivial.

One consideration is that freezing undoes all of of the encoding and turns them into the exact structures that will be loaded. So the Python code frozen into the firmware gets no benefit.

It also has no change on the RAM usage once the .mpy is loaded (because the loaded mp_float_t or integer value is the same).

But, making .mpy files smaller is in general a good goal, so this is definitely worth spending some time thinking about! Is your use case dominated primarily by any one type (float, small int, or big int?)

@arbrauns
Copy link
Contributor Author

My use case is mostly floats, which are always exactly 4 bytes when binary-encoded, but potentially huge (10+ bytes) as strings.

This allows users of mp_obj_int_to_bytes_impl() to allocate a buffer of
the correct size.

Signed-off-by: Armin Brauns <armin.brauns@embedded-solutions.at>
@arbrauns arbrauns force-pushed the persistentcode-const-shrink branch from 6120f4c to c4e9ac4 Compare July 17, 2023 13:21
Previously, the bytes would always be interpreted as unsigned.

Signed-off-by: Armin Brauns <armin.brauns@embedded-solutions.at>
@arbrauns arbrauns force-pushed the persistentcode-const-shrink branch 3 times, most recently from eeda2e1 to 09505ad Compare July 17, 2023 14:13
arbrauns added 2 commits July 17, 2023 14:33
Instead of being stored as strings:
- Floats are stored as little-endian bytes
- Integers are stored as a length byte and the minimum number of
  little-endian bytes required.

This also necessitates a bump of the mpy version.

Signed-off-by: Armin Brauns <armin.brauns@embedded-solutions.at>
The mpy version has changed.

Signed-off-by: Armin Brauns <armin.brauns@embedded-solutions.at>
@arbrauns arbrauns force-pushed the persistentcode-const-shrink branch from 09505ad to 578256a Compare July 17, 2023 14:34
@codecov
Copy link

codecov bot commented Jul 17, 2023

Codecov Report

Merging #12008 (578256a) into master (14c2b64) will decrease coverage by 0.04%.
The diff coverage is 87.93%.

@@            Coverage Diff             @@
##           master   #12008      +/-   ##
==========================================
- Coverage   98.41%   98.38%   -0.04%     
==========================================
  Files         155      155              
  Lines       20564    20600      +36     
==========================================
+ Hits        20238    20267      +29     
- Misses        326      333       +7     
Impacted Files Coverage Δ
py/builtinimport.c 100.00% <ø> (ø)
py/mpz.h 100.00% <ø> (ø)
py/objint_mpz.c 98.12% <33.33%> (-1.88%) ⬇️
py/persistentcode.c 95.97% <91.17%> (-1.13%) ⬇️
py/compile.c 99.87% <100.00%> (-0.01%) ⬇️
py/mpz.c 100.00% <100.00%> (ø)
py/objint.c 100.00% <100.00%> (ø)

@jimmo
Copy link
Member

jimmo commented Jul 21, 2023

@arbrauns I was thinking about this some more...

The current behavior seems inefficient, but because it's inherently variable-length, it's actually quite good for values that have a short decimal representation. i.e. 1.3 is 3 bytes, but a value with many decimal places is potentially e.g. 15+ bytes.
With the proposed change, because mpy-cross always runs with MP_FLOAT_IMPL=double we will now always emit 8 bytes for a floating point value. So we bound the worst case, but make the best (and I suspect average) case worse.

For integers, the proposed change improves the efficiency of the current variable-length scheme. The current implementation is already variable-length and works well for small values.

So I wondered maybe a small modification to the existing scheme would be worthwhile -- encode as BCD (i.e. nibble per digit, use 0xff as .). For doubles with 15-16 digits, you'll now be at ~8 bytes worst case. Although you do still need the length byte.

It's a bit tricky to implement this efficiently for encoding (because the current code gets a nice benefit from just using mp_obj_print_helper), but this is only enabled in mpy-cross so adding code size isn't so important. On the decode side it might not be too bad.

@arbrauns
Copy link
Contributor Author

That would be another good chunk of work that I don't currently have time to implement and test. As an alternative idea, how about supporting both float encodings in mpy (binary and via mp_obj_print_helper), and encoding each float constant as whichever variant is shorter? It would strictly make float decoding more complex than it was before, of course, but as far as I can tell, the binary float decoding should be fairly trivial and not have much code size impact.

@projectgus
Copy link
Contributor

This is an automated heads-up that we've just merged a Pull Request
that removes the STATIC macro from MicroPython's C API.

See #13763

A search suggests this PR might apply the STATIC macro to some C code. If it
does, then next time you rebase the PR (or merge from master) then you should
please replace all the STATIC keywords with static.

Although this is an automated message, feel free to @-reply to me directly if
you have any questions about this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
py-core Relates to py/ directory in source
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants