Skip to content

py: Define a set of static qstrs to reduce size of mpy files. #4473

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 3 commits into from

Conversation

dpgeorge
Copy link
Member

@dpgeorge dpgeorge commented Feb 7, 2019

The idea here is to define a small set (<200 entries) of static qstrs which are "forever" fixed and are the same on all ports, all archs, all builds (or at least those with persistent code loading/saving enabled). This essentially defines a "qstr ABI".

It allows to easily optimise .mpy files, because if a qstr is a static qstr (easy to test, just has id <= QSTR_LAST_STATIC) then its string representation doesn't need to be stored. Basically the bytecode can be fixed (immutable) when these static qstrs are used.

Maybe it also allows other optimisations in the future.

The main thing would be to find a nice set of static qstrs that are common. Provided here is a starting point for that set. It is allowed to change over time, but if it does change then the .mpy version number must be increased.

This reduces .mpy size by about 10%, as measured by esp32/modules: 32469 -> 29038 bytes. No extra RAM is needed to load/save .mpy files, and loading/saving should also be more efficient with this change.

in bytecode: if qstr < QSTR_LAST_STATIC: use value verbatim
in arg list: if qstr < QSTR_LAST_STATIC: store 0, qstr_id (2 bytes total)

reduces .mpy by about 10%, as measured in esp32/modules: 32469 -> 29038 bytes
@dpgeorge
Copy link
Member Author

dpgeorge commented Feb 7, 2019

This is complementary to #4472. For background see #3054.

@pfalcon
Copy link
Contributor

pfalcon commented Feb 7, 2019

This sounds like a hack. Maybe it would be good "last mile"optimization when everything else was done, but doesn't sound like ingenuity on itself.

Why don't do real engineering on this stuff? Per #4472 (comment) , allow for "qstr table definition" to appear inbetween code objects stored. It would be stored just as a number of entries, followed by literal strings. All code objects following that table would refer to qstr's by index of the entry in the last table. The table would need to be allocated during load, yeah, but can be freed afterwards. Essentially, it's 2-level scheme allowing to refer to the same qstr without mentioning it literally all the time.

Of course, that will require more advanced post-processing of persistent .mpy, and that's the whole point - mpy-cross is unmaintainable joke. It all should be rewritten in Python!

@dpgeorge
Copy link
Member Author

dpgeorge commented Feb 8, 2019

Maybe it would be good "last mile"optimization when everything else was done,

It would help other optimisations as well. Even with a global qstr table, that table wouldn't need to store any of the static qstrs, thus reducing its size. And as mentioned above, this static qstr optimisation saves about 10% of code size in a .mpy file. Qstrs take roughly 50% of a .mpy, so that means 20% of qstrs (roughly) in a .mpy are from this small set of static qstrs. That's a lot of qstrs that can be removed.

... The table would need to be allocated during load, yeah, but can be freed afterwards

That's why I wanted to first concentrate on optimisations that don't need any extra RAM in the loader.

It all should be rewritten in Python!

The majority of this patch is written in Python.

@pfalcon
Copy link
Contributor

pfalcon commented Feb 8, 2019

That's a lot of qstrs that can be removed.

Yes, but it doesn't do anything smart, and instead introduces ABI versioning issues...

The majority of this patch is written in Python.

But it doesn't do anything smart... I'm talking about doing even more of bytecode manipulation in Python (more than mpy-tool already does). Btw, I added initial code to do that (based on reworking mpy-tool), exposed as much as possible thru CPython-compatible opcode, dis modules to micropython-lib: https://forum.micropython.org/viewtopic.php?f=15&t=5744

Anyway, I assume the topic of this patch is to take qstrs which already exist in the C codebase, and assigning them fixed id's. Not adding some random qstrs (so that would be next tempting step). Sounds good then, +1.

@dpgeorge
Copy link
Member Author

introduces ABI versioning issues

It will be the same version as the .mpy file version.

I assume the topic of this patch is to take qstrs which already exist in the C codebase, and assigning them fixed id's

Yes, the initial list here was taken from bare-arm and minimal builds, with a few stream qstrs added like write and read, and also self which is used a lot.

@peterhinch
Copy link
Contributor

Is there a reason why some keywords are absent? Three that come to mind are set, yield and async but doubtless there are others.

@dpgeorge
Copy link
Member Author

Is there a reason why some keywords are absent? Three that come to mind are set, yield and async but doubtless there are others.

Language keywords like yield and async won't be needed because they'll never be stored in a .mpy file (you can't have a function called "yield", for example). "set" would be one to add, it's just not there right now because sets are an optional (compile-time) feature in uPy so aren't in all builds.

@dpgeorge
Copy link
Member Author

dpgeorge commented Mar 5, 2019

This feature was merged in 4f0931b

@dpgeorge dpgeorge closed this Mar 5, 2019
@dpgeorge dpgeorge deleted the py-static-qstr-list branch March 5, 2019 05:49
dlech referenced this pull request in pybricks/pybricks-micropython Nov 11, 2019
This is needed (hopefully temporarily) so that Pybricks MicroPython
(v1.11.x) can coexist with the existing MicroPython (v1.9.x) on
ev3dev-stretch. There was a major .mpy file format change, so the two
are no longer compatible.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants