Skip to content

Conversation

wnienhaus
Copy link
Contributor

This aims to fix how frozen modules are handled/identified/distinguished.

It prefixes all frozen files with a "virtual directory", which is then added to sys.path (as opposed to before, where frozen files were found via the '' current directory). This is essentially option #2 suggested in issue #2322.

The virtual directory is called {frozen}, a name chosen to be highly unlikely to ever be a real directory name. (The :frozen: name suggested in issue #2322 was not suitable, because the colon (:) character is used as the path separator in the string representation of sys.path)

While the initial fix for the unix port was quite easy (prefix all frozen filenames with {frozen}/ and add {frozen} as the last entry in the default sys.path), it ended up being a bit of a rabbit hole, requiring more changes/fixes in the test suite and other ports, even revealing a potential bug which I will report/fix separately.

One issue I am not 100% happy about: some platforms use pyexec_frozen_module to run frozen py files on boot up. This function does not search paths defined in sys.path, and arguably shouldn't need to, since it uses the arrays with frozen files/strings and essentially "knows" where to find frozen files/modules. But due to the prefix of filenames in the array, it currently needs to be passed the prefix too. Changing this, so the function (and those it calls) internally skip over the prefix while matching file names is another rabbit hole, which I haven't yet gone into. So for now, pyexec_frozen_module must be given filenames with the prefix. I am happy for thoughts around this - maybe there is a nicer way to solve this.

@dhalbert
Copy link
Contributor

In CircuitPython, we use .frozen for this virtual directory name. We normally put libraries in lib/, and put .frozen before lib in sys.path. We use frozen modules primarily to avoid using up RAM import on boards with limited RAM, for libraries that work with on-board peripherals (like, say, an on-board accelerometer).

@peterhinch
Copy link
Contributor

While the initial fix for the unix port was quite easy (prefix all frozen filenames with {frozen}/ and add {frozen} as the last entry in the default sys.path),

To retain existing behaviour {frozen} (or .frozen) should be the first entry in sys.path. By default frozen modules take precedence over matching files on the filesystem. The user can modify this by altering sys.path.

@wnienhaus
Copy link
Contributor Author

To retain existing behaviour {frozen} (or .frozen) should be the first entry in sys.path. By default frozen modules take precedence over matching files on the filesystem. The user can modify this by altering sys.path.

You are right, on devices (I tested only esp8266/esp32) frozen modules take precedence because sys.path always has '' as the first entry and frozen modules are searched first, however, on the unix port this is not always true: when executing a script from a file, the first entry in sys.path is replaced with the directory of the that contains the called script, and in that case frozen modules are never searched (or found). This is what this PR is mainly trying to fix. (see issue #3509 for more discussion on this).

I did not consider backwards compatibility up until now (not intentionally) but wonder if it's good that frozen modules take precedence. If frozen modules come after the current directory in search order, it would allow overriding frozen code (e.g. for testing/hot-fixing) without having to rebuild/reflash the firmware, while the other way around this is not possible. On the unix port specifically, it feels a bit more aligned with CPython (see my tests documented in #3509) and .

Would it make sense to have a different default search order on different ports? e.g. have the unix port search frozen modules last (it would still be before builtins), while bare-metal ports search them first? (Would not be my choice, but maybe it's an option?)

(Btw, I like .frozen so there's alignment between MicroPython forks)

@dhalbert
Copy link
Contributor

dhalbert commented Sep 1, 2019

...If frozen modules come after the current directory in search order, it would allow overriding frozen code (e.g. for testing/hot-fixing) without having to rebuild/reflash the firmware, while the other way around this is not possible.

We use that idea to be able to override the modules we've frozen for RAM-space-saving.

CircuitPython's default sys.path is ['', '/', '.frozen', '/lib']. Normally what we freeze are versions of what might be in lib. When we first added an explicit .frozen, we added it at the end of sys.path, but this did not work out so well. When we had a smaller set of libraries we actually encouraged our users to download the latest library bundle (a collection of all our libraries) and unzip the whole thing into lib. But that meant there were duplicate copies of libraries in lib that overrode the frozen ones, which negated the RAM saving-behavior of using a frozen library. This was a significant support issue for us (discussed in adafruit#1310). Now there are far too many libraries to have them all in lib, so this is less important, but putting .frozen before lib still reduces the support queries.

To emphasize, our primary use for frozen code has been to save RAM on low-RAM boards. We don't bother on boards with more RAM. We haven't had the need yet to freeze mandatory initialization code. If we needed to do that we might consider adding another virtual directory that came earlier in sys.path

@peterhinch
Copy link
Contributor

If frozen modules come after the current directory in search order, it would allow overriding frozen code (e.g. for testing/hot-fixing) without having to rebuild/reflash the firmware, while the other way around this is not possible.

It can be done by having the following in main.py:

import sys
sys.path.append(sys.path.pop(0))

@dpgeorge
Copy link
Member

dpgeorge commented Jul 6, 2021

@dhalbert do you still use .frozen for the frozen path? Any further updates/insights on this topic since Sept 2019?

@wnienhaus
Copy link
Contributor Author

@dhalbert any update on your experience with the .frozen path? I looked over CircuitPython code and the history of the code around .frozen and it looks like my most recent approach is pretty much identical.

Are you aware of any limitations/issues that could be nice to address, or does it work as expected? If so and with your permission, I would actually propose using the code as it is in CircuitPython (except perhaps removing the double check as mentioned here adafruit#235 (review)).

I could update this PR accordingly.

tannewt added a commit to tannewt/circuitpython that referenced this pull request Aug 11, 2021
This prevents timer leakage on builds without pwmio.

Fixes micropython#5057
@wnienhaus
Copy link
Contributor Author

wnienhaus commented Sep 5, 2021

I have now rebased this branch on the latest master and improved the change:

  • I renamed the virtual directory from {frozen} to .frozen - to match what CircuitPython has done (and it feels easier to type in a REPL)
  • I have changed to the approach where methods in builtinimport.c are aware of the .frozen/ prefix, so the prefix is no longer needed in the filename encoded in the mp_frozen_mpy_names array.
  • I have added the .frozen directory to sys.path on more ports - I tried my best to establish which ports have frozen module support enabled, but might have missed some.
  • I have fixed the "extra coverage" tests, to pass with with the new .frozen directory behaviour

I also tested this:

  • with the unix port using make test
  • with the unix port using make VARIANT=coverage test
  • interactively on the ESP8266, which has a number of frozen modules by default (e.g. dht)

Update: I fixed some build failures:

  • Fixed the commit message and code style (squashed into main commit)
  • Fixed the rp2 build (separate commit 2918097, because I would like some input - MICROPY_MODULE_FROZEN is defined during the compile stage, but not when the QSTRs are generated (the preprocessing that results in qstr.i.last). Defining this explicitly in CMakeLists.txt feels a bit overkill - or likely duplicated because it must already come from somewhere.)
  • Fixed the qemu-arm build (separate commit)
  • Did not fix the "code size" check - I assume this needs some manual accepting?
  • I will squash the separate "fix" commits once the rp2 build fix has been reviewed.

Wilko Nienhaus and others added 2 commits September 5, 2021 12:31
This change "moves" frozen modules into a virtual directory
called ".frozen". This allows controlling the search order for
frozen modules, relative to other directories.

Before this change, frozen modules were found via the current
directory (i.e. the "" entry in sys.path). However, when a file
is executed from the command line the first entry in sys.path
(which is normally the empty string ""), is replaced by the base
path of the executing file (correct behaviour as per CPython).
This made loading frozen modules fail.

With this change, frozen modules have their own entry in
sys.path, which also allows the user to control the order of
search when looking for frozen modules.

By default, the virtual ".frozen" directory is the last entry
in sys.path (i.e. it gets searched last).

Fixes issue micropython#2322.
The fix ensures main.c is included when searching for qstrs like
MP_QSTR_Foo. This requires including tinyusb header files during
the qstr preprocessing step and also requires setting the
MICROPY_MODULE_FROZEN compile definition (this is for the step
that generates qstr.i.last - during the later compile stage
the define is correctly set).
@wnienhaus
Copy link
Contributor Author

Any update on this?

@dhalbert
Copy link
Contributor

@dhalbert do you still use .frozen for the frozen path? Any further updates/insights on this topic since Sept 2019?

I apologize for not replying to this earlier. I subscribe to all MicroPython repo notifications, and this slipped by.

We still use the same path: ['', '/', '.frozen', '/lib']. It has continued to work out well. We don't have many support issues regarding the frozen module path. There are more issues about multiple copies of library files, with some versions being out of date, such as an older .py file overriding a new .mpy. That is not a frozen module issue

@jimmo
Copy link
Member

jimmo commented Nov 17, 2021

I think .frozen makes more sense than |frozen, especially for CircuitPython compatibility.

A possible alternative is to use None or even ... as the sentinel value? i.e. sys.path = ['foo', ..., '/lib'] would search frozen modules after foo, but before /lib.
This is still valid in CPython too (it will ignore non-string entries). It's neat, but it would mean that you could no longer add a prefix path for frozen modules to sys.path.

For example if I'd frozen foo/bar/baz.py, the current behaviour is that I can do

>>> import baz
ImportError: no module named 'baz'
>>> sys.path.append('foo/bar')
>>> import baz
>>> baz
<module 'baz' from 'foo/bar/baz.py'>

With this PR, I'd now append .frozen/foo/bar instead, and TBH I think that probably makes sense. I have no idea if anyone depends on this behaviour, but I can see it being useful when freezing the entire codebase. OTOH, if we didn't want to support this, then it could simplify the import code to make it such that frozen imports are always top-level.

@dpgeorge
Copy link
Member

Superseded by #8079, which was merged, see 86394f7

Thanks @wnienhaus for this initial implementation!

@dpgeorge dpgeorge closed this Dec 17, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
py-core Relates to py/ directory in source
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants