Skip to content

py/builtinimport: Allow built-in modules to be packages (v3) #11456

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 9 commits into from
Jun 1, 2023

Conversation

jimmo
Copy link
Member

@jimmo jimmo commented May 10, 2023

See #5701 (v2), #4731 (v1), and adafruit#2657 (inspiration for v2 & v3).

There are two parts. See the commit messages for more details.

  • Rework the import mechanism slightly to make sub-package importing generally a bit easier to follow and not do so many stats. This also gives a small code size improvement.
  • Simple addition to allow a built-in module to add a sub-module to its globals dict.

One area from the first commit is worth digging into... We've discussed many times the future of the feature-formerly-known-as-weak-links. I still strongly feel that we should remove weak links and u-prefixed modules. See recent discussion in #9018 (and PR in #9069). One of the sticking points was that u-modules provide a mechanism to extend built-in modules. I proposed a few alternative options in #9018, including a generic mechanism that also allows extending filesystem modules, however this PR gives a new option thanks to fixing the CPython incompatibility with sys.path. If you do, e.g. in os.py

saved_path = sys.path[:]
sys.path.clear()
try:
  from os import *
finally:
  sys.path.extend(saved_path)
  del saved_path

then this will force the import to be from the built-in.

Edit: See #11456 (comment) for a summary of changes of behavior relative to v1.20.

@github-actions
Copy link

github-actions bot commented May 10, 2023

Code size report:

   bare-arm:    +8 +0.014% 
minimal x86:   +64 +0.034% 
   unix x64:   -40 -0.005% standard
      stm32:   -40 -0.010% PYBV10
        rp2:   -24 -0.007% PICO

@dlech
Copy link
Contributor

dlech commented May 10, 2023

If a user wants a fast (i.e. no filesystem) built-in import to improve
startup time, or to bypass the filesystem to force a built-in import,
they can temporarily clear sys.path.

We have been relying on the previous behavior for improved startup time. Would it be possible to have a special entry in sys.path to control when builtins are checked similar to how we have .frozen already? This way, we don't have to modify all of our files or try to explain this to users.

@dpgeorge
Copy link
Member

We have been relying on the previous behavior for improved startup time

I don't quite follow: what exactly is the previous behaviour and why does it have improved startup time?

@dlech
Copy link
Contributor

dlech commented May 10, 2023

On LEGO MINDSTORMS EV3 (300MHz Arm + Linux + micropython-lib installed) there is a very noticeable difference between the time it takes to import os which loads os.mpy from micropython-lib vs import uos which loads the builtin uos module.

If we drop support for u prefixes and hard-code the import search order, import os will always default importing from micropython-lib but we would rather users not have to know about that and get the fast builtin import by default and only opt in to use micropython-lib if they really need it.

Dropping micropython-lib from the default sys.path and requiring users to manually add it to opt-in seems like a decent workaround though in this case. But in our frozen modules where we really just want the builtin imports and never micropython-lib, we would have to add the sys.path swapping workaround too.

@jimmo
Copy link
Member Author

jimmo commented May 10, 2023

We have been relying on the previous behavior for improved startup time.

It is possible to maintain the previous behavior but keep the rest of the changes here by changing this PR to do

if level_mod_name.startswith("u") and module_obj := mp_module_get_builtin(full_mod_name):
  return module_obj

immediately before the call to stat_top_level.

Would it be possible to have a special entry in sys.path to control when builtins are checked similar to how we have .frozen already?

I considered this, but while I was thinking about how to implement this I noticed the incorrect behavior of the sys.path walking (with respect to an empty sys.path).

The main reason I'm hesitant to take this approach is that if the magic e.g. ".builtin" entry is missing then the behavior is very confusing, and this will break anyone's code that modifies sys.path. (Similar concern when we added .frozen, but the result isn't quite as weird).

This way, we don't have to modify all of our files or try to explain this to users.

I understand. However, if we go ahead with #9069 then this is likely going to have to happen anyway.

That said, even if we do merge #9069 we'll still need to provide a mechanism for e.g. import utime to work, so as not to break all the code out there that does this. My plan for that was to just invert the weak-link behavior in this PR as it stands now (i.e. if builtin fails, then try without the leading "u"). But we could also do it as:

if level_mod_name.startswith("u") and module_obj := mp_module_get_builtin(full_mod_name[1:]):
  return module_obj

before the filesystem check to maintain import ufoo as a "skip filesystem import".

@jimmo
Copy link
Member Author

jimmo commented May 11, 2023

I have added another commit to the branch that adds the "bypass filesystem for u-prefixed builtin modules" logic. It only costs about 20 bytes, so we're still ahead.
If we merge #9069 then this needs a simple change to remove the "u" from level_mod_name before calling mp_module_get_builtin.

Personally I would prefer not to do this, and just have a standard CPython-compatible way (e.g. sys.path manipulation) of explicitly choosing to use the built-in module, but in the interest of backwards compatibility it seems reasonable. Perhaps we could guard this behind a config option?

Note that it's still not exactly the same behavior as before this PR, because non-u-prefixed modules will now search the filesystem first. e.g. it wasn't previously possible to extend, say, pyb from Python, and now it is (using the sys.path method).

@jimmo
Copy link
Member Author

jimmo commented May 11, 2023

The test failures will be fixed by #11468. This is due to when --via-mpy is enabled, the test is now handling import micropython from the tests/micropython directory.

Note that it's still not exactly the same behavior as before this PR, because non-u-prefixed modules will now search the filesystem first. e.g. it wasn't previously possible to extend, say, pyb from Python, and now it is (using the sys.path method).

Just to follow up on this (prompted by the above test issue), we may decide that (for performance reasons) we never want e.g. import micropython to hit the filesystem (and possibly other non-u-prefixed builtins, but I can't think of any right now). But I think we should handle this explicitly in builtinimport.c, i.e.

if (level_mod_name_str[0] == 'u' || level_mod_name == MP_QSTR_micropython) {

We have been relying on the previous behavior for improved startup time.

@dlech @dpgeorge FWIW, on the topic of performance concerns, the current behavior does actually do the full sys.path walk for all modules (loaded, builtin, filesystem). If you look at https://github.com/micropython/micropython/blob/master/py/builtinimport.c#L362 even if we successfully match mp_module_get_loaded_or_builtin, it still proceeds to stat_top_level_dir_or_file.

So the change from this PR should be a no-op on performance, and if we include the extra commit then it's actually a performance gain relative to the current implementation for importing u-prefix builtins.

(I introduced this regression in the last attempt to refactor the import mechanism... the previous implementation was very complicated and I wasn't aware that this was the expected behaviour).

@jimmo
Copy link
Member Author

jimmo commented May 11, 2023

So the change from this PR should be a no-op on performance, and if we include the extra commit then it's actually a performance gain relative to the current implementation for importing u-prefix builtins.

Actually this PR is a gain regardless for the case of importing any already-loaded module because we now just return immediately. (Something that the previous code didn't do, just incase the import was actually for a sub package).

py/mpconfig.h Outdated
@@ -850,6 +850,13 @@ typedef double mp_float_t;
#define MICROPY_MODULE_BUILTIN_INIT (MICROPY_CONFIG_ROM_LEVEL_AT_LEAST_EXTRA_FEATURES)
#endif

// Whether to allow built-in modules to have sub-packages (by making the
// sub-package a member of their locals dict). The sub-package __name__ must
// be foo.bar, and only top-level modules may have __init__.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does __init__ here refer to a module __init__() method or an __init__.py file?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The former (i.e. MICROPY_MODULE_BUILTIN_INIT).

The reason is that the module is a member of its parents globals dict and can be accessed without being imported. i.e. if baz is a submodule:

import foo
foo.bar.baz()

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That is a bit unfortunate. We make occasional use of that and would have to come up with a workaround. Probably just have to init all modules at mp_init() whether they are used or not.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You should be able to have a C-level __init__ on the top-level module (eg foo, or in your case pybricks), which initialises that module and all submodules of it. That would run when foo is imported, or any submodule like foo.bar.

Probably the comment above should be improved to point this out clearly, that importing any submodule will initialise its top-level module, ie import foo.bar will call foo's __init__.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated the comment.

@dpgeorge dpgeorge added the py-core Relates to py/ directory in source label May 12, 2023
@jimmo jimmo force-pushed the import-subpackage-v3 branch 2 times, most recently from 3fe16a1 to 8b2e268 Compare May 12, 2023 07:18
@jimmo
Copy link
Member Author

jimmo commented May 12, 2023

I've updated this with tests and an example user c module. This test might need to move to a different directory (or have special handling in run-tests.py).

The behavior now compared v1.20:

  • Setting sys.path=[] will force a built-in import, even if a filesystem module with that name is already imported.
  • Importing ufoo, sys or micropython will never search the filesystem, regardless of sys.path.
  • Importing an already loaded module will also not search the filesystem (unless sys.path is empty as above).
  • Any built-in (except sys and micropython) can now be extended from the filesystem (not just u-prefix ones). e.g. pyb.
  • Built-in modules may now be packages with sub-modules (or sub-packages).
  • Built-in modules with __init__ must now do their own tracking of whether init has already been called. (urandom did this already anyway) (Described in more detail in the corresponding commit).

@jimmo jimmo force-pushed the import-subpackage-v3 branch 2 times, most recently from 431dcae to 8c95285 Compare May 12, 2023 13:47
static bool seeded = false;
if (!seeded) {
seeded = true;
if (!MP_STATE_VM(mod_urandom_seeded)) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now this will be seeded after each soft reset, so the comment could probably use updating too.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes this is exactly the bug that this change is fixing. It seems like soft-reset should cause a re-seed, so this is now the expected behavior.

Is there a comment somewhere else that says the wrong thing (w.r.t. soft reset) or are you saying I should just add a note here?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The line above says // that it's only ever seeded once. It's not wrong, just ambiguous.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah yep, thanks. Fixed!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change looks good, but shouldn't it be in its own commit? To me it doesn't look particularly related to 3cf95f6

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes it could go in a separate commit.

But I don't see how it will re-seed on soft reset, because soft reset doesn't clear out the root pointers (also it's an int so shouldn't really go in the root pointer section, that will slow down the GC by unnecessarily tracing it).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@andrewleech It's because the logic that used to prevent duplicate initialization is no longer available. There used to be this kind of weird feature were specifically built-in modules that had an __init__ get stored in sys.modules (but regular built-ins do not).

That said, modrandom was already working around another quirk of that old behavior (urandom vs random), so I figured that showing a "correct" way that a module-with-init can address this was related to the same change.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I still think the state should not be stored in the root pointer section (see above), and so this change to modurandom.c could be entirely dropped.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK I've removed this.

Will re-visit in the future with an MP_REGISTER_STATE() macro.

}
STATIC MP_DEFINE_CONST_FUN_OBJ_0(example_package_foo_bar_f_obj, example_package_foo_bar_f);

// Define all properties of the second-level sub-package (example_package.foo.bar).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// Define all properties of the second-level sub-package (example_package.foo.bar).
// Define all attributes of the second-level sub-package (example_package.foo.bar).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This wording is copied from examples/usercmodule/cexample/examplemodule.c. I agree though, "attributes" is more correct. I'll update both.

}
STATIC MP_DEFINE_CONST_FUN_OBJ_0(example_package_foo_f_obj, example_package_foo_f);

// Define all properties of the first-level sub-package (example_package.foo).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably worth mentioning here that __init__ doesn't work and all init for subpackages/modules needs to be done in the top-level package.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe not a well though out idea, but maybe a way to avoid this limitation is to add an "ensure initialized" check to the module_attr() function, e.g. if elem->value is a module instance, call __init__ on it.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK will do.

Yeah that could work! This would mean that __init__ would need to be looked up and called every time the sub-module was accessed though which will come at a performance cost. (Also the type check isn't free, and would happen on all module attribute access).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

More brainstorming ideas: instead of adding submodules to the globals dict of a module, could we introduce a new mp_type_package type that is the same a mp_type_module but contains an extra pointer to a table of submodles? Or instead of creating a new type, use a dunder like __package_dict__ in the globals table so that the submodules can't be accessed directly?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

introduce a new mp_type_package type that is the same a mp_type_module

We were just talking about doing exactly this in the context of #11466... by introducing a module type "module with delegation" we can solve the weirdness around the delegation entry. This does lead to a slightly weird type(sys) != type(os) for example. But there are more direct ways to solve that particular issue (e.g. wrap the pointer in a dummy object).

@dpgeorge For module delegation did we consider having a ROM table of (pointer-to-module-obj, delegation function) that could be searched? (Could use a variant of MP_REGISTER_MODULE, i.e. MP_REGISTER_MODULE_WITH_DELEGATION). We could do something similar for subpackages I guess.

use a dunder like package_dict in the globals table so that the submodules can't be accessed directly?

I guess what you're saying is that module_attr would know about package_dict. That could work! Has a RAM cost of course, so that should be considered.

I guess the question is -- is this a theoretical or a practical issue for pybricks? i.e. is it really important to delay initialisation of submodules until they're explicitly imported? Can you solve this in other ways, e.g. do on-demand initialisation when methods of the submodule are used? That's extra code size of course to call check_init() at the start of every function in the submodule (and make_new on each type).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this a theoretical or a practical issue for pybricks?

No, not particularly. I'm just throwing these out as brainstorming ideas since there isn't an obvious ideal solution. There are certainly plenty of ways to get what we need regardless of what lands upstream here.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's leave this discussion/issue regarding more sophisticated __init__ of submodules for a later time. It's a change/improvement that can be made independently.

For module delegation did we consider having a ROM table of (pointer-to-module-obj, delegation function) that could be searched?

No I didn't consider that. If there were a lot of modules in that table it would be slow to search (but might be a good solution if only a few modules needed delegation).

STATIC MP_DEFINE_CONST_FUN_OBJ_0(example_package_f_obj, example_package_f);

STATIC mp_obj_t example_package___init__(void) {
mp_printf(&mp_plat_print, "example_package.__init__\n");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the only-run-__init__-once must be here now, it would be helpful to include that in the example.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, good point! Added. (And also updated the comment in mpconfig.h).

@jimmo jimmo force-pushed the import-subpackage-v3 branch from 8c95285 to 951539e Compare May 12, 2023 15:42
@jimmo
Copy link
Member Author

jimmo commented May 12, 2023

A note on the force-builtin-via-sys.path from discussion with @dpgeorge ... Currently sys.path cannot be assigned to, however since the module delegation support was added (and conveniently, already in use on sys), it would be possible to allow assignement to sys.path, which would allow this variation:

_path = sys.path
sys.path = ()
try:
  import time
finally:
  sys.path = _path
  del _path

() is a const tuple, so the only allocation is the globals slot for _path which might involve resizing the map. However, it's very likely that this resize won't go to waste because the module is likely to subsequently define at least something else.

Note that usages of mp_obj_list_get in builtinimport.c would need to change to mp_obj_get_array (which supports both lists and tuples).

@dlech
Copy link
Contributor

dlech commented May 12, 2023

Shouldn't it be...

_path = sys.path
sys.path = ()
try:
    import time
finally:
    sys.path = _path
    del _path

Or are we assuming that importing time will never fail since it is builtin? In some configs it may not exist at all though.

@jimmo
Copy link
Member Author

jimmo commented May 14, 2023

Or are we assuming that importing time will never fail since it is builtin? In some configs it may not exist at all though.

Yes I think you're right. It shouldn't happen but if it does, it would be bad to leave sys.path empty. I'll update the examples above. Thanks

#if MICROPY_PY_SYS
// Never allow sys to be overridden from the filesystem. If weak links
// are disabled, then this also provides a default weak link so that
// `import sys` works.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that CPython definitely handles sys differently because it doesn't have sys.__file__.

@jimmo jimmo force-pushed the import-subpackage-v3 branch from 951539e to 7efda84 Compare May 15, 2023 03:46
@stinos
Copy link
Contributor

stinos commented May 16, 2023

This does lead to a slightly weird type(sys) != type(os) for example

Is this already in one of the commits? If so please mention that explicitly, we happen to implement 'is_module' exactly by comparing with type(sys).

Importing ufoo, sys or micropython will never search the filesystem, regardless of sys.path.
Any built-in (except sys and micropython) can now be extended from the filesystem

That would break https://github.com/micropython/micropython-lib/pull/582/files right? Any way around that which doesn't involve changes to the micropython codebase itself?

@dpgeorge
Copy link
Member

This does lead to a slightly weird type(sys) != type(os) for example

Is this already in one of the commits?

No, we won't do that, it was just an idea. As you point out, it would be strange if different modules have different types.

Importing ufoo, sys or micropython will never search the filesystem, regardless of sys.path.
Any built-in (except sys and micropython) can now be extended from the filesystem

That would break https://github.com/micropython/micropython-lib/pull/582/files right? Any way around that which doesn't involve changes to the micropython codebase itself?

Yes this current PR will break that micropython-lib PR, because sys is treated as special.

Note that CPython also treats sys as special, it's always available regardless of isolation or path manipulation (think: how would you adjust sys.path to change where sys.py is found?).

IMO sys should be a module that is fully implemented in C and not extendable in Python. Pretty much everything it implements is tightly coupled to the core runtime.

Well, it doesn't have to be this way. We could continue to provide a way for sys.py to override the built-in sys, by continuing to provide an alias like usys which always retrieves the built-in module. But I think it's cleaner to just say that sys is fixed.

@dpgeorge
Copy link
Member

Any way around that which doesn't involve changes to the micropython codebase itself?

We could allow storing new attributes to sys from Python, eg sys.intern = .... But that would require it being done by some start up code (eg boot.py/site.py), rather than automatically when the code does import sys.

@jimmo
Copy link
Member Author

jimmo commented May 16, 2023

This does lead to a slightly weird type(sys) != type(os) for example

Is this already in one of the commits? If so please mention that explicitly, we happen to implement 'is_module' exactly by comparing with type(sys).

@stinos no sorry that wasn't clear -- I was saying that if we did do it this way (either subpackage support or module delegation) it would lead to this surprising behaviour. We don't plan to do this (for either).

Importing ufoo, sys or micropython will never search the filesystem, regardless of sys.path.
Any built-in (except sys and micropython) can now be extended from the filesystem

That would break https://github.com/micropython/micropython-lib/pull/582/files right? Any way around that which doesn't involve changes to the micropython codebase itself?

It would. However I would argue perhaps that we should just add sys.intern to the core.

@stinos
Copy link
Contributor

stinos commented May 16, 2023

I don't mind sys not being extensible in Python, but then it would be nice if one could define additional attributes like with MICROPY_PORT_BUILTINS, which would actually be fairly simple to do.

However I would argue perhaps that we should just add sys.intern to the core.

Perhaps, but seems like I'm the only one who encoutered it yet..

@jimmo
Copy link
Member Author

jimmo commented May 23, 2023

This PR also fixes #11594. Added a test to verify this.

print(sys, hasattr(sys, "__file__"))

sys.path.clear()
sys.path.append("import/ext") # TODO: Update after #11468 to "ext"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

#11468 has been merged

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed & rebased.

@jimmo jimmo force-pushed the import-subpackage-v3 branch from 2963f49 to 662e4f4 Compare May 24, 2023 06:38
@codecov-commenter
Copy link

codecov-commenter commented May 24, 2023

Codecov Report

Merging #11456 (5159304) into master (69dd013) will increase coverage by 0.00%.
The diff coverage is 100.00%.

@@           Coverage Diff           @@
##           master   #11456   +/-   ##
=======================================
  Coverage   98.39%   98.40%           
=======================================
  Files         156      156           
  Lines       20615    20609    -6     
=======================================
- Hits        20285    20281    -4     
+ Misses        330      328    -2     
Impacted Files Coverage Δ
py/obj.h 100.00% <ø> (ø)
py/builtinimport.c 100.00% <100.00%> (+1.05%) ⬆️
py/objmodule.c 100.00% <100.00%> (ø)
py/runtime.c 98.89% <100.00%> (+<0.01%) ⬆️

@dpgeorge
Copy link
Member

Doing my own testing of this PR, here are the differences in the stat'ing (checking the filesystem for a file/dir) that is done when doing imports:

  • Doing import sys or import micropython: prior to this PR it would look for the module on the filesystem, then fallback to the built-in. With this PR it will always go straight to the built-in.
  • Importing a module prefixed with u, eg import utime, import uctypes: prior to this PR it would check the filesystem, then fallback to the built-in. With this PR it will go straight to the built-in.
  • Importing a single-file .py module from the filesystem multiple times, eg import iperf3 or import urequests multiple times: prior to this PR, each import would stat the filesystem until it found the .py file, and then if it was already imported it would return. With this PR if the file is already imported then it will not do any stat'ing.

In all the above cases there is less stat'ing done by this PR, so it will be faster on these imports. All other imports do the same amount of stat'ing as before.

Note: because of that last point (all other imports do the same amount of stat'ing as before), and the fact that prior to this PR it was not possible to override a built-in module like pyb with a file on the filesystem, it means that prior to this PR if a pyb.py file existing it would be stat'd and found but never used. With this PR it will be stat'd and found and used instead of the built-in.

@andrewleech
Copy link
Contributor

andrewleech commented May 31, 2023

Doing import sys ... with this PR it will always go straight to the built-in.

prior to this PR a pyb.py file would never used. With this PR it will be stat'd and found and used instead of the built-in.

I must admit I'm not quite following the logic of this PR... are there some builtins that are special and can't be overridden from a python file, and others that can be overridden? How would I find which is which?

@dpgeorge
Copy link
Member

are there some builtins that are special and can't be overridden from a python file, and others that can be overridden?

Yes: sys and micropython are special and can never be overridden.

Also, importing a module starting with u-, it will search for built-ins first, and if that fails then look in the filesystem.

@jimmo jimmo force-pushed the import-subpackage-v3 branch 3 times, most recently from b1ce9bc to 9042525 Compare May 31, 2023 05:07
@dpgeorge
Copy link
Member

This has been updated so that no built-in modules can be overridden, which is the current behaviour on master (ie nothing has changed in this respect, except that sys is now considered a built-in name).

But now doing an import of a built-in won't stat anything on the filesystem (making built-in imports faster). This was the behaviour long ago, prior to the refactoring done in a7fa18c

@jimmo jimmo force-pushed the import-subpackage-v3 branch from 9042525 to 51b2a21 Compare May 31, 2023 06:08
@jimmo
Copy link
Member Author

jimmo commented May 31, 2023

Added an update the documentation section about built-in overrides.

@jimmo jimmo force-pushed the import-subpackage-v3 branch from 51b2a21 to 3d3f285 Compare June 1, 2023 04:13
jimmo added 9 commits June 1, 2023 16:03
If sys.path is enabled, but empty, this will now no longer search the
filesystem. Previously an empty sys.path was equivalent to having
`sys.path=[""]`. This is a breaking change, but this behavior now matches
CPython.

This also provides an alternative mechanism to the u-prefix to force an
import of a builtin module:

```
import sys
_path = sys.path[:]
sys.path.clear()
import foo  # Forces the built-in foo.
sys.path.extend(_path)
del _path
```

Code size diff is -32 bytes on PYBV11.

This work was funded through GitHub Sponsors.

Signed-off-by: Jim Mussared <jim.mussared@gmail.com>
This makes it so that sub-packages are resolved relative to their parent's
`__path__`, rather than re-resolving each parent's filesystem path.

The previous behavior was that `import foo.bar` would first re-search
`sys.path` for `foo`, then use the resulting path to find `bar`.

For already-loaded and u-prefixed modules, because we no longer need to
build the path from level to level, we no longer unnecessarily search
the filesystem. This should improve startup time.

Explicitly makes the resolving process clear:
 - Loaded modules are returned immediately without touching the filesystem.
 - Exact-match of builtins are also returned immediately.
 - Then the filesystem search happens.
 - If that fails, then the weak-link handling is applied.

This maintains the existing behavior: if a user writes `import time` they
will get time.py if it exits, otherwise the built-in utime. Whereas `import
utime` will always return the built-in.

This also fixes a regression from a7fa18c
where we search the filesystem for built-ins. It is now only possible to
override u-prefixed builtins. This will remove a lot of filesystem stats
at startup, as micropython-specific modules (e.g. `pyb`) will no longer
attempt to look at the filesystem.

Added several improvements to the comments and some minor renaming and
refactoring to make it clearer how the import mechanism works. Overall
code size diff is +56 bytes on STM32.

This work was funded through GitHub Sponsors.

Signed-off-by: Jim Mussared <jim.mussared@gmail.com>
To use this:
 - Create a built-in module, and add the module object as a member of the
   parent module's globals dict.
 - The submodule can set its `__name__` to either `QSTR_foo_dot_bar` or
   `QSTR_bar`. The former requires using qstrdefs(port).h to make the qstr.

Because `bar` is a member of `foo`'s globals, it is possible to write
`import foo` and then immediately use `foo.bar` without importing it
explicitly. This means that if `bar` has an `__init__`, it will not be
called in this situation, and for that reason, sub-modules should not have
`__init__` methods. If this is required, then all initalisation for
sub-modules should be done by the top-level module's (i.e. `foo`'s)
`__init__` method.

This work was funded through GitHub Sponsors.

Signed-off-by: Jim Mussared <jim.mussared@gmail.com>
This can lead to duplicate initialisations if a module can be imported
via multiple names, so the module must track this itself anyway.

This reduces code size (diff is -40 bytes), and avoids special treatment of
builtin-modules-with-init with respect to sys.modules. No other builtin
modules get put into sys.modules.

This work was funded through GitHub Sponsors.

Signed-off-by: Jim Mussared <jim.mussared@gmail.com>
This demonstrates how to add a sub-package in a user c module, as well
as how to define the necessary qstrs and enable the feature in the build.

This is used by the unix coverage build to test this feature.

This work was funded through GitHub Sponsors.

Signed-off-by: Jim Mussared <jim.mussared@gmail.com>
This verifies the behavior:
 - Exact matches of built-ins bypass filesystem.
 - u-prefix modules can be overridden from the filesystem.
 - Builtin import can be forced using either u-prefix or sys.path=[].

This work was funded through GitHub Sponsors.

Signed-off-by: Jim Mussared <jim.mussared@gmail.com>
When foo.bar is imported, bar is added as an attribute to foo. Previously
this happened on every import, but should only happen on first import.

This verifies the behavior for relative imports and overriding.

This work was funded through GitHub Sponsors.

Signed-off-by: Jim Mussared <jim.mussared@gmail.com>
If the module has a user-defined getattr, this could raise.

Signed-off-by: Jim Mussared <jim.mussared@gmail.com>
- Make the docs match the new behavior which only allows certain modules
  to be extended.
- List the modules that currently have the u-prefix.
- Add a note about the sys.path method for forcing a built-in import.

This work was funded through GitHub Sponsors.

Signed-off-by: Jim Mussared <jim.mussared@gmail.com>
@dpgeorge dpgeorge force-pushed the import-subpackage-v3 branch from 3d3f285 to 5159304 Compare June 1, 2023 06:23
@dpgeorge dpgeorge merged commit 5159304 into micropython:master Jun 1, 2023
@dpgeorge
Copy link
Member

dpgeorge commented Jun 1, 2023

Thanks for updating. Now merged!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
py-core Relates to py/ directory in source
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants