-
-
Notifications
You must be signed in to change notification settings - Fork 8.2k
py/builtinimport: Allow built-in modules to be packages (v3) #11456
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Code size report:
|
We have been relying on the previous behavior for improved startup time. Would it be possible to have a special entry in |
I don't quite follow: what exactly is the previous behaviour and why does it have improved startup time? |
On LEGO MINDSTORMS EV3 (300MHz Arm + Linux + micropython-lib installed) there is a very noticeable difference between the time it takes to If we drop support for Dropping |
It is possible to maintain the previous behavior but keep the rest of the changes here by changing this PR to do if level_mod_name.startswith("u") and module_obj := mp_module_get_builtin(full_mod_name):
return module_obj immediately before the call to
I considered this, but while I was thinking about how to implement this I noticed the incorrect behavior of the sys.path walking (with respect to an empty sys.path). The main reason I'm hesitant to take this approach is that if the magic e.g. ".builtin" entry is missing then the behavior is very confusing, and this will break anyone's code that modifies sys.path. (Similar concern when we added .frozen, but the result isn't quite as weird).
I understand. However, if we go ahead with #9069 then this is likely going to have to happen anyway. That said, even if we do merge #9069 we'll still need to provide a mechanism for e.g. if level_mod_name.startswith("u") and module_obj := mp_module_get_builtin(full_mod_name[1:]):
return module_obj before the filesystem check to maintain |
I have added another commit to the branch that adds the "bypass filesystem for u-prefixed builtin modules" logic. It only costs about 20 bytes, so we're still ahead. Personally I would prefer not to do this, and just have a standard CPython-compatible way (e.g. sys.path manipulation) of explicitly choosing to use the built-in module, but in the interest of backwards compatibility it seems reasonable. Perhaps we could guard this behind a config option? Note that it's still not exactly the same behavior as before this PR, because non-u-prefixed modules will now search the filesystem first. e.g. it wasn't previously possible to extend, say, |
32a4260
to
d91abfb
Compare
The test failures will be fixed by #11468. This is due to when
Just to follow up on this (prompted by the above test issue), we may decide that (for performance reasons) we never want e.g. if (level_mod_name_str[0] == 'u' || level_mod_name == MP_QSTR_micropython) {
@dlech @dpgeorge FWIW, on the topic of performance concerns, the current behavior does actually do the full sys.path walk for all modules (loaded, builtin, filesystem). If you look at https://github.com/micropython/micropython/blob/master/py/builtinimport.c#L362 even if we successfully match So the change from this PR should be a no-op on performance, and if we include the extra commit then it's actually a performance gain relative to the current implementation for importing u-prefix builtins. (I introduced this regression in the last attempt to refactor the import mechanism... the previous implementation was very complicated and I wasn't aware that this was the expected behaviour). |
Actually this PR is a gain regardless for the case of importing any already-loaded module because we now just return immediately. (Something that the previous code didn't do, just incase the import was actually for a sub package). |
py/mpconfig.h
Outdated
@@ -850,6 +850,13 @@ typedef double mp_float_t; | |||
#define MICROPY_MODULE_BUILTIN_INIT (MICROPY_CONFIG_ROM_LEVEL_AT_LEAST_EXTRA_FEATURES) | |||
#endif | |||
|
|||
// Whether to allow built-in modules to have sub-packages (by making the | |||
// sub-package a member of their locals dict). The sub-package __name__ must | |||
// be foo.bar, and only top-level modules may have __init__. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does __init__
here refer to a module __init__()
method or an __init__.py
file?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The former (i.e. MICROPY_MODULE_BUILTIN_INIT
).
The reason is that the module is a member of its parents globals dict and can be accessed without being imported. i.e. if baz is a submodule:
import foo
foo.bar.baz()
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That is a bit unfortunate. We make occasional use of that and would have to come up with a workaround. Probably just have to init all modules at mp_init()
whether they are used or not.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You should be able to have a C-level __init__
on the top-level module (eg foo
, or in your case pybricks
), which initialises that module and all submodules of it. That would run when foo
is imported, or any submodule like foo.bar
.
Probably the comment above should be improved to point this out clearly, that importing any submodule will initialise its top-level module, ie import foo.bar
will call foo
's __init__
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated the comment.
3fe16a1
to
8b2e268
Compare
I've updated this with tests and an example user c module. This test might need to move to a different directory (or have special handling in run-tests.py). The behavior now compared v1.20:
|
431dcae
to
8c95285
Compare
extmod/modurandom.c
Outdated
static bool seeded = false; | ||
if (!seeded) { | ||
seeded = true; | ||
if (!MP_STATE_VM(mod_urandom_seeded)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Now this will be seeded after each soft reset, so the comment could probably use updating too.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes this is exactly the bug that this change is fixing. It seems like soft-reset should cause a re-seed, so this is now the expected behavior.
Is there a comment somewhere else that says the wrong thing (w.r.t. soft reset) or are you saying I should just add a note here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The line above says // that it's only ever seeded once.
It's not wrong, just ambiguous.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah yep, thanks. Fixed!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This change looks good, but shouldn't it be in its own commit? To me it doesn't look particularly related to 3cf95f6
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes it could go in a separate commit.
But I don't see how it will re-seed on soft reset, because soft reset doesn't clear out the root pointers (also it's an int so shouldn't really go in the root pointer section, that will slow down the GC by unnecessarily tracing it).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@andrewleech It's because the logic that used to prevent duplicate initialization is no longer available. There used to be this kind of weird feature were specifically built-in modules that had an __init__
get stored in sys.modules
(but regular built-ins do not).
That said, modrandom was already working around another quirk of that old behavior (urandom
vs random
), so I figured that showing a "correct" way that a module-with-init can address this was related to the same change.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I still think the state should not be stored in the root pointer section (see above), and so this change to modurandom.c
could be entirely dropped.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK I've removed this.
Will re-visit in the future with an MP_REGISTER_STATE()
macro.
} | ||
STATIC MP_DEFINE_CONST_FUN_OBJ_0(example_package_foo_bar_f_obj, example_package_foo_bar_f); | ||
|
||
// Define all properties of the second-level sub-package (example_package.foo.bar). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
// Define all properties of the second-level sub-package (example_package.foo.bar). | |
// Define all attributes of the second-level sub-package (example_package.foo.bar). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This wording is copied from examples/usercmodule/cexample/examplemodule.c
. I agree though, "attributes" is more correct. I'll update both.
} | ||
STATIC MP_DEFINE_CONST_FUN_OBJ_0(example_package_foo_f_obj, example_package_foo_f); | ||
|
||
// Define all properties of the first-level sub-package (example_package.foo). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Probably worth mentioning here that __init__
doesn't work and all init for subpackages/modules needs to be done in the top-level package.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe not a well though out idea, but maybe a way to avoid this limitation is to add an "ensure initialized" check to the module_attr()
function, e.g. if elem->value
is a module instance, call __init__
on it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK will do.
Yeah that could work! This would mean that __init__
would need to be looked up and called every time the sub-module was accessed though which will come at a performance cost. (Also the type check isn't free, and would happen on all module attribute access).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
More brainstorming ideas: instead of adding submodules to the globals dict of a module, could we introduce a new mp_type_package
type that is the same a mp_type_module
but contains an extra pointer to a table of submodles? Or instead of creating a new type, use a dunder like __package_dict__
in the globals table so that the submodules can't be accessed directly?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
introduce a new mp_type_package type that is the same a mp_type_module
We were just talking about doing exactly this in the context of #11466... by introducing a module type "module with delegation" we can solve the weirdness around the delegation entry. This does lead to a slightly weird type(sys) != type(os)
for example. But there are more direct ways to solve that particular issue (e.g. wrap the pointer in a dummy object).
@dpgeorge For module delegation did we consider having a ROM table of (pointer-to-module-obj, delegation function) that could be searched? (Could use a variant of MP_REGISTER_MODULE
, i.e. MP_REGISTER_MODULE_WITH_DELEGATION
). We could do something similar for subpackages I guess.
use a dunder like package_dict in the globals table so that the submodules can't be accessed directly?
I guess what you're saying is that module_attr would know about package_dict. That could work! Has a RAM cost of course, so that should be considered.
I guess the question is -- is this a theoretical or a practical issue for pybricks? i.e. is it really important to delay initialisation of submodules until they're explicitly imported? Can you solve this in other ways, e.g. do on-demand initialisation when methods of the submodule are used? That's extra code size of course to call check_init()
at the start of every function in the submodule (and make_new on each type).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is this a theoretical or a practical issue for pybricks?
No, not particularly. I'm just throwing these out as brainstorming ideas since there isn't an obvious ideal solution. There are certainly plenty of ways to get what we need regardless of what lands upstream here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's leave this discussion/issue regarding more sophisticated __init__
of submodules for a later time. It's a change/improvement that can be made independently.
For module delegation did we consider having a ROM table of (pointer-to-module-obj, delegation function) that could be searched?
No I didn't consider that. If there were a lot of modules in that table it would be slow to search (but might be a good solution if only a few modules needed delegation).
STATIC MP_DEFINE_CONST_FUN_OBJ_0(example_package_f_obj, example_package_f); | ||
|
||
STATIC mp_obj_t example_package___init__(void) { | ||
mp_printf(&mp_plat_print, "example_package.__init__\n"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the only-run-__init__
-once must be here now, it would be helpful to include that in the example.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep, good point! Added. (And also updated the comment in mpconfig.h).
8c95285
to
951539e
Compare
A note on the force-builtin-via-sys.path from discussion with @dpgeorge ... Currently sys.path cannot be assigned to, however since the module delegation support was added (and conveniently, already in use on _path = sys.path
sys.path = ()
try:
import time
finally:
sys.path = _path
del _path
Note that usages of |
Shouldn't it be... _path = sys.path
sys.path = ()
try:
import time
finally:
sys.path = _path
del _path Or are we assuming that importing |
Yes I think you're right. It shouldn't happen but if it does, it would be bad to leave sys.path empty. I'll update the examples above. Thanks |
py/builtinimport.c
Outdated
#if MICROPY_PY_SYS | ||
// Never allow sys to be overridden from the filesystem. If weak links | ||
// are disabled, then this also provides a default weak link so that | ||
// `import sys` works. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note that CPython definitely handles sys
differently because it doesn't have sys.__file__
.
951539e
to
7efda84
Compare
Is this already in one of the commits? If so please mention that explicitly, we happen to implement 'is_module' exactly by comparing with
That would break https://github.com/micropython/micropython-lib/pull/582/files right? Any way around that which doesn't involve changes to the micropython codebase itself? |
No, we won't do that, it was just an idea. As you point out, it would be strange if different modules have different types.
Yes this current PR will break that micropython-lib PR, because Note that CPython also treats IMO Well, it doesn't have to be this way. We could continue to provide a way for |
We could allow storing new attributes to |
@stinos no sorry that wasn't clear -- I was saying that if we did do it this way (either subpackage support or module delegation) it would lead to this surprising behaviour. We don't plan to do this (for either).
It would. However I would argue perhaps that we should just add |
I don't mind
Perhaps, but seems like I'm the only one who encoutered it yet.. |
This PR also fixes #11594. Added a test to verify this. |
tests/import/builtin_ext.py
Outdated
print(sys, hasattr(sys, "__file__")) | ||
|
||
sys.path.clear() | ||
sys.path.append("import/ext") # TODO: Update after #11468 to "ext" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
#11468 has been merged
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed & rebased.
2963f49
to
662e4f4
Compare
Codecov Report
@@ Coverage Diff @@
## master #11456 +/- ##
=======================================
Coverage 98.39% 98.40%
=======================================
Files 156 156
Lines 20615 20609 -6
=======================================
- Hits 20285 20281 -4
+ Misses 330 328 -2
|
Doing my own testing of this PR, here are the differences in the stat'ing (checking the filesystem for a file/dir) that is done when doing imports:
In all the above cases there is less stat'ing done by this PR, so it will be faster on these imports. All other imports do the same amount of stat'ing as before. Note: because of that last point (all other imports do the same amount of stat'ing as before), and the fact that prior to this PR it was not possible to override a built-in module like |
I must admit I'm not quite following the logic of this PR... are there some builtins that are special and can't be overridden from a python file, and others that can be overridden? How would I find which is which? |
Yes: Also, importing a module starting with u-, it will search for built-ins first, and if that fails then look in the filesystem. |
b1ce9bc
to
9042525
Compare
This has been updated so that no built-in modules can be overridden, which is the current behaviour on master (ie nothing has changed in this respect, except that But now doing an import of a built-in won't stat anything on the filesystem (making built-in imports faster). This was the behaviour long ago, prior to the refactoring done in a7fa18c |
9042525
to
51b2a21
Compare
Added an update the documentation section about built-in overrides. |
51b2a21
to
3d3f285
Compare
If sys.path is enabled, but empty, this will now no longer search the filesystem. Previously an empty sys.path was equivalent to having `sys.path=[""]`. This is a breaking change, but this behavior now matches CPython. This also provides an alternative mechanism to the u-prefix to force an import of a builtin module: ``` import sys _path = sys.path[:] sys.path.clear() import foo # Forces the built-in foo. sys.path.extend(_path) del _path ``` Code size diff is -32 bytes on PYBV11. This work was funded through GitHub Sponsors. Signed-off-by: Jim Mussared <jim.mussared@gmail.com>
This makes it so that sub-packages are resolved relative to their parent's `__path__`, rather than re-resolving each parent's filesystem path. The previous behavior was that `import foo.bar` would first re-search `sys.path` for `foo`, then use the resulting path to find `bar`. For already-loaded and u-prefixed modules, because we no longer need to build the path from level to level, we no longer unnecessarily search the filesystem. This should improve startup time. Explicitly makes the resolving process clear: - Loaded modules are returned immediately without touching the filesystem. - Exact-match of builtins are also returned immediately. - Then the filesystem search happens. - If that fails, then the weak-link handling is applied. This maintains the existing behavior: if a user writes `import time` they will get time.py if it exits, otherwise the built-in utime. Whereas `import utime` will always return the built-in. This also fixes a regression from a7fa18c where we search the filesystem for built-ins. It is now only possible to override u-prefixed builtins. This will remove a lot of filesystem stats at startup, as micropython-specific modules (e.g. `pyb`) will no longer attempt to look at the filesystem. Added several improvements to the comments and some minor renaming and refactoring to make it clearer how the import mechanism works. Overall code size diff is +56 bytes on STM32. This work was funded through GitHub Sponsors. Signed-off-by: Jim Mussared <jim.mussared@gmail.com>
To use this: - Create a built-in module, and add the module object as a member of the parent module's globals dict. - The submodule can set its `__name__` to either `QSTR_foo_dot_bar` or `QSTR_bar`. The former requires using qstrdefs(port).h to make the qstr. Because `bar` is a member of `foo`'s globals, it is possible to write `import foo` and then immediately use `foo.bar` without importing it explicitly. This means that if `bar` has an `__init__`, it will not be called in this situation, and for that reason, sub-modules should not have `__init__` methods. If this is required, then all initalisation for sub-modules should be done by the top-level module's (i.e. `foo`'s) `__init__` method. This work was funded through GitHub Sponsors. Signed-off-by: Jim Mussared <jim.mussared@gmail.com>
This can lead to duplicate initialisations if a module can be imported via multiple names, so the module must track this itself anyway. This reduces code size (diff is -40 bytes), and avoids special treatment of builtin-modules-with-init with respect to sys.modules. No other builtin modules get put into sys.modules. This work was funded through GitHub Sponsors. Signed-off-by: Jim Mussared <jim.mussared@gmail.com>
This demonstrates how to add a sub-package in a user c module, as well as how to define the necessary qstrs and enable the feature in the build. This is used by the unix coverage build to test this feature. This work was funded through GitHub Sponsors. Signed-off-by: Jim Mussared <jim.mussared@gmail.com>
This verifies the behavior: - Exact matches of built-ins bypass filesystem. - u-prefix modules can be overridden from the filesystem. - Builtin import can be forced using either u-prefix or sys.path=[]. This work was funded through GitHub Sponsors. Signed-off-by: Jim Mussared <jim.mussared@gmail.com>
When foo.bar is imported, bar is added as an attribute to foo. Previously this happened on every import, but should only happen on first import. This verifies the behavior for relative imports and overriding. This work was funded through GitHub Sponsors. Signed-off-by: Jim Mussared <jim.mussared@gmail.com>
If the module has a user-defined getattr, this could raise. Signed-off-by: Jim Mussared <jim.mussared@gmail.com>
- Make the docs match the new behavior which only allows certain modules to be extended. - List the modules that currently have the u-prefix. - Add a note about the sys.path method for forcing a built-in import. This work was funded through GitHub Sponsors. Signed-off-by: Jim Mussared <jim.mussared@gmail.com>
3d3f285
to
5159304
Compare
Thanks for updating. Now merged! |
See #5701 (v2), #4731 (v1), and adafruit#2657 (inspiration for v2 & v3).
There are two parts. See the commit messages for more details.
One area from the first commit is worth digging into... We've discussed many times the future of the feature-formerly-known-as-weak-links. I still strongly feel that we should remove weak links and u-prefixed modules. See recent discussion in #9018 (and PR in #9069). One of the sticking points was that u-modules provide a mechanism to extend built-in modules. I proposed a few alternative options in #9018, including a generic mechanism that also allows extending filesystem modules, however this PR gives a new option thanks to fixing the CPython incompatibility with sys.path. If you do, e.g. in
os.py
then this will force the import to be from the built-in.
Edit: See #11456 (comment) for a summary of changes of behavior relative to v1.20.