py/builtinimport.c: Allow built-in modules to be packages. #4731

jimmo · 2019-04-30T12:43:37Z

The goal here is to make a built-in unumpy, and along with it unumpy.fft and unumpy.linalg.

Two main changes:

Make built-in module importing happen as part of the dotted path walking as for regular modules.
Rather than making the submodule an attribute on the package (which can't be done on a built-in module), find them in mp_loaded_modules_dict.

Note: to make this work, the built-in submodule sets its __name__ to (e.g.) QSTR_unumpy_dot_fft, with corresponding Q(unumpy.fft) in qstrdefs.

jimmo · 2019-04-30T12:46:49Z

Note: I imagine that if unumpy were a native frozen module instead, then this change would not be necessary. Do you imagine that more "optional" built-in functionality would be moving to frozen modules anyway?

pfalcon · 2019-04-30T20:09:10Z

if unumpy were a native frozen module instead, then this change would not be necessary.

It's not necessary in either case. Do it like CPython does: have C modules _unumpy, _unumpy_fft, _unumpy_linalg. Then have .py re-exporting them in whatever hierarchy you need.

dpgeorge · 2019-05-01T05:58:34Z

Note that there are quite a few test failures here, which would all need to be addressed.

Do you imagine that more "optional" built-in functionality would be moving to frozen modules anyway?

Not for the core stuff because there's a RAM and execution overhead with frozen modules (creating and populating the outer dicts, at least).

Then have .py re-exporting them in whatever hierarchy you need.

This is possible but has overhead (as mentioned above). And if the goal is efficiency then it's sensible to provide the functionality in C.

pfalcon · 2019-05-01T09:13:04Z

This is possible but has overhead (as mentioned above). And if the goal is efficiency then it's sensible to provide the functionality in C.

It's not sensible it implement more and more stuff in C, https://github.com/micropython/micropython/wiki/ContributorGuidelines has explicit warning against that. And for any feature implemented in C (== burdening the codebase and maintainers), there should be very good usecases. Example of one: to be able to compile arbitrary Python projects to native modules (which apparently would need to be done per-module), there should be support for packages in native modules. Sounds perfect, but support for native package imports would be the last and least important thing for that, first someone would need to present a compiler able to compile arbitrary Python code. But exactly as was mentioned, "native .mpy" is the closest thing en route to that aim, which supports those imports.

As for "efficiency", this patch clearly adds overheads itself. And as everything implemented in C, it burdens everyone (and nobody needed packaged native imports for years).

stinos · 2019-05-01T10:07:27Z

I'm also not sure the overhead of having C modules and .py package wrappers warrants this change? Isn't that a rather small overhead (in terms of performance I mean, or are we talking about some other overhead here?). And indeed this patch also introduces it's own overhead. And requiring a module to know it's own package name feels a bit too much 'the other way around' and not very flexible though on the other hand in practice this might never be an actual problem.

jimmo · 2019-05-01T14:46:38Z

Sorry about the test failures. I forgot that run-tests doesn't build ports/unix/micropython and had recently switched branches. They should be addressed now (the one failure is the size check).
The code size change for ports/minimal is +24 bytes when I build locally (master vs this branch). (Note: It's +96 bytes according to the Travis CI results. I haven't investigated the discrepancy yet.) I'm pretty sure there's some room for improvement in this patch to get that down to zero.

So the .py approach works (though it does have overheads of its own), I'm unsure where to put the .py files?

For now I have it in extmod/modunumpy.c and gated by MICROPY_PY_UNUMPY. So I could add extmod/modules, and extend the build to add additional paths for mpy-cross. (It would also need to be aware of MICROPY_PY_UNUMPY so as to not include these modules unless necessary).

But more than likely, this unumpy shouldn't need to live in the main repo, in which case I could do this as an external C module (as per #3871) but I still have the same problem with making it find the .py files.

Finally, as I referenced above, the other option is to use dynamically loadable C code with the .mpy loaded from the filesystem (as per 81dee3e). I'm fairly sure I can make this work with the Fortran dependencies, but unless I'm mistaken, this isn't really an option for unumpy because all the native code from the .mpy file will be loaded into RAM?

pfalcon · 2019-05-01T18:08:26Z

But more than likely, this unumpy shouldn't need to live in the main repo, in which case I could do this as an external C module (as per #3871)

+1

dpgeorge · 2019-05-02T08:07:32Z

Isn't that a rather small overhead (in terms of performance I mean, or are we talking about some other overhead here?).

In terms of performance, it'd just be at import time, requiring an extra level of import, so not really much. But there is additional RAM usage holding the module dict of the .py wrapper module.

And requiring a module to know it's own package name feels a bit too much 'the other way around'

Yes that does look like it could be improved.

So the .py approach works (though it does have overheads of its own), I'm unsure where to put the .py files?

They could just live alongside the C code, and the easiest way to integrate them into a build is just for a port to include them in its list of frozen modules. Not ideal though because it's a manual process to include/exclude it depending if the C side of things exists or not.

dpgeorge · 2019-05-02T08:10:12Z

py/builtinimport.c

-            // go there.
-            chop_component(this_name, &p);
+            // take off another level.
+            level++;


This is a neat (albeit minor) code-space optimisation, which could be taken independent of this PR.

dpgeorge · 2019-05-02T08:10:36Z

py/builtinimport.c

                    // https://docs.python.org/3/reference/import.html
                    // "Specifically, any module that contains a __path__ attribute is considered a package."
                    mp_store_attr(module_obj, MP_QSTR___path__, mp_obj_new_str(vstr_str(&path), vstr_len(&path)));
                    size_t orig_path_len = path.len;
-                    vstr_add_char(&path, PATH_SEP_CHAR);
-                    vstr_add_str(&path, "__init__.py");
+                    vstr_add_str(&path, PATH_SEP_CHAR_STR "__init__.py");


Another good code-space optimisation which can be taken separately.

jimmo · 2019-05-02T08:38:41Z

And requiring a module to know it's own package name feels a bit too much 'the other way around'

Yes that does look like it could be improved.

I'm unsure what you mean by "requiring a module to know it's own package name". Modules already knew their own name (via the __name__ global that all modules have). That doesn't change. Similarly, all modules were already always registered in the mp_loaded_modules_map. This commit actually makes non-builtin sub-modules no longer have to add an entry to the parent module's globals dict, so I think it overall uses less RAM?

stinos · 2019-05-02T08:59:21Z

I'm unsure what you mean by "requiring a module to know it's own package name".

As you say: the built-in submodule sets its __name__ to (e.g.) QSTR_unumpy_dot_fft.
So normally if you have a package with a directory structure like

unumpy/
unumpy/fft.py

then suppose you add unumpy to the Python search path, you can import fft. In which case you'd expect fft.__name__ to be fft. However now that would be to unumpy.fft (I think, not sure). Or suppose you're debugging things and make a copy of unumpy and call it unumpy2, then import unumpy2.fft would be a module with name unumpy.fft. More generally speaking, you don't want to hardcode how a modules should be used in the module itself.

The goal here is to make a built-in `unumpy`, and along with it `unumpy.fft` and `unumpy.linalg`. Two main changes: - Make built-in module importing happen as part of the dotted path walking as for regular modules. - Rather than making the submodule an attribute on the package (which can't be done on a built-in module), find them in `mp_loaded_modules_dict`. Note: to make this work, the built-in submodule sets its `__name__` to (e.g.) QSTR_unumpy_dot_fft, with corresponding Q(unumpy.fft) in qstrdefs. Note: This is gated by a new option MICROPY_MODULE_BUILTIN_PACKAGES.

jimmo · 2019-05-02T11:22:18Z

then suppose you add unumpy to the Python search path, you can import fft. In which case you'd expect fft.__name__ to be fft. However now that would be to unumpy.fft (I think, not sure). Or suppose you're debugging things and make a copy of unumpy and call it unumpy2, then import unumpy2.fft would be a module with name unumpy.fft. More generally speaking, you don't want to hardcode how a modules should be used in the module itself.

Thanks for the explanation. That situation won't be handled -- it will always need to be imported as unumpy.fft, regardless of sys.path. I can imagine people sometimes do crazy stuff like this (OK I might have done this in the past) but if that's the weirdest thing they notice about unumpy then I'll be very happy!

So I've pushed an update that hopefully is a reasonable compromise, depending on how you feel about config flags. With MICROPY_MODULE_BUILTIN_PACKAGES disabled, the code size is smaller on the three ports I tested:

   bare-arm:   -72 -0.109% 
minimal x86:  -168 -0.109% [incl +4(data)]
   unix x64:   -16 -0.003%

With the flag enabled:

   bare-arm:    +0 +0.000% 
minimal x86:    +0 +0.000% 
   unix x64:  +160 +0.032%

(Note: the 160 bytes is roughly equivalent to the cost of the frozen .py code anyway. I forget the exact number and I imagine it will grow as more of numpy is implemented)

v923z · 2020-02-24T09:00:01Z

(Note: the 160 bytes is roughly equivalent to the cost of the frozen .py code anyway. I forget the exact number and I imagine it will grow as more of numpy is implemented)

@jimmo I think this should be resolved as it was proposed in the original PR. This issue is no longer academic: we are facing the exact same problem with ulab, and I find the extra python file for the proper definition of the imports a bit hackish. I would really like to have just a bunch of C files implementing the numpy functions, without python files, DIY with the make file, frozen modules etc.

We already implemented ulab in a modular fashion, v923z/micropython-ulab#41, and Jim's approach would be extremely useful, when customising (excluding sub-modules) the ulab module itself.

jimmo · 2021-11-17T03:12:51Z

Closing in favour of #5701

jimmo force-pushed the builtin-module-package branch 4 times, most recently from cc1aa48 to 94fa8c6 Compare May 1, 2019 14:29

jimmo force-pushed the builtin-module-package branch from 94fa8c6 to aa26c70 Compare May 1, 2019 14:49

dpgeorge reviewed May 2, 2019

View reviewed changes

jimmo force-pushed the builtin-module-package branch from aa26c70 to c77f22d Compare May 2, 2019 11:11

dpgeorge mentioned this pull request Feb 26, 2020

Support importing native modules in native packages. adafruit/circuitpython#2657

Merged

jimmo mentioned this pull request Feb 26, 2020

py/builtinimport.c: Allow built-in modules to be packages (v2) #5701

Closed

jimmo closed this Nov 17, 2021

jimmo mentioned this pull request May 10, 2023

py/builtinimport: Allow built-in modules to be packages (v3) #11456

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

py/builtinimport.c: Allow built-in modules to be packages. #4731

py/builtinimport.c: Allow built-in modules to be packages. #4731

jimmo commented Apr 30, 2019

jimmo commented Apr 30, 2019 •

edited

Loading

pfalcon commented Apr 30, 2019

dpgeorge commented May 1, 2019

pfalcon commented May 1, 2019

stinos commented May 1, 2019

jimmo commented May 1, 2019

pfalcon commented May 1, 2019

dpgeorge commented May 2, 2019

dpgeorge May 2, 2019

dpgeorge May 2, 2019

jimmo commented May 2, 2019 •

edited

Loading

stinos commented May 2, 2019

jimmo commented May 2, 2019 •

edited

Loading

v923z commented Feb 24, 2020

jimmo commented Nov 17, 2021

py/builtinimport.c: Allow built-in modules to be packages. #4731

py/builtinimport.c: Allow built-in modules to be packages. #4731

Conversation

jimmo commented Apr 30, 2019

jimmo commented Apr 30, 2019 • edited Loading

pfalcon commented Apr 30, 2019

dpgeorge commented May 1, 2019

pfalcon commented May 1, 2019

stinos commented May 1, 2019

jimmo commented May 1, 2019

pfalcon commented May 1, 2019

dpgeorge commented May 2, 2019

dpgeorge May 2, 2019

Choose a reason for hiding this comment

dpgeorge May 2, 2019

Choose a reason for hiding this comment

jimmo commented May 2, 2019 • edited Loading

stinos commented May 2, 2019

jimmo commented May 2, 2019 • edited Loading

v923z commented Feb 24, 2020

jimmo commented Nov 17, 2021

jimmo commented Apr 30, 2019 •

edited

Loading

jimmo commented May 2, 2019 •

edited

Loading

jimmo commented May 2, 2019 •

edited

Loading