Skip to content

py/builtinimport.c: Allow built-in modules to be packages. #4731

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 1 commit into from

Conversation

jimmo
Copy link
Member

@jimmo jimmo commented Apr 30, 2019

The goal here is to make a built-in unumpy, and along with it unumpy.fft and unumpy.linalg.

Two main changes:

  • Make built-in module importing happen as part of the dotted path walking as for regular modules.
  • Rather than making the submodule an attribute on the package (which can't be done on a built-in module), find them in mp_loaded_modules_dict.

Note: to make this work, the built-in submodule sets its __name__ to (e.g.) QSTR_unumpy_dot_fft, with corresponding Q(unumpy.fft) in qstrdefs.

@jimmo
Copy link
Member Author

jimmo commented Apr 30, 2019

Note: I imagine that if unumpy were a native frozen module instead, then this change would not be necessary. Do you imagine that more "optional" built-in functionality would be moving to frozen modules anyway?

@pfalcon
Copy link
Contributor

pfalcon commented Apr 30, 2019

if unumpy were a native frozen module instead, then this change would not be necessary.

It's not necessary in either case. Do it like CPython does: have C modules _unumpy, _unumpy_fft, _unumpy_linalg. Then have .py re-exporting them in whatever hierarchy you need.

@dpgeorge
Copy link
Member

dpgeorge commented May 1, 2019

Note that there are quite a few test failures here, which would all need to be addressed.

Do you imagine that more "optional" built-in functionality would be moving to frozen modules anyway?

Not for the core stuff because there's a RAM and execution overhead with frozen modules (creating and populating the outer dicts, at least).

Then have .py re-exporting them in whatever hierarchy you need.

This is possible but has overhead (as mentioned above). And if the goal is efficiency then it's sensible to provide the functionality in C.

@pfalcon
Copy link
Contributor

pfalcon commented May 1, 2019

This is possible but has overhead (as mentioned above). And if the goal is efficiency then it's sensible to provide the functionality in C.

It's not sensible it implement more and more stuff in C, https://github.com/micropython/micropython/wiki/ContributorGuidelines has explicit warning against that. And for any feature implemented in C (== burdening the codebase and maintainers), there should be very good usecases. Example of one: to be able to compile arbitrary Python projects to native modules (which apparently would need to be done per-module), there should be support for packages in native modules. Sounds perfect, but support for native package imports would be the last and least important thing for that, first someone would need to present a compiler able to compile arbitrary Python code. But exactly as was mentioned, "native .mpy" is the closest thing en route to that aim, which supports those imports.

As for "efficiency", this patch clearly adds overheads itself. And as everything implemented in C, it burdens everyone (and nobody needed packaged native imports for years).

@stinos
Copy link
Contributor

stinos commented May 1, 2019

I'm also not sure the overhead of having C modules and .py package wrappers warrants this change? Isn't that a rather small overhead (in terms of performance I mean, or are we talking about some other overhead here?). And indeed this patch also introduces it's own overhead. And requiring a module to know it's own package name feels a bit too much 'the other way around' and not very flexible though on the other hand in practice this might never be an actual problem.

@jimmo jimmo force-pushed the builtin-module-package branch 4 times, most recently from cc1aa48 to 94fa8c6 Compare May 1, 2019 14:29
@jimmo
Copy link
Member Author

jimmo commented May 1, 2019

Sorry about the test failures. I forgot that run-tests doesn't build ports/unix/micropython and had recently switched branches. They should be addressed now (the one failure is the size check).
The code size change for ports/minimal is +24 bytes when I build locally (master vs this branch). (Note: It's +96 bytes according to the Travis CI results. I haven't investigated the discrepancy yet.) I'm pretty sure there's some room for improvement in this patch to get that down to zero.

So the .py approach works (though it does have overheads of its own), I'm unsure where to put the .py files?

For now I have it in extmod/modunumpy.c and gated by MICROPY_PY_UNUMPY. So I could add extmod/modules, and extend the build to add additional paths for mpy-cross. (It would also need to be aware of MICROPY_PY_UNUMPY so as to not include these modules unless necessary).

But more than likely, this unumpy shouldn't need to live in the main repo, in which case I could do this as an external C module (as per #3871) but I still have the same problem with making it find the .py files.

Finally, as I referenced above, the other option is to use dynamically loadable C code with the .mpy loaded from the filesystem (as per 81dee3e). I'm fairly sure I can make this work with the Fortran dependencies, but unless I'm mistaken, this isn't really an option for unumpy because all the native code from the .mpy file will be loaded into RAM?

@jimmo jimmo force-pushed the builtin-module-package branch from 94fa8c6 to aa26c70 Compare May 1, 2019 14:49
@pfalcon
Copy link
Contributor

pfalcon commented May 1, 2019

But more than likely, this unumpy shouldn't need to live in the main repo, in which case I could do this as an external C module (as per #3871)

+1

@dpgeorge
Copy link
Member

dpgeorge commented May 2, 2019

Isn't that a rather small overhead (in terms of performance I mean, or are we talking about some other overhead here?).

In terms of performance, it'd just be at import time, requiring an extra level of import, so not really much. But there is additional RAM usage holding the module dict of the .py wrapper module.

And requiring a module to know it's own package name feels a bit too much 'the other way around'

Yes that does look like it could be improved.

So the .py approach works (though it does have overheads of its own), I'm unsure where to put the .py files?

They could just live alongside the C code, and the easiest way to integrate them into a build is just for a port to include them in its list of frozen modules. Not ideal though because it's a manual process to include/exclude it depending if the C side of things exists or not.

// go there.
chop_component(this_name, &p);
// take off another level.
level++;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a neat (albeit minor) code-space optimisation, which could be taken independent of this PR.

// https://docs.python.org/3/reference/import.html
// "Specifically, any module that contains a __path__ attribute is considered a package."
mp_store_attr(module_obj, MP_QSTR___path__, mp_obj_new_str(vstr_str(&path), vstr_len(&path)));
size_t orig_path_len = path.len;
vstr_add_char(&path, PATH_SEP_CHAR);
vstr_add_str(&path, "__init__.py");
vstr_add_str(&path, PATH_SEP_CHAR_STR "__init__.py");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another good code-space optimisation which can be taken separately.

@jimmo
Copy link
Member Author

jimmo commented May 2, 2019

And requiring a module to know it's own package name feels a bit too much 'the other way around'

Yes that does look like it could be improved.

I'm unsure what you mean by "requiring a module to know it's own package name". Modules already knew their own name (via the __name__ global that all modules have). That doesn't change. Similarly, all modules were already always registered in the mp_loaded_modules_map. This commit actually makes non-builtin sub-modules no longer have to add an entry to the parent module's globals dict, so I think it overall uses less RAM?

@stinos
Copy link
Contributor

stinos commented May 2, 2019

I'm unsure what you mean by "requiring a module to know it's own package name".

As you say: the built-in submodule sets its __name__ to (e.g.) QSTR_unumpy_dot_fft.
So normally if you have a package with a directory structure like

unumpy/
unumpy/fft.py

then suppose you add unumpy to the Python search path, you can import fft. In which case you'd expect fft.__name__ to be fft. However now that would be to unumpy.fft (I think, not sure). Or suppose you're debugging things and make a copy of unumpy and call it unumpy2, then import unumpy2.fft would be a module with name unumpy.fft. More generally speaking, you don't want to hardcode how a modules should be used in the module itself.

The goal here is to make a built-in `unumpy`, and along with it `unumpy.fft` and `unumpy.linalg`.

Two main changes:
 - Make built-in module importing happen as part of the dotted path walking as for regular modules.
 - Rather than making the submodule an attribute on the package (which can't be done on a built-in module), find them in `mp_loaded_modules_dict`.

 Note: to make this work, the built-in submodule sets its `__name__` to (e.g.) QSTR_unumpy_dot_fft, with corresponding Q(unumpy.fft) in qstrdefs.

 Note: This is gated by a new option MICROPY_MODULE_BUILTIN_PACKAGES.
@jimmo jimmo force-pushed the builtin-module-package branch from aa26c70 to c77f22d Compare May 2, 2019 11:11
@jimmo
Copy link
Member Author

jimmo commented May 2, 2019

then suppose you add unumpy to the Python search path, you can import fft. In which case you'd expect fft.__name__ to be fft. However now that would be to unumpy.fft (I think, not sure). Or suppose you're debugging things and make a copy of unumpy and call it unumpy2, then import unumpy2.fft would be a module with name unumpy.fft. More generally speaking, you don't want to hardcode how a modules should be used in the module itself.

Thanks for the explanation. That situation won't be handled -- it will always need to be imported as unumpy.fft, regardless of sys.path. I can imagine people sometimes do crazy stuff like this (OK I might have done this in the past) but if that's the weirdest thing they notice about unumpy then I'll be very happy!

So I've pushed an update that hopefully is a reasonable compromise, depending on how you feel about config flags. With MICROPY_MODULE_BUILTIN_PACKAGES disabled, the code size is smaller on the three ports I tested:

   bare-arm:   -72 -0.109% 
minimal x86:  -168 -0.109% [incl +4(data)]
   unix x64:   -16 -0.003% 

With the flag enabled:

   bare-arm:    +0 +0.000% 
minimal x86:    +0 +0.000% 
   unix x64:  +160 +0.032%

(Note: the 160 bytes is roughly equivalent to the cost of the frozen .py code anyway. I forget the exact number and I imagine it will grow as more of numpy is implemented)

@v923z
Copy link
Contributor

v923z commented Feb 24, 2020

(Note: the 160 bytes is roughly equivalent to the cost of the frozen .py code anyway. I forget the exact number and I imagine it will grow as more of numpy is implemented)

@jimmo I think this should be resolved as it was proposed in the original PR. This issue is no longer academic: we are facing the exact same problem with ulab, and I find the extra python file for the proper definition of the imports a bit hackish. I would really like to have just a bunch of C files implementing the numpy functions, without python files, DIY with the make file, frozen modules etc.

We already implemented ulab in a modular fashion, v923z/micropython-ulab#41, and Jim's approach would be extremely useful, when customising (excluding sub-modules) the ulab module itself.

@jimmo
Copy link
Member Author

jimmo commented Nov 17, 2021

Closing in favour of #5701

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants