Skip to content

py/modsys: Add sys.builtin_module_names #4468

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 14 commits into from

Conversation

Jongy
Copy link
Contributor

@Jongy Jongy commented Feb 5, 2019

This has already been mentioned around, as we can see in:

https://forum.micropython.org/viewtopic.php?f=2&t=3298&sid=cbee627a8772945c10fc91b416489ec1#p23035

#1354

with funny tricks due to the lack of this feature: https://forum.micropython.org/viewtopic.php?f=2&t=3298&start=10#p23087

I wouldn't say the current solution is perfect, but that's the best I could hack and it's good enough IMHO.

@@ -98,6 +98,7 @@
#define MICROPY_PY_SYS_MAXSIZE (1)
#define MICROPY_PY_SYS_STDFILES (1)
#define MICROPY_PY_SYS_EXC_INFO (1)
#define MICROPY_PY_SYS_BUILTIN_MODULE_NAMES (1)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should be 0, i.e. shouldn't be there.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmm, okay. will remove.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead should be in mpconfigport_coverage.h for tests. There's also #4309 for "let's enable everything and more!".

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

mpconfigport_coverage.h is what's used in the CI later on for tests/run-tests?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

mpconfigport_coverage.h's primary purpose is to produce "coverage: 98%" figure you can see at https://github.com/micropython/micropython (in README). Yes, it does that by running all the usual tests, plus some artificial, to exercise code paths which can't be reached otherwise. So in all, mpconfigport_coverage.h must enable as much as possible (ideally, all) features implemented.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

got it, will update

@Jongy Jongy force-pushed the sys-builtin-module-names branch 3 times, most recently from 989eb0e to 266c3b8 Compare February 5, 2019 22:37
@dpgeorge
Copy link
Member

dpgeorge commented Feb 7, 2019

Thanks for the contribution, but I don't think this is the right approach. Because it statically allocates quite a large array in RAM for the tuple to hold all the entries. If possible it should be put in ROM instead, or allocated from RAM only if the user needs this info.

Some ideas how to do it:

  1. create the tuple on the fly only when requested, by adding a special case in the load section of objmodule.c:module_attr, like if (self == &mp_module_sys && attr == MP_QSTR_builtin_module_names) ...
  2. create a ROM version of the tuple with the names prepopulated at compile time, by separating out the built-in modules list to a header, and including that once for mp_builtin_module_table and once for the builtin_module_names tuple
  3. split mp_builtin_module_table into 2 parts: the keys and the values as separate tuples; the keys become mp_builtin_module_table as well as a way to search for a module to import (instead of using dict lookup would just need to search the key tuple); this would also require a new header to automatically separate the key from the value

@Jongy
Copy link
Contributor Author

Jongy commented Feb 7, 2019

Thanks for your comment!

I get you. I admit I didn't have the RAM/ROM compromise in mind.

Here's what I think:

  1. I find that particular solution (modification of objmodule.c) quite patchy. Generally, lazy initializers would be nice, but that's still avoiding the problem (because the tuple is indeed a constant)
  2. This is better, but this still clones the list of pointers to the name qstrs.
  3. This is the best solution IMO, especially now that I've descended through mp_module_get to realize that a mere mp_obj_tuple_get followed by a for loop would do just fine.

The only set-aside I see for both options 2,3 is that I'll have to deal around with splitting the builtin modules map to 2 lists, and that involves either modifying all ports, or creating a script that'll read table after preprocessing and will split out the module keys (which is more patchy IMO, and it will anyway end up in cloning the list of pointers of name qstrs).

So, I'll give the "splitting the builtin modules map" a shot and see if it's possible without creating too much unnecessary diff.

Meanwhile, do you have any better idea on implementing this?

@Jongy
Copy link
Contributor Author

Jongy commented Feb 7, 2019

So, I'll give the "splitting the builtin modules map" a shot and see if it's possible without creating too
much unnecessary diff.

What I have in mind : Remove MICROPY_PORT_BUILTIN_MODULES and instead require modules to define a list of names, in such a way:

#define MICROPY_PORT_BUILTIN_MODULES \
       MP_BUILTIN_MODULE(usocket), \
       MP_BUILTIN_MODULE(machine) \

then I'll have 2 lists, for the keys I'll have a #define MP_BUILTIN_MODULE(name) MP_ROM_QSTR(MP_QSTR_ ##name) and for the module I'll have MP_ROM_PTR(&mp_module_##name).

Problem is, many modules have inconsistent naming (even in objmodule.c itself! e.g uio has mp_module_io...)

I'm up for this change, if you'd go with it (Fixing all names to be consistent, then using that pair of macros I've mentioned)

@dpgeorge
Copy link
Member

dpgeorge commented Feb 8, 2019

I find that particular solution (modification of objmodule.c) quite patchy. Generally, lazy initializers would be nice, but that's still avoiding the problem (because the tuple is indeed a constant)

I don't think it's that bad. Only if you want the tuple does it use any RAM (and it can then be reclaimed). Also, it'd be possible to add a generic hook for any module to define its own special attributes via a callback that is part of the module's data structure. That would be less of a hack.

... to realize that a mere mp_obj_tuple_get followed by a for loop would do just fine.

It might even be possible to sort the tuple so that binary search can be used.

Problem is, many modules have inconsistent naming

And some even have things like mapping usocket to mp_module_lwip, which is kind of difficult to solve.

@Jongy
Copy link
Contributor Author

Jongy commented Feb 8, 2019

I don't think it's that bad. Only if you want the tuple does it use any RAM (and it can then be reclaimed). Also, it'd be possible to add a generic hook for any module to define its own special attributes via a callback that is part of the module's data structure. That would be less of a hack.

Yes, that sounds like something we'd like. But still, the tuple is a constant and I wanna try doing it "the right way".

Problem is, many modules have inconsistent naming

It's a hassle, but beginning to change this is the right thing to do. I found it easier to get started with MicroPython development because of the strict naming conventions (e.g, module.attr would be mp_module_attr_obj, and so on...). I believe we should put the right effort into keeping it this way. Module names (and module object names, mp_module_*) are one of the things that are not consistent enough IMO (yet... :)

The work we can do here goes in line with stuff like #4449.

And some even have things like mapping usocket to mp_module_lwip, which is kind of difficult to solve.

I'm still working on the patch, but I can already assure you I have nice solution for such cases. I'll test it around on unix and esp32, then post it here.

It might even be possible to sort the tuple so that binary search can be used.

I think it really doesn't matter that much. The "search through the builtin modules" code doesn't get many opportunities to run, and the array is quite short anyway.

@pfalcon
Copy link
Contributor

pfalcon commented Feb 8, 2019

"unix: Normalize module object names." says that it's change for unix port, but contains changes beyond it.

@Jongy
Copy link
Contributor Author

Jongy commented Feb 8, 2019

Okay, current patch has been tested on unix/x64 and esp32.

@dpgeorge What do you think of this change? It's gonna be big if we do it for all ports, but as mentioned in the commit message of py/objmodule: Split ... and as I said in previous comments here, I believe it's the right thing for the project.

@Jongy
Copy link
Contributor Author

Jongy commented Feb 8, 2019

"unix: Normalize module object names." says that it's change for unix port, but contains changes beyond it.

You're right, basically that commit was "whatever changes it takes so all unix modules are normalized".

Once we decide where this patch is going and do all remaining work, I can re-split them / squash all module normalization into one commit.

@Jongy
Copy link
Contributor Author

Jongy commented Feb 8, 2019

I guess we'll just squash all Normalize module object names commits when done.

Anyway, I'll be waiting for your opinions before continuing.

@pfalcon
Copy link
Contributor

pfalcon commented Feb 8, 2019

So, cleaning up module structure naming seems a good idea.

But as a whole, this PR reminds discussion in #4364 . Like, there's an issue, and anything goes to "fix" it. But why? Suppose there's sys.builtin_module_names, what happens next? Nothing special. Like, it's nice curiosity to know which modules are "builtin". That's why for curiosity-oriented ports, there's a help() command which provides that info. And it ends there, it's absolutely useless for any software, because no software should be concerned whether a module is "builtin" or not. Perhaps, one can make up far-fetched usecase for package managers, but they will be just that - far-fetched.

So, it's totally unclear why spend all this effort on this negligible feature. There's idea to cleanup structure naming? Sounds good. There's idea to implement add optional support sys.builtin_module_names? Do that in the simplest way with least changes to anything else. There's much more useful work to do. For example, allowing override sys.stdout and friends is long in todo, and by now it's clear that it should be possible to add/override anything in sys, just as already in builtins.

@Jongy
Copy link
Contributor Author

Jongy commented Feb 8, 2019

I see the things a bit different.

First, about #4364. The reason I looked into it is that I want to improve ussl/mbedtls and if the current ssl tests are broken then I'd say that's the place to start. Now, the ussl.backend thing was perhaps an overkill but I still think we'd need a compatible way to run ussl tests for both impls, and that requirement is not an overkill.

Like, it's nice curiosity to know which modules are "builtin".

I found it quite helpful when first meddling around with MicroPython, and when switching to a different port (e.g when I began using an ESP32). More than "sheer curiosity".

That's why for curiosity-oriented ports, there's a help() command which provides that info.

The sys.builtin_module_names solution is much cheaper and so I thought it could be enabled more widely, since help('modules') is not so wide.

And it ends there, it's absolutely useless for any software, because no software should be concerned whether a module is "builtin" or not.

You're right about "absolutely useless for any software", but on the contrary I'd say that MicroPython focuses on development ease and not on "what's directly useful for software" and I find it very helpful to have the list of available modules ready at hand whenever I'm looking on how to implement something new.

Balance must be kept between adding small patches and cleaning up project areas according to the project conventions/roadmap/..., otherwise the project will end up too heavy and too complex to be handled by multiple developers. So, when an opportunity arises to make a small change and "on the way" fix a few areas in a way that aligns with what's expected from MicroPython, I think we should pick it, instead of picking

Do that in the simplest way with least changes to anything else.

^^ that option.

The only question I have is whether this change is considered too big / too breaking (e.g because of the requirement to modify all ports in a non backwards compatible way). If we're not afraid of this, then I don't see a reason to avoid a diff we believe takes the project ahead in the right direction (talking about the module normalization and the MP_BUILTIN_MODULE thing).

But all the above is right as long as you guys agree with me :)

@pfalcon
Copy link
Contributor

pfalcon commented Feb 8, 2019

I see the things a bit different.

Good. There should be different opinions, or it'd become too boring a place. It's still responsibility of the interested parties to survey as different views as possible.

First, about #4364. The reason I looked into it is that I want to improve ussl/mbedtls and if the current ssl tests are broken then I'd say that's the place to start.

Absolutely. And I submitted it because I do improvements on both modules, and that test is a culprit. As I'm more skeptical/realistic, I submitted that ticket as a prelude for arguing to pluck that test.

I still think we'd need a compatible way to run ussl tests for both impls, and that requirement is not an overkill.

Well, we need a real test for TLS functionality, which would run on any modussl implementation, as I outlined there.

on the contrary I'd say that MicroPython focuses on development ease and not on "what's directly useful for software"

Really? That's first time I hear about that ;-). You perhaps mix up particular ports intended for "harsh" environments like no-OS, where some conveniences are indeed provided. But e.g. unix port was always intended to stay barebone. And as some conveniences are indeed useful sometimes, #4309 was proposed to address that.

The only question I have is whether this change is considered too big / too breaking

Definitely, any change beyond 5 lines is big, and complexity grows exponentially with each line beyond that.

(e.g because of the requirement to modify all ports in a non backwards compatible way).

Changes in backward incompatible way are generally not allowed, or at least there should be very good reasons and consideration.

to avoid a diff we believe takes the project ahead in the right direction

Unfortunately, everyone has own idea of a right direction...

(talking about the module normalization and the MP_BUILTIN_MODULE thing).

Fixing inconsistencies is a good thing. Making breaking changes is a no go, especially if there're concerns that they come from not complete understanding how things are set up, work, and can keep working in the future (see e.g. #4449).

Between these two, whether to apply pretty massive changes like these is up to you and @dpgeorge, but these changes should adhere to the above, e.g. not rely on oversimplifications like "Python-level name of a module matches C-level module structure definition."

@pfalcon
Copy link
Contributor

pfalcon commented Feb 9, 2019

Anyway, @Jongy, didn't want to break fun for you, letting you arrive at that yourself, but you're looking to replace

-{ MP_ROM_QSTR(MP_QSTR_umachine), MP_ROM_PTR(&mp_module_machine) }, \
+     MP_BUILTIN_MODULE(umachine, mp_module_machine) \

That shouldn't be a reason to not make naming clean up (so here it'll be MP_BUILTIN_MODULE(umachine, mp_module_umachine)), the point that in a general case, there's no 1-to-1 correspondence.

@Jongy
Copy link
Contributor Author

Jongy commented Feb 9, 2019

Really? That's first time I hear about that ;-). You perhaps mix up particular ports intended for "harsh" environments like no-OS, where some conveniences are indeed provided. But e.g. unix port was always intended to stay barebone. And as some conveniences are indeed useful sometimes, #4309 was proposed to address that.

Haha look, if I'm wiriting code to my ESP32 using MicroPython and not direct C, it's safe to say I value development ease and not code size / performance. So if sys.builtin_module_names eases my work, of course I'd pay those extra 200 bytes (given that my application is already 1MB+).

Your idea in #4309 is awesome but if my code is heavily based on port-specific modules, it's less of a help... :(

Making breaking changes is a no go, especially if there're concerns that they come from not complete understanding how things are set up, work, and can keep working in the future (see e.g. #4449).

If that's the concern then rest assured - I know these changes require deeper understanding, and as #4468 proves, I still lack some of it. I will investigate further and make sure I didn't miss anything with this change.

Anyway we'll see what @dpgeorge thinks; After all, it was partially his idea, as seen above:

3. split mp_builtin_module_table into 2 parts ...

If we don't like it, I'll go with the lazy attribute solution.

@Jongy
Copy link
Contributor Author

Jongy commented Feb 18, 2019

@dpgeorge Ping :)

@Jongy Jongy force-pushed the sys-builtin-module-names branch from 11a4f60 to a6ba2db Compare February 28, 2019 23:15
@Jongy
Copy link
Contributor Author

Jongy commented Feb 28, 2019

Okay, @pfalcon encouraged me to go on with this. Seems complete now.

A few statistics on sizes: (old vs new)

esp32 (application.elf): 1100136 vs 1100160
unix: 492446 vs 492438
nrf (build-pca10040/firmware.elf): 143796 vs 143816
stm32 (build-PYBV10/firmware.elf): 359564 vs 359564

@Jongy Jongy force-pushed the sys-builtin-module-names branch from a6ba2db to f56a7ae Compare February 28, 2019 23:19
@Jongy Jongy force-pushed the sys-builtin-module-names branch 3 times, most recently from c9c30d0 to 4755cf9 Compare March 3, 2019 22:02
@Jongy
Copy link
Contributor Author

Jongy commented Mar 3, 2019

@pfalcon I think it's ready, you can take a look.

The trigger for this is the work I've been doing on `sys.builtin_module_names`,
but I think it's a nice change nonetheless:
1. It's easier to maintain relations between module names and modules.
2. It will help to enforce consistency between module names and `mp_module_*` objects.
@Jongy Jongy force-pushed the sys-builtin-module-names branch from 4755cf9 to f14514a Compare March 31, 2019 21:33
@Jongy
Copy link
Contributor Author

Jongy commented Mar 31, 2019

Rebased to resolve conflicts; Now it's updated past the MP_REGISTER_MODULE thing done at cf22f47.

@Jongy Jongy force-pushed the sys-builtin-module-names branch from f14514a to 9709075 Compare March 31, 2019 21:46
@Jongy
Copy link
Contributor Author

Jongy commented Mar 31, 2019

@pfalcon Ping :)

tannewt pushed a commit to tannewt/circuitpython that referenced this pull request Mar 31, 2021
@dpgeorge dpgeorge added the py-core Relates to py/ directory in source label Jun 22, 2021
@dpgeorge
Copy link
Member

dpgeorge commented May 6, 2022

I'm going to close this PR.

@Jongy thanks a lot for your effort, and sorry the conversation went dead. I think the reason this went silent is because it is a large change for a minor feature addition, and getting a large, repo-wide change through is not easy.

Moving forward, the ability to override attributes in the sys module is now supported see bc18155, so that could make supporting builtin_module_names easier. Also we plan to consolidate all module declarations with MP_REGISTER_MODULE, see #8566. That will effectively replace the bulk of this PR and make it much simpler to add sys.builtin_module_names.

So let's revisit this once #8566 is merged.

@dpgeorge dpgeorge closed this May 6, 2022
@Jongy
Copy link
Contributor Author

Jongy commented May 7, 2022

Makes sense @dpgeorge . This was indeed a too-large change but yes, after #8566 it will be much easier to apply should anyone require this feature. Thanks for the notice.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
py-core Relates to py/ directory in source
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants