RFC: Allow registration of modules at their definition. #4289

andrewleech · 2018-11-03T07:05:44Z

The current method of adding a new builtin/global modules is to have the module object defined in the c file for said module, but the python module name and matching #define enable guard defined in the central objmodule.c file.

This PR provides a mechanism for moving this name and guard definition to the actual module file, removing the need to edit the objmodule.c file whenever you want to add a new module.

There is sort of fake macro provided MP_REGISTER_MODULE(name, obj, enabled) for use where modules are being defined.

These are parsed by the included python script during make (not the c the preprocessor) to generate the header of required entries in the builtins modules list in objmodule.c

This was originally written to replace my json based method of declaring modules in #4195 but I think it could have wider appeal across the micropython codebase.

If you're interested in this system, I'm happy to replace the other existing module definitions with the "macro", similar to the array one I've used as reference. I'll also update my cmodule branch to suit.

I'd also like to see a similar system used for module init (early and late) and de-init functions, so that modules can themselves declare the functions needed for their set up or tear down in main.c rather than having to maintain such a list of functions manually in all the separate ports main files. There could be some added complexity there with regard to dependencies and ordering, but that's a separate issue to this PR.

pfalcon · 2018-11-03T08:26:48Z

py/obj.h

@@ -816,6 +816,9 @@ static inline mp_obj_dict_t *mp_obj_module_get_globals(mp_obj_t module) {
 // check if given module object is a package
 bool mp_obj_is_package(mp_obj_t module);

+// declare a module as a builtin, handled by makemoduledefs.py
+#define MP_REGISTER_MODULE(name, ref, guard)


I'd try to look for better self-describing name than "guard". Likewise, for other params. And maybe even then comment should describe what are they.

pfalcon · 2018-11-03T08:27:22Z

py/objmodule.c

@@ -226,6 +225,11 @@ STATIC const mp_rom_map_elem_t mp_builtin_module_table[] = {

    // extra builtin modules as defined by a port
    MICROPY_PORT_BUILTIN_MODULES
+
+#ifdef MICROPY_REGISTERED_MODULES


Please use indented #ifdef, per the latest code conventions.

pfalcon · 2018-11-03T08:28:11Z

py/makemoduledefs.py

+
+
+if __name__ == '__main__':
+    main()


As github shows, EOL is missing.

pfalcon · 2018-11-03T08:31:43Z

+1, this is definitely move in the right direction. I'm not sure if it could be done simpler (e.g. without yet another preprocessor script), but I assume it's not.

I personally wouldn't go further until this patch is merged, or explicitly acked by the maintainer (like, "this goes in, convert other modules in this same PR").

andrewleech · 2018-11-03T10:32:30Z

I also feel like adding more python pre-processors seems a bit excessive, but I haven't been able to find a better solution either.

gcc preprocessor does now have the ability to push/pop defines, so you can build a structure that appends to existing defines and build it up that way, however it still requires all the definitions to be appended to be in the include path (headers) with the destination (objmodule.c) being at the end and be the only thing to expand the final macro. It was looking harder to achieve cleanly.

I did consider using the preprocessor on the c files first to know what should or shouldn't be included rather than needing the #if statement to be provided, but that would definitely be a lot slower than this pure python/regex method.

Other than all that, this script does seem to run faster than I can notice it, and is hopefully fairly immune to extra whitespaces / line breaks in the macro usage.

Most importantly, this will not result in any extra ram/code/runtime needed, unlike any kind of dynamic registration of enabled modules.

I definitely agree the python script needs cleaning up, comments and/or documenting etc. I also don't like the arg names, I just haven't spent any time trying to structure it better (it grew out of the script used in cmodules branch). The current implementation is intended as a proof of concept to raise this PR as a discussion start.
I also like the suggestion of splitting out any further refactoring of the existing codebase into a separate commit/PR, if this one is finished and merged then the rest are basically cosmetic changes!

dhylands · 2018-11-03T19:25:49Z

Another approach that can be used is to use linker sections. Each module would put the mp_obj_module_t into a named section. The linker winds up concatenating these together. At startup some code would walk through each of the module objects in the named section and register them.

pfalcon · 2018-11-03T19:31:38Z

At startup some code would walk

That's too expensive. However, statically creating the whole of STATIC const mp_rom_map_elem_t mp_builtin_module_table[] from objmodule.c would indeed be possible. Except if we'd want later to e.g. switch to binary search of such namespace tables, we'd need to go with preprocessor again.

dhylands · 2018-11-03T19:40:47Z

Actually, I think it could be done with zero startup cost by simply having the qstr/pointers to the modules added to the existing mp_builtin_module_table.

andrewleech · 2018-11-03T21:29:20Z

I like the idea, the thought of bending the linker into giving us a registration array never occurred to me.

If each row of the STATIC const mp_rom_map_elem_t mp_builtin_module_table[] (struct mp_rom_map_elem_t objects) were assigned to a linker section that had them and only them in it, with a linker marker at the top and bottom, then yes the mp_builtin_module_table could be just a pointer to the top of the section. The array would be built for us with the end marker giving the number of entries. Other than sizeof not working, the rest of the code wouldn't know any difference, it could use it exactly the same.

pfalcon · 2018-11-03T22:02:23Z

Yes, would give troubles to people with adhoc toolchain, but actually would give issues even for unix and windows ports. So, I'd say skip it for now, given that preprocessor is already written.

andrewleech · 2018-11-03T23:25:49Z

Yeah, true, and it would require duplicating a suitable linker definition into each ports' linker files, with changes for each different kind of compiler. T
he python preprocessor is one single common task that works for all ports.

andrewleech · 2018-11-04T03:31:27Z

I've updated the branch to be a bit cleaner, the script more descriptive.

For references, the generated genhdr/moduledefs.h looks like:

// Automatically generated by makemoduledefs.py.

#if (MICROPY_PY_ARRAY)
    extern const struct _mp_obj_module_t mp_module_array;
    #define MODULE_DEF_MP_QSTR_ARRAY { MP_ROM_QSTR(MP_QSTR_array), MP_ROM_PTR(&mp_module_array) },
#else
    #define MODULE_DEF_MP_QSTR_ARRAY
#endif


#define MICROPY_REGISTERED_MODULES \
    MODULE_DEF_MP_QSTR_ARRAY \
// MICROPY_REGISTERED_MODULES

pfalcon · 2018-11-05T09:31:46Z

py/makemoduledefs.py

+#!/usr/bin/env python
+
+# This pre-processor parses provided objects' c files for 
+# `MP_REGISTER_MODULE(module_name, obj_module, enabled_define)`


I'm not sure that (markdown?) markup here makes sense. At least, we don't use it anywhere else.

Sure, yeah, I do find myself doing that a bit these days...

pfalcon · 2018-11-05T09:33:14Z

py/makemoduledefs.py

+import argparse
+
+
+def find_c_file(vpath, obj_file):


That can be a nitpick, by first impression I have in my head looking at this line is question "why args in that order?" ;-)

I hadn't really thought about it.
On one hand, vpath had a list of root paths, with the obj_file being appending on the end of one of them, so the two are joined together in that visual order.
On the other hand, obj_file is kind of the point of the function, with vpath being the list of paths to search in.

pfalcon · 2018-11-05T09:38:28Z

Thanks for the updates. Well, now a one-line commit message stands out as a bare pole, can you add something there (for the lack of real documentation, commit message serves as documentation).

And to clarify - you don't scan each byte of the entire source with this preprocessing script, do you?

stinos · 2018-11-05T10:02:07Z

py/makemoduledefs.py

+
+def find_c_file(vpath, obj_file):
+    """
+    Search the provided vpaths for the c file that matches the provided object_file


Shouldn't docstring style be something more standard like

""" Short description which ends with period. Rest of description """

or

""" Short description which ends with period. """

Just checking, don't know if this is set in stone somewhere for uPy.

I've never loved the first line starting on the same line as the opening """, but it does seem to be the more standard convention in PEP 257.

stinos · 2018-11-05T10:03:38Z

py/makemoduledefs.py

+def find_c_file(vpath, obj_file):
+    """
+    Search the provided vpaths for the c file that matches the provided object_file
+    :param List[str} vpath: List of base paths, similar to gcc vpath


List[str} is that correct? Also uses different captialization than the line below it

That one is a typo, thanks. I'm not completely sure about the later ones, being a list of tuples.

stinos · 2018-11-05T10:05:18Z

py/makemoduledefs.py

+    """
+    c_file = None
+    relative_c_file = os.path.splitext(obj_file)[0] + ".c"
+    relative_c_file = relative_c_file.lstrip('/\\')


Since relative_c_file this is passed to os.path.join this should not be needed I think?

This is definitely needed, as the files being passed in from makefile with the build folder stripped off come in with a "/" at the start.

Passing this in as the right hand arg of os.path.join, it gets treated as an absolute path and the left hand arg (from vpath) is ignored.

it gets treated as an absolute path

Well, that's new to me. Probably because the implementation is inconsistent between windows/linux and the type of slash used :(

stinos · 2018-11-05T10:06:47Z

py/makemoduledefs.py

+    :param c_file: path to c file to check
+    :return: List((module_name, obj_module, enabled_define))
+    """
+    pattern = re.compile(


Should be closer to where it's used, or maybe don't use seperate pattern at all

Ah in previous structure this pattern compilation happened outside the for loop iterating over files, so should be a decent efficiency improvement. It's now essentially moved inside the loop though so the benefit has been lost.

I'm not sure just how much is gained by only compiling this once up front, rather than once per file... so not too sure if it's worth moving it to global scope to keep it separate.

Sorry should have been clearer, I meant: perhaps it's clearer and more concies to simply not compile to a pattern at all, but just pass the regex string as first argument to re.findall.

Oh no I agree it's somewhat clearer to just include the pattern in the findall, however it's faster to compile the pattern once and reuse for all the files looped over, so a tradeoff between readable and speed. How much speed though, I'm not sure.

perhaps it's clearer and more concies to simply not compile to a pattern at all, but just pass the regex string as first argument to re.findall.

Well, nope. Compiling patterns is the best practice, shortcuts like that exist only for once-off scripts.

Yeah I prefer the once off compilation, my last push moved the pattern compilation to global scope to ensure it's a once off.

How much speed though, I'm not sure.

Yes that's what struck me first time. De-clarifying code just for the sake of 'might be faster' in some specific Python version in a program which is likely going to be limited by disk I/O, don't know. Well, it won't hurt either.

stinos · 2018-11-05T10:08:38Z

py/makemoduledefs.py

+        return []
+
+    with open(c_file) as c_file_handle:
+        c_text = c_file_handle.read()


why not just return list(re.findall(pattern, c_file_handle.read())) here?
Minor nitpick: open() doesn't really return a 'handle' like there's handles in C, but it's called a 'file object'

Splitting it up was useful while debugging to get the pattern right, less important now!

stinos · 2018-11-05T10:10:36Z

py/makemoduledefs.py

+    for mod_def in mod_defs:
+        print("    {mod_def} \\".format(mod_def=mod_def))
+
+    print("// MICROPY_REGISTERED_MODULES")


Seems spurious

It is somewhat, however without this the last line of the file ends with \ so it kind of looks like the file is unfinished or has been truncated.

andrewleech · 2018-11-06T05:57:34Z

And to clarify - you don't scan each byte of the entire source with this preprocessing script, do you?

Well... it does run a regex over the text of every c file in the current make pattern (files being built for the port, not every c file).
I was worried it'd be quite slow, however it seems to run faster than I notice it happening, certainly a lot faster than running the preprocessor over the *.h codebase for strings at the start of the build.

During make, makemoduledefs.py parses the compiled ports' c files for MP_REGISTER_MODULE(module_name, obj_module, enabled_define) These are used to generate a header with the required entries for "mp_rom_map_elem_t mp_builtin_module_table[]" in py/objmodule.c

dpgeorge · 2019-03-08T12:05:36Z

These was merged in cf22f47 as part of #4195

pfalcon reviewed Nov 3, 2018

View reviewed changes

py/makemoduledefs.py Outdated

if __name__ == '__main__':

main()

Copy link

Contributor

pfalcon Nov 3, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As github shows, EOL is missing.

andrewleech force-pushed the module_registration branch from 3244181 to cae619a Compare November 4, 2018 03:29

andrewleech force-pushed the module_registration branch from cae619a to 5e87dca Compare November 4, 2018 03:33

pfalcon reviewed Nov 5, 2018

View reviewed changes

stinos reviewed Nov 5, 2018

View reviewed changes

andrewleech force-pushed the module_registration branch from 5e87dca to 11e7507 Compare November 6, 2018 06:03

dpgeorge closed this Mar 8, 2019

RFC: Allow registration of modules at their definition. #4289

RFC: Allow registration of modules at their definition. #4289

Conversation

andrewleech commented Nov 3, 2018

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

pfalcon commented Nov 3, 2018

andrewleech commented Nov 3, 2018

dhylands commented Nov 3, 2018

pfalcon commented Nov 3, 2018

dhylands commented Nov 3, 2018

andrewleech commented Nov 3, 2018 • edited Loading

pfalcon commented Nov 3, 2018

andrewleech commented Nov 3, 2018

andrewleech commented Nov 4, 2018 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

pfalcon commented Nov 5, 2018

stinos Nov 5, 2018 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

andrewleech Nov 6, 2018 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

andrewleech commented Nov 6, 2018

dpgeorge commented Mar 8, 2019

andrewleech commented Nov 3, 2018 •

edited

Loading

andrewleech commented Nov 4, 2018 •

edited

Loading

stinos Nov 5, 2018 •

edited

Loading

andrewleech Nov 6, 2018 •

edited

Loading