Skip to content

RFC: Allow registration of modules at their definition. #4289

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed

Conversation

andrewleech
Copy link
Contributor

The current method of adding a new builtin/global modules is to have the module object defined in the c file for said module, but the python module name and matching #define enable guard defined in the central objmodule.c file.

This PR provides a mechanism for moving this name and guard definition to the actual module file, removing the need to edit the objmodule.c file whenever you want to add a new module.

There is sort of fake macro provided MP_REGISTER_MODULE(name, obj, enabled) for use where modules are being defined.

These are parsed by the included python script during make (not the c the preprocessor) to generate the header of required entries in the builtins modules list in objmodule.c

This was originally written to replace my json based method of declaring modules in #4195 but I think it could have wider appeal across the micropython codebase.

If you're interested in this system, I'm happy to replace the other existing module definitions with the "macro", similar to the array one I've used as reference. I'll also update my cmodule branch to suit.

I'd also like to see a similar system used for module init (early and late) and de-init functions, so that modules can themselves declare the functions needed for their set up or tear down in main.c rather than having to maintain such a list of functions manually in all the separate ports main files. There could be some added complexity there with regard to dependencies and ordering, but that's a separate issue to this PR.

py/obj.h Outdated
@@ -816,6 +816,9 @@ static inline mp_obj_dict_t *mp_obj_module_get_globals(mp_obj_t module) {
// check if given module object is a package
bool mp_obj_is_package(mp_obj_t module);

// declare a module as a builtin, handled by makemoduledefs.py
#define MP_REGISTER_MODULE(name, ref, guard)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd try to look for better self-describing name than "guard". Likewise, for other params. And maybe even then comment should describe what are they.

py/objmodule.c Outdated
@@ -226,6 +225,11 @@ STATIC const mp_rom_map_elem_t mp_builtin_module_table[] = {

// extra builtin modules as defined by a port
MICROPY_PORT_BUILTIN_MODULES

#ifdef MICROPY_REGISTERED_MODULES
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please use indented #ifdef, per the latest code conventions.



if __name__ == '__main__':
main()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As github shows, EOL is missing.

@pfalcon
Copy link
Contributor

pfalcon commented Nov 3, 2018

+1, this is definitely move in the right direction. I'm not sure if it could be done simpler (e.g. without yet another preprocessor script), but I assume it's not.

I personally wouldn't go further until this patch is merged, or explicitly acked by the maintainer (like, "this goes in, convert other modules in this same PR").

@andrewleech
Copy link
Contributor Author

I also feel like adding more python pre-processors seems a bit excessive, but I haven't been able to find a better solution either.

gcc preprocessor does now have the ability to push/pop defines, so you can build a structure that appends to existing defines and build it up that way, however it still requires all the definitions to be appended to be in the include path (headers) with the destination (objmodule.c) being at the end and be the only thing to expand the final macro. It was looking harder to achieve cleanly.

I did consider using the preprocessor on the c files first to know what should or shouldn't be included rather than needing the #if statement to be provided, but that would definitely be a lot slower than this pure python/regex method.

Other than all that, this script does seem to run faster than I can notice it, and is hopefully fairly immune to extra whitespaces / line breaks in the macro usage.

Most importantly, this will not result in any extra ram/code/runtime needed, unlike any kind of dynamic registration of enabled modules.

I definitely agree the python script needs cleaning up, comments and/or documenting etc. I also don't like the arg names, I just haven't spent any time trying to structure it better (it grew out of the script used in cmodules branch). The current implementation is intended as a proof of concept to raise this PR as a discussion start.
I also like the suggestion of splitting out any further refactoring of the existing codebase into a separate commit/PR, if this one is finished and merged then the rest are basically cosmetic changes!

@dhylands
Copy link
Contributor

dhylands commented Nov 3, 2018

Another approach that can be used is to use linker sections. Each module would put the mp_obj_module_t into a named section. The linker winds up concatenating these together. At startup some code would walk through each of the module objects in the named section and register them.

@pfalcon
Copy link
Contributor

pfalcon commented Nov 3, 2018

At startup some code would walk

That's too expensive. However, statically creating the whole of STATIC const mp_rom_map_elem_t mp_builtin_module_table[] from objmodule.c would indeed be possible. Except if we'd want later to e.g. switch to binary search of such namespace tables, we'd need to go with preprocessor again.

@dhylands
Copy link
Contributor

dhylands commented Nov 3, 2018

Actually, I think it could be done with zero startup cost by simply having the qstr/pointers to the modules added to the existing mp_builtin_module_table.

@andrewleech
Copy link
Contributor Author

andrewleech commented Nov 3, 2018

I like the idea, the thought of bending the linker into giving us a registration array never occurred to me.

If each row of the STATIC const mp_rom_map_elem_t mp_builtin_module_table[] (struct mp_rom_map_elem_t objects) were assigned to a linker section that had them and only them in it, with a linker marker at the top and bottom, then yes the mp_builtin_module_table could be just a pointer to the top of the section. The array would be built for us with the end marker giving the number of entries. Other than sizeof not working, the rest of the code wouldn't know any difference, it could use it exactly the same.

@pfalcon
Copy link
Contributor

pfalcon commented Nov 3, 2018

Yes, would give troubles to people with adhoc toolchain, but actually would give issues even for unix and windows ports. So, I'd say skip it for now, given that preprocessor is already written.

@andrewleech
Copy link
Contributor Author

Yeah, true, and it would require duplicating a suitable linker definition into each ports' linker files, with changes for each different kind of compiler. T
he python preprocessor is one single common task that works for all ports.

@andrewleech
Copy link
Contributor Author

andrewleech commented Nov 4, 2018

I've updated the branch to be a bit cleaner, the script more descriptive.

For references, the generated genhdr/moduledefs.h looks like:

// Automatically generated by makemoduledefs.py.

#if (MICROPY_PY_ARRAY)
    extern const struct _mp_obj_module_t mp_module_array;
    #define MODULE_DEF_MP_QSTR_ARRAY { MP_ROM_QSTR(MP_QSTR_array), MP_ROM_PTR(&mp_module_array) },
#else
    #define MODULE_DEF_MP_QSTR_ARRAY
#endif


#define MICROPY_REGISTERED_MODULES \
    MODULE_DEF_MP_QSTR_ARRAY \
// MICROPY_REGISTERED_MODULES

#!/usr/bin/env python

# This pre-processor parses provided objects' c files for
# `MP_REGISTER_MODULE(module_name, obj_module, enabled_define)`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure that (markdown?) markup here makes sense. At least, we don't use it anywhere else.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, yeah, I do find myself doing that a bit these days...

import argparse


def find_c_file(vpath, obj_file):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That can be a nitpick, by first impression I have in my head looking at this line is question "why args in that order?" ;-)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I hadn't really thought about it.
On one hand, vpath had a list of root paths, with the obj_file being appending on the end of one of them, so the two are joined together in that visual order.
On the other hand, obj_file is kind of the point of the function, with vpath being the list of paths to search in.

@pfalcon
Copy link
Contributor

pfalcon commented Nov 5, 2018

Thanks for the updates. Well, now a one-line commit message stands out as a bare pole, can you add something there (for the lack of real documentation, commit message serves as documentation).

And to clarify - you don't scan each byte of the entire source with this preprocessing script, do you?


def find_c_file(vpath, obj_file):
"""
Search the provided vpaths for the c file that matches the provided object_file
Copy link
Contributor

@stinos stinos Nov 5, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't docstring style be something more standard like

""" Short description which ends with period.

Rest of description
"""

or

"""  Short description which ends with period. """

Just checking, don't know if this is set in stone somewhere for uPy.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've never loved the first line starting on the same line as the opening """, but it does seem to be the more standard convention in PEP 257.

def find_c_file(vpath, obj_file):
"""
Search the provided vpaths for the c file that matches the provided object_file
:param List[str} vpath: List of base paths, similar to gcc vpath
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

List[str} is that correct? Also uses different captialization than the line below it

Copy link
Contributor Author

@andrewleech andrewleech Nov 6, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That one is a typo, thanks. I'm not completely sure about the later ones, being a list of tuples.

"""
c_file = None
relative_c_file = os.path.splitext(obj_file)[0] + ".c"
relative_c_file = relative_c_file.lstrip('/\\')
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since relative_c_file this is passed to os.path.join this should not be needed I think?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is definitely needed, as the files being passed in from makefile with the build folder stripped off come in with a "/" at the start.

Passing this in as the right hand arg of os.path.join, it gets treated as an absolute path and the left hand arg (from vpath) is ignored.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it gets treated as an absolute path

Well, that's new to me. Probably because the implementation is inconsistent between windows/linux and the type of slash used :(

:param c_file: path to c file to check
:return: List((module_name, obj_module, enabled_define))
"""
pattern = re.compile(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should be closer to where it's used, or maybe don't use seperate pattern at all

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah in previous structure this pattern compilation happened outside the for loop iterating over files, so should be a decent efficiency improvement. It's now essentially moved inside the loop though so the benefit has been lost.

I'm not sure just how much is gained by only compiling this once up front, rather than once per file... so not too sure if it's worth moving it to global scope to keep it separate.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry should have been clearer, I meant: perhaps it's clearer and more concies to simply not compile to a pattern at all, but just pass the regex string as first argument to re.findall.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh no I agree it's somewhat clearer to just include the pattern in the findall, however it's faster to compile the pattern once and reuse for all the files looped over, so a tradeoff between readable and speed. How much speed though, I'm not sure.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

perhaps it's clearer and more concies to simply not compile to a pattern at all, but just pass the regex string as first argument to re.findall.

Well, nope. Compiling patterns is the best practice, shortcuts like that exist only for once-off scripts.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah I prefer the once off compilation, my last push moved the pattern compilation to global scope to ensure it's a once off.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How much speed though, I'm not sure.

Yes that's what struck me first time. De-clarifying code just for the sake of 'might be faster' in some specific Python version in a program which is likely going to be limited by disk I/O, don't know. Well, it won't hurt either.

return []

with open(c_file) as c_file_handle:
c_text = c_file_handle.read()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why not just return list(re.findall(pattern, c_file_handle.read())) here?
Minor nitpick: open() doesn't really return a 'handle' like there's handles in C, but it's called a 'file object'

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Splitting it up was useful while debugging to get the pattern right, less important now!

for mod_def in mod_defs:
print(" {mod_def} \\".format(mod_def=mod_def))

print("// MICROPY_REGISTERED_MODULES")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems spurious

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is somewhat, however without this the last line of the file ends with \ so it kind of looks like the file is unfinished or has been truncated.

@andrewleech
Copy link
Contributor Author

And to clarify - you don't scan each byte of the entire source with this preprocessing script, do you?

Well... it does run a regex over the text of every c file in the current make pattern (files being built for the port, not every c file).
I was worried it'd be quite slow, however it seems to run faster than I notice it happening, certainly a lot faster than running the preprocessor over the *.h codebase for strings at the start of the build.

During make, makemoduledefs.py parses the compiled ports' c files for
MP_REGISTER_MODULE(module_name, obj_module, enabled_define)

These are used to generate a header with the required entries for
"mp_rom_map_elem_t mp_builtin_module_table[]" in py/objmodule.c
@dpgeorge
Copy link
Member

dpgeorge commented Mar 8, 2019

These was merged in cf22f47 as part of #4195

@dpgeorge dpgeorge closed this Mar 8, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants