Skip to content

ports/embed: segfault in mp_compile() #11781

Closed
@cwalther

Description

@cwalther

There is a bug in the embed port that can cause segfaults while compiling Python code in memory-constrained situations. I have not managed to construct a simple reproducer yet, since it depends on memory allocation patterns, but the root cause is easy to verify, so I’m reporting it already.

mp_stack_ctrl_init(), called two levels deep in main()mp_embed_init()mp_stack_ctrl_init(), captures a too low stack_top, so that the local variable parse_tree of mp_embed_exec_str(), called only one level deep in main()mp_embed_exec_str(), ends up outside of the stack range looked at by the garbage collector in gc_helper_collect_regs_and_stack(). When garbage collection is invoked at just the wrong moment during compilation, the blocks pointed at by the contents of parse_tree are therefore freed prematurely, and when they are later reused and overwritten, that corrupts the pointers inside them, which is likely to lead to memory access failures in mp_parse_tree_clear() at the latest.

This is easily verified with

diff --git a/ports/embed/port/embed_util.c b/ports/embed/port/embed_util.c
index 14f508977..816e28a36 100644
--- a/ports/embed/port/embed_util.c
+++ b/ports/embed/port/embed_util.c
@@ -49,6 +49,7 @@ void mp_embed_exec_str(const char *src) {
         mp_lexer_t *lex = mp_lexer_new_from_str_len(MP_QSTR__lt_stdin_gt_, src, strlen(src), 0);
         qstr source_name = lex->source_name;
         mp_parse_tree_t parse_tree = mp_parse(lex, MP_PARSE_FILE_INPUT);
+        assert((char*)&parse_tree <= MP_STATE_THREAD(stack_top));
         mp_obj_t module_fun = mp_compile(&parse_tree, source_name, true);
         mp_call_function_0(module_fun);
         nlr_pop();

This assert fails for me on macOS 13 x86_64, compiling with clang 14.

I am not sure how to best fix this. Ideas so far:

  • Remove mp_stack_ctrl_init() from mp_embed_init() and require the application itself to call it, at the same or higher level than it will later call mp_embed_exec_str().
  • Have mp_stack_ctrl_init() apply some offset to the detected stack top to account for its own call depth. I have no idea how to generically and portably determine what that offset should be, though. (Underestimating will fail to fix the problem, overestimating might run into unmapped memory above the stack.)
  • mp_embed_exec_str() could check whether parse_tree is above stack_top and if so adjust the latter.

Any opinions?

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions