tools/mpy-tool.py: Allow dumping MPY segments into their own files. #17306

agatti · 2025-05-15T01:45:29Z

Summary

This PR lets tools/mpy-tool.py extract MPY segments into their own files, one file per segment.

This is something I wrote some time ago but I guess it cannot hurt to be upstreamed. When debugging issues related with compiled code generated by @micropython.viper or @micropython.native, it is of great help being able to get hold of generated code segments to pass to objdump or ghidra/idapro/cutter/etc., without having to dump memory from gdb or writing custom file/hex dumpers.

A pair of new command line arguments were added, namely "-e"/"--extract" that takes a filename prefix to use as a base for the generated files' name, and "--extract-only" that - combined with "--extract" - allows selecting which kind of segments should be dumped to the filesystem.

So, for example, assuming there's a file called "module.mpy", running "./mpy-tool.py --extract segments module.mpy" would yield a series of files with names like "segments_0_module.py_QSTR_module.py.bin", "segments_1_module.py_META__module_.bin",
"segments_2_module.py_QSTR_function.bin", etc. In short the file name format is <base>_<count>_<sourcefile>_<segmentkind>_<segmentname>.bin, with <segmentkind> being META, QSTR, OBJ, or CODE. Source file names and segment names will only contain characters in the range "a-zA-Z0-9_-." to avoid having output file names with unexpected characters.

The "--extract-only" option can accept one or more kinds, separated by commas and treated as case insensitive strings. The supported kinds match what is currently handled by the "MPYSegment" class in "tools/mpy-tool.py": "META", "QSTR", "OBJ", and "CODE". The absence of this command line option implies dumping every segment found.

If "--extract" is passed along with "--merge", dumping is performed after the merge process takes place, in order to dump all possible segments that match the requested segment kinds.

Testing

Besides my own usage, I've attached a zipfile containing the compiled version of tests/micropython/native_try_deep.py for x64 and its dumped output. To reproduce those files the commands to run are:

mpy-cross -X emit=native -march=x64 tests/micropython/native_try_deep.py -o native_try_deep.mpy
mpy-tool.py --extract native_try_deep native_try_deep.mpy

To check that the CODE segments actually contain executable code, running objdump -b binary -M x86-64 -m i386:x86-64 --adjust-vma=0x1000 -z --start-address=0x1008 -D native_try_deep_7_native_try_deep.py_CODE_f.bin should dump valid x64 code to STDOUT, as generated by mpy-cross (it skips the first two header words).

native_try_deep.zip

Trade-offs and Alternatives

Given that this bit of code isn't executed unless explicitly required and for a niche scenario, the only issue it has would be that it increases the overall code complexity by a tiny amount and potential security issues when the output file prefix is used in a malicious way.

As far as alternatives go, I used to run mpy-tool.py -x -d <mpyfile> to figure out the binary code start offset by looking at the hex pairs on screen (and good luck if somebody remapped their terminal colour scheme :) no idea if the output is colourblind safe though). After a while I wrote my own cut-down mpy-tool.py equivalent to run as a ghidra plugin, but then it would require keeping up with MPY format changes and whatnot, and I wasn't sure it would work in all possible cases.

Having mpy-tool.py dump the segments itself is probably the best compromise for the time being, it is tool-agnostic and doesn't require anything special to get it working.

codecov · 2025-05-15T01:56:14Z

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 98.38%. Comparing base (8c47e44) to head (ce834b8).

Additional details and impacted files

@@           Coverage Diff           @@
##           master   #17306   +/-   ##
=======================================
  Coverage   98.38%   98.38%           
=======================================
  Files         171      171           
  Lines       22296    22296           
=======================================
  Hits        21937    21937           
  Misses        359      359

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

github-actions · 2025-05-15T01:59:10Z

Code size report:

   bare-arm:    +0 +0.000% 
minimal x86:    +0 +0.000% 
   unix x64:    +0 +0.000% standard
      stm32:    +0 +0.000% PYBV10
     mimxrt:    +0 +0.000% TEENSY40
        rp2:    +0 +0.000% RPI_PICO_W
       samd:    +0 +0.000% ADAFRUIT_ITSYBITSY_M4_EXPRESS
  qemu rv32:    +0 +0.000% VIRT_RV32

Josverl · 2025-06-28T21:00:34Z

I think it is worth having this, and for your explanation above to be added to the tools documentation.

dpgeorge

Thanks, this looks good. I would have used this more than once myself if it existing already :)

Please rebase on latest master, that will pick up changed ruff rules (namely double quotes).

dpgeorge · 2025-08-26T05:16:12Z

tools/mpy-tool.py

+def parse_extract_segments_arg(arg):
+    kinds = set()
+    if arg is not None:
+        for kind in arg.lower().split(','):


If you use arg.upper().split(","): then it could be simply:

try: kinds.add(getattr(MPYSegment, kind)) except AttributeError: raise Exception("unknown kind")

Then, this function could be written inline in the one below. (A little simpler to have everything self contained in one function, IMO.)

Oh, I'm used to use lower to work around internationalisation issues in input (f.ex the Turkish I problem).

I agree this is probably not really needed here, but old habits tend to die hard don't they.

dpgeorge · 2025-08-26T05:17:43Z

tools/mpy-tool.py

+    segments = []
+    for module in compiled_modules:
+        for segment in module.mpy_segments:
+            if not kinds or segment.kind in kinds:


If you didn't want to bother validating kinds_arg, this could simply be if not kinds or segment.kind in kinds_arg.

Validation is probably the easier option here, as the extract operation would still proceed with mpy-tool.py -e module --extract-only= file.mpy. This can be interpreted either as "don't extract any segment" or maybe "extract all segments", and in both cases this is still the wrong set of arguments to pass.

Same thing if you accidentally type cod instead of code. Since there's no output during the extraction process, as the final user I'd appreciate more having an error telling me I messed up rather than falsely assume the mpy file had no raw code segments in there, for example.

The alternative would be to add a custom argparse argument type that performs its own validation, which would probably be more code overall.

dpgeorge · 2025-08-26T05:18:49Z

tools/mpy-tool.py

@@ -1795,6 +1844,14 @@ def main(args=None):
        default=16,
        help="mpz digit size used by target (default 16)",
    )
+    cmd_parser.add_argument(


I suggest moving this up to just after the --merge option, so they appear together in the help output (the action commands will then come before the tweaking options).

Makes sense, thanks! This will be addressed in the next PR iteration.

This commit lets "tools/mpy-tool.py" extract MPY segments into their own files, one file per segment. A pair of new command line arguments were added, namely "-e"/"--extract" that takes a filename prefix to use as a base for the generated files' name, and "--extract-only" that - combined with "--extract" - allows selecting which kinds of segment should be dumped to the filesystem. So, for example, assuming there's a file called "module.mpy", running "./mpy-tool.py --extract segments module.mpy" would yield a series of files with names like "segments_0_module.py_QSTR_module.py.bin", "segments_1_module.py_META__module_.bin", "segments_2_module.py_QSTR_function.bin", etc. In short the file name format is "<base>_<count>_<sourcefile>_<segmentkind>_<segmentname>.bin", with <segmentkind> being META, QSTR, OBJ, or CODE. Source file names and segment names will only contain characters in the range "a-zA-Z0-9_-." to avoid having output file names with unexpected characters. The "--extract-only" option can accept one or more kinds, separated by commas and treated as case insensitive strings. The supported kinds match what is currently handled by the "MPYSegment" class in "tools/mpy-tool.py": "META", "QSTR", "OBJ", and "CODE". The absence of this command line option implies dumping every segment found. If "--extract" is passed along with "--merge", dumping is performed after the merge process takes place, in order to dump all possible segments that match the requested segment kinds. Signed-off-by: Alessandro Gatti <a.gatti@frob.it>

agatti · 2025-08-26T19:19:49Z

Turns out I already had the segment kind strings mapped somewhere else in the new code, so I just reused those to simplify the validation.

I believe the patch is now shorter overall, and it should also be compliant with the new ruff formatting rules too.

dpgeorge added the tools Relates to tools/ directory in source, or other tooling label May 15, 2025

agatti force-pushed the mpy-tool-dump-segments branch from 1a8b588 to ad32c30 Compare June 28, 2025 19:50

dpgeorge reviewed Aug 26, 2025

View reviewed changes

agatti force-pushed the mpy-tool-dump-segments branch from ad32c30 to ce834b8 Compare August 26, 2025 19:17

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

tools/mpy-tool.py: Allow dumping MPY segments into their own files. #17306

tools/mpy-tool.py: Allow dumping MPY segments into their own files. #17306

agatti commented May 15, 2025

Uh oh!

codecov bot commented May 15, 2025 •

edited

Loading

Uh oh!

github-actions bot commented May 15, 2025

Uh oh!

Josverl commented Jun 28, 2025

Uh oh!

dpgeorge left a comment

Uh oh!

dpgeorge Aug 26, 2025

Uh oh!

agatti Aug 26, 2025

Uh oh!

dpgeorge Aug 26, 2025

Uh oh!

agatti Aug 26, 2025

Uh oh!

dpgeorge Aug 26, 2025

Uh oh!

agatti Aug 26, 2025

Uh oh!

agatti commented Aug 26, 2025 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

tools/mpy-tool.py: Allow dumping MPY segments into their own files. #17306

Are you sure you want to change the base?

tools/mpy-tool.py: Allow dumping MPY segments into their own files. #17306

Conversation

agatti commented May 15, 2025

Summary

Testing

Trade-offs and Alternatives

Uh oh!

codecov bot commented May 15, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

github-actions bot commented May 15, 2025

Uh oh!

Josverl commented Jun 28, 2025

Uh oh!

dpgeorge left a comment

Choose a reason for hiding this comment

Uh oh!

dpgeorge Aug 26, 2025

Choose a reason for hiding this comment

Uh oh!

agatti Aug 26, 2025

Choose a reason for hiding this comment

Uh oh!

dpgeorge Aug 26, 2025

Choose a reason for hiding this comment

Uh oh!

agatti Aug 26, 2025

Choose a reason for hiding this comment

Uh oh!

dpgeorge Aug 26, 2025

Choose a reason for hiding this comment

Uh oh!

agatti Aug 26, 2025

Choose a reason for hiding this comment

Uh oh!

agatti commented Aug 26, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

codecov bot commented May 15, 2025 •

edited

Loading

agatti commented Aug 26, 2025 •

edited

Loading