Skip to content

natmod: Add native modules support for RV32 code. #15603

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 4 commits into from
Dec 22, 2024

Conversation

agatti
Copy link
Contributor

@agatti agatti commented Aug 5, 2024

Summary

This commit adds support for RV32IMC native modules, which is one of the few remaining bits left to bring RV32 support to parity with Thumb or x86 (the other is inline assembler).

Testing

All tests were performed on an ESP32C3 board, with cross-compilation done in an environment similar to the CI machine used to build the ESP32 and Qemu-RISCV ports. The natmods being built are the ones from the examples/natmod/ directory, using the tests in tests/extmod/ to exercise the code in a known way whenever possible.

Trade-offs and Alternatives

There isn't any alternative to this in order to get natmod support on RV32, and there are no real trade-offs in execution speed or in code size. A full-fledged linker would be able to convert some two-opcodes relocations into a single opcode or perform similar optimisations (the infamous R_RISCV_RELAX relocation marker), but this isn't it. The RV32-specific relocations handling code had to be moved to its own function to please the linter when it measures function complexity.

One thing worth mentioning is the insane amount of possible static relocation types that must be handled (around 35 in total). Right now I've implemented all the ones I've seen from building the sample natmods, for the ones I didn't see yet, I'll have to add them in blindly and hope the documentation is accurate.

The makefile changes to add native compilation are pretty much the same as what was originally written for the Qemu-RISCV port. It's quite convoluted but I couldn't find any better way to simplify that and get the same out-of-the-box compilation output under various Linux environments.

Copy link

github-actions bot commented Aug 5, 2024

Code size report:

   bare-arm:    +0 +0.000% 
minimal x86:    +0 +0.000% 
   unix x64:    +0 +0.000% standard
      stm32:    +0 +0.000% PYBV10
     mimxrt:    +0 +0.000% TEENSY40
        rp2:    +0 +0.000% RPI_PICO_W
       samd:    +0 +0.000% ADAFRUIT_ITSYBITSY_M4_EXPRESS
  qemu rv32:    +0 +0.000% VIRT_RV32

@agatti
Copy link
Contributor Author

agatti commented Aug 6, 2024

For features2 and random I have no idea at the moment on how to tweak the build process to get the soft-float code work on Picolibc in the CI environment. MICROPY_FLOAT_IMPL is set to the appropriate value for RV32 natmods I think, and it works when building with Newlib on a different toolchain (a custom-built one in my case), so it should work for Picolibc too but it doesn't - unless the RISC-V toolchain shipped by Ubuntu 22.04 not only doesn't provide __builtin_alloca but has some issues with soft-float too?

Suggestions are welcome :)

@agatti agatti changed the title tools/mpy_ld.py: Add native modules support for RV32 code. natmod: Add native modules support for RV32 code. Aug 6, 2024
@agatti agatti force-pushed the natmod-riscv branch 2 times, most recently from 8235c7c to ef82899 Compare August 8, 2024 14:13
@agatti
Copy link
Contributor Author

agatti commented Aug 8, 2024

@dpgeorge I found an "interesting" bug, which I'm not sure if it is caused by my relocation handling.

This happens when you build the examples/deflate natmod for rv32imc and then push it to an ESP32C3. If you take a look at https://gist.github.com/agatti/84476520c808095575321769dfe64138 you'll see it's tests/extmod/deflate_decompress.py with a modified module import to pick up the natmod and with most of the test code commented out except for the last dozen or so calls to the deflate module.

Now, if you run that file via mpremote run as-is, the device crashes when executing the code at line 194 or 195. However, if you remove the chunk of code between lines 53 and 170 - which is already commented out - and run the test again, there's no crash at all.

More interestingly, if you leave the commented block in, it's only the sequence of having the last three marked blocks together that make the device crash. With any other two blocks combination and the commented block it works.

This doesn't make much sense unless the patched relocations are incorrect and by accident something overwrites some structures in RAM, but I'm not entirely sure that's what happening, because as it's the same functions called over and over I suppose it would create issues right away. I'm starting to doubt it's my device misbehaving but until the new C3s I've ordered arrive I cannot rule that out (all the boards I have are busy doing other things).

@agatti
Copy link
Contributor Author

agatti commented Aug 11, 2024

The framebuf natmod crashes because pyelftools may have a bug in handling R_RISCV_SUB32 relocations.

The switch statement in framebuf_make_new is compiled as a table whose entries contains two relocations each, one positive (ADD32) and one negative (SUB32). In this case the positive relocation contains the offset between the table entry and the parameters calculation code (calculated from the beginning of the segment), and then the negative relocation contains the offset of the table start, to make sure the offset is relative to the auipc/jalr pair of instruction that performs the table jump.

According to the official relocation descriptions, in both cases one must add or subtract the value of the relocation plus the addend to the original word present in the binary (in this case 0x00000000). When performing the relocations for that switch table, ADD32 relocations have the correct s["st_value"]/r_addend values pair, however SUB32 relocations have 0 in s["st_value"] instead of the table offset:

# ADD32 relocation, st_value contains the offset of ".L250" (the table entry offset) as it is expected
R: <Relocation (RELA): Container({'r_offset': 0, 'r_info': 63523, 'r_info_sym': 248,
                                  'r_info_type': 35, 'r_addend': 0})> 
S: Container({'st_name': 1705, 'st_value': 4644, 'st_size': 0,
              'st_info': Container({'bind': 'STB_LOCAL', 'type': 'STT_NOTYPE'}),
              'st_other': Container({'visibility': 'STV_DEFAULT'}), 'st_shndx': 1}), '.L250'
  00001580 .L250
# SUB32 relocation, st_value is 0 instead of containing the offset of ".L245" (the table start offset)
R: <Relocation (RELA): Container({'r_offset': 0, 'r_info': 23591, 'r_info_sym': 92,
                                  'r_info_type': 39, 'r_addend': 0})> 
S: Container({'st_name': 951, 'st_value': 0, 'st_size': 0,
              'st_info': Container({'bind': 'STB_LOCAL', 'type': 'STT_NOTYPE'}),
              'st_other': Container({'visibility': 'STV_DEFAULT'}), 'st_shndx': 6}), '.L245'
  00001580 .L245

Once the relocations are applied and there's an attempt to create a framebuffer, the code tries to set up the parameters using a code fragment outside the natmod code space, and things crash...

@agatti agatti force-pushed the natmod-riscv branch 2 times, most recently from be48fd7 to 2506be1 Compare August 15, 2024 18:42
@agatti agatti marked this pull request as ready for review August 15, 2024 19:20
@agatti
Copy link
Contributor Author

agatti commented Aug 15, 2024

@dpgeorge here's a version that makes all tests pass for the natmods that build (features0, features1, features3, features4, framebuf, heapq, re) except for the ones in deflate and framebuf_ellipse and framebuf_polygon. re_debug, re_sub, re_sub_unmatched need more build flags to include debug and substitute functionalities, and those were not in the re natmod makefile.

That said, for those failures I'm not sure they are due to my changes, as for example, in framebuf_ellipse it crashes when attempting to display the first clipped ellipse - however if you remove the rest of the test operations, those ellipses draw just fine. Same goes for deflate, functions fail after being called in the middle of a test case, but if moved around or isolated they work fine.

I made all test verifications by hand (editing the tests' import lines to pick up the mpy version), hopefully when Qemu ports will run mpy files this can be added to CI as well.

@agatti
Copy link
Contributor Author

agatti commented Aug 17, 2024

Can reproduce the same crashes on xtensawin and an ESP32-D0WD-V3.

@dpgeorge dpgeorge added the py-core Relates to py/ directory in source label Aug 26, 2024
@dpgeorge
Copy link
Member

Thanks for this, it's a great addition!

I haven't had time to look at it properly, but it will definitely need some CI tests:

I made all test verifications by hand (editing the tests' import lines to pick up the mpy version), hopefully when Qemu ports will run mpy files this can be added to CI as well.

Yes, we should try to get the qemu ports to test this. See #15624 for the first step in that direction.

Copy link

codecov bot commented Sep 6, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 98.57%. Comparing base (1360584) to head (7ca6e5e).
Report is 4 commits behind head on master.

Additional details and impacted files
@@           Coverage Diff           @@
##           master   #15603   +/-   ##
=======================================
  Coverage   98.57%   98.57%           
=======================================
  Files         164      164           
  Lines       21349    21349           
=======================================
  Hits        21045    21045           
  Misses        304      304           

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@vshymanskyy
Copy link
Contributor

vshymanskyy commented Oct 3, 2024

I can confirm that this PR works on the actual hardware with wasm2mpy.
We were even able to run the CoreMark tests:

ESP32-C3 160MHz: 179.791      <---
ESP32 240MHz: 228.363
ESP32-S3 240MHz: 271.573
BL616 320MHz: 344.293         <--- on Zephyr

}


def process_riscv32_relocation(env, text_addr, r):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does it make sense to split this function into "compute" and "rewrite" parts? That might follow better the existing logic above (where this function is called) used by other archs.

Alternatively, maybe it's better to have a single function like this for all the other archs? Although I think in their case the different archs share a lot of the rewrite logic.

Copy link
Contributor Author

@agatti agatti Dec 10, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's been a while, but the only reason was to keep the method length/complexity below the threshold set in the precommit checks running on python code.

As you point out, other archs more or less share the same primitives when it comes to relocation handling with RISC-V being the outlier here (and MIPS and probably AArch64 too). I'm OK with doing the split myself (either with a set of functions or classes with just enough inheritance to reuse even more code), whether that is submitted as its own PR or as an entry in this set of commits is fine - whichever way takes less effort for you to review.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm happy to leave this refactoring for another day (ie keep it as you have it).

This one-file program is getting quite large so probably needs some more serious factoring, eventually.

@dpgeorge dpgeorge added this to the release-1.25.0 milestone Dec 10, 2024
@dpgeorge
Copy link
Member

@agatti this PR is looking really good now, and I've tested it with the natmod examples. Everything passes locally for me, except compiling features2 (because it needs float) and running extmod/random_extra_float.py (because it needs float).

As the remaining piece of the puzzle, I think we should get CI to run these tests. Here's a minimal patch that works locally for me and hopefully works for the CI. If you could add it to this PR, that would be great (feel free to modify to get it working and/or improve it as you see fit):

diff --git a/ports/qemu/Makefile b/ports/qemu/Makefile
index 237e1e951..f60e8f58b 100644
--- a/ports/qemu/Makefile
+++ b/ports/qemu/Makefile
@@ -168,6 +168,11 @@ test: $(BUILD)/firmware.elf
        $(eval DIRNAME=ports/$(notdir $(CURDIR)))
        cd $(TOP)/tests && ./run-tests.py -t execpty:"$(QEMU_SYSTEM) $(QEMU_ARGS) -serial pty -kernel ../$(DIRNAME)/$<" $(RUN_TESTS_ARGS) $(RUN_TESTS_EXTRA)
 
+.PHONY: test_natmod
+test_natmod: $(BUILD)/firmware.elf
+       $(eval DIRNAME=ports/$(notdir $(CURDIR)))
+       cd $(TOP)/tests && ./run-natmodtests.py -p -d execpty:"$(QEMU_SYSTEM) $(QEMU_ARGS) -serial pty -kernel ../$(DIRNAME)/$<" $(RUN_NATMODTESTS_ARGS) extmod/{btree,deflate,framebuf,heapq,random_basic,re}*.py
+
 $(BUILD)/firmware.elf: $(LDSCRIPT) $(OBJ)
        $(Q)$(CC) $(LDFLAGS) -o $@ $(OBJ) $(LIBS)
        $(Q)$(SIZE) $@
diff --git a/ports/qemu/boards/VIRT_RV32.mk b/ports/qemu/boards/VIRT_RV32.mk
index a61b659fa..355a09c3d 100644
--- a/ports/qemu/boards/VIRT_RV32.mk
+++ b/ports/qemu/boards/VIRT_RV32.mk
@@ -12,3 +12,5 @@ MPY_CROSS_FLAGS += -march=rv32imc
 
 # These Thumb tests don't run on RV32, so exclude them.
 RUN_TESTS_ARGS = --exclude 'inlineasm|qemu/asm_test'
+
+RUN_NATMODTESTS_ARGS = --arch rv32imc
diff --git a/tools/ci.sh b/tools/ci.sh
index c67aeed0c..63ffe58b8 100755
--- a/tools/ci.sh
+++ b/tools/ci.sh
@@ -292,6 +292,10 @@ function ci_qemu_build_rv32 {
     make ${MAKEOPTS} -C mpy-cross
     make ${MAKEOPTS} -C ports/qemu BOARD=VIRT_RV32 submodules
     make ${MAKEOPTS} -C ports/qemu BOARD=VIRT_RV32 test
+
+    # Test building and running native .mpy with rv32imc architecture.
+    ci_native_mpy_modules_build rv32imc
+    make ${MAKEOPTS} -C ports/qemu BOARD=VIRT_RV32 test_natmod
 }
 
 ########################################################################################
@@ -476,7 +480,9 @@ function ci_native_mpy_modules_build {
         arch=$1
     fi
     make -C examples/natmod/features1 ARCH=$arch
-    make -C examples/natmod/features2 ARCH=$arch
+    if [ $arch != rv32imc ]; then
+        make -C examples/natmod/features2 ARCH=$arch
+    fi
     make -C examples/natmod/features3 ARCH=$arch
     make -C examples/natmod/features4 ARCH=$arch
     make -C examples/natmod/btree ARCH=$arch

@agatti
Copy link
Contributor Author

agatti commented Dec 20, 2024

Everything passes locally for me, except compiling features2 (because it needs float) and running extmod/random_extra_float.py (because it needs float).

If you merge #15838 first then RV32 natmods requiring floating point support should compile and run. I tried that back in September, but I don't recall if I had to change makefiles or any other file.

As the remaining piece of the puzzle, I think we should get CI to run these tests. Here's a minimal patch that works locally for me and hopefully works for the CI. If you could add it to this PR, that would be great (feel free to modify to get it working and/or improve it as you see fit):

Thanks, I'll look at it shortly and update the PR accordingly!

@agatti
Copy link
Contributor Author

agatti commented Dec 20, 2024

@dpgeorge I guess you've tested building modules using newlib outside the CI? The btree natmod was never able to be built with picolibc due to errno-related incompatibilities. I believe the same error should occur when targetting any other arch if Picolibc is used.

Basically, to let it compile you'll have to apply the patch listed below, but then it won't link because the errno backing variable is put in the .BSS segment, throwing an exception from mpy_ld.py as it cannot find the symbol.

I haven't looked at it much further from when I originally encountered this issue during this PR's development, but if that's OK with you I'll let the CI skip the btree natmod when testing RV32 natmod builds. I can try to come up with a PR later for MicroPython's berkeley-db fork to work around this if possible.

diff --git a/examples/natmod/btree/btree_c.c b/examples/natmod/btree/btree_c.c
index 4f494817e..d57a7f631 100644
--- a/examples/natmod/btree/btree_c.c
+++ b/examples/natmod/btree/btree_c.c
@@ -43,6 +43,7 @@ int puts(const char *s) {
     return mp_printf(&mp_plat_print, "%s\n", s);
 }
 
+#ifndef _PICOLIBC__
 int native_errno;
 #if defined(__linux__)
 int *__errno_location (void)
@@ -52,11 +53,16 @@ int *__errno (void)
 {
     return &native_errno;
 }
+#define ERRNO native_errno
+#else
+NEWLIB_THREAD_LOCAL_ERRNO int errno;
+#define ERRNO errno
+#endif
 
 ssize_t mp_stream_posix_write(void *stream, const void *buf, size_t len) {
     mp_obj_base_t* o = stream;
     const mp_stream_p_t *stream_p = MP_OBJ_TYPE_GET_SLOT(o->type, protocol);
-    mp_uint_t out_sz = stream_p->write(MP_OBJ_FROM_PTR(stream), buf, len, &native_errno);
+    mp_uint_t out_sz = stream_p->write(MP_OBJ_FROM_PTR(stream), buf, len, &ERRNO);
     if (out_sz == MP_STREAM_ERROR) {
         return -1;
     } else {
@@ -67,7 +73,7 @@ ssize_t mp_stream_posix_write(void *stream, const void *buf, size_t len) {
 ssize_t mp_stream_posix_read(void *stream, void *buf, size_t len) {
     mp_obj_base_t* o = stream;
     const mp_stream_p_t *stream_p = MP_OBJ_TYPE_GET_SLOT(o->type, protocol);
-    mp_uint_t out_sz = stream_p->read(MP_OBJ_FROM_PTR(stream), buf, len, &native_errno);
+    mp_uint_t out_sz = stream_p->read(MP_OBJ_FROM_PTR(stream), buf, len, &ERRNO);
     if (out_sz == MP_STREAM_ERROR) {
         return -1;
     } else {
@@ -81,7 +87,7 @@ off_t mp_stream_posix_lseek(void *stream, off_t offset, int whence) {
     struct mp_stream_seek_t seek_s;
     seek_s.offset = offset;
     seek_s.whence = whence;
-    mp_uint_t res = stream_p->ioctl(MP_OBJ_FROM_PTR(stream), MP_STREAM_SEEK, (mp_uint_t)(uintptr_t)&seek_s, &native_errno);
+    mp_uint_t res = stream_p->ioctl(MP_OBJ_FROM_PTR(stream), MP_STREAM_SEEK, (mp_uint_t)(uintptr_t)&seek_s, &ERRNO);
     if (res == MP_STREAM_ERROR) {
         return -1;
     }
@@ -91,7 +97,7 @@ off_t mp_stream_posix_lseek(void *stream, off_t offset, int whence) {
 int mp_stream_posix_fsync(void *stream) {
     mp_obj_base_t* o = stream;
     const mp_stream_p_t *stream_p = MP_OBJ_TYPE_GET_SLOT(o->type, protocol);
-    mp_uint_t res = stream_p->ioctl(MP_OBJ_FROM_PTR(stream), MP_STREAM_FLUSH, 0, &native_errno);
+    mp_uint_t res = stream_p->ioctl(MP_OBJ_FROM_PTR(stream), MP_STREAM_FLUSH, 0, &ERRNO);
     if (res == MP_STREAM_ERROR) {
         return -1;
     }
@@ -133,7 +139,7 @@ static mp_obj_t btree_open(size_t n_args, const mp_obj_t *pos_args, mp_map_t *kw
     openinfo.minkeypage = args[ARG_minkeypage].u_int;
     DB *db = __bt_open(MP_OBJ_TO_PTR(pos_args[0]), &btree_stream_fvtable, &openinfo, 0);
     if (db == NULL) {
-        mp_raise_OSError(native_errno);
+        mp_raise_OSError(ERRNO);
     }
 
     return MP_OBJ_FROM_PTR(btree_new(db, pos_args[0]));

@dpgeorge
Copy link
Member

I guess you've tested building modules using newlib outside the CI?

I'm using the riscv32-embecosm-ubuntu2204-gcc13.2.0 toolchain (used for the Pico 2) and I guess that uses newlib? btree builds and runs for me using that toolchain so I guess it must be newlib.

if that's OK with you I'll let the CI skip the btree natmod when testing RV32 natmod builds

Yes that's totally fine.

I can try to come up with a PR later for MicroPython's berkeley-db fork to work around this if possible.

I'm not sure if that's worth the time/effort. We should instead improve mpy_ld.py so that it can find the errno variable (but that's not a priority right now).


My main aim here is to just get some form of CI working that tests the functionality of this PR. Compiling and running just a few of the examples/natmod/ modules is enough for that.

@agatti
Copy link
Contributor Author

agatti commented Dec 21, 2024

I'm using the riscv32-embecosm-ubuntu2204-gcc13.2.0 toolchain (used for the Pico 2) and I guess that uses newlib?

So far the only picolibc-only RV32/RV64 toolchain I've found is the one shipping with Ubuntu 22.04 (and possibly 24.04, haven't yet looked at it) - which made adding QEMU RISC-V support so much of a prolonged effort...

I'm not sure if that's worth the time/effort. We should instead improve mpy_ld.py so that it can find the errno variable (but that's not a priority right now).

The thing is, the CI toolchain uses a picolibc version that's explicitly set up with global errno variable support disabled. That is, errno is instead a thread-local variable and that is thus compiled as a static uninitialised variable.

I've had a go at adding support for static initialised data to mpy_ld.py and it sort of worked except for ArmV6 bringing absolute relocations into the mix (I still have no idea why they're there). I believe this is the same issue but for a different segment (.DATA vs .BSS) - if GCC for Arm isn't sneaking in absolute relocations for this kind of variables then I may probably be able to add a patch for that, otherwise things may require a more coordinated effort I guess.

Still, thanks for looking at this PR :)

@dpgeorge
Copy link
Member

OK, sounds like there are a few things to improve here. But let's just take it a step at a time 😄 That makes it much easier to review and get things merged.

Adding a bit of CI testing to this PR, it should then be good to go.

@agatti agatti force-pushed the natmod-riscv branch 2 times, most recently from fe15324 to 6bb5eca Compare December 21, 2024 23:51
@agatti
Copy link
Contributor Author

agatti commented Dec 22, 2024

Now the CI should work for RV32. In theory this could work for Arm too but my Arm dev VM needs to be brought back up and I'm on a metered connection right now, so I limited my changes to make sure RV32 works.

Regarding the mpy_ld.py changes, I guess TLS support for RV32 can be added when doing the per-arch processing split. It also requires having init jump code sequences that may change their length after the first evaluation (and changes to QEMU's RV32 platform code), so it's a whole new PR indeed.

Glad all of this works now!

agatti and others added 4 commits December 23, 2024 10:02
This commit adds support for RV32IMC native modules, as in embedding native
code into a self-contained MPY module and and make its exported functions
available to the MicroPython environment.

Signed-off-by: Alessandro Gatti <a.gatti@frob.it>
This fixes compilation of the `re` natmod example when built with Picolibc
in the CI environment.  Ubuntu 22.04's combination of its bare metal RISC-V
toolchain and its version of Picolibc makes the `alloca` symbol more
elusive than it should be.

This commit makes the `re` natmod try harder to get an `alloca`
implementation.

Signed-off-by: Alessandro Gatti <a.gatti@frob.it>
This commit adds the compiled native module file to the list of files to
remove when `make clean` is issued in a native module source directory.

Signed-off-by: Alessandro Gatti <a.gatti@frob.it>
This commit brings the natmod tests in the CI build process for the RV32
platform.  Not all example natmods are tested at the moment, as
`features` requires soft-float support, and `btree` needs thread-local
storage support in `mpy_ld.py` when built with the CI's toolchain.

Co-authored-by: Damien George <damien@micropython.org>
Signed-off-by: Alessandro Gatti <a.gatti@frob.it>
@dpgeorge dpgeorge merged commit 7ca6e5e into micropython:master Dec 22, 2024
66 of 67 checks passed
@dpgeorge
Copy link
Member

Thanks for updating! Now merged.

@agatti agatti deleted the natmod-riscv branch December 23, 2024 09:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
py-core Relates to py/ directory in source
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants