alif/alif.mk: Add MPY_CROSS_FLAGS setting. #17908

dpgeorge · 2025-08-13T00:46:36Z

Summary

The HP and HE CPUs have double-precision hardware floating point, so can use the armv7emdp architecture.

This allows frozen code to use native/viper/asm_thumb decorators.

Fixes issue #17896.

Testing

Tested on OPENMV_AE3, putting native/viper/asm_thumb code in a frozen module. It works.

kwagyeman · 2025-08-13T01:09:17Z

@dpgeorge - Awesome!

iabdalkader · 2025-08-13T13:15:46Z

ports/alif/alif.mk

@@ -22,6 +22,8 @@ include $(TOP)/extmod/extmod.mk
 ################################################################################
 # Project specific settings and compiler/linker flags

+MPY_CROSS_FLAGS += -march=armv7emdp


Should this be set for float implementation or is it unrelated?

Suggested change

MPY_CROSS_FLAGS += -march=armv7emdp

ifeq ($(MICROPY_FLOAT_IMPL),float)

MPY_CROSS_FLAGS += -march=armv7emdp

else

MPY_CROSS_FLAGS += -march=armv7emsp # ?

endif

Note that we use we use MICROPY_FLOAT_IMPL=float when building. Would that affect the loaded frozen code?

This setting is unrelated to the MICROPY_FLOAT_IMPL setting.

This setting is about the hardware capabilities, not any API/ABI (at least for frozen code, which is what matters here). It means you can use floating-point assembly instructions in @micropython.asm_thumb functions.

Note that we use we use MICROPY_FLOAT_IMPL=float when building. Would that affect the loaded frozen code?

No, it won't matter.

(I'm curious why you don't use double though? Float objects still the same amount of heap as double (16 bytes) and you'd get more precision with doubles.)

We use the object representation that has floats as 4-bytes.

Matters a lot of large arrays. You'd run out of memory quick.

You can still make 32-bit float arrays (in C and Python) while still using double for MicroPython's float object.

It means you can use floating-point assembly instructions in @micropython.asm_thumb functions.

But wouldn't armv7emdp emit double-precision floating point instructions?

I'm curious why you don't use double though?

Historically we only had single-precision FPUs so we hard-coded float in many places (not mp_float_t) switching would break a lot of code, but that's easily fixable. However, I still don't think we should use double because it uses more bandwidth and more cycles. I couldn't easily find a reference for this, but I think double-precision has to be slower, at the very least it's wider so more memory bandwidth on loads/stores. The CM55 has a performance monitoring unit (PMU), it would be interesting to use it to benchmark float vs double.

You can still make 32-bit float arrays (in C and Python) while still using double for MicroPython's float object.

You mean by casting back and forth? We don't have control over all modules, for example ulab arrays would double in size, and its object files would use double-precision instructions. Also, I think the same goes for MicroPython for any float operations performed in Python.

EDIT: Not sure if I'm using it correctly:

#include "pmu_armv8.h" uint32_t test_float(size_t iterations) { ARM_PMU_CYCCNT_Reset(); volatile float x = 1.234f, y = 5.678f, z = 0.0f; for (size_t i = 0; i < iterations; i++) { z += x * y; // FP32 multiply-add } return ARM_PMU_Get_CCNTR(); } uint32_t test_double(size_t iterations) { ARM_PMU_CYCCNT_Reset(); volatile double x = 1.234, y = 5.678, z = 0.0; for (size_t i = 0; i < iterations; i++) { z += x * y; // FP64 multiply-add } return ARM_PMU_Get_CCNTR(); } int benchmark_float(void) { const size_t N = 1000000; __disable_irq(); ARM_PMU_Enable(); ARM_PMU_CNTR_Enable(PMU_CNTENSET_CCNTR_ENABLE_Msk); uint32_t flt_cycles = test_float(N); uint32_t dbl_cycles = test_double(N); __enable_irq(); printf("Float cycles: %lu\n", (unsigned long)flt_cycles); printf("Double cycles: %lu\n", (unsigned long)dbl_cycles); while (1); }

Output with -Og:

Float cycles: 1000358 Double cycles: 4300843

Output with -O2:

Float cycles: 800347 Double cycles: 3000302

Yeah I think I ran the test the first time with less iteration. I changed the test a bit so it doesn't get optimized out, and ran it again with 1_000_000 iterations and -O2:

With volatile:

Float cycles: 9000756 Double cycles: 31001274

Without volatile:

Float cycles: 6000008 Double cycles: 18000003

In either case you see it takes about 3-4x more cycles.

In either case you see it takes about 3-4x more cycles.

OK, that's good information to have.

But my point still stands: for Python code this won't make much of a difference. It will however make a big difference for things like ulab and if you have custom C code that is floating-point heavy and uses mp_float_t instead of explicitly using float.

It will however make a big difference for things like ulab and if you have custom C code that is floating-point heavy and uses mp_float_t instead of explicitly using float.

ulab uses mp_float_t, while our code still uses hard-coded float. However, it still won't build with IMPL=double because double will need to be cast back to float. I'd rather refactor the code to use mp_float_t instead of adding the casts, this way if a port/board uses double it builds. For now, all we need is single-precision.

our code still uses hard-coded float. However, it still won't build with IMPL=double because double will need to be cast back to float. I'd rather refactor the code to use mp_float_t instead of adding the casts, this way if a port/board uses double it builds

I'd actually suggest to stick with hard-coded float everywhere, because that's what you've designed your algorithm about and optimised it with. Then, use the following provided functions to interoperate with MicroPython objects: mp_obj_get_float_to_f, mp_obj_get_float_to_d, mp_obj_new_float_from_f, mp_obj_new_float_from_d. They adjust themselved based on single/double settings.

(You could also define your own float type, eg omv_float_t, and use that in all your code, at least then it's configurable.)

interoperate with MicroPython objects: mp_obj_get_float_to_f, mp_obj_get_float_to_d, mp_obj_new_float_from_f

Didn't know about those, thanks! Yes, I'll do that instead.

The HP and HE CPUs have double-precision hardware floating point, so can use the armv7emdp architecture. This allows frozen code to use native/viper/asm_thumb decorators. Fixes issue micropython#17896. Signed-off-by: Damien George <damien@micropython.org>

dpgeorge added the port-alif label Aug 13, 2025

dpgeorge mentioned this pull request Aug 13, 2025

@micropython.viper doesn't work on Alif. #17896

Closed

iabdalkader reviewed Aug 13, 2025

View reviewed changes

iabdalkader approved these changes Aug 14, 2025

View reviewed changes

dpgeorge force-pushed the alif-add-mpy-cross-flags branch from 3076d19 to b7cfafc Compare August 15, 2025 02:45

dpgeorge merged commit b7cfafc into micropython:master Aug 15, 2025
7 checks passed

dpgeorge deleted the alif-add-mpy-cross-flags branch August 15, 2025 02:46

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

alif/alif.mk: Add MPY_CROSS_FLAGS setting. #17908

alif/alif.mk: Add MPY_CROSS_FLAGS setting. #17908

Uh oh!

dpgeorge commented Aug 13, 2025

Uh oh!

kwagyeman commented Aug 13, 2025

Uh oh!

iabdalkader Aug 13, 2025 •

edited

Loading

Uh oh!

dpgeorge Aug 14, 2025

Uh oh!

kwagyeman Aug 14, 2025 •

edited

Loading

Uh oh!

dpgeorge Aug 14, 2025

Uh oh!

iabdalkader Aug 14, 2025 •

edited

Loading

Uh oh!

iabdalkader Aug 14, 2025

Uh oh!

dpgeorge Aug 14, 2025

Uh oh!

iabdalkader Aug 14, 2025

Uh oh!

dpgeorge Aug 14, 2025

Uh oh!

iabdalkader Aug 14, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

alif/alif.mk: Add MPY_CROSS_FLAGS setting. #17908

alif/alif.mk: Add MPY_CROSS_FLAGS setting. #17908

Uh oh!

Conversation

dpgeorge commented Aug 13, 2025

Summary

Testing

Uh oh!

kwagyeman commented Aug 13, 2025

Uh oh!

iabdalkader Aug 13, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

dpgeorge Aug 14, 2025

Choose a reason for hiding this comment

Uh oh!

kwagyeman Aug 14, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

dpgeorge Aug 14, 2025

Choose a reason for hiding this comment

Uh oh!

iabdalkader Aug 14, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

iabdalkader Aug 14, 2025

Choose a reason for hiding this comment

Uh oh!

dpgeorge Aug 14, 2025

Choose a reason for hiding this comment

Uh oh!

iabdalkader Aug 14, 2025

Choose a reason for hiding this comment

Uh oh!

dpgeorge Aug 14, 2025

Choose a reason for hiding this comment

Uh oh!

iabdalkader Aug 14, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

iabdalkader Aug 13, 2025 •

edited

Loading

kwagyeman Aug 14, 2025 •

edited

Loading

iabdalkader Aug 14, 2025 •

edited

Loading