Skip to content

ports/stm32: Add memory map with attributes for different MCU families. #16644

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
iabdalkader opened this issue Jan 25, 2025 · 13 comments
Open
Labels
enhancement Feature requests, new feature implementations port-stm32

Comments

@iabdalkader
Copy link
Contributor

iabdalkader commented Jan 25, 2025

Description

There's a general DMA/Cache-management issue with the MicroPython's stm32 port; most drivers make assumptions about the memory (cachable or not DAM-acessible or not) which probably works fine for all (most?) of the default linker scripts, which happen to place everything in one memory region, but when buffers are moved around, drivers starts failing.

A few examples to demonstrate the issue (there are many more in sdcard, I2C etc..):

  1. DAC should flush buffer before DMA call if memory is cacheable and fail if it's not DMA-accesible:

    dma_nohal_deinit(dma_descr);

  2. SPI regular transfer (fallback) path should be taken if the memory is not accessible by DMA:

    if (len == 1 || query_irq() == IRQ_STATE_DISABLED) {

    For example: if (len == 1 || query_irq() == IRQ_STATE_DISABLED || !dma_is_accessible(src)) {

  3. SDCARD check here is over-simplistic: On H7 SDMMC2's DMA has access to all memory while SDMMC1's DMA can only access D1 memory:

    if (query_irq() == IRQ_STATE_ENABLED) {

  4. Something like this issue ports/stm32: Make ETH DMA buffer attributes configurable. #16633 could've been avoided, or at least easily detected, if there was a runtime assert that memory is "dmable".

  5. All of those drivers, and others, should Not do any cache-management if memory is Not cacheable.

The problem will only get worse as we add more MCUs/ports with more complex memory and will not scale well in the future. Note I've been maintaining patches to all of those drivers, to fix these issues, but it starting to get out of hand.

Proposed fix:

To fix this issue, I propose we implement some sort of memory map per MCU family, override-able per-board (if needed), with attributes per region: cacheable, DAM-accessible, TCM, external etc.. And a new generic API to query those attributes. It would be great if this mechanism is generic somehow, so it can be reused by future ports with more complex memory.

Other aspects

Having this memory map/attributes query could also help in other areas. For example, it could help gc allocate memory based on attributes (gc_alloc(size, MP_MEMORY_REGION_ATTR_TCM/EXTERNAL). Then gc could loop through gc blocks and try to fulfill such request with something like: if mp_memory_region_get_attributes(gc_block_start) & attr...

Toy API:

typdef enum {
    MP_MEMORY_REGION_ATTR_CACHEABLE = (1 << 0),
    MP_MEMORY_REGION_ATTR_TCM       = (1 << 1),
    MP_MEMORY_REGION_ATTR_SRAM      = (1 << 2),
    MP_MEMORY_REGION_ATTR_EXTERNAL  = (1 << 3),
    MP_MEMORY_REGION_ATTR_DMA       = (1 << 4),
    // Use to add port-specific/custom attributes.
    MP_MEMORY_REGION_ATTR_LAST      = (1 << 15)
} mp_memory_region_attr_t;

typedef struct {
    uintptr_t base;
    uintptr_t limit;
    uint32_t attrs;
} mp_memory_region_t;

// Or add regions dynamically maybe?
uint32_t mp_memory_region_add(uintptr_t base, size_t limit);
uint32_t mp_memory_region_get_attributes(uintptr_t ptr);

// Some helper macros
#define mp_memory_region_is_cacheable(ptr) \
    mp_memory_region_get_attributes(ptr) & MP_MEMORY_REGION_ATTR_CACHEABLE
#define mp_memory_region_is_dmable(ptr) \
    mp_memory_region_get_attributes(ptr) & MP_MEMORY_REGION_ATTR_DMA

extern const mp_memory_region_t mp_memory_regions[];

// ======================
// MCU/board/port memory map with port-specific/custom attributes.

// Real-life example: H7 SDMMC1 DMA can only access D1 memory/devices.
// `sdcard.c` should check if `SDMMC1` && memory is Not D1, then allocate
// a D1 buffer possibly via: `gc_alloc(size, MP_MEMORY_REGION_ATTR_D1)`,
// or use a pre-allocated static D1 buffer.

#define MP_MEMORY_REGION_ATTR_D1 (MP_MEMORY_REGION_ATTR_LAST << 1)

const mp_memory_region_t mp_memory_regions[] = {
    { 0x90000000, 0xD0000000, MP_MEMORY_REGION_ATTR_DMA |
        MP_MEMORY_REGION_ATTR_CACHEABLE | MP_MEMORY_REGION_ATTR_D1 },

    { 0x24000000, 0x24080000, MP_MEMORY_REGION_ATTR_DMA |
        MP_MEMORY_REGION_ATTR_CACHEABLE | MP_MEMORY_REGION_ATTR_D1 },

    { 0, 0, 0 } // end
};

Implementation

I'd be very happy to work on this, but I need feedback on the API first before.

Code of Conduct

Yes, I agree

@iabdalkader iabdalkader added the enhancement Feature requests, new feature implementations label Jan 25, 2025
@kwagyeman
Copy link
Contributor

This is cool and would be very useful.

@iabdalkader
Copy link
Contributor Author

@dpgeorge Any thoughts?

@Bortania
Copy link

Bortania commented Mar 1, 2025

@iabdalkader can you point out on the "patches to all of those driver" that you mentioned in your post.
Here we have an issue with DAC buffer on nucleo F7 board. Maybe the reason is memory cache.

@iabdalkader
Copy link
Contributor Author

can you point out on the "patches to all of those driver" that you mentioned in your post.

Those patches are not applicable for all cases, otherwise I would have sent them as fixes. For example DAC uses DMA, in timed mode, so the memory needs to be cleaned/flushed before DMA starts. But the memory could also be TCM, in this case it should Not be flushed.
The point is, the memory view in MicroPython is overly simplistic and assumes that any provided memory will just work. Most of the chips/boards just happen to use memory that works in most cases, but once you start customizing the linker script and moving buffers around, drivers start failing.

@Bortania
Copy link

Bortania commented Mar 1, 2025

Those patches are not applicable for all cases, otherwise I would have sent them as fixes. For example DAC uses DMA, in timed mode, so the memory needs to be cleaned/flushed before DMA starts. But the memory could also be TCM, in this case it should Not be flushed. The point is, the memory view in MicroPython is overly simplistic and assumes that any provided memory will just work. Most of the chips/boards just happen to use memory that works in most cases, but once you start customizing the linker script and moving buffers around, drivers start failing.

Exactly, here we have the issue with dac.write_timed(dacBuf, dacTimer, mode=dac.CIRCULAR) for the Nucleo STM32F767ZI board.
Is this issue solvable in micropython?

@iabdalkader
Copy link
Contributor Author

Is this issue solvable in micropython?

Yes, it is. This patch may fix it for you, but it's not a good long-term solution. The proper fix is this feature proposed here.

diff --git a/ports/stm32/dac.c b/ports/stm32/dac.c
index 54d5acc6c..dcbd93516 100644
--- a/ports/stm32/dac.c
+++ b/ports/stm32/dac.c
@@ -198,6 +198,10 @@ static void dac_start_dma(uint32_t dac_channel, const dma_descr_t *dma_descr, ui
     #endif
     }
 
+    #ifdef __DCACHE_PRESENT
+    MP_HAL_CLEAN_DCACHE(buf, len);
+    #endif
+
     dma_nohal_deinit(dma_descr);
     dma_nohal_init(dma_descr, DMA_MEMORY_TO_PERIPH | dma_mode | dma_align);
     dma_nohal_start(dma_descr, (uint32_t)buf, base + dac_align, len);

@Bortania
Copy link

Bortania commented Mar 3, 2025

Is this issue solvable in micropython?

Yes, it is. This patch may fix it for you, but it's not a good long-term solution. The proper fix is this feature proposed here.

diff --git a/ports/stm32/dac.c b/ports/stm32/dac.c
index 54d5acc6c..dcbd93516 100644
--- a/ports/stm32/dac.c
+++ b/ports/stm32/dac.c
@@ -198,6 +198,10 @@ static void dac_start_dma(uint32_t dac_channel, const dma_descr_t *dma_descr, ui
#endif
}

  • #ifdef __DCACHE_PRESENT
  • MP_HAL_CLEAN_DCACHE(buf, len);
  • #endif
  • dma_nohal_deinit(dma_descr);
    dma_nohal_init(dma_descr, DMA_MEMORY_TO_PERIPH | dma_mode | dma_align);
    dma_nohal_start(dma_descr, (uint32_t)buf, base + dac_align, len);

@iabdalkader thank you for reply.
Do you know if issue with cache can influence SPI? I see that there was commit to fix for cache and SPI in MicroPython source code. But when I establis communicate of DUT with nucleo F7 board through SPI, SPI reading is 0x0.
Did you have some similar experience with SPI?

@iabdalkader
Copy link
Contributor Author

Do you know if issue with cache can influence SPI?

Every driver here can be affected by cache or other memory related issues. That's what I'm trying to fix with this new feature.

@Gadgetoid
Copy link
Contributor

I think the crucial aspect of this, and perhaps the most glossed over portion, is the proposal for the ability to alloc memory from a pool with the given attributes.

Since an SRAM/PSRAM hybrid (we're working toward this on RP2) is, from the perspective of userspace, a total lottery (caveat that if you alloc early you'll probably be in SRAM if there's enough and if it's up-front) then there's nothing the end-user can really do if a driver requires DMA-accessible RAM, fast RAM, page-aligned RAM etc but expects a bytearray created and supplied by the user.

Can we get away with adding new syntax for this?

Allocation decorators? 😆

@memory.SRAM(align=1024)
buffer = bytearray(1024)

@memory.any(align=1024, dma=True)
buffer = bytearray(1024)

Or a context manager?

with memory.any(align=1024, dma=True):
    buffer1 = bytearray(1024)
    buffer2 = bytearray(1024)

@iabdalkader
Copy link
Contributor Author

I haven't thought about exposing this functionality to Python code, I was mainly concerned with C code/drivers, but I can see how this might be very useful for Python drivers, or even regular code that requires a small amount of fast memory. As for syntax, I personally prefer the decorators over the context, but I guess we could have both. Either way, the memory map support has to exist first in C.

@andrewleech
Copy link
Contributor

andrewleech commented Mar 29, 2025

I've thought about context managers as a clean syntax for times when python needs ram from a particular source, or aligned etc, that has my vote!
In python syntax, are decorators allowed on anything other than a function declaration?

There's slight complications in that it would probably need to be thread aware to ensure the hint passed to the gc by the context manager only affects the current thread, but otherwise should be fairly straightforward.

@Gadgetoid
Copy link
Contributor

In python syntax, are decorators allowed on anything other than a function declaration?

Currently no, the conceit is that it could be so on MicroPython but then we never know if that might cause some future turmoil if CPython allows them on statements. A context manager is probably the safest bet.

Either way, the memory map support has to exist first in C.

And, indeed, step one is to mark up RAM with the appropriate attributes!

@agatti
Copy link
Contributor

agatti commented Mar 30, 2025

Passing buffers to ESP32S3's SIMD unit could also benefit from this - my DSP code was spending an awful amount of time to handle unaligned loads and stores, and it was faster (in every sense of the word) to move buffer management to the C side so I could have a properly aligned buffer to pass to SIMD code.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement Feature requests, new feature implementations port-stm32
Projects
None yet
Development

No branches or pull requests

7 participants