Skip to content

py/gc: Support multiple heaps (version 2). #3580

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 1 commit into from

Conversation

aykevl
Copy link
Contributor

@aykevl aykevl commented Jan 24, 2018

Enable the addition of heap space at runtime. Advantages:

  • The ESP32 has a fragmented heap so to use all of it the heap must be split.
  • Support a dynamic heap while running on an OS, adding more heap when necessary.

Rewritten PR of #3533. The biggest difference is that multiple heaps support can now be disabled (and is disabled by default) to reduce code size. I hope it is also an more stable as I did the changes after looking how the memory manager actually works.

With this code, I managed to extend the MicroPython heap to ~200kB on the ESP32:

MicroPython v1.9.3-241-gfbc575cd1-dirty on 2018-01-24; ESP32 module with ESP32
Type "help()" for more information.
>>> import micropython
>>> micropython.mem_info(True)
stack: 752 out of 15360
GC: total: 206976, used: 5200, free: 201776
 No. of 1-blocks: 30, 2-blocks: 7, max blk sz: 264, max free sz: 6936
GC memory layout; from 3ffb30a0:
00000: h=AhhBMh=DhhhDBBBBAhh===h===Ahh==h==============================
00400: ================================================================
00800: ================================================================
00c00: ================================================================
01000: =========================================h==Bh=ShShhThAh=h=Bh==B
01400: ..h.h=......h=..................................................
       (87 lines all free)
17400: ................................................
GC memory layout; from 3ffe4de0:
       (108 lines all free)
1b000: ........................
>>> 

This is necessary because by default, the esp32 does not have a contiguous memory area:

I (343) cpu_start: Pro cpu up.
I (344) cpu_start: Single core mode
I (344) heap_init: Initializing. RAM available for dynamic allocation:
I (347) heap_init: At 3FFAE6E0 len 00001920 (6 KiB): DRAM
I (353) heap_init: At 3FFDCE60 len 000031A0 (12 KiB): DRAM
I (360) heap_init: At 3FFE0440 len 00003BC0 (14 KiB): D/IRAM
I (366) heap_init: At 3FFE4350 len 0001BCB0 (111 KiB): D/IRAM
I (372) heap_init: At 4008FC7C len 00010384 (64 KiB): IRAM
I (379) cpu_start: Pro cpu start user code
I (172) cpu_start: Starting scheduler on PRO CPU.

I have tested using tests/run-tests and haven't seen a regression (both on unix and esp32).

Image size changes:

port change
unix +14
bare-arm 0 (unchanged)
minimal +20
stm32 -116
esp8266 +136
esp32 +80 (without patch), +432 (with patch adding multiheap support)

I have tried to keep the image sizes unchanged, but some changed anyway for some reason. Maybe the optimizer is less effective on some ports than other ports. Most of these ports (with the exception of bare-arm) jumped around a lot during development, so I'm suspecting it's mostly just an inconsistent optimizer.

Enable the addition of heap space at runtime. Advantages:
  - The ESP32 has a fragmented heap so to use all of it the heap must be
    split.
  - Support a dynamic heap while running on an OS, adding more heap when
    necessary.
@aykevl
Copy link
Contributor Author

aykevl commented Jan 24, 2018

I see that even this introduces some extra code when multiheap is disabled (in the minimal port with CROSS=1). I'll look further into how this can be avoided.

@dpgeorge
Copy link
Member

dpgeorge commented Feb 1, 2018

Really nice work, thank you! I did have this feature on my to-do list but it was low priority so I didn't make an progress on it.

From a brief look it looks ok. It would be nice if it didn't mak such big changes to gc.c but that's probably unavoidable given the feature that it's adding.

I have tried to keep the image sizes unchanged, but some changed anyway for some reason.

It could be because some #define are changed to static inline functions, which can make the compiler emit quite different code. Maybe try going back to macros?

@andynd
Copy link

andynd commented Feb 4, 2018

would you mind adding a snippet explaining how you got the esp32 to 200kb of heap?

edit: added snippet

From 99e9f5ece72858f0ebfb655bb198ec3ac1f9be96 Mon Sep 17 00:00:00 2001
From: Andreas Valder <nd@serioese.gmbh>
Date: Sun, 4 Feb 2018 15:53:56 +0100
Subject: [PATCH] feat(memory): use multiple heaps for micropython to handle
 the esp32 fragmented memory

---
 ports/esp32/main.c         | 10 ++++++++++
 ports/esp32/mpconfigport.h |  1 +
 2 files changed, 11 insertions(+)

diff --git a/ports/esp32/main.c b/ports/esp32/main.c
index 93423e1c..971b6a99 100644
--- a/ports/esp32/main.c
+++ b/ports/esp32/main.c
@@ -60,6 +60,9 @@
 #   define MP_TASK_HEAP_SIZE       (96 * 1024)
 #endif

+#define HEAP_CHUNK_SIZE                (8*1024)
+#define HEAP_CHUNK_COUNT       (8)
+
 STATIC StaticTask_t mp_task_tcb;
 STATIC StackType_t mp_task_stack[MP_TASK_STACK_LEN] __attribute__((aligned (8)));
 STATIC uint8_t mp_task_heap[MP_TASK_HEAP_SIZE];
@@ -81,6 +84,13 @@ soft_reset:
     mp_stack_set_top((void *)sp);
     mp_stack_set_limit(MP_TASK_STACK_SIZE - 1024);
     gc_init(mp_task_heap, mp_task_heap + sizeof(mp_task_heap));
+    void* p;
+    for (int i =0; i< HEAP_CHUNK_COUNT; i++) {
+       p = malloc(HEAP_CHUNK_SIZE);
+       if (p == NULL)
+           break;
+        gc_add(p, p+HEAP_CHUNK_SIZE);
+    }
     mp_init();
     mp_obj_list_init(mp_sys_path, 0);
     mp_obj_list_append(mp_sys_path, MP_OBJ_NEW_QSTR(MP_QSTR_));
diff --git a/ports/esp32/mpconfigport.h b/ports/esp32/mpconfigport.h
index 4b4e08df..6e4d8842 100644
--- a/ports/esp32/mpconfigport.h
+++ b/ports/esp32/mpconfigport.h
@@ -54,6 +54,7 @@
 #define MICROPY_SCHEDULER_DEPTH             (8)
 #define MICROPY_VFS                         (1)
 #define MICROPY_VFS_FAT                     (1)
+#define MICROPY_GC_MULTIHEAP                (1)

 // control over Python builtins
 #define MICROPY_PY_FUNCTION_ATTRS           (1)
--
2.16.1

@aykevl
Copy link
Contributor Author

aykevl commented Feb 4, 2018

would you mind adding a snippet explaining how you got the esp32 to 200kb of heap?

edit: added snippet

A better way would be this patch, which only adds one extra heap area instead of many like in your example. Every new heap area added slows down every heap operation in O(n) at the moment, so you'll want as few as possible.
Additionally, your patch doesn't support soft resets (Ctrl+D): the ESP32 SDK heap isn't reinitialized on a soft reset so no memory can be allocated after a soft reset and all new malloc() calls will fail.

@adritium
Copy link

@dpgeorge

Can an allocation span multiple heap blocks?

@dhylands
Copy link
Contributor

Can an allocation span multiple heap blocks?

Heap blocks are fixed in size (typically 16 bytes) so any allocation which exceeds 16 bytes will span multiple blocks.

When you look at a heap dump (i.e. micropython.mem_info(1)) then you'll see something like this:

GC memory layout; from 20003000:
00000: h=hhhhhBh==hh=Bhh==hh==h=h=BhB..h...h=....h=....................
00400: .............h=======h==========================================
00800: ===================================.............................
       (97 lines all free)

Each character corresponds to a heap block. The ones with a letter followed by ='s are single objects that span multiple blocks. For example (immediately after the above):

>>> x1 = bytearray(4000)
>>> mp.mem_info(1)
stack: 508 out of 15360
GC: total: 102400, used: 6144, free: 96256
 No. of 1-blocks: 20, 2-blocks: 8, max blk sz: 250, max free sz: 5987
GC memory layout; from 20003000:
00000: h=hhhhhBh==hhhBhh==hh==h=h=BhBh=hBABh=hh==h=....h=......h=......
00400: .............h=======h==========================================
00800: ===================================h============================
00c00: ================================================================
01000: ================================================================
01400: ================================================================
01800: =============================...................................
       (93 lines all free)

You can see the 4000 block allocation crossing many heap blocks.

Any single allocation is contiguous, so it needs contiguous heap blocks.

@adritium
Copy link

adritium commented May 29, 2018

Not what I meant (but thanks for that detailed explanation).

This new multi-heap increases the heap size. If the current heap has X bytes left but I want to allocate Y > X bytes, it would be nice if I could take X bytes from current heap and only allocate (Y-X) bytes from adjacent heap.

This would be really useful if Y is really big and X is almost as big as Y.

Basically I’m asking if you could treat the individual heaps as one contiguous region for the purpose of allocation.

@dhylands
Copy link
Contributor

Since individual allocations need to be contiguous, it wouldn't be possible to have a single object span multiple heap areas.

@peterhinch
Copy link
Contributor

In particular allocations for objects supporting the Python buffer protocol.

@tjclement
Copy link

I was looking into implementing exactly this, and was super happy to find it has already been done!

The discussion seems to have slumbered a bit, but this feature is tremendously useful on microcontrollers with non-contiguous memory regions. Can this change be mainlined? It would be an amazing addition.

@andynd
Copy link

andynd commented Feb 22, 2020

Any plans to merge this or something alike? Would make the available memory on something like the esp32 much larger.

@dpgeorge dpgeorge added the py-core Relates to py/ directory in source label Feb 24, 2020
@jimmo
Copy link
Member

jimmo commented Feb 24, 2020

See #5543 (comment) for some more context, but at the moment the areas of memory not used for the MicroPython heap are still used by the IDF for things like SSL buffers. So in order to make this change work, we'd also have to make the IDF and libraries (e.g. mbedtls) use the MicroPython heap instead of malloc directly. Which is starting to get a bit messy.

@tjclement
Copy link

On our uPy fork, I can add at least 14kB from the DRAM block at 0x3FFE0440 without any issues. (The TLS outbound buffer was increased from 4 to 8k as suggested in #5543, to fix TLS with large keys.)

We get this block specifically by first taking the 111kB free block at 0x3FFE4350, then requesting the largest free block (which allocates 0x3FFE0440), and finally freeing the 111kB again for WiFi.

This results in a ~17.5% larger uPy heap, which is helpful for our purpose.

    uint8_t *large_dram = NULL;
    size_t large_heap_size = NULL;
	for (large_heap_size = 520 * 1024; large_heap_size >= 8 * 1024; large_heap_size -= 1024) {
	    // Find the maximally allocatable space
        large_dram = malloc(large_heap_size);
		if (large_dram != NULL) {
            break;
		}
	}

    // Now take the 14KB block at 0x3FFE0440
    uint8_t *dram_14k = NULL;
    for (size_t heap_size = 16 * 1024; heap_size >= 8 * 1024; heap_size -= 1024) {
        // Find the maximally allocatable space
        dram_14k = malloc(heap_size);
        if (dram_14k != NULL) {
            gc_add(dram_14k, dram_14k + heap_size);
            break;
        }
    }

    // There's also ~9KB remaining at 0x3FFBB190, but taking it
    // seems to break TLS on WiFi.

    // Finally we free the reserved 111KB block, leaving it in full for WiFi. (needed)
    free(large_dram);```

@tjclement
Copy link

FWIW, if you're not planning on using WiFi or Bluetooth, the same greedy allocation strategy as above can be used to allocate all free DRAM blocks, which nets you a ~200kB uPy heap.

@formigarafa
Copy link

@tjclement,
Do you mind tell me where should I put this code you suggested above?

@tjclement
Copy link

@formigarafa of course, you can see here where I added this in our codebase: badgeteam/ESP32-platform-firmware@3ed58b4#diff-325ac9f54774c3d5d2604108c9759715

@aykevl
Copy link
Contributor Author

aykevl commented Sep 22, 2020

@tjclement have you seen e600810? It may be more efficient to use that instead as it only works with a single block (and thus the GC can work faster).

@rknegjens
Copy link
Contributor

I've rebased and optimised this PR in #8526.

@dpgeorge
Copy link
Member

Closing because development continues in #8526.

@dpgeorge dpgeorge closed this Apr 14, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
py-core Relates to py/ directory in source
Projects
None yet
Development

Successfully merging this pull request may close these issues.

10 participants