Skip to content

ESP32 without psram. Socket and Memory Split or ... . #14421

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
2 tasks done
straga opened this issue May 3, 2024 · 30 comments
Open
2 tasks done

ESP32 without psram. Socket and Memory Split or ... . #14421

straga opened this issue May 3, 2024 · 30 comments

Comments

@straga
Copy link

straga commented May 3, 2024

Checks

  • I agree to follow the MicroPython Code of Conduct to ensure a safe and respectful space for everyone.

  • I've searched for existing issues matching this bug, and didn't find any.

Port, board and/or hardware

ESP32 withou PSRAM / ESP32-C3

MicroPython version

Micropython 23.0-preview.346.g64f28dc1e on 2024-05-03

Reproduction

>>> gc.collect()
>>> micropython.mem_info()
stack: 704 out of 15360
GC: total: 112000, used: 71008, free: 40992, max new split: 21504
 No. of 1-blocks: 979, 2-blocks: 263, max blk sz: 142, max free sz: 133
>>> s1=socket.socket()
>>>
>>> gc.collect()
>>> micropython.mem_info()
stack: 704 out of 15360
GC: total: 112000, used: 71040, free: 40960, max new split: 20480
 No. of 1-blocks: 977, 2-blocks: 265, max blk sz: 142, max free sz: 133
>>> s2=socket.socket()
>>>
>>> gc.collect()
>>> micropython.mem_info()
stack: 704 out of 15360
GC: total: 112000, used: 71072, free: 40928, max new split: 20480
 No. of 1-blocks: 977, 2-blocks: 266, max blk sz: 142, max free sz: 133
>>> s3=socket.socket()
>>>
>>> gc.collect()
>>> micropython.mem_info()
stack: 704 out of 15360
GC: total: 131968, used: 75584, free: 56384, max new split: 248
 No. of 1-blocks: 976, 2-blocks: 268, max blk sz: 282, max free sz: 966
>>> s4=socket.socket()
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
OSError: [Errno 105] ENOBUFS

Expected behaviour

No response

Observed behaviour

  • Connected to WIFI.

  • Already open 3 socket with FTP, TELNET, MQTT
    All works

However, when I attempt to create an additional socket for testing purposes, I encounter an error: OSError: [Errno 105] ENOBUFS. Following this error, the WiFi functionality ceases to work.

Additional Information

Any guidance on how to resolve this issue would be greatly appreciated.

@straga straga added the bug label May 3, 2024
@straga
Copy link
Author

straga commented May 5, 2024

“It appears that the issue arises when an asynchronous function is invoked in a synchronous manner. This behavior is particularly noticeable when there’s a significant delay(some time.sleep) in the code execution or when a piece of code takes an extended period to complete its operation in synchronous mode. It seems that asyncio has an impact on these scenarios.”

@straga
Copy link
Author

straga commented May 5, 2024

Also

Traceback (most recent call last):
  File "asyncio/core.py", line 1, in run_forever
  File "asyncio/core.py", line 1, in run_until_complete
  File "asyncio/core.py", line 1, in wait_io_event
OSError: [Errno 5] EIO

CleanShot 2024-05-05 at 09 32 32@2x

@straga
Copy link
Author

straga commented May 7, 2024

I am try without thread same result. from gc and mcropython shows memory enought.
CleanShot 2024-05-07 at 17 05 19@2x

@straga
Copy link
Author

straga commented May 7, 2024

When a call is executed within a thread, an error occurs - like that. However, when the call is executed outside of a thread, not that errors. But the WiFi functionality is compromised. Despite showing a connected status, data transmission is not occurring. Neither sending nor receiving of data is happening, and the ping operation is also failing.

CleanShot 2024-05-07 at 17 21 04@2x

CleanShot 2024-05-07 at 17 29 36@2x

@projectgus
Copy link
Contributor

projectgus commented May 8, 2024

MicroPython is consuming all of the available memory in the ESP32 for its heap, and ESP-IDF is running out of memory for allocating new sockets. After memory has run out, it's likely the Wi-Fi will also stop working as it regularly allocates and frees buffers.

You can see this happening at the moment the "GC: total" value goes up in mem_info output, and "max new split" drops to a very low number. "max new split" is the largest free memory block that ESP-IDF can use to allocate buffers for sockets and Wi-Fi.

You can also call esp32.idf_heap_info() to confirm this.

MicroPython only grows its heap when it needs to avoid a MemoryError. It looks like your application's total memory usage is still pretty low, but maybe at some point your code allocates a large single buffer and fragmentation means it has to grow the heap for this. If you can find the places in your MicroPython code that do these allocations and either remove them, or allocate the large buffer early in your program and then reuse it, then probably you can prevent MicroPython from growing the heap and then the other errors will go away.

If you're not sure what is causing MicroPython to grow the heap, add some more micropython.mem_info() calls in your code and look for whatever makes the "GC: total" number go up.

@straga
Copy link
Author

straga commented May 10, 2024

It turns out that ESP-IDF does not have enough memory for wifi operation. Wifi stops working, but sta.isconnected() == True. Ping timeout from PC to ESP32. Everyone thinks everything is fine. Asyncio stream keeps writing as there are no errors. Only if you try to create new sockets, asyncio crashes with Error 5 EIO.

Push Ctrl+c stop asyncio.

micropython.mem_info()
stack: 704 out of 15360
GC: total: 143936, used: 106624, free: 37312, max new split: 448
 No. of 1-blocks: 1550, 2-blocks: 407, max blk sz: 282, max free sz: 1111

gc.collect()

micropython.mem_info()
stack: 704 out of 15360
GC: total: 143936, used: 106752, free: 37184, max new split: 448
 No. of 1-blocks: 1556, 2-blocks: 408, max blk sz: 282, max free sz: 1111

That look same - (#12819)

Try build with CONFIG_LWIP_TCP_MSL=6000.

Result:
CleanShot 2024-05-11 at 09 45 14@2x

build with

LWIP

CONFIG_LWIP_TCP_MSL=6000
CONFIG_LWIP_SO_LINGER=y
and use awrite:

async def awrite(writer, data,  b=False):

    gc.collect()
    log.info(micropython.mem_info())

    try:
        if isinstance(data, str):
            data = data.encode('utf-8')
        await asyncio.wait_for(writer.awrite(data), timeout=1)
    except Exception as e:
        log.debug("Error: write: {}".format(e))
        pass

Stop frame: result -> ping timeout and ... .

CleanShot 2024-05-11 at 11 49 51@2x

CleanShot 2024-05-12 at 15 06 53@2x

@projectgus
Copy link
Contributor

projectgus commented May 14, 2024

@straga It might be related to the linked issue, but the root cause of ESP-IDF running out of memory is that MicroPython has already moved all the memory into the "Python heap".

There are two separate heaps, Python heap and ESP-IDF heap. Even if you have free memory in the Python heap, ESP-IDF can't use it. So to prevent this issue, you need to stop the "GC: total: ..." number from ever increasing. Once this memory is added to the Python heap, it's no longer available for ESP-IDF even if it's free for Python...

@straga
Copy link
Author

straga commented May 14, 2024 via email

@projectgus
Copy link
Contributor

@straga MicroPython is growing the heap automatically as your code is running, to prevent a MemoryError. From your logs:

GC: total: 112000,

This is early, there is enough other RAM for ESP-IDF.

GC: total: 143936,

This is later, not enough other RAM for ESP-IDF.

If you find the function in your code that causes this number to increase and refactor it, then ESP-IDF will start working again.

We might be able to add an option to MicroPython to make this simpler, as it's hard for MicroPython to know if it should choose to throw a MemoryError or to grow the heap.

@straga
Copy link
Author

straga commented May 16, 2024

@projectgus Thanks for information. I am rewrite my code for use less usage ram as possible. Now works better. While free RAM enought for python and esp-idf, all right. Micropython got first all RAM or more agresive method. If micropython not shows not enought allocated RAM it ok for micropython. But when for esp-idf need more RAM and not free. ESP-IDF not print any just randomly can stop something (in my case wifi stack). Right ?

@projectgus
Copy link
Contributor

projectgus commented May 22, 2024

@straga Yes, that's how it is at the moment. Glad that you got everything working.

@straga
Copy link
Author

straga commented Oct 11, 2024

Using the ESP32-C3, after applying a patch and opening a new socket, the board freezes completely. The only way to recover is by pressing the hardware reset button or enabling the watchdog.

This issue occurs on different boards as well, suggesting it’s not just related to RAM but could involve other factors too.

The ping stops as a result of actions I take. At that moment, I open another socket connection to the board. Despite this, the board remains connected to MQTT and continues sending messages.

However, when I start using multiple active socket connections (Telnet, FTP, HTTP), the board freezes.

That watch dog working.
CleanShot 2024-10-11 at 14 44 23@2x

@straga straga changed the title ESP32 without psram. Socket and [Errno 105] ENOBUFS ESP32 without psram. Socket and Memory Split or ... . Oct 11, 2024
@projectgus
Copy link
Contributor

@straga Do you have a way for us to reproduce this hang?

@projectgus
Copy link
Contributor

projectgus commented Oct 15, 2024

This fix may be relevant to the problem you're seeing #16015 (although unclear without a way to reproduce.) EDIT: Not this one, missed you weren't using TLS.

@projectgus
Copy link
Contributor

The fix from #15952 may help with this issue. Please try the latest nightly build v1.24.0-preview.409.g82e69df33 (2024-10-10) .bin from the downloads, or any newer version.

@straga
Copy link
Author

straga commented Oct 15, 2024

ESP32 board
Using the correct path seems to improve, but I’m still experiencing random freezes when running multiple components simultaneously: mqtt_as, uftpd, and telnet for REPL. If I use only telnet and FTP without running any asyncio code or mqtt_as, the system works without issues.

However, when all components are active (which only involves up to six sockets), I encounter random freezes. The situation has improved compared to before, as the watchdog now restarts the board when it freezes.

•	Left side: An asyncio task feeds the watchdog and shows memory info every second.
•	Right side: A ping is sent to the board.

When I attempt to connect to the board and download a 3KB file, the ping drops, and the left side freezes. You can see this behavior in this video: https://youtu.be/-Vujb0btDwc .

CleanShot 2024-10-15 at 11 01 37@2x

If I run the asyncio code in a separate thread, you can observe the behavior in this video: https://youtu.be/MfxXQfAYu9g. It demonstrates how memory leaks occur due to the split between the split new and MicroPython, but the system doesn’t freeze—it continues running. The code remains unchanged, with the only difference being that asyncio is now executed in a thread.

When running asyncio in a thread, the Wi-Fi occasionally stops working, even though ifconfig still shows an IP, and isconnected() == True. This behavior also seems to occur randomly.

CleanShot 2024-10-15 at 11 06 51@2x

@straga
Copy link
Author

straga commented Oct 15, 2024

@straga Do you have a way for us to reproduce this hang?

@projectgus

just in secret file set (ssid password), in board.yml mqtt (ip).
Wait while loaded and connected to mqtt. connect to telnet and try dowloaded "main.py" from board for example.

main.py
TREAD = True # Run in tread, False # Not in thread

ftp client setting: pycharm
CleanShot 2024-10-15 at 11 36 02@2x

board files:
CleanShot 2024-10-15 at 11 40 09@2x

for_esp32_board.zip

https://github.com/cpopp/MicroTelnetServer
https://github.com/robert-hh/FTP-Server-for-ESP8266-ESP32-and-PYBD/blob/master/uftpd.py

@projectgus
Copy link
Contributor

@straga Does it still hang on the latest nightly build? See #14421 (comment)

@straga
Copy link
Author

straga commented Oct 16, 2024

@projectgus That was:
CleanShot 2024-10-16 at 07 40 08@2x

@dpgeorge
Copy link
Member

@straga There shouldn't be a "dirty" label in the tag if you used the firmware from the download page.

Did you build yourself with modifications?

@straga
Copy link
Author

straga commented Oct 16, 2024

@dpgeorge Yes with my board configuration.

Now same with:
fw: v1.24.0-preview.447.g838f21298 (2024-10-15) .bin

video: https://youtu.be/3zo4YU3zZ-w

Only comment binary sensor:
If uncomment - wifi not work after loaded. Memory minimum but nobody say about that. https://youtu.be/9S36EsN63rA
CleanShot 2024-10-16 at 09 13 05@2x

That with night build: Result same after loaded all, Try use FTP upload or download file. Freeze all, just wait watchdog reset.
CleanShot 2024-10-16 at 09 07 06@2x

@straga
Copy link
Author

straga commented Dec 31, 2024

CleanShot 2024-12-31 at 10 43 48@2x

@projectgus
Copy link
Contributor

@straga Some changes recently merged (#16015) that may help with avoiding MemoryErrors in Python, so you could test with a recently nightly preview build and check for any improvement.

However, looking at this latest screenshot you are simply running out of RAM. Low memory in ESP-IDF is causing the other weird behaviours like dropped packets and lack of response.

There probably isn't anything MicroPython can do to help with this. You'll need to either rewrite your code to use less RAM (for example, by freezing modules into the firmware or other techniques) or you could look at switching to an ESP32 with PSRAM onboard, this will give you much more RAM to use.

@straga
Copy link
Author

straga commented Feb 14, 2025

@projectgus I have an old board with an older version, maybe 1.21 or 1.22, that works the same way with no problems. That's why it confused me.

I am now looking at defragmenting the memory. And if it's heavily defragmented. The maximum possible piece that can be allocated is too small, although the total amount of free memory is enough. I just did a board reset.

I haven't figured out how to understand what went wrong at the system level.

@projectgus
Copy link
Contributor

I am now looking at defragmenting the memory. And if it's heavily defragmented. The maximum possible piece that can be allocated is too small, although the total amount of free memory is enough. I just did a board reset.

Is this in the ESP-IDF heap?

@straga
Copy link
Author

straga commented Feb 21, 2025

How I am understand

esp32.idf_heap_info(0)
[
    (240, 4, 0, 4),        # Region 0: DMA memory
    (7288, 4, 0, 4),       # Region 1: 32-bit accessible memory
    (16648, 4, 0, 4),      # Region 2: Instruction cache
    (85008, 4, 0, 4),      # Region 3: Data cache
    (15072, 4, 0, 4),      # Region 4: IRAM (instruction RAM)
    (113840, 33744, 29696, 27544),  # Region 5: Main heap (DRAM)
    (21040, 20280, 19456, 20248)    # Region 6: RTC fast memory
]

Each tuple shows (total_size, free_size, largest_free_block, min_free_since_boot):

113840, 33744, 29696, 27544), # Region 5: Main heap (DRAM)

# def check_memory():
#     heaps = esp32.idf_heap_info(0)
#     #main_heap = heaps[5]  # Main region esp32
#
#     # Check memory status
#     if main_heap[2] < 1024:
#         print(f"\nCRITICAL! Main heap fragmented! Largest block: {main_heap[2]} bytes")
#         import machine
#         machine.reset()

@projectgus
Copy link
Contributor

How I am understand

esp32.idf_heap_info(0)

If you only care about data allocations then it's better to call esp32.idf_heap_info(esp32.HEAP_DATA) and ignore the IRAM entries.

To determine fragmentation you can do something like:

info = esp32.idf_heap_info(esp32.HEAP_DATA)
largest_free_block = max(h[2] for h in info)
total_free = sum(h[1] for h in info)

Will give you the largest free block and the total free bytes across all heaps.

Comparing these two numbers will help you understand if this is actually fragmentation, or if you're simply running out of RAM (for fragmentation, total_free is high but largest_free_block is very low. If out of RAM, both numbers will be low.)

@piotrwest
Copy link

I've found this discussion as I'm experiencing similar effects as @straga , although in slightly different use case. I'm attempting to make an MQTT TLS connection, which fails with different random errors, or sometimes just hangs. However, it succeeds if I have more than approx. 40KB of "max new split" (e.g. by trimming down the code).

In my case, the memory usage is not high, but fragmented:

Total:143744 Free:99472 (69.20%)
stack: 2208 out of 15360
GC: total: 112000, used: 44496, free: 67504, max new split: 31744
 No. of 1-blocks: 357, 2-blocks: 206, max blk sz: 251, max free sz: 3249

The behavior of doubling MicroPython heap doesn't help either - I cross 56000 bytes of heap only upon initialization/precomputation, then fall back below that. Unfortunately, the heap remains at 112000 bytes.

One of the workarounds I'm thinking about is to allocate memory outside of MicroPython heap. This way, just after starting, one could allocate 100KB of memory, run initialization part of the code, which would effectively squeeze everything to available ~50KB, and then release those 100KB before connecting to MQTT. The end result would be a max new split of 100k, instead of 31k. I've tested this theory successfully with allocating a byte array (but unfortunately, since it's ending up on the heap, which I cannot resize back down, it's practically useless).

So, all in all, I'm wondering if there is a way to allocate memory outside of MicroPython heap? Could this be a viable workaround in some cases?

@piotrwest
Copy link

Quick update, the idea above is certainly possible and works... at least in my case.
Before the "hack" I was getting to wifi connection code (i.e. just before connecting to wifi, after all initialization for business logic) with:

Total:143744 Free:99472 (69.20%)
stack: 2208 out of 15360
GC: total: 112000, used: 44496, free: 67504, max new split: 31744
 No. of 1-blocks: 357, 2-blocks: 206, max blk sz: 251, max free sz: 3249

After the "hack", the same code, same place (note I bumped initial stack from 56k->70k as well):

Total:133504 Free:90016 (67.43%)
stack: 2208 out of 15360
GC: total: 70016, used: 43712, free: 26304, max new split: 63488
 No. of 1-blocks: 365, 2-blocks: 212, max blk sz: 251, max free sz: 570

this doesn't give the whole picture, the available blocks here are 64k, 40k and 14k! Connection to wifi and subsequent MQTT w/ TLS is now possible.

If this is useful for anyone, those are two functions which I added in micropython:

// Allocate outside MicroPython's heap using system malloc
static mp_obj_t mem_non_mp_heap_alloc(const mp_obj_t size_in) {
    mp_int_t size = mp_obj_get_int(size_in);
    void *ptr = malloc(size);
    if (ptr == NULL) {
        mp_raise_OSError(MP_ENOMEM);
    }
    // Return pointer as an integer
    return mp_obj_new_int_from_ull((uintptr_t)ptr);
}
static MP_DEFINE_CONST_FUN_OBJ_1(mem_non_mp_heap_alloc_obj, mem_non_mp_heap_alloc);

// Free previously allocated memory
static mp_obj_t mem_non_mp_heap_free(const mp_obj_t ptr_in) {
    void *ptr = (void *)(uintptr_t)mp_obj_get_int(ptr_in);
    if (ptr != NULL) {
        free(ptr);
    }
    return mp_const_none;
}
static MP_DEFINE_CONST_FUN_OBJ_1(mem_non_mp_heap_free_obj, mem_non_mp_heap_free);

and this is how I'm using them:

ALLOC_TO_SQUEEZE_REST_OF_ALLOCATIONS_1 = None
ALLOC_TO_SQUEEZE_REST_OF_ALLOCATIONS_2 = None
ALLOC_TO_SQUEEZE_REST_OF_ALLOCATIONS_3 = None

def block_mem():
    global ALLOC_TO_SQUEEZE_REST_OF_ALLOCATIONS_1
    global ALLOC_TO_SQUEEZE_REST_OF_ALLOCATIONS_2
    global ALLOC_TO_SQUEEZE_REST_OF_ALLOCATIONS_3
    ALLOC_TO_SQUEEZE_REST_OF_ALLOCATIONS_1 = esp32.mem_non_mp_heap_alloc(64000)
    ALLOC_TO_SQUEEZE_REST_OF_ALLOCATIONS_2 = esp32.mem_non_mp_heap_alloc(40000)
    ALLOC_TO_SQUEEZE_REST_OF_ALLOCATIONS_3 = esp32.mem_non_mp_heap_alloc(14000)

def free_mem():
  global ALLOC_TO_SQUEEZE_REST_OF_ALLOCATIONS_1
  global ALLOC_TO_SQUEEZE_REST_OF_ALLOCATIONS_2
  global ALLOC_TO_SQUEEZE_REST_OF_ALLOCATIONS_3
  if ALLOC_TO_SQUEEZE_REST_OF_ALLOCATIONS_1 is None:
    print("already released!!!")
    return
  esp32.mem_non_mp_heap_free(ALLOC_TO_SQUEEZE_REST_OF_ALLOCATIONS_1)
  esp32.mem_non_mp_heap_free(ALLOC_TO_SQUEEZE_REST_OF_ALLOCATIONS_2)
  esp32.mem_non_mp_heap_free(ALLOC_TO_SQUEEZE_REST_OF_ALLOCATIONS_3)
  print("released blocked memory")

Of course, you will need to tweak the block sizes for your situation.

@projectgus
Copy link
Contributor

projectgus commented Mar 10, 2025

The behavior of doubling MicroPython heap doesn't help either - I cross 56000 bytes of heap only upon initialization/precomputation, then fall back below that. Unfortunately, the heap remains at 112000 bytes.

We've actually been down the path of trying to tackle this a bit, before. I had this PR which added a micropython module function to tell the system to try and keep a certain amount of memory available for the ESP-IDF system (very similar to your "hack" but should have worked better in the limit as it will keep not growing them). However it's very complex and hard to reason about, which is why it wasn't merged.

What we added instead was MICROPY_GC_INITIAL_HEAP_SIZE. If you're building a custom firmware anyway then suggest setting this value to the amount of heap that you know you'll need, so MicroPython doesn't need to try and grow the heap during startup. You could even set MICROPY_GC_SPLIT_HEAP_AUTO to 0 as well, so the MicroPython heap never tries to grow.

attempting to make an MQTT TLS connection

If you're using v1.24 or earlier, there is another bug that causes TLS buffers to not be freed as early as they could be during a disconnect/reconnect cycle. The fix is in #16015 and is available in the nightly preview builds or the upcoming v1.25 release.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants