-
-
Notifications
You must be signed in to change notification settings - Fork 8.2k
GC: Malloc failing to allocate due to lately release of unreferenced allocated memory by forced GC swipe #7778
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Does that happen as well if you run gc.collect() in the main thread loop? |
i did one step more and removed the gc.collect() completely.
|
So I try to replicate that fault here on a WEMOS esp32 lite, Generic ESP32, v1.17-20 firmware. I use your first script, and pre-allocated 32000 bytes. It runs well. How long must I wait until it should fail? |
Sorry but i didn't understand your first comment. A couple of minutes are enough to trigger the issue on the ESP32. I think the issue is related to the print allocating the internal buffers in concurrence with the main thread .. |
So you script with threading and print runs now here for ~20 Minutes. The free heap is about 15k. |
well considering the port of ESP32 uses IDF primitives i'm not sure it may be relevant. I'll now try to isolate better the issue |
So then this issue can be closed. The forum at https://forum.micropython.org/ is better suited for topics like these. |
could you do a last try with more memory in the allocated buffer? like 3000 instead of 1500.
Examples (only main thread, heap monitoring thread disabled):
|
I think at the end the issue may be related to heap fragmentation... I used gc_dump_alloc_table to print the memory at malloc failure, here is the result:
Tomorrow i'll investigate more on this front, since it's strange. mp initialization shouldn't scatter data so much in the heap. |
Yes i confirm the issue is much probably related to the initialization of micropython scattering the heap with data that is not released.
then in boot.py The difference seen is huge! At boot:
as first line of main.py:
Already here the device has only 3 lines of free contiguous memory! After just 3 iterations also that part becomes unavailable, but it's visible that some other sections were freed.
@robert-hh please reconsider this as an issue, since quite a major optimization problem and completely independent from my code. |
I do think this is related to micropython parsing all the files dependent from main.py in the device. This is why you are not seeing the issue, the fact that you don't have other modules gives much more spare space and the gc manages to cope with it. It is still strange that the malloc fails after few loops, since no other code is running, as if the GC fails to free an allocated block sometimes. I think that in the end we can split this and reformulate as: The GC is not reliable when memory is close to being filled. |
Not sure that is the correct way to phrase it; i.e. the GC is reliable in that it is predictable. It's just that it doesn't defragment and total amount of free memory is in almost no way related to maximum size which can be allocated. |
I disagree. GC is reliable in the way it is implemented. It does not move blocks when runnuign gc.collect(). So it cann happen that many free blocks are scattered all over the memory, amounting to a large fraction of the total memory, but no contiguous large memory block. That was discussed several times. |
nono, problem is not moving around memory. Please leave aside the initial example.
This should not happen, but somehow it does. This is visible in the GC dumps above: Before the loop
after few cycles the Malloc fails and the dump shows clearly a big buffer still allocated -> which is not referenced
to rerun a meaningful example you should fill your heap until you have something like 5KB of contiguous space left at the end of your heap, then run the loop. It can be drawn like this where The issue can be probably renamed something like
First case is just an optimization issue, second case is a bug. |
That can only happen if there is a reference somewhere to the memory. It may be a stray reference (eg somewhere on the C stack, in some machine register) but there must be a pointer somewhere in RAM/registers. Otherwise the GC will reclaim the memory.
This can never happen. If a chunk is freeable (not referenced anywhere) then it is guaranteed to be freed in the GC collection (ie it's never delayed). BTW, did you try using |
@dpgeorge there is no need to put the threshold since the initial condition of the test will cause the GC to be over threshold every time.
This is why the following script should never fail! No references whatsoever, unless some internal mp module is getting delayed.
So option 2, there is a bug in the GC/(mp memory management) :) This is an updated version of the script that will fail after few seconds (if you want to make it fail faster reduce the size of FREESPACE, leaving enough space for a b c allocations tho!). boot.py
the gc.dump is just:
The result of the script after a 10k iterations (on my ESP32) is
|
I can replicate that on my ESP32 board if it is in boot.py or in a separate imported python script. b.t.w. micropython.mem_info(1) creates the same output as your gc.dump(). |
well @robert-hh , the heap is not fragmented anymore in the latest example,
Cool that you could replicate!
Uhm, i haven't tried terminal yet, what would be the difference? less loaded modules?
This would make me say something like "The GC is late to free the 3K buff in one cycle, and manages to free it the one after." But, it could also mean that one 3K buffer is allocated and not released, the fact the loop still works is because you can allocate in another place. IF this is true tho, the loop would just fail in the same way after a longer time, since you would gradually fill up the heap.
|
I renamed the issue to highlight the underlying Micropython issue. |
dropping this since we stopped using micropython for now |
Add address_little_endian for epaper displays with little endian (low byte first) addresses. Also clears allocated display and display bus memory so it has a known state. The acep member wasn't always set so it varied accidentally. Fixes micropython#7560. May fix micropython#7778. Fixes micropython#5119.
Hello,
I'm currently trying to actively debug what seems to be a bug in the GC.
I discover this while transferring a 500KB file to my device from a server.
I managed to reduce the required code to trigger the issue quite a lot
What happens:
With what
The code to reproduce is quite simple (i hope it can be simplified more tho, since it currently uses threads)
After a while, you will see something like this:
So,
something strange is happening.
the GC is supposed to run a sweep each time a heap threshold is reached.
Since the variables a, b and c are being overridden in the loop, the old memory should be freed each time this threshold is reached.
And by looking at the reported values from the monitoring thread, it should be true.
On top, the heap monitor thread is running gc.collect every second.
.. Any help debugging what is going on?
Thanks
The text was updated successfully, but these errors were encountered: