Skip to content

GC: Malloc failing to allocate due to lately release of unreferenced allocated memory by forced GC swipe #7778

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
KonssnoK opened this issue Sep 12, 2021 · 20 comments

Comments

@KonssnoK
Copy link

KonssnoK commented Sep 12, 2021

Hello,
I'm currently trying to actively debug what seems to be a bug in the GC.
I discover this while transferring a 500KB file to my device from a server.
I managed to reduce the required code to trigger the issue quite a lot

What happens:

  • the GC fails to allocate a contiguous buffer of 1500 when more than 20KB are reported as free

With what

  • ESP32-VROOM-32 -> compile with generic board
  • Micropython compiled from tag 1.17

The code to reproduce is quite simple (i hope it can be simplified more tho, since it currently uses threads)

import utime as time
import _thread
import gc

def thread_heap_monitor():
    while True:
        unreac = gc.collect()
        print("GC: Using %sB, %sB are free. %s unreachable"%(gc.mem_alloc(), gc.mem_free(), unreac))
        time.sleep_ms(1000)

def main():
    _thread.start_new_thread(thread_heap_monitor, ())
    while True:
        a = bytes(1501)
        b = bytes(16)
        c = bytes(40)
        la = len(a)
        print("A %s %s"%(a[0] , la))

if __name__ == "__main__":
    main()

After a while, you will see something like this:

GC: Using 40384B, 23616B are free. None unreachableA 0 1501
A 0 1501
A 0 1501
A 0 1501
A 0 1501
A 0 1501
A 0 1501
A 0 1501
A 0 1501
A 0 1501
A 0 1501
A 0 1501
A 0 1501
A 0 1501
A 0 1501
A 0 1501
A 0 1501
A 0 1501
A 0 1501
A 0 1501
A 0 1501
A 0 1501
A 0 1501
A 0 1501
A 0 1501
A 0 1501
A 0 1501
A 0 1501
A 0 1501
A 0 1501
A 0 1501
A 0 1501
A 0 1501
A 0 1501
A 0 1501
A 0 1501
Traceback (most recent call last):
  File "main.py", line 85, in <module>
  File "main.py", line 65, in main
MemoryError: memory allocation failed, allocating 1502 bytes
MicroPython v1.16-285-g5be5707bd on 2021-09-12; ESP32 module with ESP32
Type "help()" for more information.
GC: Using 37744B, 26256B are free. None unreachable
GC: Using 37808B, 26192B are free. None unreachable
GC: Using 37808B, 26192B are free. None unreachable
GC: Using 37808B, 26192B are free. None unreachable
GC: Using 37808B, 26192B are free. None unreachable
GC: Using 37808B, 26192B are free. None unreachable
GC: Using 37808B, 26192B are free. None unreachable
GC: Using 37808B, 26192B are free. None unreachable
GC: Using 37808B, 26192B are free. None unreachable

So,
something strange is happening.
the GC is supposed to run a sweep each time a heap threshold is reached.
Since the variables a, b and c are being overridden in the loop, the old memory should be freed each time this threshold is reached.
And by looking at the reported values from the monitoring thread, it should be true.
On top, the heap monitor thread is running gc.collect every second.

.. Any help debugging what is going on?
Thanks

@robert-hh
Copy link
Contributor

Does that happen as well if you run gc.collect() in the main thread loop?

@KonssnoK
Copy link
Author

KonssnoK commented Sep 12, 2021

i did one step more and removed the gc.collect() completely.


def thread_heap_monitor():
    while True:
        #unreac = gc.collect()
        print("GC: Using %sB, %sB are free."%(gc.mem_alloc(), gc.mem_free()))
        time.sleep_ms(1000)
A 0 1501
A 0 1501
A 0 1501
Traceback (most recent call last):
  File "main.py", line 85, in <module>
  File "main.py", line 65, in main
MemoryError: memory allocation failed, allocating 1502 bytes
MicroPython v1.16-285-g5be5707bd on 2021-09-12; ESP32 module with ESP32GC: Using 41728B, 22272B are free.

Type "help()" for more information.
>>> GC: Using 41840B, 22160B are free.
GC: Using 41920B, 22080B are free.
GC: Using 42000B, 22000B are free.
GC: Using 42080B, 21920B are free.

@robert-hh
Copy link
Contributor

So I try to replicate that fault here on a WEMOS esp32 lite, Generic ESP32, v1.17-20 firmware. I use your first script, and pre-allocated 32000 bytes. It runs well. How long must I wait until it should fail?

@KonssnoK
Copy link
Author

Sorry but i didn't understand your first comment.

A couple of minutes are enough to trigger the issue on the ESP32.
I'm now running the script with the heap monitoring thread commented out and for now i see no crashes.

I think the issue is related to the print allocating the internal buffers in concurrence with the main thread ..

@robert-hh
Copy link
Contributor

So you script with threading and print runs now here for ~20 Minutes. The free heap is about 15k.

@KonssnoK
Copy link
Author

well considering the port of ESP32 uses IDF primitives i'm not sure it may be relevant.
It helps narrow down for sure!

I'll now try to isolate better the issue

@robert-hh
Copy link
Contributor

So then this issue can be closed. The forum at https://forum.micropython.org/ is better suited for topics like these.

@KonssnoK
Copy link
Author

KonssnoK commented Sep 12, 2021

could you do a last try with more memory in the allocated buffer? like 3000 instead of 1500.
In my tests i managed to make it crash (out of mem) almost immediately with a larger buffer.
It's true that i also had a long lasting loop with 1500 without any crash for at least 10 minutes

CONF: Reading config v1
A 0 3000
A 0 3000
A 0 3000
A 0 3000
A 0 3000
A 0 3000
A 0 3000
A 0 3000
A 0 3000
A 0 3000
A 0 3000
A 0 3000
Traceback (most recent call last):
  File "main.py", line 88, in <module>
  File "main.py", line 68, in main
MemoryError: memory allocation failed, allocating 3001 bytes
MicroPython v1.16-285-g5be5707bd on 2021-09-12; ESP32 module with ESP32B: Using 41616B, 22384B are free.

Type "help()" for more information.
>>> B: Using 41728B, 22272B are free.

Examples (only main thread, heap monitoring thread disabled):

  • 4000B buffer -> out of mem after 1 cycle (20KB free reported by GC)
A 0 4000 39520 24480
Traceback (most recent call last):
  File "main.py", line 88, in <module>
  File "main.py", line 68, in main
MemoryError: memory allocation failed, allocating 4001 bytes
MicroPython v1.16-285-g5be5707bd on 2021-09-12; ESP32 module with ESP32
Type "help()" for more information.
>>>
  • 3000B -> more or less 10 cycles

@KonssnoK
Copy link
Author

I think at the end the issue may be related to heap fragmentation...

I used gc_dump_alloc_table to print the memory at malloc failure, here is the result:


GC memory layout; from 3ffcf2d0:
00000: TMShhSMShDh=Sh=Shh=================.h=======h=======h=hh==T=.h==
00400: =====B..MD...........h=........h=====MD.hh==BM..hh..............
00800: ......DS........h...MSSh=======Sh=ShSh.DSD.....Sh=hh.hMDS..hh...
00c00: .............Sh==========h===.....D...h=======MD......h=.....Sh=
01000: Sh=.h.....D...h.h=======h==========h=======................h====
01400: ===h..h=h===============.DBB.h=====h=======h=h===h==BBh=h===h==h
01800: BBhh=........h========h=h==hh==h===S...Sh====================.D.
01c00: B......hh=...........S.....Sh===Sh=SSh=................h=.....M.
02000: .DS..........Sh=h=======D.M.B.BBB=BhhhhSh=.....Sh...............
02400: ...Sh==h===.....h====================......ShSh..ShSShhhS.h=====
02800: h=======h==h======h=................h===DhBBhh====h====.h====BBh
02c00: BBShBBBBB.BBB............h=======h========hh===================h
03000: =h===hh=hBh===B=BS...h========h====h=h========h===h===SS..BBBMDS
03400: .....h=====.Dh=SShBBB.h===h=hBBB.B=h===.MD...Sh=Sh=h=Sh=======h=
03800: ====D....h=...........S.h=...h=....M.hhSh=......................
03c00: h=Sh=Sh=.hShS...h=S..........................h.....MD.BBh===h===
04000: Sh=======h==hhh=.......................h=.h=======h====S.h=ShS.h
04400: .hSh.hSShS.h=Sh....h=============.DDS..........M...DB...S.h=====
04800: ======h=................Dh=.....S.......SSh=....h=D.SSh.........
04c00: .........DBBBBLBSh=.Sh=h===h==Bh===.h========================...
05000: .h========================================h=======Sh===.....h=h.
05400: .h==h=====BB.B=BB=S..hh========h===..h=h========================
05800: ============h===..........h=S.S..h=======hh....h=h=......B..DBBh
05c00: ==Sh.h=S.hhh........................h==Bh=h....h===BBBh===B=....
06000: .h===========================h===========h===h==......h===hhShhh
06400: .hSh==.hSh==..h====.......h=====h====.......h=h========.........
06800: ...............................h...........h=========...........
06c00: .h========hShh=hSh......................h==.....................
07000: ..........Shh=======hh=======h==ShShShShh=====h=ShShShh==hh=====
07400: ===h=ShShh==hh==========h=ShShShh========hh======hh====hh====hh=
07800: hh=hShh==hShSh...........h====h=========h=========h===hhh.......
07c00: ......h=====h===h=====================h=========================
08000: =======h=====hh=============.h=hh============h...h...........hhh
08400: hh......h.......................................................
08800: ..............h=================================================
08c00: =================================================...............
09000: .........................hh.........hh.....................h....
09400: ........h===========....................................h======h
09800: ========h=h====================hh======hh==========hh===========
09c00: =======h=h======================================================
0a000: =========.......................................................
0a400: ................................................................
0a800: ..........hhhhhhhhhhhhhh........hhhhhh................h=========
0ac00: ================================================================
0b000: =======........hhhhhhhhhhhhhhhh........hhhhhhhhhhhhhhhhhhhh.....
0b400: ...hhhhhhhhhhhhhhhhhhhh........hhhhhhhhhhhhhh........hhhhhhhhhhh
0b800: hhhhhhhhhhhhhhhhhhhhh........hhhhhhhhhhhhhhhhhhhhhhhhhh........h
0bc00: hhhhhhhhhhhhhhhhhhhhhhhhhh........hhhhhhhhhhhhhhhhhhh........hhh
0c000: hhhhhhhhhhhhhhhhhh........hhhhhhhhhhhhhhhhhhhhhhhhh........hhhhh
0c400: hhhhhhhhhhhhhhh........hhhhhhhhhhh........hhhhhhhhhhhhhhhhhhhhh.
0c800: .......hhhhhhhhhhhhhhhhh........hhhhhhhhhhhhhhh........hhhhhhhhh
0cc00: hhhhhhh........hhhhhhhhhhhhhhhhhhhhh..h....hh....h=.............
0d000: .............hh.........hhhhhhhhhhhhh...........................
0d400: ........................h=====h===================h=h===========
0d800: ================================================================
0dc00: ================================================================
0e000: ================================================................
0e400: h========h============================================h====h====
0e800: ================================================================
0ec00: ================================================================
0f000: =======================================================.........
0f400: ................................................................
0f800: ................................

Tomorrow i'll investigate more on this front, since it's strange. mp initialization shouldn't scatter data so much in the heap.

@KonssnoK
Copy link
Author

Yes i confirm the issue is much probably related to the initialization of micropython scattering the heap with data that is not released.
To see this i made a quick change to modgc

STATIC mp_obj_t gc_dump(void) {

    gc_dump_alloc_table();
    return mp_const_none;
}
MP_DEFINE_CONST_FUN_OBJ_0(gc_dump_obj, gc_dump);

....
    { MP_ROM_QSTR(MP_QSTR_dump), MP_ROM_PTR(&gc_dump_obj) },
....

then in boot.py
and as first line in main.py i added
gc.dump()

The difference seen is huge!

At boot:

MPY: soft reboot
GC memory layout; from 3ffcf2d0:
00000: h=.hh.M..DB..h=.hh=================hh=======h=======h=hh========
00400: =======h==......h===............................................
       (60 lines all free)
0f800: ................................

as first line of main.py:

GC memory layout; from 3ffcf2d0:
00000: TMShhhMhhDBShh=hhh=================Sh=======h=======h=hh========
00400: =======TMDh=====h===h===============h=======BTMDShh==hhhBMh=h=DS
00800: hhSh=Sh=h=======Sh=ShSh=BDShhh===============Sh=TMh=============
00c00: ==SShhhDSShhhSSh=Sh=hhh==========h===h=======MDSh=Sh=BBDTMDShhhS
01000: h=Shhh=======h=h=======h=======h==========h=BBDBDBhhhhhBh=BBhhh=
01400: ==============h===============h==hhh=hhh=====h=======hh===h===h=
01800: =BBhh=Th==h==hh========h=h==hh==h===ShhBDTBhhh=h================
01c00: ====Sh=hhSSSh=BTTShShhhhhhSSSh==h=======SSh=LhLhBMShDShhSTMh=h==
02000: =============h=SSh=h==hhhhShhhhh=======hBTBBDTBBB=SShBh===h==h==
02400: h=Sh===h==h===BhSh=h==================hh===DhBBhBBSh=hShShh=Sh=h
02800: =====h=======ShShShh==hh======h=h===============h===============
02c00: h====h====Th====h======h====h=======h========hh=================
03000: ==h=h===hh=hhBBBBBBBShhhBBTh===BBB=BTSh=h========h===BBSh=SSh===
03400: =====h====h===TBBBMDShhSShh=Sh=h=======h===============DhBBBh===
03800: ==hSh=ShhBBBTB=TMDSSh=h===Dh=====SBh=h=Sh=Sh=h===============hSh
03c00: ===h===h=ShhhhSh=hh=Sh=hh=======MDBBBh===hhhh==hhSh===h===h==SSh
04000: =======h======Sh..h=h===============hhSSThh=======h====h=ShhShSh
04400: =hhhShhhhSh=ShBTMhhSShh=============BDDShhSh=Sh=hhSSh===h=h===BB
04800: DBT=TMhDh=Shh=h=======SSh=ShhSh=ShSShhh=SSh===Sh=h==============
04c00: =h=h======SBBh=DhBDBBBBLBSh=ShhhSh=h===============h===h==BB=BhS
05000: h==h========================================h==hhh==Sh===h==h===
05400: BTBShhhhhh============h=h====h=====B=SShhh===h===hhh=Sh=hh======
05800: ========================Sh==h===========Shh===h=================
05c00: =h====================h=Sh====h========BBTBDBBh==Shh======hh====
06000: ===h=hhSh===h===============h==h===BBBh===B=Bhh===h=======hhhh==
06400: ===h=hhShhhhhSh==hhSh==hh===========h===h==============h=h======
06800: =hh======h=h=h=h=hhhhh==.....hh=====hhh===hh==========h=========
06c00: h=======h===h========hShh=hShh====h=====h========..h==h=========
07000: =====================Shh=======hh=======h==ShShShShh=====h=ShShS
07400: hh==hh========h=ShShh==hh==========h=ShShShh========hh======hh==
07800: ==hh====hh=hh=hShh==hShShh=====h====h====h=========h=========h==
07c00: =hhhh=====h=====h=====================h=========================
08000: =======h=====h=============h===................hhh=.....hhhhhhhh
08400: h=............h...h==h==h=hh============h.......................
08800: ................................................h=====........hh
08c00: =======h========================================================
09000: =======h====================================.....hh.........hh..
09400: ..................hh.............................h=======h==h===
09800: =====h======h========h====================h======h==========hh==
09c00: ================h=........................................hh====
0a000: =====...........................................................
0a400: ................................................................
0a800: .................................hhhhhhhhhhhh........hhhhhhhh...
0ac00: ..........h=====================================================
0b000: ===========================........hhhhhhh........hhhhhhhhhhhhhh
0b400: hhhhhh........hhhhhhhhhhhhhhhhhhhh........hhhhhhhhhhhhhh........
0b800: hhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhh........hhhhhhhhhhhhhhhhhhhhhhhh
0bc00: hh........hhhhhhhhhhhhhhhhhhhhhhhhhhh........hhhhhhhhhhhhhhhhhhh
0c000: hh........hhhhhhhhhhhhhhhhhhh........hhhhhhhhhhhhhhhhhhhhhhhhhh.
0c400: .......hhhhhhhhhhhhhhhhhhh........hhhhhhhhhhhhhhh........hhhhhhh
0c800: hhhhhhhhhhhhhh........hhhhhhhhhh........hhh........hhhhhhhhhhhhh
0cc00: hhhhhhh........hhhhhhhhhhhhhhhhhhhh........hhhhhhhhh...hhh...hh=
0d000: ......................hh=.......................................
0d400: .h==hhhhhhhhhhhhhhhhhh..hh.....h................................
0d800: ..................................................h=============
0dc00: ===============================================================h
0e000: =================================================h=====h========
0e400: ===========h========h===========================================
0e800: =h====..........................................................
       (3 lines all free)
0f800: ................................

Already here the device has only 3 lines of free contiguous memory!

After just 3 iterations also that part becomes unavailable, but it's visible that some other sections were freed.

A 0 3000 51424 12576
A 0 3000 41456 22544
A 0 3000 41456 22544
GC memory layout; from 3ffcf2d0:
00000: TMShh.M.hD...h=hhh=================Sh=======h=======h=h.......h=
00400: Sh==ST=SMDh=====.h=.................h=======B.MD.hh==hhh.M....DS
00800: .h......h=======Sh=ShSh=BDShh................Sh=.M..............
00c00: ..SShhhD....h.Sh=Sh=hhh==========h===h=======MDS.......D.MD.....
01000: ....hh=======h=h=======h=======h==========h=..D.DB.h..hBh=BBhh..
01400: ..............................h==..h=h.h=====h=======.h===h===h=
01800: =BBhh=........h========h=h==hh==h===S...D.B....h================
01c00: ====......SSh=...........hS.....h=======SSh=.....M..DS....Mh=...
02000: .............h=SSh=........hh.hh=======...B.D.BBB=.S.B..........
02400: h=S....h==h===.hSh=h==================hh===DhBBhBBSh=hShShh=Sh=h
02800: =====h=======ShShShh==hh======h=................................
02c00: h====h====.h====h======h====h=======h========hh=================
03000: ==h=h===hh=hhBBBBBBBS...BB.h===BBB=B.Sh=h========h===..Sh=SSh===
03400: =====h====h===.BBBMDS...Shh=Sh=h=======................DhBBBh===
03800: ==.Sh=ShhBBB.B=.MDSSh=h===Dh=====.........Sh=...................
03c00: ...h===h=Sh..hSh=...Sh=.h=======MD.BBh===.hh...hh.........h==SSh
04000: =======..............................hS..hh=======h====h=Sh.ShSh
04400: =.h.Sh..hSh=....M...S.h=============.DDS...........Sh===h=h===..
04800: DB...M.Dh=Shh=h=======.Sh=S.......SSh.h=S.....Sh=...............
04c00: ...h======...h=D..DBBBBLBSh=S...Sh=................h===h==BB=B..
05000: ...h========================================........Sh===h==h===
05400: B.BS....hh============.......h=====B=...hh===h===........h======
05800: ========================....h===========S.....h=================
05c00: =h====================h=......h========.B..DBBh==Shh======hh====
06000: ===h=h......................h==h===BBBh===B=Bh....h=======hh.h==
06400: ===h=hhShhh.hSh==.hSh==.h===========h===h==============h=h======
06800: =h.........h=h=h=hhhhh==h====.......hhh===h...........h=========
06c00: ............h========hShh=hShh====h=====h========..h==..........
07000: .....................Shh=======hh=======h==ShShShShh=====h=ShShS
07400: hh==hh========h=ShShh==hh==========h=ShShShh========hh======hh==
07800: ==hh====hh=hh=hShh==hShSh...........h====h=========h=========h==
07c00: =hhh......h=====h=====================h=========================
08000: =======h=====h=============h===.................h.........hhhhhh
08400: h=............h.........h=hh============h.......................
08800: ..............................................................hh
08c00: =======h========================================================
09000: =======h====================================.....hh.........hh..
09400: ...................h............................................
09800: .....h======h========h====================h======h==========hh==
09c00: ================h=........................................hh====
0a000: =====...........................................................
0a400: ................................................................
0a800: .................................hhhhhhhhhhhh........hhhhhhhh...
0ac00: ..........h=====================================================
0b000: ===========================........hhhhhhh........hhhhhhhhhhhhhh
0b400: hhhhhh........hhhhhhhhhhhhhhhhhhhh........hhhhhhhhhhhhhh........
0b800: hhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhh........hhhhhhhhhhhhhhhhhhhhhhhh
0bc00: hh........hhhhhhhhhhhhhhhhhhhhhhhhhhh........hhhhhhhhhhhhhhhhhhh
0c000: hh........hhhhhhhhhhhhhhhhhhh........hhhhhhhhhhhhhhhhhhhhhhhhhh.
0c400: .......hhhhhhhhhhhhhhhhhhh........hhhhhhhhhhhhhhh........hhhhhhh
0c800: hhhhhhhhhhhhhh........hhhhhhhhhh........hhh........hhhhhhhhhhhhh
0cc00: hhhhhhh........hhhhhhhhhhhhhhhhhhhh........hhhhhhhhh....hh....h=
0d000: .......................h=.......................................
0d400: ....hhhhhhhhhhhhhhhhhh..hh.....hh===============================
0d800: ================================================================
0dc00: ================================================================
0e000: ============================.....................h=====h========
0e400: ===========h========h===========================================
0e800: =h====h=========================================================
0ec00: ================================================================
0f000: ================================================================
0f400: ==..............................................................
0f800: ................................
Traceback (most recent call last):
  File "main.py", line 89, in <module>
  File "main.py", line 68, in main
MemoryError: memory allocation failed, allocating 3001 bytes
MicroPython v1.16-285-g5be5707bd-dirty on 2021-09-12; ESP32 module with ESP32
Type "help()" for more information.
>>>

@robert-hh please reconsider this as an issue, since quite a major optimization problem and completely independent from my code.
I'll continue to investigate meanwhile

@KonssnoK
Copy link
Author

I do think this is related to micropython parsing all the files dependent from main.py in the device.
So i guess the modules are being loaded, occupying let's say 80% of the available memory.
Then the GC runs, cleaning some of the loaded data but obviously not reallocating the loaded ones.
This recovers some space for small allocation but the contiguous memory free for use is reduced a lot.

This is why you are not seeing the issue, the fact that you don't have other modules gives much more spare space and the gc manages to cope with it.

It is still strange that the malloc fails after few loops, since no other code is running, as if the GC fails to free an allocated block sometimes.

I think that in the end we can split this and reformulate as:

The GC is not reliable when memory is close to being filled.

@stinos
Copy link
Contributor

stinos commented Sep 13, 2021

Not sure that is the correct way to phrase it; i.e. the GC is reliable in that it is predictable. It's just that it doesn't defragment and total amount of free memory is in almost no way related to maximum size which can be allocated.

@robert-hh
Copy link
Contributor

I disagree. GC is reliable in the way it is implemented. It does not move blocks when runnuign gc.collect(). So it cann happen that many free blocks are scattered all over the memory, amounting to a large fraction of the total memory, but no contiguous large memory block. That was discussed several times.
b.t.w.: I ran you sample code now with 3000 bytes alloc for an hour w.o. problems.

@KonssnoK
Copy link
Author

KonssnoK commented Sep 13, 2021

nono, problem is not moving around memory.
That is not the job of the GC.

Please leave aside the initial example.
This is what happens:

  • The script loads the submodules
  • Some contiguous memory is left free in the heap. enough for a 3K allocation but not enough for 4K
  • The loop starts, continuously reallocating those 3K -> GC collect is forced every loop during malloc
  • At some point, the GC does not free the 3K buffer in the malloc -> NO SPACE

This should not happen, but somehow it does.
This is why i'm calling the GC unreliable, it fails to deallocate a chunk of memory which is not referenced anymore in one loop.
Probably if more memory is available, the malloc will allocate the next block and the old one will be released the next cycle (or not? who knows).

This is visible in the GC dumps above:

Before the loop

0e400: ===========h========h===========================================
0e800: =h====..........................................................
       (3 lines all free)

after few cycles the Malloc fails and the dump shows clearly a big buffer still allocated -> which is not referenced

0e400: ===========h========h===========================================
0e800: =h====h=========================================================
0ec00: ================================================================
0f000: ================================================================
0f400: ==..............................................................
0f800: ................................

to rerun a meaningful example you should fill your heap until you have something like 5KB of contiguous space left at the end of your heap, then run the loop.

It can be drawn like this
FFFFFFFFFFFF................................
FFFFFFFFFFFFh======.......... -> 3KB Allocation
FFFFFFFFFFFF................................ -> forced collect on next loop
FFFFFFFFFFFFh======.......... -> 3KB allocation
FFFFFFFFFFFFh======.......... -> failed GC collect on next loop
FFFFFFFFFFFFh======.......... -> failed malloc allocation due to no space

where
. = free space
F = fixed allocated memory
h======== = 3K Buffer

The issue can be probably renamed something like
"GC failing to free unreferenced memory on forced collect"
but it should still be verified which of the 2 happens:

  • if there is enough memory, the chunk that is not freed would be freed the next cycle
  • the chunk is never freed

First case is just an optimization issue, second case is a bug.

@dpgeorge
Copy link
Member

GC failing to free unreferenced memory on forced collect

That can only happen if there is a reference somewhere to the memory. It may be a stray reference (eg somewhere on the C stack, in some machine register) but there must be a pointer somewhere in RAM/registers. Otherwise the GC will reclaim the memory.

if there is enough memory, the chunk that is not freed would be freed the next cycle

This can never happen. If a chunk is freeable (not referenced anywhere) then it is guaranteed to be freed in the GC collection (ie it's never delayed).

BTW, did you try using gc.threshold() to set a threshold, eg 5000? That can help a lot to keep the heap compacted.

@KonssnoK
Copy link
Author

KonssnoK commented Sep 14, 2021

@dpgeorge there is no need to put the threshold since the initial condition of the test will cause the GC to be over threshold every time.

That can only happen if there is a reference somewhere to the memory. It may be a stray reference (eg somewhere on the C stack, in some machine register) but there must be a pointer somewhere in RAM/registers. Otherwise the GC will reclaim the memory.

This is why the following script should never fail! No references whatsoever, unless some internal mp module is getting delayed.

This can never happen. If a chunk is freeable (not referenced anywhere) then it is guaranteed to be freed in the GC collection (ie it's never delayed).

So option 2, there is a bug in the GC/(mp memory management) :)

This is an updated version of the script that will fail after few seconds (if you want to make it fail faster reduce the size of FREESPACE, leaving enough space for a b c allocations tho!).

boot.py

# This file is executed on every boot (including wake-boot from deepsleep)
#import esp
#esp.osdebug(None)
import uos, machine
#uos.dupterm(None, 1) # disable REPL on UART(0)
import gc
#import webrepl
#webrepl.start()
gc.collect()

gc.dump()

FREESPACE = 8000
total = gc.mem_free()
d = bytes(total - FREESPACE)
i = 0

gc.dump()

while True:
    a = bytes(3000)
    b = bytes(16)
    c = bytes(40)
    la = len(a)
    print("A %s %s %s %s %s"%(a[0] , la, gc.mem_alloc(), gc.mem_free(), i))
    i+=1

the gc.dump is just:

STATIC mp_obj_t gc_dump(void) {

    gc_dump_alloc_table();
    return mp_const_none;
}
MP_DEFINE_CONST_FUN_OBJ_0(gc_dump_obj, gc_dump);

....
    { MP_ROM_QSTR(MP_QSTR_dump), MP_ROM_PTR(&gc_dump_obj) },
....

The result of the script after a 10k iterations (on my ESP32) is

0ec00: ====================h===========================================
0f000: ================================================================
0f400: ================================================================
0f800: ================................
Traceback (most recent call last):
  File "boot.py", line 19, in <module>
MemoryError: memory allocation failed, allocating 3001 bytes

@robert-hh
Copy link
Contributor

I can replicate that on my ESP32 board if it is in boot.py or in a separate imported python script.
MicroPython v1.17-20-g0a5107372 on 2021-09-13; ESP32 module with ESP32
Type "help()" for more information.
I can NOT replicate this code it it is called from the command line using pasting.
The alloc fail also disappears is the FREESPACE is e.g. increased to 12000, the getting into a stable loop of free_mem() figures.
So what you see is just the expected behavior of gc and memory allocation. If the heap is fragmented, then chances are high that you cannot allocate any more a large block of data.

b.t.w. micropython.mem_info(1) creates the same output as your gc.dump().

@KonssnoK
Copy link
Author

well @robert-hh , the heap is not fragmented anymore in the latest example,
the majority of it is taken by d, the remaining part is free for the loop.

I can replicate that on my ESP32 board if it is in boot.py or in a separate imported python script.

Cool that you could replicate!

I can NOT replicate this code it it is called from the command line using pasting.

Uhm, i haven't tried terminal yet, what would be the difference? less loaded modules?

The alloc fail also disappears is the FREESPACE is e.g. increased to 12000, the getting into a stable loop of free_mem() figures.

This would make me say something like "The GC is late to free the 3K buff in one cycle, and manages to free it the one after."
So basically you have enough space to sustain a delayed collection. This was my option 1 a couple of posts before.

But, it could also mean that one 3K buffer is allocated and not released, the fact the loop still works is because you can allocate in another place. IF this is true tho, the loop would just fail in the same way after a longer time, since you would gradually fill up the heap.

b.t.w. micropython.mem_info(1) creates the same output as your gc.dump().
Oh! thanks!

@KonssnoK KonssnoK changed the title Malloc failing with half of heap available GC: Malloc failing to allocate due to lately release of unreferenced allocated memory by forced GC swipe Sep 21, 2021
@KonssnoK
Copy link
Author

I renamed the issue to highlight the underlying Micropython issue.
Hope it's clear enough.

@KonssnoK
Copy link
Author

dropping this since we stopped using micropython for now

tannewt added a commit to tannewt/circuitpython that referenced this issue Apr 21, 2023
Add address_little_endian for epaper displays with little endian
(low byte first) addresses.

Also clears allocated display and display bus memory so it has a
known state. The acep member wasn't always set so it varied
accidentally.

Fixes micropython#7560. May fix micropython#7778. Fixes micropython#5119.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants