Skip to content

py/malloc: Fix failed memory allocations by more aggressively freeing buffers. #16015

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 9 commits into from
Feb 3, 2025

Conversation

projectgus
Copy link
Contributor

@projectgus projectgus commented Oct 15, 2024

Summary

This issue was originally reported on Discord by ondrej1024: on RPI_PICO_W a TLS connection via micropython-lib umqtt.simple would fail consistently on reconnect with OSError(ENOMEM). A workaround was to explicitly close the old socket before connecting again. This doesn't seem like it should be necessary.

Was able to generalise into the new test case in this PR: extmod/ssl_noleak. This fails when trying to allocate sequential SSL socket objects (one at a time) on rp2 and also other ports.

Different fixes were needed, in separate commits in this PR. First approach was replaced with a new approach at suggestion of @dpgeorge, see inlinecomments below.

Changes:

  • For rp2 and other mbedTLS bare metal ports, TLS nodes are "tracked" memory allocations associated with the SSL socket object. If an SSL socket object runs its finaliser during garbage collection then the "tracked" nodes can't be immediately freed (GC is locked) so they were left for the next GC. As a result, allocating a new SSL socket could fail even though enough memory was available (but not yet freed). Fix is to support gc_free() during a GC sweep by tracking gc collection separately from "heap is locked".
  • This change led to a double-free assertion failure during gc_sweep_all (soft-reset), as the memory buffer might already be free. Fix for this was to split gc_sweep into two phases - one which runs finalisers and then one which frees memory. This works around other potential finaliser issues when memory has already been freed, too. Due to the sparse nature of FTB entries, this could be optimised to more than trade-off adding a second pass, with only a small code size increase.
  • Similarly, GC mutex needed to be made recursive on ports which support MICROPY_PY_THREAD && !MICROPY_PY_THREAD_GIL. Have added an optional generic recursive mutex interface.
  • For esp32 something similar happens where the small SSL socket object in the MicroPython heap holds large buffers outside MicroPython in the ESP-IDF heap. Allocatinga new SSL socket object could fail to allocate new buffers as the ESP-IDF heap is too full, even if there are SSL socket objects eligible for garbage collection that would free up lots of ESP-IDF heap when garbage collected. Fix is to run a gc_collect() and retry if SSL socket setup fails.

Testing

  • New unit test extmod/ssl_noleak fails consistently on master (using RPI_NANO_W and ESP32_GENERIC_C3 boards). Probably also fails on other ports using MICROPY_MBEDTLS_CONFIG_BARE_METAL, unless they have huge amounts of RAM.
  • With fixes in this PR, new test passes (as well as all other unit tests).
  • Separate commit enables mbedTLS "bare metal" memory management in the coverage build, to cover these changes and increase the coverage of MicroPython-specific code (the default for unix port being libc calloc and free).

Performance

Performance benchmark run on PYB_V11 with all cache init commented out:

diff of scores (higher is better)
N=168 M=100                ./pybv11_master.txt -> ./pybv11_pr.txt         diff      diff% (error%)
bm_chaos.py                    240.23 ->     242.18 :      +1.95 =  +0.812% (+/-0.00%)
bm_fannkuch.py                  48.62 ->      48.98 :      +0.36 =  +0.740% (+/-0.01%)
bm_fft.py                     1479.57 ->    1492.25 :     +12.68 =  +0.857% (+/-0.00%)
bm_float.py                   4082.97 ->    4080.69 :      -2.28 =  -0.056% (+/-0.00%)
bm_hexiom.py                    31.85 ->      31.80 :      -0.05 =  -0.157% (+/-0.01%)
bm_nqueens.py                 2840.25 ->    2866.36 :     +26.11 =  +0.919% (+/-0.01%)
bm_pidigits.py                 372.74 ->     376.82 :      +4.08 =  +1.095% (+/-0.36%)
bm_wordcount.py                 30.88 ->      30.88 :      +0.00 =  +0.000% (+/-0.01%)
core_import_mpy_multi.py       361.54 ->     363.66 :      +2.12 =  +0.586% (+/-0.01%)
core_import_mpy_single.py       57.34 ->      57.34 :      +0.00 =  +0.000% (+/-0.00%)
core_locals.py                  24.96 ->      24.96 :      +0.00 =  +0.000% (+/-0.00%)
core_qstr.py                    78.47 ->      78.46 :      -0.01 =  -0.013% (+/-0.00%)
core_str.py                     12.56 ->      12.56 :      +0.00 =  +0.000% (+/-0.00%)
core_yield_from.py             220.01 ->     220.00 :      -0.01 =  -0.005% (+/-0.01%)
misc_aes.py                    268.31 ->     268.31 :      +0.00 =  +0.000% (+/-0.01%)
misc_mandel.py                2069.87 ->    2090.56 :     +20.69 =  +1.000% (+/-0.01%)
misc_pystone.py               1366.78 ->    1366.45 :      -0.33 =  -0.024% (+/-0.01%)
misc_raytrace.py               246.10 ->     248.10 :      +2.00 =  +0.813% (+/-0.01%)
viper_call0.py                 332.95 ->     332.96 :      +0.01 =  +0.003% (+/-0.01%)
viper_call1a.py                324.59 ->     324.59 :      +0.00 =  +0.000% (+/-0.00%)
viper_call1b.py                252.76 ->     252.76 :      +0.00 =  +0.000% (+/-0.01%)
viper_call1c.py                254.68 ->     254.67 :      -0.01 =  -0.004% (+/-0.01%)
viper_call2a.py                319.64 ->     319.65 :      +0.01 =  +0.003% (+/-0.01%)
viper_call2b.py                222.30 ->     222.30 :      +0.00 =  +0.000% (+/-0.01%)

Trade-offs and Alternatives

  • For future performance gains we could look at adding a similar "fast path" optimisation to the ATB lookups in GC, and/or expanding the ATB and FTB table entries to word size (currently individual bytes).
  • For bare metal, @dpgeorge pointed out that it's possible mbedTLS doesn't need "tracked node" allocations (i.e. if mbedTLS structures always contain pointers to the head of each buffer, then they could be tracked as normal GC objects). However the mbedTLS implementation has used "tracked node" allocations for as long as it's been in MicroPython, and seems like good insurance rather than auditing every mbedTLS allocation.
  • For ESP32, it's tempting to try and extend the "GC and then retry" behaviour to all ESP-IDF heap allocations. However, this is much more fiddly and has potential thread safety issues for allocations made from non-Python threads. Most of the large ESP-IDF memory allocations happen as part of mbedTLS, at least.

@projectgus projectgus added port-esp32 py-core Relates to py/ directory in source extmod Relates to extmod/ directory in source port-rp2 labels Oct 15, 2024
Copy link

codecov bot commented Oct 15, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 98.59%. Comparing base (bfb1bee) to head (990f50f).
Report is 9 commits behind head on master.

Additional details and impacted files
@@           Coverage Diff           @@
##           master   #16015   +/-   ##
=======================================
  Coverage   98.58%   98.59%           
=======================================
  Files         167      167           
  Lines       21590    21599    +9     
=======================================
+ Hits        21285    21295   +10     
+ Misses        305      304    -1     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Copy link

github-actions bot commented Oct 15, 2024

Code size report:

   bare-arm:    +0 +0.000% 
minimal x86:   -41 -0.022% 
   unix x64:  +428 +0.050% standard[incl +8(data)]
      stm32:    +8 +0.002% PYBV10
     mimxrt:    +0 +0.000% TEENSY40
        rp2:   +16 +0.002% RPI_PICO_W
       samd:    +8 +0.003% ADAFRUIT_ITSYBITSY_M4_EXPRESS
  qemu rv32:   +40 +0.009% VIRT_RV32

@projectgus projectgus changed the title py/malloc,extmod/mbedtls: Fix failed memory allocations by more aggressively freeing buffers. py/malloc,port/esp32: Fix failed memory allocations by more aggressively freeing buffers. Oct 15, 2024
@projectgus projectgus changed the title py/malloc,port/esp32: Fix failed memory allocations by more aggressively freeing buffers. py/malloc: Fix failed memory allocations by more aggressively freeing buffers. Oct 15, 2024
@projectgus projectgus force-pushed the bugfix/ssl_tracked_node_gc branch from cfb5d5c to d2b8f2f Compare October 16, 2024 03:35
@projectgus projectgus force-pushed the bugfix/ssl_tracked_node_gc branch 5 times, most recently from a6861ec to 6ef0256 Compare October 16, 2024 06:14
@projectgus projectgus force-pushed the bugfix/ssl_tracked_node_gc branch from 6ef0256 to 21a4e5a Compare October 24, 2024 02:56
@projectgus projectgus requested a review from dpgeorge November 12, 2024 00:08
@projectgus projectgus force-pushed the bugfix/ssl_tracked_node_gc branch from 21a4e5a to 0048e7c Compare December 3, 2024 00:46
@dpgeorge
Copy link
Member

dpgeorge commented Dec 3, 2024

This looks really clean @projectgus ! Thanks for the thorough implementation and testing.

That said, I can't help but think of the alternative where gc_free() could possibly be made to work when the GC is locked. That would solve the bare-metal case without needing a list of pending tracked nodes to free, and also maybe improve other cases where gc_free() doesn't work if the heap is locked.

It could be tricky to implement though, eg it probably needs a recursive mutex for GC_ENTER(). Or, could have a flag that's set when running a finaliser (eg gc_finaliser_active = true) and if this flag is set when gc_free() is called it runs only this bit of the function:

    // free head and all of its tail blocks
    do {
        ATB_ANY_TO_FREE(area, block);
        block += 1;
    } while (ATB_GET_KIND(area, block) == AT_TAIL);

That would be more of a self-contained fix, contained to gc.c.

@projectgus
Copy link
Contributor Author

gc_free() could possibly be made to work when the GC is locked

I think I started out with the same idea, and came to a sticking point that I couldn't seem to resolve. Unfortunately I no longer remember what that sticking point was, but I'll take a quick look and get back to you.

@projectgus projectgus force-pushed the bugfix/ssl_tracked_node_gc branch from 0048e7c to 677bc67 Compare December 4, 2024 04:18
@projectgus

This comment was marked as outdated.

@projectgus projectgus marked this pull request as draft December 4, 2024 04:48
@projectgus projectgus force-pushed the bugfix/ssl_tracked_node_gc branch 3 times, most recently from a5aff04 to 630629e Compare December 10, 2024 04:13
@projectgus projectgus force-pushed the bugfix/ssl_tracked_node_gc branch 3 times, most recently from ef241b3 to 732c047 Compare January 14, 2025 04:41
@projectgus projectgus marked this pull request as ready for review January 14, 2025 05:28
@projectgus projectgus requested a review from dpgeorge January 14, 2025 05:28
@projectgus
Copy link
Contributor Author

Note for anyone reviewing: There are enough semi-related changes here that it might be worth reviewing as individual commits.

@projectgus projectgus force-pushed the bugfix/ssl_tracked_node_gc branch from 732c047 to 90f3111 Compare January 28, 2025 04:58
@projectgus projectgus force-pushed the bugfix/ssl_tracked_node_gc branch 3 times, most recently from 475d355 to 7a7bf31 Compare January 28, 2025 05:26
Copy link
Member

@dpgeorge dpgeorge left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good now, this has come a long way since its first version!

Please can you just modify the commit subject line for the two "gc: ..." commits to "py/gc: ..." for consistency (I still need to revisit #15823 for that).

@projectgus projectgus force-pushed the bugfix/ssl_tracked_node_gc branch from 5da2c06 to c037c15 Compare January 29, 2025 03:19
@projectgus
Copy link
Contributor Author

Please can you just modify the commit subject line for the two "gc: ..." commits to "py/gc: ..." for consistency (I still need to revisit #15823 for that).

Done!

Test is for an issue reported on the micropython-lib Discord as
effecting the rp2 port umqtt.simple interface when reconnecting with TLS,
however it's a more generic problem.

Currently this test fails on RPI_PICO_W and ESP32_GENERIC_C3 (and no doubt
others). Fixes are in the subsequent commits.

This work was funded through GitHub Sponsors.

Signed-off-by: Angus Gratton <angus@redyak.com.au>
On mbedTLS ports with non-baremetal configs (mostly esp32, technically also
unix port), mbedTLS memory is allocated from the libc heap. This means an
old SSL socket may be holding large SSL buffers and preventing a new SSL
socket from being allocated.

As a workaround, trigger a GC pass and retry before failing outright.

This was originally implemented as a global mbedTLS calloc function, but
there is complexity around the possibility of C user modules calling into
mbedTLS without holding the GIL. It would be interesting to try making a
generic version for any malloc which fails, but this would require checking
for a Python thread and probably making the GIL recursive.

This work was funded through GitHub Sponsors.

Signed-off-by: Angus Gratton <angus@redyak.com.au>
This allows coverage to test MicroPython-specific features such as
the tracked alloc cleanups added in the parent commit.

This work was funded through GitHub Sponsors.

Signed-off-by: Angus Gratton <angus@redyak.com.au>
Currently a finalizer may run and access memory which has already been
freed. (This happens mostly during gc_sweep_all() but could happen during
any garbage collection pass.)

Includes some speed improvement tweaks to skip empty FTB blocks. These help
compensate for the inherent slowdown of having to walk the heap twice.

Signed-off-by: Angus Gratton <angus@redyak.com.au>
Do this by tracking being inside gc collection with a
separate flag, GC_COLLECT_FLAG. In gc_free(),
ignore this flag when determining if the heap is locked.

* For finalisers calling gc_free() when heap is otherwise unlocked,
  this allows memory to be immediately freed (potentially
  avoiding a MemoryError).

* Hard IRQs still can't call gc_free(), as heap will be locked via
  gc_lock().

* If finalisers are disabled then all of this code can be compiled
  out to save some code size.

Signed-off-by: Angus Gratton <angus@redyak.com.au>
Enabled by default if using threading and no GIL

This work was funded through GitHub Sponsors.

Signed-off-by: Angus Gratton <angus@redyak.com.au>
Necessary for GC support, also refactored pendsv usage.

This work was funded through GitHub Sponsors.

Signed-off-by: Angus Gratton <angus@redyak.com.au>
Allows refactoring the existing thread_mutex atomic section support to use
the new recursive mutex type.

This work was funded through GitHub Sponsors.

Signed-off-by: Angus Gratton <angus@redyak.com.au>
- Renamed gc_sweep to gc_sweep_free_blocks.
- Call gc_sweep_run_finalisers from top level.
- Reordered the gc static functions to be in approximate
  runtime sequence (with forward declarations) rather than
  in declaration order.

This work was funded through GitHub Sponsors.

Signed-off-by: Angus Gratton <angus@redyak.com.au>
@dpgeorge dpgeorge force-pushed the bugfix/ssl_tracked_node_gc branch from c037c15 to 990f50f Compare February 3, 2025 04:03
@dpgeorge dpgeorge merged commit 990f50f into micropython:master Feb 3, 2025
65 checks passed
@projectgus projectgus deleted the bugfix/ssl_tracked_node_gc branch February 4, 2025 05:14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
extmod Relates to extmod/ directory in source port-esp32 port-rp2 py-core Relates to py/ directory in source
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants