Increased memory usage with mimalloc #135153
Labels
interpreter-core
(Objects, Python, Grammar, and Parser dirs)
performance
Performance or resource usage
type-bug
An unexpected behavior, bug, or error
Uh oh!
There was an error while loading. Please reload this page.
This was originally going to be titled "with free-threaded build". However, based on some investigation, the increased memory usage is seemingly largely due to the use of mimalloc rather than free-threading specific behaviors. This issue will focus on mimalloc causing increased memory usage.
Here are some techniques I used to investigate this, which might be useful to others. Note that these instructions are specific for running on Linux.
Include/internal/mimalloc/mimalloc/types.h
, uncomment theMI_TRACK_VALGRIND
define. This allows the "massif" tool of Valgrind to trace the mimalloc allocations and deallocations. Also note that you must compile with GCC, not Clang.Objects/mimalloc/alloc.c
, comment out the line containing the "reallocation still fits and not more than 50% waste". This makes mimalloc always resize down an object, even if the decrease is less than 50% of the original size. This is not a major cause of extra memory usage but reduces the gap between pymalloc and mimalloc. pymalloc always resizes downwards.valgrind --tool=massif ./python <script>
. This will create amassif.out.NNN
file as a result. That is a text file report of the allocations and where they happened.As an alternative to "massif", you can also use memray. To make it run with the free-threaded build, you need to uncomment one line (the definition of Py_LIMITED_API in
src/memray/_memray/inject.cpp
). You do have to rely on--trace-python-allocators
as well as--native
because mimalloc allocates a 1Gb region right at process startup, but you really want to see the results of tracing python allocations anyway.Here is a run of pyperformance done by Thomas Wouters, showing memory usage of benchmarks compared to a baseline.
pyperformance results
Based on the pyperformance memory usage statistics, the
bm_tomli_loads
benchmark is a benchmark that uses quite a lot of extra memory under the free-threaded build vs the default build. To make it easier to run, I wrote a simpler version of a benchmark that uses the builtintomllib
module instead. It also shows a similar amount of extra memory usage.bench_tomllib_mem.py.txt
Some additional scripts, to dump out memory usage stats:
memory_stats.py.txt
count_used_pages.py.txt
Statistics on RSS (resident set size) when running this script under various Python builds:
Note that to run the default build with mimalloc, you must set
PYTHONMALLOC=mimalloc
in the environment. Otherwise, pymalloc is used, even if mimalloc is available in the build. The FT build always uses mimalloc.Using the massif reports, we can examine where the memory is being allocated from. Below is a summary for three different builds.
Based on these results, my suspicion is that mimalloc can waste quite a lot of memory compared to pymalloc when many small allocations are made. I suspect so because it seems most of the extra memory usage is due to the
unicode_subscript
calls. For thepymalloc
build, that memory must be accounted under the "PyUnicode_New, others" category (massif doesn't show theunicode_subscript
call explicitly). More work is needed to confirm this theory. Another theory is that mimalloc is not releasing the freed memory back to the OS as aggressively as pymalloc does.The text was updated successfully, but these errors were encountered: