Skip to content

Increased memory usage with mimalloc #135153

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
nascheme opened this issue Jun 5, 2025 · 0 comments
Open

Increased memory usage with mimalloc #135153

nascheme opened this issue Jun 5, 2025 · 0 comments
Labels
interpreter-core (Objects, Python, Grammar, and Parser dirs) performance Performance or resource usage type-bug An unexpected behavior, bug, or error

Comments

@nascheme
Copy link
Member

nascheme commented Jun 5, 2025

This was originally going to be titled "with free-threaded build". However, based on some investigation, the increased memory usage is seemingly largely due to the use of mimalloc rather than free-threading specific behaviors. This issue will focus on mimalloc causing increased memory usage.

Here are some techniques I used to investigate this, which might be useful to others. Note that these instructions are specific for running on Linux.

  • In Include/internal/mimalloc/mimalloc/types.h, uncomment the MI_TRACK_VALGRIND define. This allows the "massif" tool of Valgrind to trace the mimalloc allocations and deallocations. Also note that you must compile with GCC, not Clang.
  • In Objects/mimalloc/alloc.c, comment out the line containing the "reallocation still fits and not more than 50% waste". This makes mimalloc always resize down an object, even if the decrease is less than 50% of the original size. This is not a major cause of extra memory usage but reduces the gap between pymalloc and mimalloc. pymalloc always resizes downwards.
  • Your benchmark script can be run with valgrind --tool=massif ./python <script>. This will create a massif.out.NNN file as a result. That is a text file report of the allocations and where they happened.

As an alternative to "massif", you can also use memray. To make it run with the free-threaded build, you need to uncomment one line (the definition of Py_LIMITED_API in src/memray/_memray/inject.cpp). You do have to rely on --trace-python-allocators as well as --native because mimalloc allocates a 1Gb region right at process startup, but you really want to see the results of tracing python allocations anyway.

Here is a run of pyperformance done by Thomas Wouters, showing memory usage of benchmarks compared to a baseline.
pyperformance results

Based on the pyperformance memory usage statistics, the bm_tomli_loads benchmark is a benchmark that uses quite a lot of extra memory under the free-threaded build vs the default build. To make it easier to run, I wrote a simpler version of a benchmark that uses the builtin tomllib module instead. It also shows a similar amount of extra memory usage.

bench_tomllib_mem.py.txt

Some additional scripts, to dump out memory usage stats:

memory_stats.py.txt
count_used_pages.py.txt

Statistics on RSS (resident set size) when running this script under various Python builds:

Build RSS (kB) Increase
Default build, pymalloc 131,192 -
Default build, mimalloc 196,796 1.5x
FT build, mimalloc 213,408 1.6x

Note that to run the default build with mimalloc, you must set PYTHONMALLOC=mimalloc in the environment. Otherwise, pymalloc is used, even if mimalloc is available in the build. The FT build always uses mimalloc.

Using the massif reports, we can examine where the memory is being allocated from. Below is a summary for three different builds.

Build Mem (MB)
default, pymalloc:
67.5 PyUnicode_New, _PyUnicodeWriter_PrepareInternal
0.2 PyUnicode_New, others
16.8 _PyBytes_Resize
3.9 new_keys_object
88.5 sub-total
90.3 total heap usage reported by massif
default, mimalloc:
67.5 PyUnicode_New _PyUnicodeWriter_PrepareInternal
16.8 _PyBytes_FromSize
37.8 PyUnicode_New unicode_subscript
8.5 new_keys_object
130.6 sub-total
136.5 total heap usage reported by massif
free-threaded, mimalloc:
67.5 PyUnicode_New _PyUnicodeWriter_PrepareInternal
16.8 _PyBytes_FromSize
47.7 PyUnicode_New unicode_subscript
8.7 new_keys_object
140.7 sub-total
147.0 total heap usage reported by massif

Based on these results, my suspicion is that mimalloc can waste quite a lot of memory compared to pymalloc when many small allocations are made. I suspect so because it seems most of the extra memory usage is due to the unicode_subscript calls. For the pymalloc build, that memory must be accounted under the "PyUnicode_New, others" category (massif doesn't show the unicode_subscript call explicitly). More work is needed to confirm this theory. Another theory is that mimalloc is not releasing the freed memory back to the OS as aggressively as pymalloc does.

@nascheme nascheme added the performance Performance or resource usage label Jun 5, 2025
@picnixz picnixz added type-bug An unexpected behavior, bug, or error interpreter-core (Objects, Python, Grammar, and Parser dirs) labels Jun 7, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
interpreter-core (Objects, Python, Grammar, and Parser dirs) performance Performance or resource usage type-bug An unexpected behavior, bug, or error
Projects
None yet
Development

No branches or pull requests

2 participants