Skip to content

Conversation

AJMansfield
Copy link
Contributor

@AJMansfield AJMansfield commented Aug 29, 2025

Summary

This PR implements the required functionality to allow the garbage collector to run user-defined __del__ methods as finalisers.

Testing

I've added full-coverage tests of this feature that I've tested for sensitivity across both the Unix port and my RP2040 and RP2350 boards. All tests pass on my RP2 boards, and with one very difficult exception noted below all tests also pass on unix.

tests/basics/class_del_postadded.py only passes on the unix builds I've built with DEBUG=1; but fails on my DEBUG=0 builds. I suspect that this is an underlying garbage-collector bug, rather than a bug with this implementation of __del__ specifically. So far I've failed to discover exactly why it is this occurs, though I've confirmed using the debugger on a non-debug non-stripped build that everything up to and including the call to mp_obj_malloc_var_with_finaliser does run as expected.

Trade-offs and Alternatives

This PR claims two additional bits from the mp_obj_type_t.flags field, for MP_TYPE_FLAG_IS_INSTANCED and MP_TYPE_FLAG_HAS_FINALISER. The latter bit is required in order to track whether instances should be allocated using the _with_finaliser allocators, while the former is required to track whether or it's safe to allow the __del__ method to be inserted into a class after-the-fact. It's possible that MP_TYPE_FLAG_IS_INSTANCED could be eliminated by forbidding this type of insertion entirely.

Alternatively, almost all of the code complexity in this PR could be eliminated by unconditionally allocating objects from user-defined classes with mp_obj_malloc_var_with_finaliser; however the performance penalty that this would introduce with all of the failed attribute lookups on non-using classes, leads me to favor this more-complex approach.

Copy link

codecov bot commented Aug 29, 2025

Codecov Report

❌ Patch coverage is 97.22222% with 1 line in your changes missing coverage. Please review.
✅ Project coverage is 98.39%. Comparing base (af745a5) to head (9c37ce3).
⚠️ Report is 12 commits behind head on master.

Files with missing lines Patch % Lines
extmod/vfs.c 75.00% 1 Missing ⚠️
Additional details and impacted files
@@           Coverage Diff           @@
##           master   #18005   +/-   ##
=======================================
  Coverage   98.38%   98.39%           
=======================================
  Files         171      171           
  Lines       22296    22314   +18     
=======================================
+ Hits        21937    21955   +18     
  Misses        359      359           

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@AJMansfield
Copy link
Contributor Author

AJMansfield commented Aug 29, 2025

Note that the first 2 commits here are common with #18004; debugging this feature was the main driver for finding those cleanups.

Copy link

github-actions bot commented Aug 29, 2025

Code size report:

   bare-arm:   +16 +0.028% 
minimal x86:   +24 +0.013% 
   unix x64:  +136 +0.016% standard
      stm32:  +104 +0.026% PYBV10
     mimxrt:  +104 +0.028% TEENSY40
        rp2:  +128 +0.014% RPI_PICO_W
       samd:  +140 +0.052% ADAFRUIT_ITSYBITSY_M4_EXPRESS
  qemu rv32:  +272 +0.060% VIRT_RV32

@AJMansfield
Copy link
Contributor Author

AJMansfield commented Aug 29, 2025

Remaining TODO:

  • Find a means to discriminate MICROPY_ENABLE_FINALISER to avoid test failures on minimal ports.
  • Determine the root cause for the Unix-specific test failures and fix it.
  • Revise documentation to communicate this capability instead of warning its absence.
  • Develop and test performance benchmarks to profile runtime impact.

Cleans up 2423493, and fixes builds
that include LFS but not MICROPY_ENABLE_FINALISER.

Signed-off-by: Anson Mansfield <amansfield@mantaro.com>
@dlech
Copy link
Contributor

dlech commented Aug 29, 2025

Other than for freeing resources obtained through FFI, what use case would it have in MicroPython?

@AJMansfield
Copy link
Contributor Author

AJMansfield commented Aug 29, 2025

Other than for freeing resources obtained through FFI, what use case would it have in MicroPython?

What use cases do finalisers have in MicroPython at all? Anything you could want to do with them in C, you could have a perfectly valid reason to want to do with python. __del__ isn't just about letting you close or destruct non-python external resources, it's also for managing the lifetimes of other python objects in designs that involve caching or ECS-like systems.

The original impetus for this feature was related to @agatti's current work implementing machine.Pin on the unix port, in order to preserve the standard machine.Pin(*a) is machine.Pin(*a) invariant with pins backed by /dev/gpiochipX ioctls. Not all device files necessarily even exist yet when the interpreter starts, so they really do need to be dynamically allocated, and freed too when user code no longer holds a reference, in order to not leak memory over multiple hotplug cycles of a gpio device.

I've got another library I've been developing that would benefit from this, that's for managing I2C devices --- with efficient transaction queueing, caching, and automatic fusion of adjacent transactions.

When a sqlite returns a handle to a stored procedure it's memoized, if you're using a python-level ORM object to manage that procedure, it ought to be able to set a __del__ method that instructs sqlite to discard that entry from its stored procedure table, too.

More broadly, this applies to any sorts of code that uses ECS/SoA data layout, i.e. with user-defined type with a non-pointer value indexing some kind of table; or when doing any kind of database that needs to track reach-ability across non-trivial foreign-key relationships.

In many cases, context managers are the "clean code approved" way to manage external lifetimes --- but if every problem was one with lifetime boundaries that were that simple, we'd already all just be writing Rust instead of Python. Plus, to implement a context manager that's robust against bugs like e.g. #2621, that too also needs to make use of __del__, as a second-resort cleanup behind __exit__.

I'm currently working on an implementation of the weakref module, which is also a tool often used in caching and ECS-like situations. With __del__, it should be possible to do this as part of micropython's python-language standard library, without necessarily requiring other c-language primitives (except maybe one for de/obfuscating object pointers from the garbage collector).

@AJMansfield
Copy link
Contributor Author

AJMansfield commented Aug 29, 2025

The calls signature for gc_malloc was changed in
5ed578e, without cleaning up existing
code on the rationale that the previous bool is automatically converted
to an int with the same meaning.

This commit goes back and cleans up existing invocations to make their
behavior more readable in a modern context.

Signed-off-by: Anson Mansfield <amansfield@mantaro.com>
Copy link
Contributor Author

@AJMansfield AJMansfield Aug 30, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

f7bc55c
This subtle tweak to the memory layout is the only distinction between a failing and passing version of this exact test, on otherwise exactly the same commits.

@AJMansfield
Copy link
Contributor Author

AJMansfield commented Aug 30, 2025

Ok, found the culprit! 1f093b7
Skipping finalisers on non-AT_HEAD blocks seems to not actually be a valid optimization here, though I definitely still need to shore up my understanding for why that is...

@AJMansfield
Copy link
Contributor Author

AJMansfield commented Aug 30, 2025

Ok, so this original ATB_GET_KIND(block) == AT_HEAD condition derives from the original del implementation in 4f7e9f5, all the way back in v1.0-rc1 --- put in the AT_HEAD of the switch-case on the block type. This was preserved across 12bab72 replacing MPOBJ table with the FTB table, and then again across the refactor in 8a2ff2c that moved finalisers into the separate gc_sweep_run_finalisers pass.

But, why isn't it the case that first block of the allocation is marked as AT_HEAD in the broken case? That seems like an invariant that, then as now, should be holding; the only change in that step of allocating an object since 2014 has been parameterizing out area:

    // mark first block as used head
    ATB_FREE_TO_HEAD(area, start_block);

@AJMansfield
Copy link
Contributor Author

AJMansfield commented Aug 30, 2025

I have an answer: it's not marked as AT_HEAD, because at that point in the algorithm, mark-and-sweep already clobbered it with AT_MARK! I can now confidently say that this is a garbage collector bug, introduced in 8a2ff2c.

AJMansfield added a commit to AJMansfield/micropython that referenced this pull request Aug 30, 2025
This check is a remnant of the original finaliser implementation from
4f7e9f5, all the way from pre-1.0.
This was a valid optimisation back then, but after its ordering with
other finaliser code changed in 8a2ff2c
the invariant it exploits is no longer valid, and it breaks down
completely in the case of variable-length allocations made with
`mp_obj_malloc_var_with_finaliser`.

This resulted in a latent bug in the VFS system, where depending on the
precise memory layout, VFS finaliser routines could sometimes be
inadvertently missed. This bug was discovered when encountering an
identical issue with an attempt to implement `__del__` for user classes
in micropython#18005.

As can be seen from the test runs with the instrumentation in micropython#18013,
this branch is also never actually hit by normal code outside of VFS.
Therefore, whatever the original intent and whether or not it was ever,
it's not actually an optimisation now; removing this condition is a
strict improvement to garbage collector code size and performance.

Signed-off-by: Anson Mansfield <amansfield@mantaro.com>
AJMansfield added a commit to AJMansfield/micropython that referenced this pull request Aug 30, 2025
This check is a remnant of the original finaliser implementation from
4f7e9f5, all the way from pre-1.0.
This was a valid optimisation back then, but after its ordering with
other finaliser code changed in 8a2ff2c
the invariant it exploits is no longer valid, and it breaks down
completely in the case of variable-length allocations made with
`mp_obj_malloc_var_with_finaliser`.

This resulted in a latent bug in the VFS system, where depending on the
precise memory layout, VFS finaliser routines could sometimes be
inadvertently missed. This bug was discovered when encountering an
identical issue with an attempt to implement `__del__` for user classes
in micropython#18005.

This perhaps should be removed completely for the reasons discussed
in micropython#18014, but removing it appears to induce some other issue with the
tls module; this version just expands the check in the interest of
avoiding that.

Signed-off-by: Anson Mansfield <amansfield@mantaro.com>
@AJMansfield
Copy link
Contributor Author

AJMansfield commented Aug 31, 2025

After much tribulation, I've determined that the bug I thought I was tracing is actually something else entirely. Not finalising AT_MARK blocks is the correct behavior --- that's how the mark-and-sweep instructs the rest of the GC machinery which blocks should be left unfreed, and finalising a block that the GC still thinks is reachable definitely is a bug.

Why, then, does GC think that object foo from class_del_postadded.py is still reachable? i.e. why is there still some kind of pointer sitting latent somewhere that GC tries to trace from?

(I guess there's also two other competing hypotheses: foo is sharing its ATB block with another object that's still reachable; or there actually is some other bug in GC with its reachability traces; both seem unlikely though.)

Signed-off-by: Anson Mansfield <amansfield@mantaro.com>
@AJMansfield
Copy link
Contributor Author

I've benchmarked this PR against the benchmarks in #18019

Results from Pico2, Cortex M33 @ 300 MHz
internal_bench/arrayop:
    0.050s (+00.00%) internal_bench/arrayop-1-list_inplace.py
    0.103s (+108.21%) internal_bench/arrayop-2-list_map.py
    0.058s (+17.42%) internal_bench/arrayop-3-bytearray_inplace.py
    0.120s (+141.35%) internal_bench/arrayop-4-bytearray_map.py
internal_bench/bytealloc:
    0.142s (+00.00%) internal_bench/bytealloc-1-bytes_n.py
    0.353s (+149.29%) internal_bench/bytealloc-2-repeat.py
internal_bench/bytebuf:
    0.058s (+00.00%) internal_bench/bytebuf-1-inplace.py
    0.459s (+688.80%) internal_bench/bytebuf-2-join_map_bytes.py
    0.120s (+105.47%) internal_bench/bytebuf-3-bytarray_map.py
internal_bench/class_create:
    0.203s (+00.00%) internal_bench/class_create-0-empty.py
    0.249s (+22.63%) internal_bench/class_create-1-slots.py
    0.251s (+23.59%) internal_bench/class_create-1.1-slots5.py
    0.242s (+19.28%) internal_bench/class_create-2-classattr.py
    0.383s (+88.39%) internal_bench/class_create-2.1-classattr5.py
    0.487s (+139.75%) internal_bench/class_create-2.3-classattr5objs.py
    0.245s (+20.86%) internal_bench/class_create-3-instancemethod.py
    0.269s (+32.62%) internal_bench/class_create-4-classmethod.py
    0.245s (+20.67%) internal_bench/class_create-4.1-classmethod_implicit.py
    0.270s (+32.91%) internal_bench/class_create-5-staticmethod.py
    0.244s (+20.00%) internal_bench/class_create-6-getattribute.py
    0.244s (+20.08%) internal_bench/class_create-6.1-getattr.py
    0.257s (+26.37%) internal_bench/class_create-6.2-property.py
    0.245s (+20.81%) internal_bench/class_create-6.3-descriptor.py
    0.211s (+03.70%) internal_bench/class_create-7-inherit.py
    0.211s (+03.91%) internal_bench/class_create-7.1-inherit_initsubclass.py
    0.316s (+55.42%) internal_bench/class_create-8-metaclass_setname.py
    0.597s (+193.89%) internal_bench/class_create-8.1-metaclass_setname5.py
internal_bench/class_instance:
    0.107s (+00.00%) internal_bench/class_instance-0-object.py
    0.109s (+02.13%) internal_bench/class_instance-0.1-object-gc.py
    0.253s (+136.51%) internal_bench/class_instance-1-empty.py
    0.255s (+137.94%) internal_bench/class_instance-1.1-classattr.py
    0.254s (+137.76%) internal_bench/class_instance-1.2-func.py
    0.255s (+138.61%) internal_bench/class_instance-1.3-empty-gc.py
    0.548s (+412.49%) internal_bench/class_instance-2-init.py
    0.862s (+706.22%) internal_bench/class_instance-2.1-init_super.py
    0.799s (+646.79%) internal_bench/class_instance-2.2-new.py
    0.389s (+263.42%) internal_bench/class_instance-3-del.py
    0.433s (+304.64%) internal_bench/class_instance-3.1-del-gc.py
    0.249s (+132.90%) internal_bench/class_instance-4-slots.py
    0.249s (+132.92%) internal_bench/class_instance-4.1-slots5.py
internal_bench/from_iter:
    0.018s (+00.00%) internal_bench/from_iter-1-list_bound.py
    0.075s (+322.42%) internal_bench/from_iter-2-list_unbound.py
    0.019s (+09.26%) internal_bench/from_iter-3-tuple_bound.py
    0.075s (+321.30%) internal_bench/from_iter-4-tuple_unbound.py
    0.021s (+15.95%) internal_bench/from_iter-5-bytes_bound.py
    0.084s (+372.09%) internal_bench/from_iter-6-bytes_unbound.py
    0.020s (+11.80%) internal_bench/from_iter-7-bytearray_bound.py
    0.098s (+449.90%) internal_bench/from_iter-8-bytearray_unbound.py
internal_bench/func_args:
    0.742s (+00.00%) internal_bench/func_args-1.1-pos_1.py
    0.778s (+04.86%) internal_bench/func_args-1.2-pos_3.py
    0.802s (+08.03%) internal_bench/func_args-2-pos_default_2_of_3.py
    0.920s (+23.95%) internal_bench/func_args-3.1-kw_1.py
    1.186s (+59.84%) internal_bench/func_args-3.2-kw_3.py
internal_bench/func_builtin:
    0.084s (+00.00%) internal_bench/func_builtin-1-enum_pos.py
    0.090s (+06.77%) internal_bench/func_builtin-2-enum_kw.py
internal_bench/funcall:
    0.183s (+00.00%) internal_bench/funcall-1-inline.py
    1.230s (+573.07%) internal_bench/funcall-2-funcall.py
    0.995s (+444.66%) internal_bench/funcall-3-funcall-local.py
internal_bench/loop_count:
    0.177s (+00.00%) internal_bench/loop_count-1-range.py
    0.102s (-42.46%) internal_bench/loop_count-2-range_iter.py
    0.183s (+03.38%) internal_bench/loop_count-3-while_up.py
    0.182s (+02.63%) internal_bench/loop_count-4-while_down_gt.py
    0.198s (+11.65%) internal_bench/loop_count-5-while_down_ne.py
    0.199s (+12.40%) internal_bench/loop_count-5.1-while_down_ne_localvar.py
internal_bench/var:
    0.198s (+00.00%) internal_bench/var-1-constant.py
    0.241s (+21.55%) internal_bench/var-2-global.py
    0.184s (-07.07%) internal_bench/var-3-local.py
    0.183s (-07.41%) internal_bench/var-4-arg.py
    0.435s (+119.51%) internal_bench/var-5-class-attr.py
    0.245s (+23.92%) internal_bench/var-6-instance-attr.py
    0.245s (+23.92%) internal_bench/var-6.1-instance-attr-5.py
    0.245s (+23.92%) internal_bench/var-6.2-instance-speciallookup.py
    2.106s (+963.57%) internal_bench/var-6.3-instance-property.py
    2.436s (+1130.06%) internal_bench/var-6.4-instance-descriptor.py
    2.732s (+1279.38%) internal_bench/var-6.5-instance-getattr.py
    2.150s (+985.57%) internal_bench/var-7-instance-meth.py
    0.298s (+50.50%) internal_bench/var-8-namedtuple-1st.py
    0.319s (+61.28%) internal_bench/var-8.1-namedtuple-5th.py
    0.734s (+270.55%) internal_bench/var-9-getattr.py
    1.285s (+548.93%) internal_bench/var-9.1-getattr_default.py
    1.194s (+502.84%) internal_bench/var-9.2-getattr_default_special.py
    1.154s (+482.63%) internal_bench/var-9.3-except_ok.py
    3.277s (+1554.88%) internal_bench/var-9.4-except_selfinduced.py
    -1.000s (-604.98%) internal_bench/var-9.5-except_error.py
    -1.000s (-604.98%) internal_bench/var-9.6-except_error_special.py
11 tests performed (85 individual testcases)

The comparisons I think are most instructive here are these:

test case
from db0c677
master
on 704c70c
del
on 87b9bde
internal_bench/class_instance-0.1-object-gc.py 0.109s 0.109s
internal_bench/class_instance-1.3-empty-gc.py 0.237s 0.255s
internal_bench/class_instance-2-init.py 0.683s 0.548s
internal_bench/class_instance-3.1-del-gc.py 0.237s 0.433s

Note that the cost of adding a __del__ method to an object is actually even lower than the cost of adding an __init__ method, by a small margin but quite consistently. (I suspect it has to do with the possibility of __init__ being variadic.)

@AJMansfield AJMansfield force-pushed the del branch 3 times, most recently from 6829e35 to f21dc52 Compare August 31, 2025 18:46
@AJMansfield
Copy link
Contributor Author

@dpgeorge @mattytrentini thoughts on this?

@andrewleech
Copy link
Contributor

I tried adding an example of del for user C modules in #13011 and ran into issues I never resolved with the lack of predictability on when they would actually be run; often requiring multiple collect's and /or run "extra code" then collect again to get any sembalance of reliability. From you investigations above you're already looking a lot deeper into the gc than I ever got but perhaps some of the comments in my old PR support your findings.

@AJMansfield
Copy link
Contributor Author

AJMansfield commented Sep 1, 2025

I tried adding an example of del for user C modules in #13011 and ran into issues I never resolved with the lack of predictability on when they would actually be run; often requiring multiple collect's and /or run "extra code" then collect again to get any sembalance of reliability. From you investigations above you're already looking a lot deeper into the gc than I ever got but perhaps some of the comments in my old PR support your findings.

Thanks for the reference! I was kinda able to observe this happening with some of the objects in the VFS LFS module, but that code has its own complexities that muddy the waters --- it's super useful having a MCVE of this issue using just the C api.

Probably gonna have to buckle down and write some proper instrumentation code here soon to visualize the heap structure and what pointers it actually is, keeping these objects alive here...

Signed-off-by: Anson Mansfield <amansfield@mantaro.com>
Signed-off-by: Anson Mansfield <amansfield@mantaro.com>
Signed-off-by: Anson Mansfield <amansfield@mantaro.com>
Signed-off-by: Anson Mansfield <amansfield@mantaro.com>
Signed-off-by: Anson Mansfield <amansfield@mantaro.com>
Signed-off-by: Anson Mansfield <amansfield@mantaro.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants