Implementing finalization (del) support with GC #245

pfalcon · 2014-01-31T15:08:35Z

A good discussion went in comment to another PullReq, adding top-level ticket to be able to find it later: #238 (comment)

dpgeorge · 2014-02-17T23:31:47Z

The relevant place is in gc_sweep in py/gc.c. In the first case of the switch, AT_HEAD, you could call there your custom cleanup function, with pointer to object computed from block. Main problem is that you don't know if the memory is a uPy object or just some other allocated memory (eg just a string, or array of bytes). It's needs a type tag to distinguish objects from non-objects.

iabdalkader · 2014-02-22T02:28:36Z

I see, so, if I understand correctly, gc_alloc will add a tag to every block ? but other than the tag memory overhead, how does that impact the performance ? you mentioned it will slow down gc.. I can probably get around allocating memory for now, but what about closing FatFS file descriptors ? I can think of some other use cases too...

pfalcon · 2014-02-26T00:34:41Z

but what about closing FatFS file descriptors ? I can think of some other use cases too...

So, let's first think about usecases first. Closing an underlying file descriptor doesn't sound like a grave usecase for this feature. You can just close it explicitly with .close() methods. If that sounds too manual, Python offers lexical-scope context management using with statement (note that it likely don't work in uPy - implementing it would be nice thing to do to get complete Python semantics).

So, managing handles to generic external resources can be well done without __del__ IMHO. Managing memory external memory resources is different matter, but with bare-metal ports, there's only GC heap, so anything allocated will be freed eventually. It's different situation with unix port, where there's both Python GC heap and system heap. 3rd party libs will of course allocate on system heap, so it must be explicitly freed (or leaked). I have this problem already in FFI module - that's why I brought up this ticket in the first place. But even there, it's bearable situation - it's because FFI objects are intended to be module-global, so they live for the duration of program execution, so freeing them is not that important.

So, overall, while I brought up this issue, I don't see any pressing need to implement it right away, and am personally not going to work on this soon. YMMV.

pfalcon · 2014-02-26T00:41:26Z

Now how'd I go for implementing it after akk:

Currently, uPy GC uses 2-bit vector to store memory block flags. I'd split those into separate 1-bit vectors. Intuitively, that will make alloc algos simpler (and thus will allow to add more complex features to it without overcomplicating the code). It will also allow to add more such bit vectors easily. So, I'd add 3rd one, to mark heads of blocks which contain mp_obj_t objects (vs all the other allocations). Then, it's easy to call __del__ when GCing such objects (and of course, the method slot would need to be added to mp_obj_type_t - that can be down separately, and sooner).

iabdalkader · 2014-02-26T10:00:56Z

Python offers lexical-scope context management using with statement

Yes those are really nice, but any Python user (or any other high level language) expects the language to take care of cleaning up resources (fixing this now avoids lots of tickets later like, opening files causes resource leaks, creating handle x has to be closed manually...)

Managing memory external memory resources is different matter, but with bare-metal ports, there's only GC heap

not necessarily, this is a very special case, but in my current situation, the 64KB CCM is managed by GC and I'm linking with newlib, so it's possible to have malloc and friends (or any other lightweight allocator) manage the main RAM block on the main RAM block...

Now how'd I go for implementing it after akk:

I will give this a try, I don't usually mess around with py/ but I will try..

iabdalkader · 2014-03-28T19:16:12Z

this is just crossed my mind, would this require a special function to alloc python objects ?

pfalcon · 2014-03-28T19:22:47Z

We kinda have that already -

#define m_new_obj(type) ...
#define m_new_obj_var(obj_type, var_type, var_num) ...

Just would need to do extra check they are used in all cases for objects and not used for non-objects.

dpgeorge · 2014-03-30T21:43:47Z

Yes, you'd need a separate allocation function. I would not reuse m_new_obj, but rather rename it to m_new_mp_obj, or something.

iabdalkader · 2014-03-30T23:48:03Z

okay, thanks for the info, I will keep this in mind, hopefully I will have more time the next few days to work on this :)

iabdalkader · 2014-04-03T01:15:21Z

what if we reduce the blocks per ATB to 2 blocks ? 4 bits per block, this should do the trick with minimal changes, what do you think ?

dpgeorge · 2014-04-03T08:46:11Z

That's too wasteful. As per @pfalcon's suggestion, I would create a second segment, the "contents segment", holding 1 bit per block. If the bit is set, then that block contains a uPy object, otherwise, not.

pfalcon · 2014-04-03T09:33:52Z

Per my suggestion, I would refactor existing code to use independent segments, each holding 1 bit per block, then just add yet another such segment for "is object" flagging.

Having 1 bit per block will open up more opportunities for fast allocation optimization. (Well, one could optimize existing code, but it would be pretty messy.)

Reference which advocates same approach (having independent bitstrings of flags): http://wiki.luajit.org/New-Garbage-Collector

dpgeorge · 2014-04-03T10:36:02Z

I would refactor existing code to use independent segments, each holding 1 bit per block

Such a change would need a critical analysis, probably by implementing the 1-bit per block algo and doing proper profiling for speed (and code size). The reason I think it's not a good idea is because 1 bit does not have enough information in it to decide anything meaningful for the block. To detect a free block you need to check both bits are 0, for example. All accesses of 1 bit will need the other bit.

pfalcon · 2014-04-03T10:54:55Z

Such a change would need a critical analysis, probably by implementing the 1-bit per block algo and doing proper profiling for speed (and code size).

That's why I don't take this task so far ;-).

1 bit does not have enough information in it to decide anything meaningful for the block

Luajit paper argues the opposite. And indeed, intuitively, one bit - free/non-free, another bit - head-nonhead, and only one special case (head mark) requires consulting both, but happens only during GC.

Anyway, you're probably right, it should be done step by step, e.g.:

Introduce slot by __del__, make obj.__del__() work, and probably del obj as a temporary workaround to be removed later.
Write 1-bit block handling routines.
Use them for object flagging.
Do needed analysis and get rid of 2-bit block code.

iabdalkader · 2014-04-03T12:26:25Z

I was going to use separate bit vectors initially, as per @pfalcon suggestion, but other than complicating things a lot, it consumes the same amount memory ? with separate bit vectors, we need 4 bits per block (1-bit for head, tail, mark and mp-obj) which is the same as using 4 bits per block (2 blocks per ATB) except with 4 bits per block (in half a byte) we have 2 extra flags ?

iabdalkader · 2014-04-03T12:37:27Z

actually 3bits vs 4bits but if we add another flag, it's still 4bits (separate bit vectors) vs 4bits (2 blocks per ATB), so we could use 2 blocks per ATB, for now, and if there's a need for a 3rd flag, in the future, we move to separate bit vectors ?

dpgeorge · 2014-04-03T13:49:43Z

I don't understand why you need 4 bits. On top of what is there now, you just need 1 extra bit for whether it's an mp_obj_t or not. And 3 bits are easier to store using 3 times 1-bit segments, or a 2-bit plus a 1-bit segment (keeping current 2-bit segment and adding a new 1-bit segment).

iabdalkader · 2014-04-03T14:10:47Z

yes I know we just need one extra bit (for now), I wanted to keep all flags in one place by reducing blocks per ATB, adding another bit vector only saves 1 bit per block but it's okay too I guess

iabdalkader · 2014-04-03T21:59:00Z

can you review this please ?
https://github.com/iabdalkader/micropython/compare/del

iabdalkader · 2014-04-03T22:05:18Z

btw no need to implement del to test:

pfalcon · 2014-04-04T15:04:09Z

From a quick glance, looks pretty good. But we really should have tests for the as much functionality as possible. Would be nice to have a test, but I guess you leave implementing __del__ for class instances for later? Then my guess it's worth a pull request for more detailed review/testing.

iabdalkader · 2014-04-04T15:43:26Z

a set of standard tests for GC would be nice to have, anyway, should del be added somewhere else ? also I'm thinking those macros should be renamed to something more generic ? if we decide to split the 2bits later, like ATB_SET_FLAG(block, bit_vec) something like that ?

dpgeorge · 2014-04-04T15:47:11Z

Since __del__ is not common it should go in the locals_dict table, and not in the mp_obj_type_t structure. Doing it this way would also enable user defined __del__ methods.

iabdalkader · 2014-04-04T15:50:31Z

so I should do a lookup in locals_dict if it's not NULL and call del ?

dpgeorge · 2014-04-04T15:53:52Z

Easiest to use existing functions for this:

mp_load_method(base, QSTR___del__, dest)
mp_call_method_x(0, 0, dest)

Problem is that we would then need to set a flag that the GC is running, so that the above calls don't allocate memory while del is being called.

pfalcon · 2014-04-04T15:58:12Z

Since del is not common it should go in the locals_dict table, and not in the mp_obj_type_t structure.

Surely no! Do you want to do expensive dictionary lookups during such time-critical phase as garbage collection? GC is very slow on its own already.

iabdalkader · 2014-04-04T16:07:09Z

so what's the final call :D ?

dpgeorge · 2014-04-04T16:07:40Z

Surely no! Do you want to do expensive dictionary lookups during such time-critical phase as garbage collection? GC is very slow on its own already.

Would it work if we set the "I'm a uPy object" tag bit in the GC only if that object had a __del__ method? This could be determined at compile time for built in objects.

pfalcon · 2014-04-04T16:19:24Z

Would it work if we set the "I'm a uPy object" tag bit in the GC only if that object

I don't know, that's again (re: globals/locals discussion) seems like complicated half-way to optimize things. I personally never doubted that __del__ would require a slot in type structure - it's really performance critical. So, IMHO we should support 2 cases (via compile-time option): no finalization support, and straightforward implementation (if we want to optimize things for user classes later, we may set del slot in their type structs only if __del__ method is defined). Of course, I may be wrong here.

dpgeorge · 2014-04-04T16:37:45Z

I don't know, that's again (re: globals/locals discussion) seems like complicated half-way to optimize things.

No, I think it would work very well. The optimisation can be done in the gc_alloc_mp_obj function: if the caller says that the object is an mp_obj_t then the GC can check to see if it has a __del__ slot at that point. Only if it does, does it set the tag. That is a very straight forward, and transparent, optimisation.

pfalcon · 2014-04-04T16:40:58Z

Ok, sounds good. Then the flag won't be "object is allocated here", but "call finalization for object here". Hope it won't make any differences for possible future usages of that flag.

iabdalkader · 2014-04-04T17:08:20Z

so set flag only if it has __del__ ? but what if we would like to do some optimization later with uPy objects ?
what's wrong with adding del to mp_obj_type_t ? just to get a better understanding of the issue..

pfalcon · 2014-04-04T17:13:53Z

what's wrong with adding del to mp_obj_type_t ? just to get a better understanding of the issue..

It will be used in like 1% of types/classes, but will take 4 bytes in each and every type structure (i.e. in 99% cases just waste memory).

dpgeorge · 2014-04-04T17:14:49Z

Actually, looking for __del__ in the allocator is the same expense as looking for it in the GC sweep function (if an object is allocated, it's deallocated at most once). So you would need to change the allocation function to read something like:

void *gc_alloc(int bytes, bool is_mp_obj_and_may_have_finaliser);

iabdalkader · 2014-04-04T17:23:28Z

how about:
void *gc_alloc(int bytes, machine_uint_t hints);
pass hints about the allocation and let gc decide what to do with that? something like GC_ALLOC_MP_OBJ|GC_ALLOC_HAVE_FINALISER etc...

pfalcon · 2014-04-04T17:29:22Z

@iabdalkader : no need to complicate design beyond realistic requirements - we unlikely will need to have other flags ;-). And just think about typing all that stuff! ;-)

iabdalkader · 2014-04-04T17:37:56Z

what if I need to know that it's an mp obj and doesn't not have a finaliser ?

KeithJRome · 2014-04-04T17:55:55Z

It will be used in like 1% of types/classes, but will take 4 bytes in each and every type structure (i.e. in 99% cases just waste memory).

How many types are expected to be loaded? Python itself doesn't have all that many types, and neither will programs. Especially since things are modular and you only pay for what you need. So exactly how high is this cost that is attempting to be avoided? 40 bytes? 400 bytes? 4k?

Or do you mean this requires 4 more bytes per object instance?

iabdalkader · 2014-04-04T17:58:40Z

no it's per type, in stm/ for example, there are ~19 types (or 76 bytes):

grep -r mp_obj_type_t * | wc -l
19

pfalcon · 2014-04-04T18:05:45Z

No, per type, each user class being a type. As I told, I personally think this is worth a slot in a type struct. But on the other hand, we already have type struct which is ~80% empty for most of types.

40 bytes? 400 bytes? 4k?

The ultimate difference is qualitative, not quantitative - it's between being able and not being able to run a particular program, and in general, being able and not being able to use Python on constrained systems. Without scrutinizing each case and fighting for each byte, we'll soon end up in the "not" land...

KeithJRome · 2014-04-04T18:45:49Z

I hear you, but it smells like a premature optimisation. A user program with enough declared types to make this a real problem is almost certainly going to run into some other platform limitations first.

If it's really in fact a problem, then there may be other ways to solve it too that trade off cpu cycles for memory efficiency, such as a dictionary of finalizers keyed on the type pointer (if finalizers really are that rarely used), or an alternative "light" base class that triggers special gc handling that could optionally be used by user programs when a class doesn't need finalization (or other features that might bloat the type data). But ultimately, I don't think that would be needed - one extra 32-bit value per type IMO isn't likely to jeopardize viability of the platform except in very rare situations.

iabdalkader · 2014-04-05T13:51:46Z

I agree, and +1 for smaller base type, anyway, I moved del to locals for now.

dpgeorge · 2014-10-06T15:25:49Z

Finalisation was implemented a while back, so closing this issue.

pfalcon · 2014-10-06T19:09:40Z

Was it? Is it possible to define __del__ method in Python and have it called when object is proven non-live? There're no tests for this functionality.

dpgeorge · 2014-10-06T19:51:58Z

__del__ support is implemented in the GC, as per the topic of this issue.

dpgeorge · 2014-10-06T19:52:51Z

If __del__ support is needed for custom objects, open a separate issue with a test case that fails.

pmp-p · 2019-09-23T18:23:14Z

There're no tests for this functionality.

and no examples in documentation sadly to help solving some points for #4993 that would need group finalizers.

iabdalkader mentioned this issue Mar 13, 2014

Fix gc_realloc to expand in place #334

Merged

iabdalkader mentioned this issue Apr 4, 2014

Implement del #425

Merged

pfalcon changed the title ~~Implementing finalizition (__del__) support with GC~~ Implementing finalization (__del__) support with GC Apr 4, 2014

dpgeorge closed this as completed Oct 6, 2014

pfalcon reopened this Oct 6, 2014

dpgeorge closed this as completed Oct 6, 2014

pmp-p mentioned this issue Sep 2, 2019

py: __del__ special method not implemented for user-defined classes #1878

Open

pmp-p mentioned this issue Sep 11, 2019

Do we want to have weakref support? #646

Open

pmp-p mentioned this issue Sep 24, 2019

request: C finalizers v923z/micropython-usermod#1

Open

Implementing finalization (__del__) support with GC #245

Implementing finalization (__del__) support with GC #245

Comments

pfalcon commented Jan 31, 2014

dpgeorge commented Feb 17, 2014

iabdalkader commented Feb 22, 2014

pfalcon commented Feb 26, 2014

pfalcon commented Feb 26, 2014

iabdalkader commented Feb 26, 2014

iabdalkader commented Mar 28, 2014

pfalcon commented Mar 28, 2014

dpgeorge commented Mar 30, 2014

iabdalkader commented Mar 30, 2014

iabdalkader commented Apr 3, 2014

dpgeorge commented Apr 3, 2014

pfalcon commented Apr 3, 2014

dpgeorge commented Apr 3, 2014

pfalcon commented Apr 3, 2014

iabdalkader commented Apr 3, 2014

iabdalkader commented Apr 3, 2014

dpgeorge commented Apr 3, 2014

iabdalkader commented Apr 3, 2014

iabdalkader commented Apr 3, 2014

iabdalkader commented Apr 3, 2014

pfalcon commented Apr 4, 2014

iabdalkader commented Apr 4, 2014

dpgeorge commented Apr 4, 2014

iabdalkader commented Apr 4, 2014

dpgeorge commented Apr 4, 2014

pfalcon commented Apr 4, 2014

iabdalkader commented Apr 4, 2014

dpgeorge commented Apr 4, 2014

pfalcon commented Apr 4, 2014

dpgeorge commented Apr 4, 2014

pfalcon commented Apr 4, 2014

iabdalkader commented Apr 4, 2014

pfalcon commented Apr 4, 2014

dpgeorge commented Apr 4, 2014

iabdalkader commented Apr 4, 2014

pfalcon commented Apr 4, 2014

iabdalkader commented Apr 4, 2014

KeithJRome commented Apr 4, 2014

iabdalkader commented Apr 4, 2014

pfalcon commented Apr 4, 2014

KeithJRome commented Apr 4, 2014

iabdalkader commented Apr 5, 2014

dpgeorge commented Oct 6, 2014

pfalcon commented Oct 6, 2014

dpgeorge commented Oct 6, 2014

dpgeorge commented Oct 6, 2014

pmp-p commented Sep 23, 2019

Implementing finalization (del) support with GC #245

Implementing finalization (del) support with GC #245