Skip to content

Implementing finalization (__del__) support with GC #245

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
pfalcon opened this issue Jan 31, 2014 · 47 comments
Closed

Implementing finalization (__del__) support with GC #245

pfalcon opened this issue Jan 31, 2014 · 47 comments
Labels
rfc Request for Comment

Comments

@pfalcon
Copy link
Contributor

pfalcon commented Jan 31, 2014

A good discussion went in comment to another PullReq, adding top-level ticket to be able to find it later: #238 (comment)

@dpgeorge
Copy link
Member

The relevant place is in gc_sweep in py/gc.c. In the first case of the switch, AT_HEAD, you could call there your custom cleanup function, with pointer to object computed from block. Main problem is that you don't know if the memory is a uPy object or just some other allocated memory (eg just a string, or array of bytes). It's needs a type tag to distinguish objects from non-objects.

@iabdalkader
Copy link
Contributor

I see, so, if I understand correctly, gc_alloc will add a tag to every block ? but other than the tag memory overhead, how does that impact the performance ? you mentioned it will slow down gc.. I can probably get around allocating memory for now, but what about closing FatFS file descriptors ? I can think of some other use cases too...

@pfalcon
Copy link
Contributor Author

pfalcon commented Feb 26, 2014

but what about closing FatFS file descriptors ? I can think of some other use cases too...

So, let's first think about usecases first. Closing an underlying file descriptor doesn't sound like a grave usecase for this feature. You can just close it explicitly with .close() methods. If that sounds too manual, Python offers lexical-scope context management using with statement (note that it likely don't work in uPy - implementing it would be nice thing to do to get complete Python semantics).

So, managing handles to generic external resources can be well done without __del__ IMHO. Managing memory external memory resources is different matter, but with bare-metal ports, there's only GC heap, so anything allocated will be freed eventually. It's different situation with unix port, where there's both Python GC heap and system heap. 3rd party libs will of course allocate on system heap, so it must be explicitly freed (or leaked). I have this problem already in FFI module - that's why I brought up this ticket in the first place. But even there, it's bearable situation - it's because FFI objects are intended to be module-global, so they live for the duration of program execution, so freeing them is not that important.

So, overall, while I brought up this issue, I don't see any pressing need to implement it right away, and am personally not going to work on this soon. YMMV.

@pfalcon
Copy link
Contributor Author

pfalcon commented Feb 26, 2014

Now how'd I go for implementing it after akk:

Currently, uPy GC uses 2-bit vector to store memory block flags. I'd split those into separate 1-bit vectors. Intuitively, that will make alloc algos simpler (and thus will allow to add more complex features to it without overcomplicating the code). It will also allow to add more such bit vectors easily. So, I'd add 3rd one, to mark heads of blocks which contain mp_obj_t objects (vs all the other allocations). Then, it's easy to call __del__ when GCing such objects (and of course, the method slot would need to be added to mp_obj_type_t - that can be down separately, and sooner).

@iabdalkader
Copy link
Contributor

Python offers lexical-scope context management using with statement

Yes those are really nice, but any Python user (or any other high level language) expects the language to take care of cleaning up resources (fixing this now avoids lots of tickets later like, opening files causes resource leaks, creating handle x has to be closed manually...)

Managing memory external memory resources is different matter, but with bare-metal ports, there's only GC heap

not necessarily, this is a very special case, but in my current situation, the 64KB CCM is managed by GC and I'm linking with newlib, so it's possible to have malloc and friends (or any other lightweight allocator) manage the main RAM block on the main RAM block...

Now how'd I go for implementing it after akk:

I will give this a try, I don't usually mess around with py/ but I will try..

@iabdalkader
Copy link
Contributor

this is just crossed my mind, would this require a special function to alloc python objects ?

@pfalcon
Copy link
Contributor Author

pfalcon commented Mar 28, 2014

We kinda have that already -

#define m_new_obj(type) ...
#define m_new_obj_var(obj_type, var_type, var_num) ...

Just would need to do extra check they are used in all cases for objects and not used for non-objects.

@dpgeorge
Copy link
Member

Yes, you'd need a separate allocation function. I would not reuse m_new_obj, but rather rename it to m_new_mp_obj, or something.

@iabdalkader
Copy link
Contributor

okay, thanks for the info, I will keep this in mind, hopefully I will have more time the next few days to work on this :)

@iabdalkader
Copy link
Contributor

what if we reduce the blocks per ATB to 2 blocks ? 4 bits per block, this should do the trick with minimal changes, what do you think ?

@dpgeorge
Copy link
Member

dpgeorge commented Apr 3, 2014

That's too wasteful. As per @pfalcon's suggestion, I would create a second segment, the "contents segment", holding 1 bit per block. If the bit is set, then that block contains a uPy object, otherwise, not.

@pfalcon
Copy link
Contributor Author

pfalcon commented Apr 3, 2014

Per my suggestion, I would refactor existing code to use independent segments, each holding 1 bit per block, then just add yet another such segment for "is object" flagging.

Having 1 bit per block will open up more opportunities for fast allocation optimization. (Well, one could optimize existing code, but it would be pretty messy.)

Reference which advocates same approach (having independent bitstrings of flags): http://wiki.luajit.org/New-Garbage-Collector

@dpgeorge
Copy link
Member

dpgeorge commented Apr 3, 2014

I would refactor existing code to use independent segments, each holding 1 bit per block

Such a change would need a critical analysis, probably by implementing the 1-bit per block algo and doing proper profiling for speed (and code size). The reason I think it's not a good idea is because 1 bit does not have enough information in it to decide anything meaningful for the block. To detect a free block you need to check both bits are 0, for example. All accesses of 1 bit will need the other bit.

@pfalcon
Copy link
Contributor Author

pfalcon commented Apr 3, 2014

Such a change would need a critical analysis, probably by implementing the 1-bit per block algo and doing proper profiling for speed (and code size).

That's why I don't take this task so far ;-).

1 bit does not have enough information in it to decide anything meaningful for the block

Luajit paper argues the opposite. And indeed, intuitively, one bit - free/non-free, another bit - head-nonhead, and only one special case (head mark) requires consulting both, but happens only during GC.

Anyway, you're probably right, it should be done step by step, e.g.:

  1. Introduce slot by __del__, make obj.__del__() work, and probably del obj as a temporary workaround to be removed later.
  2. Write 1-bit block handling routines.
  3. Use them for object flagging.
  4. Do needed analysis and get rid of 2-bit block code.

@iabdalkader
Copy link
Contributor

I was going to use separate bit vectors initially, as per @pfalcon suggestion, but other than complicating things a lot, it consumes the same amount memory ? with separate bit vectors, we need 4 bits per block (1-bit for head, tail, mark and mp-obj) which is the same as using 4 bits per block (2 blocks per ATB) except with 4 bits per block (in half a byte) we have 2 extra flags ?

@iabdalkader
Copy link
Contributor

actually 3bits vs 4bits but if we add another flag, it's still 4bits (separate bit vectors) vs 4bits (2 blocks per ATB), so we could use 2 blocks per ATB, for now, and if there's a need for a 3rd flag, in the future, we move to separate bit vectors ?

@dpgeorge
Copy link
Member

dpgeorge commented Apr 3, 2014

I don't understand why you need 4 bits. On top of what is there now, you just need 1 extra bit for whether it's an mp_obj_t or not. And 3 bits are easier to store using 3 times 1-bit segments, or a 2-bit plus a 1-bit segment (keeping current 2-bit segment and adding a new 1-bit segment).

@iabdalkader
Copy link
Contributor

yes I know we just need one extra bit (for now), I wanted to keep all flags in one place by reducing blocks per ATB, adding another bit vector only saves 1 bit per block but it's okay too I guess

@iabdalkader
Copy link
Contributor

can you review this please ?
https://github.com/iabdalkader/micropython/compare/del

@iabdalkader
Copy link
Contributor

btw no need to implement del to test:
s331

@pfalcon
Copy link
Contributor Author

pfalcon commented Apr 4, 2014

From a quick glance, looks pretty good. But we really should have tests for the as much functionality as possible. Would be nice to have a test, but I guess you leave implementing __del__ for class instances for later? Then my guess it's worth a pull request for more detailed review/testing.

@iabdalkader
Copy link
Contributor

a set of standard tests for GC would be nice to have, anyway, should del be added somewhere else ? also I'm thinking those macros should be renamed to something more generic ? if we decide to split the 2bits later, like ATB_SET_FLAG(block, bit_vec) something like that ?

@dpgeorge
Copy link
Member

dpgeorge commented Apr 4, 2014

Since __del__ is not common it should go in the locals_dict table, and not in the mp_obj_type_t structure. Doing it this way would also enable user defined __del__ methods.

@iabdalkader
Copy link
Contributor

so I should do a lookup in locals_dict if it's not NULL and call del ?

@dpgeorge
Copy link
Member

dpgeorge commented Apr 4, 2014

Easiest to use existing functions for this:

mp_load_method(base, QSTR___del__, dest)
mp_call_method_x(0, 0, dest)

Problem is that we would then need to set a flag that the GC is running, so that the above calls don't allocate memory while del is being called.

@pfalcon
Copy link
Contributor Author

pfalcon commented Apr 4, 2014

Since del is not common it should go in the locals_dict table, and not in the mp_obj_type_t structure.

Surely no! Do you want to do expensive dictionary lookups during such time-critical phase as garbage collection? GC is very slow on its own already.

@iabdalkader
Copy link
Contributor

so what's the final call :D ?

@dpgeorge
Copy link
Member

dpgeorge commented Apr 4, 2014

Surely no! Do you want to do expensive dictionary lookups during such time-critical phase as garbage collection? GC is very slow on its own already.

Would it work if we set the "I'm a uPy object" tag bit in the GC only if that object had a __del__ method? This could be determined at compile time for built in objects.

@pfalcon
Copy link
Contributor Author

pfalcon commented Apr 4, 2014

Would it work if we set the "I'm a uPy object" tag bit in the GC only if that object

I don't know, that's again (re: globals/locals discussion) seems like complicated half-way to optimize things. I personally never doubted that __del__ would require a slot in type structure - it's really performance critical. So, IMHO we should support 2 cases (via compile-time option): no finalization support, and straightforward implementation (if we want to optimize things for user classes later, we may set del slot in their type structs only if __del__ method is defined). Of course, I may be wrong here.

@dpgeorge
Copy link
Member

dpgeorge commented Apr 4, 2014

I don't know, that's again (re: globals/locals discussion) seems like complicated half-way to optimize things.

No, I think it would work very well. The optimisation can be done in the gc_alloc_mp_obj function: if the caller says that the object is an mp_obj_t then the GC can check to see if it has a __del__ slot at that point. Only if it does, does it set the tag. That is a very straight forward, and transparent, optimisation.

@pfalcon
Copy link
Contributor Author

pfalcon commented Apr 4, 2014

Ok, sounds good. Then the flag won't be "object is allocated here", but "call finalization for object here". Hope it won't make any differences for possible future usages of that flag.

@iabdalkader
Copy link
Contributor

so set flag only if it has __del__ ? but what if we would like to do some optimization later with uPy objects ?
what's wrong with adding del to mp_obj_type_t ? just to get a better understanding of the issue..

@pfalcon
Copy link
Contributor Author

pfalcon commented Apr 4, 2014

what's wrong with adding del to mp_obj_type_t ? just to get a better understanding of the issue..

It will be used in like 1% of types/classes, but will take 4 bytes in each and every type structure (i.e. in 99% cases just waste memory).

@dpgeorge
Copy link
Member

dpgeorge commented Apr 4, 2014

Actually, looking for __del__ in the allocator is the same expense as looking for it in the GC sweep function (if an object is allocated, it's deallocated at most once). So you would need to change the allocation function to read something like:

void *gc_alloc(int bytes, bool is_mp_obj_and_may_have_finaliser);

@pfalcon pfalcon changed the title Implementing finalizition (__del__) support with GC Implementing finalization (__del__) support with GC Apr 4, 2014
@iabdalkader
Copy link
Contributor

how about:
void *gc_alloc(int bytes, machine_uint_t hints);
pass hints about the allocation and let gc decide what to do with that? something like GC_ALLOC_MP_OBJ|GC_ALLOC_HAVE_FINALISER etc...

@pfalcon
Copy link
Contributor Author

pfalcon commented Apr 4, 2014

@iabdalkader : no need to complicate design beyond realistic requirements - we unlikely will need to have other flags ;-). And just think about typing all that stuff! ;-)

@iabdalkader
Copy link
Contributor

what if I need to know that it's an mp obj and doesn't not have a finaliser ?

@KeithJRome
Copy link

It will be used in like 1% of types/classes, but will take 4 bytes in each and every type structure (i.e. in 99% cases just waste memory).

How many types are expected to be loaded? Python itself doesn't have all that many types, and neither will programs. Especially since things are modular and you only pay for what you need. So exactly how high is this cost that is attempting to be avoided? 40 bytes? 400 bytes? 4k?

Or do you mean this requires 4 more bytes per object instance?

@iabdalkader
Copy link
Contributor

no it's per type, in stm/ for example, there are ~19 types (or 76 bytes):

grep -r mp_obj_type_t * | wc -l
19

@pfalcon
Copy link
Contributor Author

pfalcon commented Apr 4, 2014

No, per type, each user class being a type. As I told, I personally think this is worth a slot in a type struct. But on the other hand, we already have type struct which is ~80% empty for most of types.

40 bytes? 400 bytes? 4k?

The ultimate difference is qualitative, not quantitative - it's between being able and not being able to run a particular program, and in general, being able and not being able to use Python on constrained systems. Without scrutinizing each case and fighting for each byte, we'll soon end up in the "not" land...

@KeithJRome
Copy link

I hear you, but it smells like a premature optimisation. A user program with enough declared types to make this a real problem is almost certainly going to run into some other platform limitations first.

If it's really in fact a problem, then there may be other ways to solve it too that trade off cpu cycles for memory efficiency, such as a dictionary of finalizers keyed on the type pointer (if finalizers really are that rarely used), or an alternative "light" base class that triggers special gc handling that could optionally be used by user programs when a class doesn't need finalization (or other features that might bloat the type data). But ultimately, I don't think that would be needed - one extra 32-bit value per type IMO isn't likely to jeopardize viability of the platform except in very rare situations.

@iabdalkader
Copy link
Contributor

I agree, and +1 for smaller base type, anyway, I moved del to locals for now.

@dpgeorge
Copy link
Member

dpgeorge commented Oct 6, 2014

Finalisation was implemented a while back, so closing this issue.

@dpgeorge dpgeorge closed this as completed Oct 6, 2014
@pfalcon
Copy link
Contributor Author

pfalcon commented Oct 6, 2014

Was it? Is it possible to define __del__ method in Python and have it called when object is proven non-live? There're no tests for this functionality.

@pfalcon pfalcon reopened this Oct 6, 2014
@dpgeorge
Copy link
Member

dpgeorge commented Oct 6, 2014

__del__ support is implemented in the GC, as per the topic of this issue.

@dpgeorge dpgeorge closed this as completed Oct 6, 2014
@dpgeorge
Copy link
Member

dpgeorge commented Oct 6, 2014

If __del__ support is needed for custom objects, open a separate issue with a test case that fails.

@pmp-p
Copy link
Contributor

pmp-p commented Sep 23, 2019

There're no tests for this functionality.

and no examples in documentation sadly to help solving some points for #4993 that would need group finalizers.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
rfc Request for Comment
Projects
None yet
Development

No branches or pull requests

5 participants