-
-
Notifications
You must be signed in to change notification settings - Fork 8.2k
Implementing finalization (__del__) support with GC #245
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
The relevant place is in |
I see, so, if I understand correctly, gc_alloc will add a tag to every block ? but other than the tag memory overhead, how does that impact the performance ? you mentioned it will slow down gc.. I can probably get around allocating memory for now, but what about closing FatFS file descriptors ? I can think of some other use cases too... |
So, let's first think about usecases first. Closing an underlying file descriptor doesn't sound like a grave usecase for this feature. You can just close it explicitly with .close() methods. If that sounds too manual, Python offers lexical-scope context management using So, managing handles to generic external resources can be well done without So, overall, while I brought up this issue, I don't see any pressing need to implement it right away, and am personally not going to work on this soon. YMMV. |
Now how'd I go for implementing it after akk: Currently, uPy GC uses 2-bit vector to store memory block flags. I'd split those into separate 1-bit vectors. Intuitively, that will make alloc algos simpler (and thus will allow to add more complex features to it without overcomplicating the code). It will also allow to add more such bit vectors easily. So, I'd add 3rd one, to mark heads of blocks which contain mp_obj_t objects (vs all the other allocations). Then, it's easy to call |
Yes those are really nice, but any Python user (or any other high level language) expects the language to take care of cleaning up resources (fixing this now avoids lots of tickets later like, opening files causes resource leaks, creating handle x has to be closed manually...)
not necessarily, this is a very special case, but in my current situation, the 64KB CCM is managed by GC and I'm linking with newlib, so it's possible to have malloc and friends (or any other lightweight allocator) manage the main RAM block on the main RAM block...
I will give this a try, I don't usually mess around with py/ but I will try.. |
this is just crossed my mind, would this require a special function to alloc python objects ? |
We kinda have that already -
Just would need to do extra check they are used in all cases for objects and not used for non-objects. |
Yes, you'd need a separate allocation function. I would not reuse m_new_obj, but rather rename it to m_new_mp_obj, or something. |
okay, thanks for the info, I will keep this in mind, hopefully I will have more time the next few days to work on this :) |
what if we reduce the blocks per ATB to 2 blocks ? 4 bits per block, this should do the trick with minimal changes, what do you think ? |
That's too wasteful. As per @pfalcon's suggestion, I would create a second segment, the "contents segment", holding 1 bit per block. If the bit is set, then that block contains a uPy object, otherwise, not. |
Per my suggestion, I would refactor existing code to use independent segments, each holding 1 bit per block, then just add yet another such segment for "is object" flagging. Having 1 bit per block will open up more opportunities for fast allocation optimization. (Well, one could optimize existing code, but it would be pretty messy.) Reference which advocates same approach (having independent bitstrings of flags): http://wiki.luajit.org/New-Garbage-Collector |
Such a change would need a critical analysis, probably by implementing the 1-bit per block algo and doing proper profiling for speed (and code size). The reason I think it's not a good idea is because 1 bit does not have enough information in it to decide anything meaningful for the block. To detect a free block you need to check both bits are 0, for example. All accesses of 1 bit will need the other bit. |
That's why I don't take this task so far ;-).
Luajit paper argues the opposite. And indeed, intuitively, one bit - free/non-free, another bit - head-nonhead, and only one special case (head mark) requires consulting both, but happens only during GC. Anyway, you're probably right, it should be done step by step, e.g.:
|
I was going to use separate bit vectors initially, as per @pfalcon suggestion, but other than complicating things a lot, it consumes the same amount memory ? with separate bit vectors, we need 4 bits per block (1-bit for head, tail, mark and mp-obj) which is the same as using 4 bits per block (2 blocks per ATB) except with 4 bits per block (in half a byte) we have 2 extra flags ? |
actually 3bits vs 4bits but if we add another flag, it's still 4bits (separate bit vectors) vs 4bits (2 blocks per ATB), so we could use 2 blocks per ATB, for now, and if there's a need for a 3rd flag, in the future, we move to separate bit vectors ? |
I don't understand why you need 4 bits. On top of what is there now, you just need 1 extra bit for whether it's an mp_obj_t or not. And 3 bits are easier to store using 3 times 1-bit segments, or a 2-bit plus a 1-bit segment (keeping current 2-bit segment and adding a new 1-bit segment). |
yes I know we just need one extra bit (for now), I wanted to keep all flags in one place by reducing blocks per ATB, adding another bit vector only saves 1 bit per block but it's okay too I guess |
can you review this please ? |
From a quick glance, looks pretty good. But we really should have tests for the as much functionality as possible. Would be nice to have a test, but I guess you leave implementing |
a set of standard tests for GC would be nice to have, anyway, should del be added somewhere else ? also I'm thinking those macros should be renamed to something more generic ? if we decide to split the 2bits later, like ATB_SET_FLAG(block, bit_vec) something like that ? |
Since |
so I should do a lookup in locals_dict if it's not NULL and call del ? |
Easiest to use existing functions for this:
Problem is that we would then need to set a flag that the GC is running, so that the above calls don't allocate memory while del is being called. |
Surely no! Do you want to do expensive dictionary lookups during such time-critical phase as garbage collection? GC is very slow on its own already. |
so what's the final call :D ? |
Would it work if we set the "I'm a uPy object" tag bit in the GC only if that object had a |
I don't know, that's again (re: globals/locals discussion) seems like complicated half-way to optimize things. I personally never doubted that |
No, I think it would work very well. The optimisation can be done in the gc_alloc_mp_obj function: if the caller says that the object is an mp_obj_t then the GC can check to see if it has a |
Ok, sounds good. Then the flag won't be "object is allocated here", but "call finalization for object here". Hope it won't make any differences for possible future usages of that flag. |
so set flag only if it has |
It will be used in like 1% of types/classes, but will take 4 bytes in each and every type structure (i.e. in 99% cases just waste memory). |
Actually, looking for
|
how about: |
@iabdalkader : no need to complicate design beyond realistic requirements - we unlikely will need to have other flags ;-). And just think about typing all that stuff! ;-) |
what if I need to know that it's an mp obj and doesn't not have a finaliser ? |
How many types are expected to be loaded? Python itself doesn't have all that many types, and neither will programs. Especially since things are modular and you only pay for what you need. So exactly how high is this cost that is attempting to be avoided? 40 bytes? 400 bytes? 4k? Or do you mean this requires 4 more bytes per object instance? |
no it's per type, in stm/ for example, there are ~19 types (or 76 bytes):
|
No, per type, each user class being a type. As I told, I personally think this is worth a slot in a type struct. But on the other hand, we already have type struct which is ~80% empty for most of types.
The ultimate difference is qualitative, not quantitative - it's between being able and not being able to run a particular program, and in general, being able and not being able to use Python on constrained systems. Without scrutinizing each case and fighting for each byte, we'll soon end up in the "not" land... |
I hear you, but it smells like a premature optimisation. A user program with enough declared types to make this a real problem is almost certainly going to run into some other platform limitations first. If it's really in fact a problem, then there may be other ways to solve it too that trade off cpu cycles for memory efficiency, such as a dictionary of finalizers keyed on the type pointer (if finalizers really are that rarely used), or an alternative "light" base class that triggers special gc handling that could optionally be used by user programs when a class doesn't need finalization (or other features that might bloat the type data). But ultimately, I don't think that would be needed - one extra 32-bit value per type IMO isn't likely to jeopardize viability of the platform except in very rare situations. |
I agree, and +1 for smaller base type, anyway, I moved del to locals for now. |
Finalisation was implemented a while back, so closing this issue. |
Was it? Is it possible to define |
|
If |
and no examples in documentation sadly to help solving some points for #4993 that would need group finalizers. |
A good discussion went in comment to another PullReq, adding top-level ticket to be able to find it later: #238 (comment)
The text was updated successfully, but these errors were encountered: