-
-
Notifications
You must be signed in to change notification settings - Fork 8.2k
Towards possibility of precise garbage collector #1161
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
It's difficult to be 100% precise because you also need to track the "type" of objects on the stack and in registers. |
Well, I don't know what to say here, but there gotta be way to deal with that. E.g., make sure that any pointers are flushed to the heap or static rootset at checkpoints when GC may happen. There're surely many steps to have memory-efficient compacted heap, and having obvious deficiencies in current memalloc preclude even experimenting with them. |
We can learn from high quality GC implementations such as V8. See eg this blog post http://jayconrod.com/posts/55/a-tour-of-v8-garbage-collection for a nice overview. |
Sure, but what about deciding on the down-to-earth question this ticket poses: how to distinguish object and non-object allocation? |
I have thought a lot about how to improve the memory manager and the reason I haven't done anything concrete yet is because of these issues of being 100% precise and of compacting in the presence of interrupts. I wanted to understand how to fully solve the problems before going in a particular direction, and writing code that may end up being unused.
I don't see a problem with having more functions (they are mostly macro wrappers anyway). A more specific malloc function gives more information about your intentions for that memory, and may allow further optimisations with other kinds of memory managers.
Yes. For raw memory regions could make a special uPy type which tags them (eg mp_type_raw_memory) and then all allocated chunks within the heap start with a 1-word pointer to an object of type struct mp_obj_type_t. |
Yes, I thought a lot too, but I didn't try to do much, because even for experimenting, a lot of prerequisites are required. Now that there're even new people appearing who would like to experiment, maybe it's to move in that direction ;-).
I don't have problems with having many functions as we have already, but doubling them - I'm not sure. One of the aim of the refactoring is to make memory management/GC more pluggable, and just understanding (re)defining couple of dozens of functions is a chore and not really welcoming. So, what about 2nd approach I propose - we keep existing functions for objects, and get just couple of functions for "raw" allocations. The latter will of course require auditing and possible updating of existing code, and by the amount of refactoring required, we can see if it makes sense, and if we lose any additional semantics by using bare "malloc" for raw allocations.
Yes, that's the idea (to make that possible, why at the same time not enforce it, and work with existing conservative C with minimal changes). |
Well, not that easy. What I was thinking is that mp_raw_alloc() would allocate block with type header, offset it, and return. But for GC, ptrs stored in objects should be to the head of block. So "transparent" scheme won't work, all requesters of raw memory should be fully aware that they receive a kind of "handle", use that to store in ptr fields, but do explicit cast to access the underlying memory with something like MP_RAW_DATA(). Yeah, it's not easy to retrofit more advanced memory management into existing code, but the longer we won't do that, the more it will fall behind the competition. And we should have the same cast-accessor to get an mp_obj_t from memory (because in memory it may be stored in other form than just a full pointer). |
Why? Why can't the invariant be that all ptrs to head allocated RAM point to 1 word beyond the head? |
Invariant for all can be anything. But here we talk about how to differentiate pointers to objects from pointers to non-objects (both of which should be differentiated from non-pointers). And you can't store pointers to objects as ptr-to-head and to non-objects as ptr-past-head. |
I still don't understand. Aren't all pointers into the heap now pointers to objects? (Because all allocated chunks on the heap start with a type identification word.) |
Hello, On Tue, 24 Mar 2015 03:29:47 -0700
Just have a look at "typedef struct _mp_obj_str_t": "const byte *data;" So, we have 2 choices:
Best regards, |
Ok I now understand, thanks for clarification! My original thinking was that type-tagging of memory chunks would be transparent to the application, but I see what you are describing is a tight coupling between the memory manager and the application (which it needs to be if we want efficiency). We can do the type-tagging many different ways. For example:
The first 2 approaches are transparent to the application. Approach 2 is more complicated than approach 1 but uses less RAM overhead. Approach 3 is not transparent to the application and potentially uses more ROM to do the offsetting at many points in the code. Anyway, anyway, I'm sure there are many other ways of doing it. Point is that we should keep an open mind and make sure the macro wrappers we use can accommodate all these different schemes, since it's not clear a priori which is better. |
Or one (or more) bitfield arrays which indicate the type or other useful aspect. (unreferenced so can be collected, pointers that start, pointers that start inside, type, etc) |
Was wondering: is it ok for a malloc-like allocator function to return pointers which are word-aligned? |
As opposed to..? The specs are that malloc returns a pointer "suitably aligned for any built-in type". |
True, AFAIK, every malloc implementation works as @dpgeorge says.
|
Ok, just asking. |
This issue dead? It's been 3+ years. |
It's still open/active, and still interesting. |
It seems that a breaking change in pointer management and API must be introduced because of need for double indirection. Is this something you’d consider? |
For sure there needs to be a big change to make the GC precise. And even more change to get to a compacting collector. IMO this can't be done by incremental changes to master, but rather it would be worked on in a separate branch to investigate how feasible it is first, and find out if there are any big issues that have not yet been understood. |
Nrf52 neopixel
|
It seems that many approaches of further evolution of GC in uPy depend on being able to do precise GC, i.e. tell which fields with an object are pointers are which are not. To achieve this, each allocated memory block has to be explicitly typed (and from type, internal layout can be inferred). Way to achieve this while staying compatible with conservative GC and without updating code largely is to have to set of memory allocation routines: one set is to deal with uPy objects (which already have type header is structure member) and another set to deal with "raw" memory allocations (which will need type header added implicitly for precise GC, and nothing added for conservative GC).
We currently have ~dozen functions to do memalloc. The above means doubling them. Other approach would be to reserve all current functions to uPy objects, add just add 2 funcs for "raw" memory: m_malloc() (will use implicit "all pointers inside" type) and m_malloc_typed() which takes explicit type for allocation.
Other thoughts?
The text was updated successfully, but these errors were encountered: