-
-
Notifications
You must be signed in to change notification settings - Fork 8.2k
WIP: use a root stack to explicitly track root pointers #4723
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Running unix coverage tests yields: 709 tests 709 passed 32 skipped can get vfs_fat_ramdisklarge passing (not skipped) with 640k heap remaining skipped are native based working on pyboard with pystack enabled there
I've looked at this, and think I understand the principle. I have some comments which are not thoroughly thought out and may be wrong or impractical (this is written before breakfast), but see what you think. For simplicity, I'm assuming a 32-bit word, when I really mean 32 or 64 (basic word or pointer size). Root pointers A root pointer is a pointer to an object that should not be gc'd even if nothing else is pointing to it. There are two kinds of root pointers:
If I understand correctly, this scheme handles the second case, so that pointers on the stack will not be gc'd if a gc is done during that existence of that stack frame. To do that, you keep a temporary list (a "stack") of those pointers. The "problem" that's being solved is that there are mixed MicroPython and non-MicroPython things on the stack: there are object pointers, MPy ints, etc., and then there are plain C pointers, ints, etc. When you scan a stack frame, you can't tell the difference. Bitmap indicating object pointers on the stack Suppose instead of using a root pointer stack, you used a bitmap, where each bit represented whether a word on the stack was an MPy pointer or not. You'd set and clear these bits instead of pushing and popping pointers on the root pointer stack. The bitmap memory size be 1/32nd of the maximum stack size. This may or may not save RAM: it depends on how much room you need for the root pointer stack. Extending the bitmap to the heap and static space Suppose that the bitmap above were extended to all of RAM, or at least all of RAM that could contain MPy objects. Then you could mark whether each 32-bit word was an MPy object pointer or not. Further, if you used pairs of bits instead of single bits, it could also be the gc mark table. (EDIT: original parts of this were just wrong - should be talking the object blocks themselves, not pointers to them. I've adjusted with other suggestions, but it may be less interesting.)
I think with this scheme, you could get rid of heap allocation blocks, and allocate memory with smaller granularity. The overhead is 1/16th of heap+stack. Maybe you already considered and discarded this for time or space reasons. |
This is usually called shadow stack.
Well, usually one go thru a trouble like that to implement a compacting GC.
The question is also how much performance it costs.
This requires thinking ahead. PUSH()/POP() should be macros, as fundamental as TO_OBJ()/FROM_OBJ(). POP(obj) should assign pointer from stack (which may be relocated in the meantime) back to the object. And even without popping, it should be reloaded from shadow stack after each checkpoint (call to a function which performs memory allocation).
This again raises concerns of efficiency. Doing So, maybe supporting 2 types of entries is smart. But one still may wonder if going with "store shadow stack on the main stack" is still smarter. E.g.:
That's quite a bunch of boilerplate of course, and dealing with register var optimization will only add more (2 explicit vars, one on shadow stack, one cached in register, reloading as needed). All in all, even PUSH()/POP() stuff adds enough boilerplate (and failure points!), that's why such things are usually done in C++, e.g. https://developer.mozilla.org/en-US/docs/Mozilla/Projects/SpiderMonkey/GC_Rooting_Guide |
This is still relevant, but no further progress has been made. |
I'll close this because there's still a lot of work to make it mergeable, and it's also very difficult to prove correctness of code that uses a root stack (puts a big burden on users that write C extensions). There are also other reasons, see #4131 (comment) |
This is something I was working on for a while, but only now got it to a point where it works well enough to be useful.
The idea is to remove the need to scan the C stack and machine registers for root pointers, and instead track root pointers explicitly throughout the code by maintaining a stack of root pointers, a "root stack".
The reason to look at it now is because of issues like #4652, #4705, and the problem with the Javascript port not being able to trace machine registers. The patch here should be a full solution to those issues.
The code here is not really mergeable (yet) but instead shows what is necessary to do this, to track the root pointers manually.
Running unix coverage tests yields:
709 tests
709 passed
32 skipped
It also works on pyboard with pystack enabled there.
Some points to note: