Skip to content

WIP: use a root stack to explicitly track root pointers #4723

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 1 commit into from

Conversation

dpgeorge
Copy link
Member

This is something I was working on for a while, but only now got it to a point where it works well enough to be useful.

The idea is to remove the need to scan the C stack and machine registers for root pointers, and instead track root pointers explicitly throughout the code by maintaining a stack of root pointers, a "root stack".

The reason to look at it now is because of issues like #4652, #4705, and the problem with the Javascript port not being able to trace machine registers. The patch here should be a full solution to those issues.

The code here is not really mergeable (yet) but instead shows what is necessary to do this, to track the root pointers manually.

Running unix coverage tests yields:

709 tests
709 passed
32 skipped

It also works on pyboard with pystack enabled there.

Some points to note:

  • the patch here is mostly just a bunch of push/pop pairs at various places to make sure root pointers are reachable
  • the feature can be completely disabled if not needed and traditional scanning of root pointers (using the C stack and machine registers) used instead
  • to test, aggressive GC collection was enabled (collection at each opcode, at each memory allocation, and at some other locations)
  • code size increases by about 3000 bytes
  • an additional region of memory is needed for the root stack
  • the GC is now much more precise and could be on the path to becoming 100% precise with this patch (and a lot of extra work)
  • it may now be possible to do some very basic form of moving of objects in the heap to defragment it

Running unix coverage tests yields:

709 tests
709 passed
32 skipped

can get vfs_fat_ramdisklarge passing (not skipped) with 640k heap
remaining skipped are native based

working on pyboard with pystack enabled there
@dhalbert
Copy link
Contributor

dhalbert commented Apr 26, 2019

I've looked at this, and think I understand the principle. I have some comments which are not thoroughly thought out and may be wrong or impractical (this is written before breakfast), but see what you think.

For simplicity, I'm assuming a 32-bit word, when I really mean 32 or 64 (basic word or pointer size).

Root pointers

A root pointer is a pointer to an object that should not be gc'd even if nothing else is pointing to it. There are two kinds of root pointers:

  1. Static root pointers, like singletons, etc.
  2. Object pointers created on the stack, which may or may not be temporary for the life of the stack frame. If not temporary, they will be returned or added to another object.

If I understand correctly, this scheme handles the second case, so that pointers on the stack will not be gc'd if a gc is done during that existence of that stack frame. To do that, you keep a temporary list (a "stack") of those pointers.

The "problem" that's being solved is that there are mixed MicroPython and non-MicroPython things on the stack: there are object pointers, MPy ints, etc., and then there are plain C pointers, ints, etc. When you scan a stack frame, you can't tell the difference.

Bitmap indicating object pointers on the stack

Suppose instead of using a root pointer stack, you used a bitmap, where each bit represented whether a word on the stack was an MPy pointer or not. You'd set and clear these bits instead of pushing and popping pointers on the root pointer stack. The bitmap memory size be 1/32nd of the maximum stack size. This may or may not save RAM: it depends on how much room you need for the root pointer stack.

Extending the bitmap to the heap and static space

Suppose that the bitmap above were extended to all of RAM, or at least all of RAM that could contain MPy objects. Then you could mark whether each 32-bit word was an MPy object pointer or not.

Further, if you used pairs of bits instead of single bits, it could also be the gc mark table.

(EDIT: original parts of this were just wrong - should be talking the object blocks themselves, not pointers to them. I've adjusted with other suggestions, but it may be less interesting.)

  • 0b00 - The corresponding address is not an MPy pointer.

  • 0b01 - The corresponding address is an MPy root pointer. Do not gc. (EDIT: changed from original post)

  • 0b1m - The corresponding address is an MPy pointer that can be considered for gc. The m bit is set during gc mark.

  • 0b00 - The corresponding address is free memory or the continuation of an allocated object.

  • 0b01 - The corresponding address is an MPy root pointer. Do not gc.

  • 0b1m - The corresponding address is the beginning of an allocated object. The m bit is set during gc mark.

I think with this scheme, you could get rid of heap allocation blocks, and allocate memory with smaller granularity. The overhead is 1/16th of heap+stack. Maybe you already considered and discarded this for time or space reasons.

@pfalcon
Copy link
Contributor

pfalcon commented Apr 26, 2019

a "root stack".

This is usually called shadow stack.

The reason to look at it now is because of issues like #4652, #4705, and the problem with the Javascript port not being able to trace machine registers.

Well, usually one go thru a trouble like that to implement a compacting GC.

code size increases by about 3000 bytes

The question is also how much performance it costs.

it may now be possible to do some very basic form of moving of objects in the heap to defragment it

This requires thinking ahead. PUSH()/POP() should be macros, as fundamental as TO_OBJ()/FROM_OBJ(). POP(obj) should assign pointer from stack (which may be relocated in the meantime) back to the object. And even without popping, it should be reloaded from shadow stack after each checkpoint (call to a function which performs memory allocation).

typedef struct _root_stack_elem_t {
    int kind;
    void *ptr;
} root_stack_elem_t;

This again raises concerns of efficiency. Doing if on critical path of collecting shadow stack probably hits performance. Common sense would say that stick with either direct-copy shadow stack or indirect ptr shadow stack would make sense. Sticking with direct-copy leads to potential degenerated cases like mp_obj_t foo[100] - copies of 100 pointers would need to be added to shadowstack. With indirect scheme, could add (addr, num_entries). Which would degenerate for a common case of adding just 1 var. Also, taking an address of var will kill register optimization for it.

So, maybe supporting 2 types of entries is smart. But one still may wonder if going with "store shadow stack on the main stack" is still smarter. E.g.:

void foo() {
  volatile struct foo_frame {
    mp_obj_t a, b;
    void *prev_frame;
  } _roots;
#define a _roots.a
#define b _roots.b
  _roots.prev_frame = shadow_top;
  shadow_top = &_roots;
...
  shadow_top = _roots.prev_frame;
}

That's quite a bunch of boilerplate of course, and dealing with register var optimization will only add more (2 explicit vars, one on shadow stack, one cached in register, reloading as needed).

All in all, even PUSH()/POP() stuff adds enough boilerplate (and failure points!), that's why such things are usually done in C++, e.g. https://developer.mozilla.org/en-US/docs/Mozilla/Projects/SpiderMonkey/GC_Rooting_Guide

@dpgeorge
Copy link
Member Author

dpgeorge commented Jul 6, 2021

This is still relevant, but no further progress has been made.

@dpgeorge dpgeorge added the py-core Relates to py/ directory in source label Nov 30, 2021
@dpgeorge
Copy link
Member Author

I'll close this because there's still a lot of work to make it mergeable, and it's also very difficult to prove correctness of code that uses a root stack (puts a big burden on users that write C extensions).

There are also other reasons, see #4131 (comment)

@dpgeorge dpgeorge closed this Sep 23, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
py-core Relates to py/ directory in source
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants