WIP: use a root stack to explicitly track root pointers #4723

dpgeorge · 2019-04-26T01:23:40Z

This is something I was working on for a while, but only now got it to a point where it works well enough to be useful.

The idea is to remove the need to scan the C stack and machine registers for root pointers, and instead track root pointers explicitly throughout the code by maintaining a stack of root pointers, a "root stack".

The reason to look at it now is because of issues like #4652, #4705, and the problem with the Javascript port not being able to trace machine registers. The patch here should be a full solution to those issues.

The code here is not really mergeable (yet) but instead shows what is necessary to do this, to track the root pointers manually.

Running unix coverage tests yields:

709 tests
709 passed
32 skipped

It also works on pyboard with pystack enabled there.

Some points to note:

the patch here is mostly just a bunch of push/pop pairs at various places to make sure root pointers are reachable
the feature can be completely disabled if not needed and traditional scanning of root pointers (using the C stack and machine registers) used instead
to test, aggressive GC collection was enabled (collection at each opcode, at each memory allocation, and at some other locations)
code size increases by about 3000 bytes
an additional region of memory is needed for the root stack
the GC is now much more precise and could be on the path to becoming 100% precise with this patch (and a lot of extra work)
it may now be possible to do some very basic form of moving of objects in the heap to defragment it

Running unix coverage tests yields: 709 tests 709 passed 32 skipped can get vfs_fat_ramdisklarge passing (not skipped) with 640k heap remaining skipped are native based working on pyboard with pystack enabled there

dhalbert · 2019-04-26T13:54:39Z

I've looked at this, and think I understand the principle. I have some comments which are not thoroughly thought out and may be wrong or impractical (this is written before breakfast), but see what you think.

For simplicity, I'm assuming a 32-bit word, when I really mean 32 or 64 (basic word or pointer size).

Root pointers

A root pointer is a pointer to an object that should not be gc'd even if nothing else is pointing to it. There are two kinds of root pointers:

Static root pointers, like singletons, etc.
Object pointers created on the stack, which may or may not be temporary for the life of the stack frame. If not temporary, they will be returned or added to another object.

If I understand correctly, this scheme handles the second case, so that pointers on the stack will not be gc'd if a gc is done during that existence of that stack frame. To do that, you keep a temporary list (a "stack") of those pointers.

The "problem" that's being solved is that there are mixed MicroPython and non-MicroPython things on the stack: there are object pointers, MPy ints, etc., and then there are plain C pointers, ints, etc. When you scan a stack frame, you can't tell the difference.

Bitmap indicating object pointers on the stack

Suppose instead of using a root pointer stack, you used a bitmap, where each bit represented whether a word on the stack was an MPy pointer or not. You'd set and clear these bits instead of pushing and popping pointers on the root pointer stack. The bitmap memory size be 1/32nd of the maximum stack size. This may or may not save RAM: it depends on how much room you need for the root pointer stack.

Extending the bitmap to the heap and static space

Suppose that the bitmap above were extended to all of RAM, or at least all of RAM that could contain MPy objects. Then you could mark whether each 32-bit word was an MPy object pointer or not.

Further, if you used pairs of bits instead of single bits, it could also be the gc mark table.

(EDIT: original parts of this were just wrong - should be talking the object blocks themselves, not pointers to them. I've adjusted with other suggestions, but it may be less interesting.)

0b00 - The corresponding address is not an MPy pointer.
~~0b01 - The corresponding address is an MPy root pointer. Do not gc. (EDIT: changed from original post)~~
0b1m - The corresponding address is an MPy pointer that can be considered for gc. The m bit is set during gc mark.
0b00 - The corresponding address is free memory or the continuation of an allocated object.
0b01 - The corresponding address is an MPy root pointer. Do not gc.
0b1m - The corresponding address is the beginning of an allocated object. The m bit is set during gc mark.

I think with this scheme, you could get rid of heap allocation blocks, and allocate memory with smaller granularity. The overhead is 1/16th of heap+stack. Maybe you already considered and discarded this for time or space reasons.

pfalcon · 2019-04-26T21:42:37Z

a "root stack".

This is usually called shadow stack.

The reason to look at it now is because of issues like #4652, #4705, and the problem with the Javascript port not being able to trace machine registers.

Well, usually one go thru a trouble like that to implement a compacting GC.

code size increases by about 3000 bytes

The question is also how much performance it costs.

it may now be possible to do some very basic form of moving of objects in the heap to defragment it

This requires thinking ahead. PUSH()/POP() should be macros, as fundamental as TO_OBJ()/FROM_OBJ(). POP(obj) should assign pointer from stack (which may be relocated in the meantime) back to the object. And even without popping, it should be reloaded from shadow stack after each checkpoint (call to a function which performs memory allocation).

typedef struct _root_stack_elem_t {
    int kind;
    void *ptr;
} root_stack_elem_t;

This again raises concerns of efficiency. Doing if on critical path of collecting shadow stack probably hits performance. Common sense would say that stick with either direct-copy shadow stack or indirect ptr shadow stack would make sense. Sticking with direct-copy leads to potential degenerated cases like mp_obj_t foo[100] - copies of 100 pointers would need to be added to shadowstack. With indirect scheme, could add (addr, num_entries). Which would degenerate for a common case of adding just 1 var. Also, taking an address of var will kill register optimization for it.

So, maybe supporting 2 types of entries is smart. But one still may wonder if going with "store shadow stack on the main stack" is still smarter. E.g.:

void foo() {
  volatile struct foo_frame {
    mp_obj_t a, b;
    void *prev_frame;
  } _roots;
#define a _roots.a
#define b _roots.b
  _roots.prev_frame = shadow_top;
  shadow_top = &_roots;
...
  shadow_top = _roots.prev_frame;
}

That's quite a bunch of boilerplate of course, and dealing with register var optimization will only add more (2 explicit vars, one on shadow stack, one cached in register, reloading as needed).

All in all, even PUSH()/POP() stuff adds enough boilerplate (and failure points!), that's why such things are usually done in C++, e.g. https://developer.mozilla.org/en-US/docs/Mozilla/Projects/SpiderMonkey/GC_Rooting_Guide

dpgeorge · 2021-07-06T04:22:54Z

This is still relevant, but no further progress has been made.

dpgeorge · 2022-09-23T03:14:32Z

I'll close this because there's still a lot of work to make it mergeable, and it's also very difficult to prove correctness of code that uses a root stack (puts a big burden on users that write C extensions).

There are also other reasons, see #4131 (comment)

WIP: use a root stack to explicitly track root pointers

a462689

Running unix coverage tests yields: 709 tests 709 passed 32 skipped can get vfs_fat_ramdisklarge passing (not skipped) with 640k heap remaining skipped are native based working on pyboard with pystack enabled there

This was referenced Apr 26, 2019

MicroPython and emscripten #3474

Closed

Win32 micropython builds corrupt memory upon repeated execution of a particular function #4652

Closed

GeorgeWort mentioned this pull request May 31, 2019

Use a root stack to explicitly track root pointers microbit-foundation/micropython-simulator#2

Open

pmp-p mentioned this pull request Jul 5, 2019

port-javascript: can't push twice #4860

Closed

pmp-p mentioned this pull request Aug 8, 2019

Convert Micropython to JavaScript in order to create an online simulator lvgl/lvgl#792

Closed

5 tasks

dpgeorge mentioned this pull request Aug 17, 2021

tests/extmod/vfs_fat_finaliser.py: Fix on unix qemu-arm (WIP) #7659

Closed

dpgeorge added the py-core Relates to py/ directory in source label Nov 30, 2021

dpgeorge closed this Sep 23, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

WIP: use a root stack to explicitly track root pointers #4723

WIP: use a root stack to explicitly track root pointers #4723

Uh oh!

dpgeorge commented Apr 26, 2019

Uh oh!

dhalbert commented Apr 26, 2019 •

edited

Loading

Uh oh!

pfalcon commented Apr 26, 2019

Uh oh!

dpgeorge commented Jul 6, 2021

Uh oh!

dpgeorge commented Sep 23, 2022

Uh oh!

Uh oh!

Uh oh!

WIP: use a root stack to explicitly track root pointers #4723

WIP: use a root stack to explicitly track root pointers #4723

Uh oh!

Conversation

dpgeorge commented Apr 26, 2019

Uh oh!

dhalbert commented Apr 26, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pfalcon commented Apr 26, 2019

Uh oh!

dpgeorge commented Jul 6, 2021

Uh oh!

dpgeorge commented Sep 23, 2022

Uh oh!

Uh oh!

dhalbert commented Apr 26, 2019 •

edited

Loading