Skip to content

Fix inconsistent supervisor heap #3482

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 3 commits into from
Oct 17, 2020
Merged

Fix inconsistent supervisor heap #3482

merged 3 commits into from
Oct 17, 2020

Conversation

cwalther
Copy link

There is a logic error in the supervisor memory allocation functions that causes the supervisor heap to get into an inconsistent state, eventually resulting in crashes, when allocations are freed in a different order from the reverse of how they were allocated (leaving holes). Apparently nobody has been doing that so far, but now my recent experiments on #1084 at one point did.

free_memory() relies on having allocations in order, but allocate_memory() does not guarantee that: It reuses the first allocation with a NULL ptr without ensuring that it is between low_address and high_address. When it belongs to a hole in the allocated memory, such an allocation is not really free for reuse, because free_memory() still needs its length.

Here’s a sketch that hopefully explains better than words what goes wrong in a specific sequence of calls: c = allocate(8) reuses an out-of-order slot, then free(b) moves low_address outside of the heap, and the next allocation after that will probably crash when the caller tries to use it.

image

The simplest way of fixing that I found is to explicitly mark allocations available for reuse with a special (invalid) value in the length field. Only allocations that lie between low_address and high_address are marked that way.

There is an open question on what value to use for that special marker. Using zero saves a couple of bytes of code for initialization (and possibly the comparisons use fewer instructions too), 28 bytes shorter code on a SAMD51 in my tests. The drawback is that it makes it impossible to support zero-length allocations (allocate_memory() needs to check for that and return NULL). Such allocations are mostly useless, because while you get a valid pointer, you can’t write to it, but clients that don’t try to do that might have gotten away with it. I have identified one possible zero-length caller, rgbmatrix_rgbmatrix_make_new(), but there may be other sloppily programmed ones in the future. Do we want to be safe against sloppy code, or do we want it to crash?

@jepler
Copy link

jepler commented Sep 28, 2020

This is good sleuthing work!

I think there's little reason to support a zero-byte supervisor allocation, and every reason to avoid having to give the supervisor heap a non-zero initialization.

Will this change allow a supervisor block that is deallocated from the middle to be allocated again, if the new allocation fits? This would be very beneficial for the framebuffer displays, RGBMatrix and SharpMemoryDisplay.

@cwalther
Copy link
Author

This is good sleuthing work!

It taught me how to use a Raspberry Pi with OpenOCD as an SWD debugger, which I’m sure is going to come in handy again! :)

I think there's little reason to support a zero-byte supervisor allocation, and every reason to avoid having to give the supervisor heap a non-zero initialization.

OK, will force-push the other solution. Should I try to adjust rgbmatrix_rgbmatrix_make_new() to that situation? I didn’t actually test it (so my analysis may be wrong), because my board (pewpew_m4) doesn’t seem to have that module enabled.

Will this change allow a supervisor block that is deallocated from the middle to be allocated again, if the new allocation fits? This would be very beneficial for the framebuffer displays, RGBMatrix and SharpMemoryDisplay.

At this stage, no. It could be done if the allocation fits exactly into the hole. I don’t think it can be done if the new allocation is smaller than the hole, because that needs an extra slot for the new, smaller hole, but I’d have to think about it some more.

@jepler
Copy link

jepler commented Sep 28, 2020

Can you tell me more about what you see in rgbmatrix?

In the cases I'm interested in the new allocations would likely match previously-freed ones, but of course not necessarily in the same order.

@cwalther
Copy link
Author

It looks like rgbmatrix_rgbmatrix_make_new can be called from Python with argument width = 0, and then that goes through common_hal_rgbmatrix_rgbmatrix_construct (bufsize) → common_hal_rgbmatrix_rgbmatrix_reconstructcommon_hal_rgbmatrix_allocator_implallocate_memory. Although, thinking about it now, when calling from Python the Python heap will be available, so common_hal_rgbmatrix_allocator_impl would go there, not to the supervisor heap. So maybe it’s impossible after all.

@jepler
Copy link

jepler commented Sep 28, 2020

width=0 or height=0 should be errors, feel free to make it so!

@cwalther
Copy link
Author

Okay, but not today. Meanwhile, here’s the zero-marker version. The nonzero-marker one is still at cwalther:alloc-nonzero.

@tannewt tannewt requested a review from jepler September 28, 2020 22:23
jepler added a commit to jepler/circuitpython that referenced this pull request Sep 29, 2020
In adafruit#3482, @cwalther noted that, hypothetically, a zero byte allocation
could be made in the RGBMatrix constructor.  Ensure that width is positive.
Height was already checked against the number of RGB pins if it was specified,
so zero is ruled out there as well.
@jepler jepler mentioned this pull request Sep 29, 2020
@cwalther
Copy link
Author

I added reuse of holes that fit exactly, however without checking in what circumstances this is actually used.

My first attempt recovered the ptr of a reused allocation by adding up neighbor lengths, which works but is a bit involved. So I changed things to keep the pointers around all the time. To mark an allocation as freed by the client, I no longer set the ptr to NULL, but store that marker in the length too. We still need the length, but because a valid length is always divisible by 4, its lowest bit can be used for that.

With #3485 split out, I think this is good to go from my side.

@cwalther cwalther marked this pull request as ready for review September 29, 2020 21:30
jepler added a commit to jepler/circuitpython that referenced this pull request Sep 30, 2020
In adafruit#3482, @cwalther noted that, hypothetically, a zero byte allocation
could be made in the RGBMatrix constructor.  Ensure that width is positive.
Height was already checked against the number of RGB pins if it was specified,
so zero is ruled out there as well.
Copy link

@jepler jepler left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Scott asked me to review this PR particularly to decide whether it should go into the next beta release which is coming real soon.

At this time, I think we should skip incorporating this PR, and reevaluate after beta.1 goes out:

  • I was not really able to give this code a thorough code review, I only did some testing
  • It is not fixing a specific known crash, but rather seeks to address a deficiency of some internal core code that we have (accidentally or by design) mostly worked around without understanding it
  • It may be unfair, but because I knew it was likely to exercise supervisor allocations in a different way than "normal" (and thus, differently than @cwalther had checked) I tested sharpdisplay + framebufferio. I found that it made the memory corruption of Memory corruption: feather nrf52840 + sharp memory display 400x240 #3473 much easier to trigger

@cwalther Thank you for your work on this thus far! I just think this is not the right moment for it, since I identified a regression (though without relating it to a specific problem in your PR). I hope that after beta.1 I will be able to look into the matter further. I'm grateful that you have taken a look into this and the result of my testing your code "in the neighborhood of" #3473 reinforces my preconception that the issue there is related to the supervisor heap.

My testing procedure: I placed the following content in setup_display.py:

import displayio
import board
import framebufferio
import sharpdisplay
import gc

gc.collect();
a, f = gc.mem_alloc(), gc.mem_free()
print(a+f, a, f)

displayio.release_displays()
framebuffer = sharpdisplay.SharpMemoryFramebuffer(board.SPI(),
        board.RX,
        400,
        240)
display = framebufferio.FramebufferDisplay(framebuffer,
        auto_refresh=True)

gc.collect()
a, f = gc.mem_alloc(), gc.mem_free()
print(a+f, a, f)

and then imported it at the REPL, exited, and repeated the process two or three times. I used a debug build on an nrf52840 feather. Generally, the result is a reset into safe mode due to some kind of memory corruption:

Breakpoint 1, reset_into_safe_mode (reason=reason@entry=MICROPY_FATAL_ERROR)
    at ../../supervisor/shared/safe_mode.c:95
95	void __attribute__((noinline,)) reset_into_safe_mode(safe_mode_t reason) {
(gdb) where
#0  reset_into_safe_mode (reason=reason@entry=MICROPY_FATAL_ERROR)
    at ../../supervisor/shared/safe_mode.c:95
#1  0x00044256 in __fatal_error (msg=msg@entry=0x8da40 "Assertion failed")
    at ../../main.c:560
#2  0x00044270 in __assert_func (file=file@entry=0x81480 "../../py/qstr.c", 
    line=line@entry=140, func=func@entry=0x84a94 <__func__.13814> "find_qstr", 
    expr=expr@entry=0x81458 "q - pool->total_prev_len < pool->len")
    at ../../main.c:567
#3  0x000289d8 in find_qstr (q=<optimized out>) at ../../py/qstr.c:141
#4  0x00028c28 in qstr_data (q=<optimized out>, len=len@entry=0x2003fae4)
    at ../../py/qstr.c:269
#5  0x00029676 in mp_vprintf (print=0x2003fb34, print@entry=0x2003fb4c, 
    fmt=0x2003fb17 "q' is not defined", 
    fmt@entry=0x2003fb10 "name '%q' is not defined", args=...)
    at ../../py/mpprint.c:483
#6  0x00035de6 in mp_obj_new_exception_msg_vlist (
    exc_type=0x894e8 <mp_type_NameError>, fmt=0x93258 <v>, 
    fmt@entry=0x20010588 <_pystack+72>, ap=..., ap@entry=...)
    at ../../py/objexcept.c:440
#7  0x00035e7e in mp_obj_new_exception_msg_varg (exc_type=<optimized out>, 
    fmt=0x93258 <v>) at ../../py/objexcept.c:392
#8  0x00031928 in mp_load_global (qst=qst@entry=534) at ../../py/runtime.c:183
#9  0x00031964 in mp_load_name (qst=534) at ../../py/runtime.c:161
#10 0x0003f8f8 in mp_execute_bytecode (
    code_state=code_state@entry=0x20010560 <_pystack+32>, 
    inject_exc=<optimized out>, inject_exc@entry=0x0) at ../../py/vm.c:275
#11 0x000367ba in fun_bc_call (self_in=0x20014d70, n_args=<optimized out>, 
    n_kw=0, args=0x0) at ../../py/objfun.c:286
#12 0x00031cc4 in mp_call_function_n_kw (fun_in=0x20014d70, 
    n_args=n_args@entry=0, n_kw=n_kw@entry=0, args=args@entry=0x0)
    at ../../py/runtime.c:624
#13 0x00031cf0 in mp_call_function_0 (fun=<optimized out>)
    at ../../py/runtime.c:598
#14 0x00031d2e in mp_parse_compile_execute (lex=lex@entry=0x20014ea0, 
    parse_input_kind=parse_input_kind@entry=MP_PARSE_FILE_INPUT, 
    globals=globals@entry=0x20039aa0, locals=locals@entry=0x20039aa0)
    at ../../py/runtime.c:1488
#15 0x0003e5d0 in do_load_from_lexer (module_obj=module_obj@entry=0x20039ab0, 
    lex=0x20014ea0) at ../../py/builtinimport.c:152
#16 0x0003e608 in do_load (module_obj=module_obj@entry=0x20039ab0, 
    file=file@entry=0x2003fe0c) at ../../py/builtinimport.c:245
#17 0x0003e8c8 in mp_builtin___import__ (n_args=n_args@entry=5, 
    args=args@entry=0x2003fe4c) at ../../py/builtinimport.c:480
#18 0x00031a16 in mp_import_name (name=<optimized out>, 
    fromlist=<optimized out>, level=<optimized out>) at ../../py/runtime.c:1392
#19 0x000406ec in mp_execute_bytecode (
    code_state=code_state@entry=0x20010540 <_pystack>, 
    inject_exc=<optimized out>, inject_exc@entry=0x0) at ../../py/vm.c:1219
#20 0x000367ba in fun_bc_call (self_in=0x20014d40, n_args=<optimized out>, 
    n_kw=0, args=0x0) at ../../py/objfun.c:286
#21 0x00031cc4 in mp_call_function_n_kw (fun_in=fun_in@entry=0x20014d40, 
    n_args=n_args@entry=0, n_kw=n_kw@entry=0, args=args@entry=0x0)
    at ../../py/runtime.c:624
#22 0x00031cf0 in mp_call_function_0 (fun=fun@entry=0x20014d40)
    at ../../py/runtime.c:598
#23 0x00050e4a in parse_compile_execute (source=source@entry=0x2003ffb8, 
    input_kind=<optimized out>, exec_flags=exec_flags@entry=22, 
    result=result@entry=0x0) at ../../lib/utils/pyexec.c:114
#24 0x0005110e in pyexec_friendly_repl () at ../../lib/utils/pyexec.c:520
#25 0x00044158 in run_repl () at ../../main.c:447
#26 0x000441ce in main () at ../../main.c:512

Typical repl output:

Press any key to enter the REPL. Use CTRL-D to reload.
Adafruit CircuitPython 6.0.0-beta.0-82-g009574dda on 2020-09-30; Adafruit Feather nRF52840 Express with nRF52840
>>> 
>>> import setup_display
152448 768 151680
152448 14448 138000
>>> 
soft reboot

Auto-reload is on. Simply save files over USB to run them or enter REPL to disable.



Press any key to enter the REPL. Use CTRL-D to reload.
Adafruit CircuitPython 6.0.0-beta.0-82-g009574dda on 2020-09-30; Adafruit Feather nRF52840 Express with nRF52840
>>> import setup_display
151296 768 150528
Assertion '

(the final line is cut off)

@cwalther
Copy link
Author

cwalther commented Oct 1, 2020

Thanks for the comments! I have no problem with waiting until after beta.1, or even after the 6.0 release.

There is some likelihood that my work on #1084 will trigger the bug fixed here, and therefore become dependent on the fix, but I don’t think that is a problem.

Maybe I can do some testing on #3473 too if I’m getting impatient, although I don’t have an nRF52 and a Sharp display. As far as debugging is concerned, making the bug easier to trigger seems like a good thing…

jepler added a commit to jepler/circuitpython that referenced this pull request Oct 1, 2020
This is enabled by adafruit#3482

I was unable to determine why previously I had added sizeof(void*)
to the GC heap allocation, so I removed that code as a mistake.
jepler added a commit to jepler/circuitpython that referenced this pull request Oct 1, 2020
If `supervisor_start_terminal` is called twice in a row without
`supervisor_stop_terminal`, it would lose track of a supervisor
allocation.

This can occur when setting the rotation of a display, including the
way that a FramebufferDisplay sets rotation _AFTER_ initial construction,
first with a stack like
```
#0  supervisor_start_terminal
#1  in displayio_display_core_construct
#2  in common_hal_framebufferio_framebufferdisplay_construct
```
and then with a stack like
```
#0  supervisor_start_terminal
#1  in common_hal_framebufferio_framebufferdisplay_construct
#2  in framebufferio_framebufferdisplay_make_new
```
.. without an intervening stop_terminal call.

For reasons I didn't fully explore, this did not become a problem until
the ability to re-allocate a freed supervisor allocation was
implemented in adafruit#3482.  Demonstrating the problem requires adafruit#3482 + adafruit#3497
+ this PR.
When allocations were freed in a different order from the reverse of how they were allocated (leaving holes), the heap would get into an inconsistent state, eventually resulting in crashes.

free_memory() relies on having allocations in order, but allocate_memory() did not guarantee that: It reused the first allocation with a NULL ptr without ensuring that it was between low_address and high_address. When it belongs to a hole in the allocated memory, such an allocation is not really free for reuse, because free_memory() still needs its length.

Instead, explicitly mark allocations available for reuse with a special (invalid) value in the length field. Only allocations that lie between low_address and high_address are marked that way.
This requires recovering the pointer of the allocation, which could be done by adding up neighbor lengths, but the simpler way is to stop NULLing it out in the first place and instead mark an allocation as freed by the client by setting the lowest bit of the length (which is always zero in a valid length).
@cwalther
Copy link
Author

cwalther commented Oct 2, 2020

Found a small improvement for allocate_memory(): When there is some free space in the middle, but not enough, but there is a matching hole on the other side of it, use the hole instead of giving up.

This is unlikely to matter in current usage, as the only thing that is ever allocated on the high side is the stack, but my current work-in-progress might need it.

(Plus rebased the first two commits, unmodified.)

@tannewt tannewt requested a review from jepler October 2, 2020 22:39
@jepler
Copy link

jepler commented Oct 7, 2020

I did further testing and found some other problems around supervisor allocations and displays, which I'm trying to address in #3498. I think the fix in #3498 has to go in when this does, or there will be a new bug at least with FramebufferDisplays.

@cwalther
Copy link
Author

cwalther commented Oct 7, 2020

OK. The more generic system of movable allocations that I’m currently working on should give us better tools to meet the requirements of both sharpdisplay (#3498) and rgbmatrix (#3499), and I intend to do that as a proof of concept before I consider it ready, so from that point of view that might obsolete at least part of #3498. That rework should probably not go into the 6.0 release at this point though. So if you want to merge both this and #3498 before the release, that’s fine with me. I’ll take a look at the new bug you’re describing there.

@jepler
Copy link

jepler commented Oct 7, 2020

Thanks, doing further improvements to the supervisor heap in an additional PR is probably a good plan.

@cwalther
Copy link
Author

cwalther commented Oct 9, 2020

As per my analysis in #3498 (comment), I concur that this PR should only go in together with #3498 (or an alternative fix for the double supervisor_start_terminal issue).

jepler added a commit to jepler/circuitpython that referenced this pull request Oct 12, 2020
…rminal

A call to supervisor_start_terminal remained in
common_hal_displayio_display_construct and was copied to other display
_construct functions, even though it was also being done in
displayio_display_core_construct when that was factored out.

Originally, this was harmless, except it created an extra allocation.
When investigating adafruit#3482, I found that this bug became harmful,
especially for displays that were created in Python code, because it
caused a supervisor allocation to leak.

I believe that it is safe to merge adafruit#3482 after this PR is merged.
@jepler
Copy link

jepler commented Oct 12, 2020

OK, we had #3542 go in. @cwalther shouldn't that fix the only problem we've identified that prevented us from merging this PR before now?

@cwalther
Copy link
Author

I can still trigger a double call to supervisor_start_terminal(), and thereby the leak, by constructing two displays. That requires setting CIRCUITPY_DISPLAY_LIMIT to 2, which in the wild is only the case on the monster_m4sk board. Is that a relevant case?

Copy link

@jepler jepler left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK we got the problems this exposed fixed and merged, so this is good to go. Thanks for turning up and helping us (me) understand the issues!

@jepler jepler merged commit 3c05a8c into adafruit:main Oct 17, 2020
@cwalther
Copy link
Author

Thanks for the thorough reviews!

jepler added a commit to jepler/circuitpython that referenced this pull request Oct 22, 2020
This is enabled by adafruit#3482

I was unable to determine why previously I had added sizeof(void*)
to the GC heap allocation, so I removed that code as a mistake.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants