Skip to content

Memory corruption: feather nrf52840 + sharp memory display 400x240 #3473

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
jepler opened this issue Sep 25, 2020 · 2 comments · Fixed by #3497
Closed

Memory corruption: feather nrf52840 + sharp memory display 400x240 #3473

jepler opened this issue Sep 25, 2020 · 2 comments · Fixed by #3497

Comments

@jepler
Copy link

jepler commented Sep 25, 2020

I have a large-ish program (RPN calculator with decimal arithmetic) that runs on this hardware combination. After reloading it an indeterminate number of times, I get an inexplicable traceback like

Auto-reload is on. Simply save files over USB to run them or enter REPL to disable.
code.py output:
Traceback (most recent call last):
  File "code.py", line 4, in <module>
  File "__init__.py", line 5433, in <module>
RuntimeError: Corrupt raw code

or I'll get a full hang after trying to enter safe mode. The safe mode entry looks like nonsense:

Breakpoint 1, reset_into_safe_mode (reason=reason@entry=HARD_CRASH)
    at ../../supervisor/shared/safe_mode.c:95
95	void __attribute__((noinline,)) reset_into_safe_mode(safe_mode_t reason) {
(gdb) n
96	    if (current_safe_mode > BROWNOUT && reason > BROWNOUT) {
(gdb) where
#0  reset_into_safe_mode (reason=reason@entry=HARD_CRASH)
    at ../../supervisor/shared/safe_mode.c:96
#1  0x0004457c in HardFault_Handler () at supervisor/port.c:347
#2  <signal handler called>
#3  0x00033226 in mp_map_lookup (map=0x200389a4, index=0x22080f00, lookup_kind=<optimized out>)
    at ../../py/obj.h:171
#4  0x000551bc in irq_handler (p_reg=0x200127ac <usb_callback>, instance_id=279173, 
    channel_count=283785) at nrfx/drivers/src/nrfx_rtc.c:308
#5  0x200389a4 in ?? ()
Backtrace stopped: previous frame identical to this frame (corrupt stack?)

As yet, I haven't captured any better debugging information than this. I also haven't distilled down a reproducer smaller than my full calculator program.

I have jumped to the possibly-incorrect conclusion that it's something about the Sharp display or framebufferio that allows memory corruption to occur during board reset, but I don't have proof yet. Nothing else that my program does is new core functionality.

@DavePutz
Copy link
Collaborator

@jepler , are you able to attach the program you are running so I can test?

@jepler
Copy link
Author

jepler commented Sep 29, 2020

Sadly my program is too big and ugly to post as-is. If I find a reduced test case I'll post it.

jepler added a commit to jepler/circuitpython that referenced this issue Oct 1, 2020
It was incorrect to NULL out the pointer to our heap allocated buffer in
`reset`, because subsequent to framebuffer_reset, but while
the heap was still active, we could call `get_bufinfo` again,
leading to a fresh allocation on the heap that is about to be destroyed.

Typical stack trace:
```
#1  0x0006c368 in sharpdisplay_framebuffer_get_bufinfo
#2  0x0006ad6e in _refresh_display
#3  0x0006b168 in framebufferio_framebufferdisplay_background
#4  0x00069d22 in displayio_background
adafruit#5  0x00045496 in supervisor_background_tasks
adafruit#6  0x000446e8 in background_callback_run_all
adafruit#7  0x00045546 in supervisor_run_background_tasks_if_tick
adafruit#8  0x0005b042 in common_hal_neopixel_write
adafruit#9  0x00044c4c in clear_temp_status
adafruit#10 0x000497de in spi_flash_flush_keep_cache
adafruit#11 0x00049a66 in supervisor_external_flash_flush
adafruit#12 0x00044b22 in supervisor_flash_flush
adafruit#13 0x0004490e in filesystem_flush
adafruit#14 0x00043e18 in cleanup_after_vm
adafruit#15 0x0004414c in run_repl
adafruit#16 0x000441ce in main
```
When this happened -- which was inconsistent -- the display would keep
some heap allocation across reset which is exactly what we need to avoid.

NULLing the pointer in reconstruct follows what RGBMatrix does, and that
code is a bit more battle-tested anyway.

If I had a motivation for structuring the SharpMemory code differently,
I can no longer recall it.

Testing performed: Ran my complicated calculator program over multiple
iterations without observing signs of heap corruption.

Closes: adafruit#3473
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants