Skip to content

M4 Express can deadlock on certain complex import chains #1283

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
klardotsh opened this issue Oct 18, 2018 · 5 comments
Closed

M4 Express can deadlock on certain complex import chains #1283

klardotsh opened this issue Oct 18, 2018 · 5 comments

Comments

@klardotsh
Copy link

klardotsh commented Oct 18, 2018

I don't have my board handy to provide a proper repro case right now, so I'll do what I can to describe the scenario until I can provide said repro case (and/or crack out my JLINK and just dive in):

  • Normally, when stack depth is exceeded (too many nested imports), a RuntimeError is raised (and if this happens in main.py, safe mode should be triggered)

  • It appears some cases of deeply nested import trees will bypass the RuntimeError entirely and simply lock the device. After a while, the serial console disconnects, and dmesgstarts complaining that it needs to reset the device, but that the USB device won't respond to addresses (meaning all execution of anything on the device has stopped, probably including the supervisor)

  • If this happens in a REPL, a simple reboot of the device (through the button on ItsyBitsy/Feather) gets you back to a safe state. If this happens in main.py, the board is soft-bricked until the internal flash is wiped, which requires a custom build of CircuitPython to be flashed over UF2 that forcibly recreates the internal filesystem, and then another flash of "actual stable" CircuitPython (lest your filesystem be wiped every boot from then on)

Some context:

  • CircuitPython 4.0.0-alpha2, atmel-samd SAMD51 (tested against Feather M4 Express)
  • The files in question vary in size anywhere from 2KB to 25KB
  • None of the files are frozen into the ROM, they're all copied onto the MSC flash device and compiled at import-time
  • Import depth of my worst case was about 9-10 (which is enough to cause a RuntimeError easily)

Also interestingly, updating the max stack size in boot.py does not fix this. Setting the value to anything below 650 results in the RuntimeError, anything over 700 and the modules don't have enough heap space to actually compile (I assume) and fail to import, anything in between and (if I recall correctly - this was a few days ago) I'd deadlock.

The project branch that triggered this is available here: https://github.com/KMKfw/kmk_firmware/tree/topic-planck-klaranck. In kmk/firmware.py I hack around this issue and things work - I believe removing the giant block at the top of the file (everything before Thanks for sticking around. Now let's do real work, starting below) may repro one or both of the symptoms described above when trying to use user_keymaps/klardotsh/klarank_featherm4.py as main.py

@tannewt
Copy link
Member

tannewt commented Oct 18, 2018

I've seen a similar failure when the internal C code creates a stack that's bigger than the allocated stack space. It then writes onto the heap and then the moment the overwritten object is referenced it can cause a hard fault. This case may be different though.

@tannewt tannewt added this to the 4.0.0 - Bluetooth milestone Oct 18, 2018
@dhalbert
Copy link
Collaborator

We could add MP_STACK_CHECK() before the import code to see if we could catch this. This "manually" checks for the stack overflowing into the guard region, I believe.

@dhalbert
Copy link
Collaborator

@klardotsh Do you have a commit in your repo that provokes the problem? I'd like to test against it.

@klardotsh
Copy link
Author

There's no commit for it explicitly (it got rebased away when I was squashing down my then-WIP branch), however removing everything above line 37 in https://github.com/KMKfw/kmk_firmware/blob/master/kmk/firmware.py (master branch of KMKfw/kmk_firmware) should trigger at least the RuntimeError - not sure if it repros the deadlock (it may?), and I didn't end up with time this weekend to construct an independent repro example, sadly.

Flashing instructions for KMK are available at https://github.com/KMKfw/kmk_firmware/blob/master/docs/flashing.md (it rsyncs over the kmk folder, a main.py - using USER_KEYMAP=user_keymaps/klardotsh/klarank_featherm4.py, you'll end up in the same state as what I use at home - as well as one dependency from micropython-lib, the string standard library polyfill)

If that doesn't repro, I'll try to assemble a specific and shrunken-down repro example this week.

@tannewt
Copy link
Member

tannewt commented Feb 5, 2019

I don't think this is an issue anymore because 1) we check to make sure the stack hasn't overwritten the heap now and go into safe mode if it does and 2) we can enter safe mode manually by clicking reset when the status neopixel is yellow.

@tannewt tannewt closed this as completed Feb 5, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants