Skip to content

Metro-M4 problems with gcc 7.2.1 #500

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
jerryneedell opened this issue Dec 24, 2017 · 9 comments · Fixed by #506
Closed

Metro-M4 problems with gcc 7.2.1 #500

jerryneedell opened this issue Dec 24, 2017 · 9 comments · Fixed by #506
Assignees
Milestone

Comments

@jerryneedell
Copy link
Collaborator

After updating my linux system to the latest arm-none-eabi-gcc toolset (7.2.1) I have been experiencing some odd behavior when executing a build of CP 3.0 master on my Metro-m4-express.

The compile and upload run with no problems and I can enter the REPL on the M4. When I execute some scripts, the system will run for awhile then apparently hang and after about 30 seconds, it disconnects the USB port. It can be restored by a RESET.

I recompiled CP 3.0 master with gcc 6.3.1 and everything works properly.
I have compiled CP 2.x for several M0 boards with gcc 7.2.1 and have not seen any problems.
Also I have compiled CP 3.0 master for the feather nrf52 board (also an M4) and have not experienced any problems (but my testing of it has been minimal).

Note that CP 3.0 master compiled with gcc 7.2.1 for the metro-m0-express also works without any problems.

I am attaching 2 scripts that cause hangs - the BMP280_test.py runs an report 3 full samples then on the 4th sample it hangs after reporting the Temperature. -- this is repeatable ...

the ring.py script displays a color wheel on a 16 element neopixel ring. It hangs immediately and none of the neopixels even get illuminated.

I have also caused it to hang via the REPL by setting up the BMP280 and making repeade reads. After several successful reads, it hung as before.

bmp280_test:

import board
import digitalio
import busio
import time
import adafruit_bmp280

# Create library object using our Bus I2C port
i2c = busio.I2C(board.SCL, board.SDA)
bmp280 = adafruit_bmp280.Adafruit_BMP280_I2C(i2c)

# OR create library object using our Bus SPI port
#spi = busio.SPI(board.SCK, board.MOSI, board.MISO)
#bmp_cs = digitalio.DigitalInOut(board.D10)
#bmp280 = adafruit_bmp280.Adafruit_BMP280_SPI(spi, bmp_cs)

# change this to match the location's pressure (hPa) at sea level
bmp280.seaLevelhPa = 1013.25

while True:
    print("\nTemperature: %0.1f C" % bmp280.temperature)
    print("Pressure: %0.1f hPa" % bmp280.pressure)
    print("Altitude = %0.2f meters" % bmp280.altitude)
    time.sleep(2)

ring.py:

 
from digitalio import *
from board import *
import neopixel
import time
 
pixpin = D2
numpix = 16
 
#led = DigitalInOut(D13)
#led.direction = Direction.OUTPUT
 
strip = neopixel.NeoPixel(pixpin, numpix, brightness=0.2,auto_write=False)
 
 
def wheel(pos):
    # Input a value 0 to 255 to get a color value.
    # The colours are a transition r - g - b - back to r.
    if (pos < 0):
        return (0, 0, 0)
    if (pos > 255):
        return (0, 0, 0)
    if (pos < 85):
        return (int(pos * 3), int(255 - (pos*3)), 0)
    elif (pos < 170):
        pos -= 85
        return (int(255 - pos*3), 0, int(pos*3))
    else:
        pos -= 170
        return (0, int(pos*3), int(255 - pos*3))
 
def rainbow_cycle(wait):
    for j in range(255):
        for i in range(len(strip)):
            idx = int ((i * 256 / len(strip)) + j)
            strip[i] = wheel(idx & 255)
        strip.show()
        time.sleep(wait)
 
try:
    while True:        
        rainbow_cycle(0.001)

except:
    pass

finally:
    for i in range(len(strip)):
        strip[i] = (0,0,0)
    strip.show()
@dhalbert
Copy link
Collaborator

I kept pruning the program back. The problem appears to be much more fundamental, and appears to have to do with storage allocation or garbage collection. This program hangs around i=479:

for i in range(10000):
    print(i)
    l = [0]*50

Interestingly, it doesn't fail in the repl, but does fail if named as, say gcc7crash.py and then I do import gcc7crash. Don't make it main.py: it will crash before you get to the REPL, and CIRCUITPY will be nonfunctional quickly. I had to reload a gcc6 build to make it not be main.py.

@tannewt
Copy link
Member

tannewt commented Dec 27, 2017

Have you tried this with a debugger hooked up to see where it is in C land?

@dhalbert
Copy link
Collaborator

going to do that - just wanted to redirect where to look. It's doesn't seem like it's IO or USB or something like that. I have a feeling it's something that got optimized out or misplaced even though it's "asm"d or something like that.

@tannewt tannewt added this to the 3.0 Beta milestone Dec 27, 2017
@tannewt
Copy link
Member

tannewt commented Dec 27, 2017

Yup, thats my feeling too. I'll assign to you.

@dhalbert
Copy link
Collaborator

(this diagnosis filed as issue in (also filed in micropython#3526)

We recently starting using the gcc-arm-none-eabi-7-2017-q4-major toolchain, which uses gcc 7.2.1. We also use -flto. On M0+, it works OK, but on M4, I found an issue having to do with mp_call_function_1_protected(), nlr_push() and nlr_push_tail(). The assembly code emitted is below. Note that mp_call_function_1_protected() saves r0 in r3 while it calls nlr_push(). nlr_push() is OK, but nlr_push_tail() thinks that it's OK to use r3 as a scratch register, and smashes it.

The ARM calling conventions are that r0-r3 are for arg passing, and r4 and up are for locals.

In the M0 assembly code, and in non-LTO compilations, r0 gets saved in r4, which is safer.

I'm not sure if this is an actual compiler bug or simply a pathological case due to asm-jumping from a naked function into a regular one (nlr_push_tail), and the optimizer or register allocator don't notice.

If I put in a regular call to something else before nlr_push, then the problem goes away (E.g., when I put in a mp_printf()!).

I'm writing this down as a warning to others for now. I have to think of some way to fix this. I was thinking about using local register variables and rewriting nlr_push as C code that stores the registers into nlr_buf. Or I can just put in a dummy call as mentioned above to force safer code to be generated. Comments welcome.

00006bcc <mp_call_function_1_protected>:
    6bcc:	b500      	push	{lr}
    6bce:	b08d      	sub	sp, #52	; 0x34
    6bd0:	4603      	mov	r3, r0
    6bd2:	4668      	mov	r0, sp
    6bd4:	f7ff fbe8 	bl	63a8 <nlr_push>
    6bd8:	b938      	cbnz	r0, 6bea <mp_call_function_1_protected+0x1e>
    6bda:	4618      	mov	r0, r3
    6bdc:	f7ff ffeb 	bl	6bb6 <mp_call_function_1>
    6be0:	f7ff fbd2 	bl	6388 <nlr_pop>
    6be4:	b00d      	add	sp, #52	; 0x34
    6be6:	f85d fb04 	ldr.w	pc, [sp], #4
    6bea:	9801      	ldr	r0, [sp, #4]
    6bec:	f01a fe76 	bl	218dc <mp_obj_print_exception.constprop.61>
    6bf0:	e7f8      	b.n	6be4 <mp_call_function_1_protected+0x18>
000063a8 <nlr_push>:
    63a8:	60c4      	str	r4, [r0, #12]
    63aa:	6105      	str	r5, [r0, #16]
    63ac:	6146      	str	r6, [r0, #20]
    63ae:	6187      	str	r7, [r0, #24]
    63b0:	f8c0 801c 	str.w	r8, [r0, #28]
    63b4:	f8c0 9020 	str.w	r9, [r0, #32]
    63b8:	f8c0 a024 	str.w	sl, [r0, #36]	; 0x24
    63bc:	f8c0 b028 	str.w	fp, [r0, #40]	; 0x28
    63c0:	f8c0 d02c 	str.w	sp, [r0, #44]	; 0x2c
    63c4:	f8c0 e008 	str.w	lr, [r0, #8]
    63c8:	f7ff bfe6 	b.w	6398 <nlr_push_tail>
00006398 <nlr_push_tail>:
    6398:	4b02      	ldr	r3, [pc, #8]	; (63a4 <nlr_push_tail+0xc>)
    639a:	689a      	ldr	r2, [r3, #8]
    639c:	6002      	str	r2, [r0, #0]
    639e:	6098      	str	r0, [r3, #8]
    63a0:	2000      	movs	r0, #0
    63a2:	4770      	bx	lr
    63a4:	1228      	asrs	r0, r5, #8
    63a6:	2002      	movs	r0, #2

@dhalbert
Copy link
Collaborator

I believe I have fixed this by using the asm(...) "clobbers" setting to mark r1, r2, and r3 as clobbered, so that the compiler won't use r3 as a temp save register. Note: don't mark r0 as clobbered; that doesn't work and causes odd behavior.

Other things I tried included removing the naked attribute on nlr_push() and combinding nlr_push() and nlr_push_tail(). That didn't work. Also declaring the registers as C: int register r1 asm("r1"); to make them appear to be used didn't work.

The best way to look at the assembly code is to use objdump, because you can see the absolute final assembler (better and easier than gcc -S)

arm-none-eabi-objdump -D -Mforce-thumb build-metro_firmware.elf >firmware.asm

@dhalbert
Copy link
Collaborator

dhalbert commented Dec 31, 2017

Code after fix:

00006bd0 <mp_call_function_1_protected>:
    6bd0:	b530      	push	{r4, r5, lr}
    6bd2:	b08d      	sub	sp, #52	; 0x34
    6bd4:	4604      	mov	r4, r0
    6bd6:	4668      	mov	r0, sp
    6bd8:	460d      	mov	r5, r1
    6bda:	f7ff fbe7 	bl	63ac <nlr_push>
    6bde:	b938      	cbnz	r0, 6bf0 <mp_call_function_1_protected+0x20>
    6be0:	4629      	mov	r1, r5
    6be2:	4620      	mov	r0, r4
    6be4:	f7ff ffe9 	bl	6bba <mp_call_function_1>
    6be8:	f7ff fbd0 	bl	638c <nlr_pop>
    6bec:	b00d      	add	sp, #52	; 0x34
    6bee:	bd30      	pop	{r4, r5, pc}
    6bf0:	9801      	ldr	r0, [sp, #4]
    6bf2:	f01a fe79 	bl	218e8 <mp_obj_print_exception.constprop.61>
    6bf6:	e7f9      	b.n	6bec <mp_call_function_1_protected+0x1c>
000063ac <nlr_push>:
    63ac:	60c4      	str	r4, [r0, #12]
    63ae:	6105      	str	r5, [r0, #16]
    63b0:	6146      	str	r6, [r0, #20]
    63b2:	6187      	str	r7, [r0, #24]
    63b4:	f8c0 801c 	str.w	r8, [r0, #28]
    63b8:	f8c0 9020 	str.w	r9, [r0, #32]
    63bc:	f8c0 a024 	str.w	sl, [r0, #36]	; 0x24
    63c0:	f8c0 b028 	str.w	fp, [r0, #40]	; 0x28
    63c4:	f8c0 d02c 	str.w	sp, [r0, #44]	; 0x2c
    63c8:	f8c0 e008 	str.w	lr, [r0, #8]
    63cc:	f7ff bfe6 	b.w	639c <nlr_push_tail>
    63d0:	2000      	movs	r0, #0
0000639c <nlr_push_tail>:
    639c:	4b02      	ldr	r3, [pc, #8]	; (63a8 <nlr_push_tail+0xc>)
    639e:	689a      	ldr	r2, [r3, #8]
    63a0:	6002      	str	r2, [r0, #0]
    63a2:	6098      	str	r0, [r3, #8]
    63a4:	2000      	movs	r0, #0
    63a6:	4770      	bx	lr
    63a8:	1228      	asrs	r0, r5, #8
    63aa:	2002      	movs	r0, #2

@jerryneedell
Copy link
Collaborator Author

Is any of this specific to the M4 ? Rephrased, is there some reason why the M0 code does not generate these errors or were we just lucky.

@dhalbert
Copy link
Collaborator

It should be just about the same (both are thumb), but it's a different cpu setting, so maybe there's some difference in the cpu description tables. I did check the asm for both, and the M0 didn't have the mistake that the M4 compilation did. I may look further at some point.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants