Skip to content

gcc7 ARM M4 compiler change breaks nlr_push() and nlr_push_tail() #3526

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
dhalbert opened this issue Dec 29, 2017 · 13 comments
Closed

gcc7 ARM M4 compiler change breaks nlr_push() and nlr_push_tail() #3526

dhalbert opened this issue Dec 29, 2017 · 13 comments

Comments

@dhalbert
Copy link
Contributor

dhalbert commented Dec 29, 2017

We recently starting using the gcc-arm-none-eabi-7-2017-q4-major toolchain, which uses gcc 7.2.1. We also use -flto. On M0+, it works OK, but on M4, I found an issue having to do with mp_call_function_1_protected(), nlr_push() and nlr_push_tail(). The assembly code emitted is below. Note that mp_call_function_1_protected() saves r0 in r3 while it calls nlr_push(). nlr_push() is OK, but nlr_push_tail() thinks that it's OK to use r3 as a scratch register, and smashes it.

The ARM calling conventions are that r0-r3 are for arg passing, and r4 and up are for locals.

In the M0 assembly code, and in non-LTO compilations, r0 gets saved in r4, which is safer.

I'm not sure if this is an actual compiler bug or simply a pathological case due to asm-jumping from a naked function into a regular one (nlr_push_tail), and the optimizer or register allocator don't notice.

If I put in a regular call to something else before nlr_push, then the problem goes away (E.g., when I put in a mp_printf()!).

I'm writing this down as a warning to others for now. I have to think of some way to fix this. I was thinking about using local register variables and rewriting nlr_push as C code that stores the registers into nlr_buf. Or I can just put in a dummy call as mentioned above to force safer code to be generated. Comments welcome.

00006bcc <mp_call_function_1_protected>:
    6bcc:	b500      	push	{lr}
    6bce:	b08d      	sub	sp, #52	; 0x34
    6bd0:	4603      	mov	r3, r0
    6bd2:	4668      	mov	r0, sp
    6bd4:	f7ff fbe8 	bl	63a8 <nlr_push>
    6bd8:	b938      	cbnz	r0, 6bea <mp_call_function_1_protected+0x1e>
    6bda:	4618      	mov	r0, r3
    6bdc:	f7ff ffeb 	bl	6bb6 <mp_call_function_1>
    6be0:	f7ff fbd2 	bl	6388 <nlr_pop>
    6be4:	b00d      	add	sp, #52	; 0x34
    6be6:	f85d fb04 	ldr.w	pc, [sp], #4
    6bea:	9801      	ldr	r0, [sp, #4]
    6bec:	f01a fe76 	bl	218dc <mp_obj_print_exception.constprop.61>
    6bf0:	e7f8      	b.n	6be4 <mp_call_function_1_protected+0x18>
000063a8 <nlr_push>:
    63a8:	60c4      	str	r4, [r0, #12]
    63aa:	6105      	str	r5, [r0, #16]
    63ac:	6146      	str	r6, [r0, #20]
    63ae:	6187      	str	r7, [r0, #24]
    63b0:	f8c0 801c 	str.w	r8, [r0, #28]
    63b4:	f8c0 9020 	str.w	r9, [r0, #32]
    63b8:	f8c0 a024 	str.w	sl, [r0, #36]	; 0x24
    63bc:	f8c0 b028 	str.w	fp, [r0, #40]	; 0x28
    63c0:	f8c0 d02c 	str.w	sp, [r0, #44]	; 0x2c
    63c4:	f8c0 e008 	str.w	lr, [r0, #8]
    63c8:	f7ff bfe6 	b.w	6398 <nlr_push_tail>
00006398 <nlr_push_tail>:
    6398:	4b02      	ldr	r3, [pc, #8]	; (63a4 <nlr_push_tail+0xc>)
    639a:	689a      	ldr	r2, [r3, #8]
    639c:	6002      	str	r2, [r0, #0]
    639e:	6098      	str	r0, [r3, #8]
    63a0:	2000      	movs	r0, #0
    63a2:	4770      	bx	lr
    63a4:	1228      	asrs	r0, r5, #8
    63a6:	2002      	movs	r0, #2
@dhalbert dhalbert changed the title gcc7 ARM M4 compiler change with breaks nlr_push() and nlr_push_tail() gcc7 ARM M4 compiler change breaks nlr_push() and nlr_push_tail() Dec 29, 2017
@dhalbert
Copy link
Contributor Author

dhalbert commented Jan 2, 2018

I believe I've fixed this in CircuitPython by adding some "clobbered register" info to the asm(...) in `nlr_push(...).

    :                               // output operands
    : "r" (nlr)                     // input operands
    : "r1", "r2", "r3"        // clobbers

See adafruit#506 and adafruit#500.

@Cheaterman
Copy link

I've tried your workarounds (both using mp_printf() before the nlr_push() call and clobbering extra registers) ; the first didn't work, and I wasn't able to do the second.

I'm running into this issue on x86_64 with GCC 5.4.0. Please help :-)

@dhalbert
Copy link
Contributor Author

dhalbert commented Jan 15, 2018

I'm not sure what to suggest for gcc 5.4.0. We haven't done deveopment with our fork on gcc 5 or below. We were fine with gcc 6.3.1 and encountered the problem with gcc 7. If you could at least upgrade to gcc 6 the problem may go away.

@Cheaterman
Copy link

Hi dhalbert,

LoBo's tests seem to lead similar results:
loboris/MicroPython_ESP32_psRAM_LoBo#19

It might indeed be a good idea that I upgrade to 6.x.

Thanks!

@Cheaterman
Copy link

Cheaterman commented Jan 15, 2018

I just tried with gcc version 6.4.0 and the problem persists. I have no idea what is going on - however it is worth noting I'm using Gentoo Hardened which might add PIE to GCC. I'll get in touch with the community and see if they have any idea.

@stinos
Copy link
Contributor

stinos commented Jan 15, 2018

To get an idea about a proper fix in assembly code you could check out the disassembly of setjmp/longjmp as it is basically the same functionally

@Cheaterman
Copy link

Thanks stinos, this sounds like a reasonable idea. I'll try that and come back with what I find :-)

@Cheaterman
Copy link

I finally found it! It was SSP, not PIE. export CFLAGS_MOD='-fno-stack-protector' solves the issue :-)

I'm not sure if that's the right solution for upstream though?

@dhalbert
Copy link
Contributor Author

@Cheaterman could you open a different issue for this? Your problems sound different from the specific ARM M4 issues I reported in the original issue.

@Cheaterman
Copy link

Yes, I apologize for hijacking your issue @dhalbert. Opening a new issue right now!

@dpgeorge
Copy link
Member

I'm not sure if this is an actual compiler bug or simply a pathological case due to asm-jumping from a naked function into a regular one (nlr_push_tail), and the optimizer or register allocator don't notice.

My guess is that it's a limitation of the compiler/linker, that you can't use LTO and also inline asm.

Does the problem go away if you write nlr_push in pure assembler as a .s (or .S) file (using exactly the same code as in nlrthumb.c)?

I believe I've fixed this in CircuitPython by adding some "clobbered register" info to the asm(...) in `nlr_push(...).

According to the gcc docs this is not allowed, in naked functions you must only have basic asm statements, which means you can't use a clobbered register list; see https://gcc.gnu.org/onlinedocs/gcc/ARM-Function-Attributes.html

@dhalbert
Copy link
Contributor Author

Thanks for your comments. Extended asm is not allowed in naked functions, so says the doc, yet it does work, and even fixes the problem. (E pur si muove.) At this point I consider this a workaround. Because the M0 output does not have the problem, and the M4 does, I wonder if there's an issue with the gcc M4 architecture description, and we are hitting an edge case.

There has been some discussion of extended vs basic asm and clobbers on the gcc list, for instance: https://www.mail-archive.com/gcc@gcc.gnu.org/msg78703.html. But the "naked" attribute has not been part of the discussion, as far as I can see, except that it's an exception.

I'll try the .S file thing at some point, but I think I will bring up the issue on the gcc mailing list as well to clarify the issue. I have tried coming up with a small test case to file a gcc bug report, but didn't succeed (yet).

@dpgeorge
Copy link
Member

I think this has been fixed by 5591bd2

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants