-
-
Notifications
You must be signed in to change notification settings - Fork 8.2k
py/nlrthumb: Add returns_twice attribute to nlr_push #3889
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
After the change of using mp_load_method_protected in 7241d90 it looks like the code change triggered an issue with nlr_push when using -flto. How related this specific commit is to the issue is unclear. At least it is the commit that has been tracked down to be the first which made the issue visible. The reason for the behaviour is unknown and is only triggered when using LTO. The strange thing about the issue is that it is not consistent accross build targets. The code have been tested on both cortex-m0 (nrf51822) and cortex-m4 (nrf52832/nrf52840), and it only fails on cortex-m4. Adding the attribute "returns_twice" on the declaration of nlr_push seems to resolve the issue seen on cortex-m4. Toolchain used, arm-none-eabi-gcc 7.2.1. Also, a thanks to @aykevl for proposing to try this attribute.
@@ -36,7 +36,7 @@ | |||
// For reference, arm/thumb callee save regs are: | |||
// r4-r11, r13=sp | |||
|
|||
__attribute__((naked)) unsigned int nlr_push(nlr_buf_t *nlr) { | |||
__attribute__((naked)) __attribute__((returns_twice)) unsigned int nlr_push(nlr_buf_t *nlr) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nitpick: you can merge these attributes with __attribute__((naked,returns_twice))
.
For some background as to why this is needed, see: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=49243 Also see #3526, which may be the same bug (miscompilation with LTO in gcc 7 on Cortex-M4). |
Thanks for raising this issue. I did see kind-of-similar behaviour on SPARC where gcc would treat the built-in setjmp specially and if one renamed setjmp to something else (but with an equivalent implementation) then that would break uPy. Dissasembly there showed the issue and the fix was to simply save and restore more registers in the custom setjmp/longjmp handler. I think it's important to properly understand what is going on here with the Cortex-M4 and LTO. What is the difference in the disassembly with and without using the Also, wouldn't it make more sense to declare this attribute in nlr.h, so all the callers of nlr_push can see that it it'll return twice and so forbid the relevant optimisations? |
Thanks for the feedback and comments. I did an attempt on declaring the attribute in nlr.h and that gave me also a failing result. This give me 3 compilation scenarios:
For reference the code used is an upmerge of tralamazza/master at 187cb09 with micropython/master at 34344a4 As a note, the
Next, an user of these functions. I selected the Working - attribute in nlrthumb.c:
Failing - old/current code:
Failing - attribute in nlr.h:
|
@aykevl, you might be right that its the same as reported in #3526. I tested the patch proposed in that reference and it seems to solve the issue without the patch in this PR. @dpgeorge Should we close this PR and go for the "clobbered register" patch instead? For comparison, i'll add the disassembly of the
The |
Maybe we're on the wrong track. Maybe the simplest option is to simply disable LTO for this file only (nlrthumb.c) as it has given a lot of problems, similar to how gc.c uses -O3. It could be something else that happened to fix this bug, like a missing clobbered register or a GCC bug. I'm certainly a bit lost after I learned that putting it on the declaration results in yet another version of the output while breaking it again. Nonetheless I think the attribute should be added because it is intended for this purpose (double returns, setjmp like functions). |
Thanks @glennrub for the disassembly. It's obvious what the problem is: the failing So the question is, why does it not save these registers in the other cases? I guess because LTO is being too optimistic and assuming that
IIRC, from previous discussions it was concluded that a clobbered register list cannot appear here because it's a naked function.
If you can do that I think that's the best option at this point.
If we do that it should be added to the header nlr.h, and probably only for gcc builds. Also it would be good to see that this actually did something rather than nothing. |
The gcc documentation says that, but it actually works anyway, and we've been using that fix as I mentioned in #3526 in CircuitPython for several months: adafruit#500.
I'm not sure that would work, because LTO is a multiple file thing, but it's worth trying. (Maybe all the files have to be LTO or not). |
Not particularily easy: https://stackoverflow.com/questions/49018738/gcc-lto-limit-scope-of-optimization |
In my experience LTO and non-LTO object files can simply be linked together without problems. |
Looks like adding Some size comparisons on nrf52832: Could adding For completeness, this is the generated assembly if
source: https://gcc.gnu.org/wiki/LinkTimeOptimizationFAQ |
Yes, that's exactly what I was thinking about. |
I think this is the right solution but should be done on a per-port basis. Mainly because -fno-lto may not be accepted by non-gcc compilers. You should be able to add this to the port's Makefile:
Does that work? |
Yes, that worked. I guess keeping it in the port specific Makefile is the most neat way of handling this. Thanks to all for the help and guidance. |
Ok, sounds good. We should keep in mind this issue for future use of LTO (it's not yet used by stm32, but maybe one day...). |
After the change of using mp_load_method_protected in
7241d90 it looks like the code
change triggered an issue with nlr_push when using -flto. How
related this specific commit is to the issue is unclear. At least
it is the commit that has been tracked down to be the first which
made the issue visible.
The reason for the behaviour is unknown and is only triggered
when using LTO. The strange thing about the issue is that it is
not consistent accross build targets. The code have been tested
on both cortex-m0 (nrf51822) and cortex-m4 (nrf52832/nrf52840),
and it only fails on cortex-m4.
Adding the attribute "returns_twice" on the declaration of
nlr_push seems to resolve the issue seen on cortex-m4.
Toolchain used, arm-none-eabi-gcc 7.2.1.
Also, a thanks to @aykevl for proposing to try this attribute.