Skip to content

Conversation

jepler
Copy link

@jepler jepler commented Jun 6, 2018

.. setting it based on the ad-hoc stack pointer calculation of mp_stack_ctrl_init() meant that the stack used above main() counts against the 1KiB safety factor that the mp_stack_set_limit call tries to establish. It turns out, at least on M4, that over half of the safety factor is used up by stack-above-main()!

In the case of the basics/gen_stack_overflow.py test, which blows the stack on purpose, it turns out that gc would be called while handling the "maximum recursion depth exceeded" error, and this needed more stack than was left.

Closes: #900

.. setting it based on the ad-hoc stack pointer calculation of
mp_stack_ctrl_init() meant that the stack used above main() counts
against the 1KiB safety factor that the mp_stack_set_limit call tries
to establish.  It turns out, at least on M4, that over half of the
safety factor is used up by stack-above-main()!

In the case of the basics/gen_stack_overflow.py test,
which blows the stack on purpose, it turns out that gc would be called
while handling the "maximum recursion depth exceeded" error, and this
needed more stack than was left.

Closes: adafruit#900
Copy link
Member

@tannewt tannewt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Epic find @jepler ! Any idea what is above the stack? Maybe we're not setting it up right. Either way this is an improvement!

@tannewt tannewt merged commit aa63684 into adafruit:master Jun 6, 2018
@jepler
Copy link
Author

jepler commented Jun 6, 2018

hmmmm.. I am not a pro at reading ARM assembly yet but it looks like main is allocating quite a bit of stack:

(gdb) disas main
Dump of assembler code for function main:
   0x0000fb9c <+0>:	stmdb	sp!, {r4, r5, r6, r7, r8, r9, r10, r11, lr}
...
   0x0000fbda <+62>:	sub.w	sp, sp, #612	; 0x264

somewhat confirmed by looking at the stack pointer on arrival in main:

Breakpoint 2, main () at ../../main.c:236
236	int __attribute__((used)) main(void) {
1: $sp = (void *) 0x2002fff0
(gdb) p _estack
$3 = {<text variable, no debug info>} 0x2002fff8

so we start just 8 bytes below estack. I am guessing that this local array is the main culprit:

        if (!have_boot_py && f_open(fs, boot_output_file, CIRCUITPY_BOOT_OUTPUT_FILE, FA_READ) == FR_OK) {
            char file_contents[512];
            UINT chars_read = 0;
            f_read(boot_output_file, file_contents, 512, &chars_read);

indeed, when getting rid of CIRCUITPY_BOOT_OUTPUT_FILE, the stack size of main shrinks quite a bit:

   0x0001755e <+62>:	sub	sp, #116	; 0x74

but that code was new in May so I don't know if this yet explains the Rosie CI failures you have alluded to, which I thought were happening for a lot longer than a month...

@dhalbert
Copy link
Collaborator

dhalbert commented Jun 6, 2018

main() reads the first 512 bytes of boot_out.txt to see whether the version information in the file is the same as what would be written to boot_out.txt. This could be reduced to, say, 160 bytes, without harm. But I thought C semantics is that the lifetime of file_contents[] would be just inside the if { } block. This could be checked by looking at the stack pointer for a call inside the if { } block vs just after it.

@dhalbert
Copy link
Collaborator

dhalbert commented Jun 6, 2018

I guess the maximum stack used by main() is the important thing.

@jepler
Copy link
Author

jepler commented Jun 6, 2018

I agree that under the semantics of C, the storage duration of file_contents is limited to the body of the if() block. But I think gcc is doing something clever to minimize the number of operations on the stack pointer, by finding the largest amount of storage that is ever required, and adjusting the stack pointer just once based on that value. This is probably controlled by this optimizer flag, which is turned on at any optimization level:

'-fcombine-stack-adjustments'
     Tracks stack adjustments (pushes and pops) and stack memory
     references and then tries to find ways to combine them.

     Enabled by default at '-O1' and higher.

@jepler jepler deleted the stack-overflow-crash branch August 2, 2018 01:23
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants