lib/libm/math: Add fabsf function. #3590

aykevl · 2018-01-31T21:27:25Z

This translates to this very short sequence (ARM Cortex M0):

00007b2c <fabs_funcf>:
    7b2c:       0040            lsls    r0, r0, #1
    7b2e:       0840            lsrs    r0, r0, #1
    7b30:       4770            bx      lr

Some background: https://www.doc.ic.ac.uk/~eedwards/compsys/float/nan.html

dpgeorge · 2018-02-01T12:29:24Z

This looks fine, but it's strange that it's needed for Cortex M0... on the micro:bit port it seems that the compiler is generating the code for fabsf inline, so it must know what the definition is.

glennrub · 2018-02-01T19:30:37Z

This PR is triggered by the issue raised in the nrf51/52 port. After some investigation today, i see that we do not really need this patch as described in the issue tracker in the nrf port. However, i believe this might still be a valid patch.

The reason for this function missing during linking was wrong flags to the compiler for nrf51 (cortex-m0), setting "-fno-builtins". Removing this flag solves the problem and fabsf is found. And now it starts getting interesting. Because, if -fno-builtins is removed, and this patch is not applied, it will increase the size of the .text by approximatly 1.2k .text.

More investigation also shows that this can benefit cortex-m4 as well with approximately the same decrease of .text usage. If -fno-builtin is enabled on cortex-m4, it lacks two new float functions, one is already present in math.c and can be enabled (copysignf), the other is nanf. By implementing a dummy function of this i get the following results:

NRF52/pca10040 (cortex-m4)

-fno-builtin not set:

   text	   data	    bss	    dec	    hex	filename
 137764	    280	  35668	 173712	  2a690	build-pca10040/firmware.elf

fabsf() patch + nanf() dummy + copysignf() enabled + -fno-builtin:

   text	   data	    bss	    dec	    hex	filename
 136504	    280	  35668	 172452	  2a1a4	build-pca10040/firmware.elf

NRF51/microbit (cortex-m0)

-fno-builtin not set:

   text	   data	    bss	    dec	    hex	filename
 158384	    252	  13104	 171740	  29edc	build-microbit/firmware.elf

fabsf() + -fno-builtin:

   text	   data	    bss	    dec	    hex	filename
 157148	    256	  13104	 170508	  29a0c	build-microbit/firmware.elf

Summarized:
NRF52 .text decrease = 1260 (not including full implementation of nanf)
NRF51 .text decrease = 1236

aykevl · 2018-02-01T22:44:50Z

I've changed the PR a bit and added nanf.

nRF52, before the change:

   text	   data	    bss	    dec	    hex	filename
 137752	    280	  35668	 173700	  2a684	build-pca10040/firmware.elf

nRF52, with this patch, copysignf enabled and -fno-builtin:

   text	   data	    bss	    dec	    hex	filename
 136504	    280	  35668	 172452	  2a1a4	build-pca10040/firmware.elf

So for me it saves 1248 bytes.

dpgeorge · 2018-02-02T01:28:51Z

More investigation also shows that this can benefit cortex-m4 as well with approximately the same decrease of .text usage. If -fno-builtin is enabled on cortex-m4 ....

According to the gcc docs, -fno-builtin will disable all builtins, like memcpy, alloca and others. So that could be what is leading to a significant change in code size with that option enabled. It looks like you can disable specific functions, eg -fno-builtin-fabsf. I'd suggest trying that to isolate the issue with fabs/nanf and see if providing custom ones makes the code smaller.

But, regardless of the outcome of such an investigation, including these functions in the repo here is a good idea. They are obviously useful in cases that the compiler/toolchain doesn't provide them. The main question is: does including them in math.c mean they will be always used even if the compiler provides a built-in version? Eg does the output of the stm32 build change with this patch?

dhalbert · 2018-02-02T04:18:44Z

The main question is: does including them in math.c mean they will be always used even if the compiler provides a built-in version? Eg does the output of the stm32 build change with this patch?

In the atmel-samd port in CircuitPython, we only include SRC_LIBM if a make flag INTERNAL_LIBM is set:

ifeq ($(INTERNAL_LIBM),1)
SRC_LIBM = $(addprefix lib/,\
	libm/math.c \
...
	libm/atan2f.c \
	)
endif

This guarantees inclusion or not. Elsewhere in the Makefile we have:

# Use toolchain libm if we're not using our own.
ifndef INTERNAL_LIBM
LIBS += -lm
endif

We found that using the MPy libm saved a lot of space over the toolchain libm. But we had to make sure all double-precision computations were removed so as not to drag in the software implementations of double-precision. See adafruit#325.

aykevl · 2018-02-18T02:02:32Z

I've discovered something interesting: this has only a major effect when LTO is enabled.

	before this PR with builtins	this PR without builtins	without FP (`-no-builtin`)
M0 with LTO	141000	139736	118636 (default for nrf51)
M0 without LTO	140196	140188	118404
M4 with LTO	137364 (default for nrf52)	136168*	119580
M4 without LTO	136252	136216*	118900

* I had to implement my own copysignf that may work but is untested.

As you can see, without LTO this PR (and avoiding builtins) only brings a small improvement. But when LTO is enabled, it brings a much bigger improvement: ~1.2K on the nrf52 (M4) and ~1.3K on the nrf51 (M0). So somehow, link-time optimisation affects this, making it much harder to find the root cause.

Additionally, it looks like LTO actually increases code size on both the nrf52 (M4) and nrf51. I'm sure I've seen a decrease before - not sure why it's increasing now.

fix CPB SPI pin definitions

dpgeorge · 2021-05-11T04:53:52Z

Closing due to inactivity (and the nrf boards build OK at the moment without this addition).

If these functions are needed in the future then we can add them then.

aykevl force-pushed the fabsf branch 2 times, most recently from cafbf84 to e22ea4d Compare February 1, 2018 22:44

lib/libm/math: Add fabsf and nanf functions.

e22ea4d

tannewt added a commit to tannewt/circuitpython that referenced this pull request Oct 26, 2020

Merge pull request micropython#3590 from jerryneedell/jerryn_cpb_spi

a899433

fix CPB SPI pin definitions

dpgeorge closed this May 11, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

lib/libm/math: Add fabsf function. #3590

lib/libm/math: Add fabsf function. #3590

Uh oh!

aykevl commented Jan 31, 2018 •

edited

Loading

Uh oh!

dpgeorge commented Feb 1, 2018

Uh oh!

glennrub commented Feb 1, 2018 •

edited

Loading

Uh oh!

aykevl commented Feb 1, 2018

Uh oh!

dpgeorge commented Feb 2, 2018

Uh oh!

dhalbert commented Feb 2, 2018 •

edited

Loading

Uh oh!

aykevl commented Feb 18, 2018

Uh oh!

dpgeorge commented May 11, 2021

Uh oh!

Uh oh!

Uh oh!

lib/libm/math: Add fabsf function. #3590

lib/libm/math: Add fabsf function. #3590

Uh oh!

Conversation

aykevl commented Jan 31, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

dpgeorge commented Feb 1, 2018

Uh oh!

glennrub commented Feb 1, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

aykevl commented Feb 1, 2018

Uh oh!

dpgeorge commented Feb 2, 2018

Uh oh!

dhalbert commented Feb 2, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

aykevl commented Feb 18, 2018

Uh oh!

dpgeorge commented May 11, 2021

Uh oh!

Uh oh!

aykevl commented Jan 31, 2018 •

edited

Loading

glennrub commented Feb 1, 2018 •

edited

Loading

dhalbert commented Feb 2, 2018 •

edited

Loading