Skip to content

lib/libm/math: Add fabsf function. #3590

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 1 commit into from
Closed

Conversation

aykevl
Copy link
Contributor

@aykevl aykevl commented Jan 31, 2018

See: tralamazza#122

This translates to this very short sequence (ARM Cortex M0):

00007b2c <fabs_funcf>:
    7b2c:       0040            lsls    r0, r0, #1
    7b2e:       0840            lsrs    r0, r0, #1
    7b30:       4770            bx      lr

Some background: https://www.doc.ic.ac.uk/~eedwards/compsys/float/nan.html

@dpgeorge
Copy link
Member

dpgeorge commented Feb 1, 2018

This looks fine, but it's strange that it's needed for Cortex M0... on the micro:bit port it seems that the compiler is generating the code for fabsf inline, so it must know what the definition is.

@glennrub
Copy link
Contributor

glennrub commented Feb 1, 2018

This PR is triggered by the issue raised in the nrf51/52 port. After some investigation today, i see that we do not really need this patch as described in the issue tracker in the nrf port. However, i believe this might still be a valid patch.

The reason for this function missing during linking was wrong flags to the compiler for nrf51 (cortex-m0), setting "-fno-builtins". Removing this flag solves the problem and fabsf is found. And now it starts getting interesting. Because, if -fno-builtins is removed, and this patch is not applied, it will increase the size of the .text by approximatly 1.2k .text.

More investigation also shows that this can benefit cortex-m4 as well with approximately the same decrease of .text usage. If -fno-builtin is enabled on cortex-m4, it lacks two new float functions, one is already present in math.c and can be enabled (copysignf), the other is nanf. By implementing a dummy function of this i get the following results:

NRF52/pca10040 (cortex-m4)

-fno-builtin not set:

   text	   data	    bss	    dec	    hex	filename
 137764	    280	  35668	 173712	  2a690	build-pca10040/firmware.elf

fabsf() patch + nanf() dummy + copysignf() enabled + -fno-builtin:

   text	   data	    bss	    dec	    hex	filename
 136504	    280	  35668	 172452	  2a1a4	build-pca10040/firmware.elf

NRF51/microbit (cortex-m0)

-fno-builtin not set:

   text	   data	    bss	    dec	    hex	filename
 158384	    252	  13104	 171740	  29edc	build-microbit/firmware.elf

fabsf() + -fno-builtin:

   text	   data	    bss	    dec	    hex	filename
 157148	    256	  13104	 170508	  29a0c	build-microbit/firmware.elf

Summarized:
NRF52 .text decrease = 1260 (not including full implementation of nanf)
NRF51 .text decrease = 1236

@aykevl aykevl force-pushed the fabsf branch 2 times, most recently from cafbf84 to e22ea4d Compare February 1, 2018 22:44
@aykevl
Copy link
Contributor Author

aykevl commented Feb 1, 2018

I've changed the PR a bit and added nanf.

nRF52, before the change:

   text	   data	    bss	    dec	    hex	filename
 137752	    280	  35668	 173700	  2a684	build-pca10040/firmware.elf

nRF52, with this patch, copysignf enabled and -fno-builtin:

   text	   data	    bss	    dec	    hex	filename
 136504	    280	  35668	 172452	  2a1a4	build-pca10040/firmware.elf

So for me it saves 1248 bytes.

@dpgeorge
Copy link
Member

dpgeorge commented Feb 2, 2018

More investigation also shows that this can benefit cortex-m4 as well with approximately the same decrease of .text usage. If -fno-builtin is enabled on cortex-m4 ....

According to the gcc docs, -fno-builtin will disable all builtins, like memcpy, alloca and others. So that could be what is leading to a significant change in code size with that option enabled. It looks like you can disable specific functions, eg -fno-builtin-fabsf. I'd suggest trying that to isolate the issue with fabs/nanf and see if providing custom ones makes the code smaller.

But, regardless of the outcome of such an investigation, including these functions in the repo here is a good idea. They are obviously useful in cases that the compiler/toolchain doesn't provide them. The main question is: does including them in math.c mean they will be always used even if the compiler provides a built-in version? Eg does the output of the stm32 build change with this patch?

@dhalbert
Copy link
Contributor

dhalbert commented Feb 2, 2018

The main question is: does including them in math.c mean they will be always used even if the compiler provides a built-in version? Eg does the output of the stm32 build change with this patch?

In the atmel-samd port in CircuitPython, we only include SRC_LIBM if a make flag INTERNAL_LIBM is set:

ifeq ($(INTERNAL_LIBM),1)
SRC_LIBM = $(addprefix lib/,\
	libm/math.c \
...
	libm/atan2f.c \
	)
endif

This guarantees inclusion or not. Elsewhere in the Makefile we have:

# Use toolchain libm if we're not using our own.
ifndef INTERNAL_LIBM
LIBS += -lm
endif

We found that using the MPy libm saved a lot of space over the toolchain libm. But we had to make sure all double-precision computations were removed so as not to drag in the software implementations of double-precision. See adafruit#325.

@aykevl
Copy link
Contributor Author

aykevl commented Feb 18, 2018

I've discovered something interesting: this has only a major effect when LTO is enabled.

before this PR with builtins this PR without builtins without FP (-no-builtin)
M0 with LTO 141000 139736 118636 (default for nrf51)
M0 without LTO 140196 140188 118404
M4 with LTO 137364 (default for nrf52) 136168* 119580
M4 without LTO 136252 136216* 118900

* I had to implement my own copysignf that may work but is untested.

As you can see, without LTO this PR (and avoiding builtins) only brings a small improvement. But when LTO is enabled, it brings a much bigger improvement: ~1.2K on the nrf52 (M4) and ~1.3K on the nrf51 (M0). So somehow, link-time optimisation affects this, making it much harder to find the root cause.

Additionally, it looks like LTO actually increases code size on both the nrf52 (M4) and nrf51. I'm sure I've seen a decrease before - not sure why it's increasing now.

tannewt added a commit to tannewt/circuitpython that referenced this pull request Oct 26, 2020
@dpgeorge
Copy link
Member

Closing due to inactivity (and the nrf boards build OK at the moment without this addition).

If these functions are needed in the future then we can add them then.

@dpgeorge dpgeorge closed this May 11, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants