-
-
Notifications
You must be signed in to change notification settings - Fork 8.2k
py/mkrules.mk: workaround fused multiply-add inaccuracy #5995
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Can you fix codestyle (run tools/codeformat.py), add tests for this exact behavior, and review why the current float_parse tests fail on some platforms? |
On arm64, `print(float('.' + '9' * 70))` resulted in `1.000000000000001` rather that `1.0`. This is because dec_val = 10 * dec_val + dig; was converted into a single fused multiply-add instruction (fmadd) rather than two (mulsd and addsd on x86), thus caused a slightly different rounding loss when `dec_val` is about to overflow. The parsing process of `'.' + '9' * 70` would look like this on x86 dec_val *(uint64_t*)&dec_val exp_extra ... 10000000000000003053453312.000000 45208b2a2c280292 -25 100000000000000021944598528.000000 4554adf4b7320336 -26 1000000000000000288165462016.000000 4589d971e4fe8404 -27 ... and on arm64 ... 10000000000000003053453312.000000 45208b2a2c280292 -25 100000000000000039124467712.000000 4554adf4b7320337 -26 1000000000000000425604415488.000000 4589d971e4fe8405 -27 ... Note the difference when `exp_extra` = -26. Unfortunately there are no easy ways to fix this problem [1], furthermore GCC does not recognize `#pragma STDC FP_CONTRACT OFF`, while it does enable FP_CONTRACT as soon as -O2 (or higher). The only feasible solution appears to me is only `-ffp-contract=off`. [1] https://stackoverflow.com/a/42134261 [2] https://lists.freedesktop.org/archives/libreoffice/2017-December/079046.html
Come up with another solution. Now tests/float/float_parse.py passes on arm64. |
Just for cross-reference, this looks like the same code that is causing #5831 |
Thanks for the patch, but I don't think this is the right way to fix the problem. The |
If you want to work around the issue for ARM64 builds, then it's possible to pass |
Thanks and I see. But I think I'd rather add that flag in makefile to prevent those problems before someone does actually come up with a bugfree |
These flags put in But, do these flags fix more cases of floating port parsing then they break? Previous attempts to fix the |
At lease it does not break any tests under tests/. It should be enabled for all systems since there may be more fused operations hidden in the code, which are both hard to point out and somewhat impossible to avoid, and x86 SIMD has fused instructions. Anyway it's all up to you and I'd happy to see any real fix. |
But are these operations bad? Some systems may be relying on them in other code (eg FFT) and having the option in The usual case is that it is up to a port (eg unix port) to apply relevant compiler flags depending on the target system/arch/OS/etc. To make progress here it'd be good to be able to reproduce the issue, eg using a virtualbox image or qemu. |
#include <math.h>
#include <stdio.h>
void __attribute__((optimize("-O0"))) madd1(double a[3][2], double * restrict d) {
for (int i = 0; i < 2; i++) {
d[i] = a[0][i] * a[1][i] + a[2][i];
}
}
void __attribute__((optimize("-O3"), target("fma"))) madd2(double a[3][2], double * restrict d) {
for (int i = 0; i < 2; i++) {
d[i] = a[0][i] * a[1][i] + a[2][i];
}
}
int __attribute__((optimize("-O0"))) main(){
double a[3][2], d1[2], d2[2];
a[1][0] = 10;
a[2][0] = 9;
d2[0] = 9;
for (int j = 2; j < 60; j++) {
a[0][0] = d2[0];
madd1(a, d1);
madd2(a, d2);
printf("%s %.20f %.20f\n", d1[0] == d2[0] ? "yay" : "nah", d1[0] / pow(10, j), d2[0] / pow(10, j));
}
} Abused SIMD to demonstrate FMA inaccuracy. But since floating-point arithmetic comes with more or less error, it's merely one's preference to choose between speed or cross-platform consistency. |
I was able to reproduce this issue on Aarch64 (RPi 3 Model B+ running in 64-bit mode). But I was also able to find a floating point number that is parsed correctly with fp-contraction enabled, but then incorrectly when fp-contraction is disabled. Ie the commit here fixes some floats but breaks others. Eg the test: import array
print(bytes(array.array('d', [float("." + "9" * 70)])))
print(bytes(array.array('d', [float("52306490527514614E49")]))) run under CPython 3.8.2 gives:
under MicroPython unix port (current master) which does have fp contraction:
(note that first is wrong, second is right) and under MicroPython unix port (current master) with fp contraction turned off:
(note that first is now right, but second is now wrong). The function |
For an alternative to this, which attempts to fix the FP parsing routine, see #6024 . This has not yet been tested on Aarch64 though. |
As mentioned above, this PR fixes some float parsing instances but breaks others. Better to work on a solution like #6024. |
On arm64,
print(float('.' + '9' * 70))
resulted in1.000000000000001
rather that1.0
.This is because
dec_val = 10 * dec_val + dig;
was converted into a single fused multiply-add instruction (fmadd) rather than two (mulsd and addsd on x86), thus caused a slightly different rounding loss when
dec_val
is about to overflow.The parsing process of
'.' + '9' * 70
would look like this on x8610000000000000003053453312.000000 45208b2a2c280292 -25
100000000000000021944598528.000000 4554adf4b7320336 -26
1000000000000000288165462016.000000 4589d971e4fe8404 -27
and on arm64
10000000000000003053453312.000000 45208b2a2c280292 -25
100000000000000039124467712.000000 4554adf4b7320337 -26
1000000000000000425604415488.000000 4589d971e4fe8405 -27
Detect the overflow earily to avoid the rounding loss.