py/mkrules.mk: workaround fused multiply-add inaccuracy #5995

yangfl · 2020-05-01T07:08:30Z

On arm64, print(float('.' + '9' * 70)) resulted in 1.000000000000001 rather that 1.0.

This is because

dec_val = 10 * dec_val + dig;

was converted into a single fused multiply-add instruction (fmadd) rather than two (mulsd and addsd on x86), thus caused a slightly different rounding loss when dec_val is about to overflow.

The parsing process of '.' + '9' * 70 would look like this on x86

10000000000000003053453312.000000 45208b2a2c280292 -25
100000000000000021944598528.000000 4554adf4b7320336 -26
1000000000000000288165462016.000000 4589d971e4fe8404 -27

and on arm64

10000000000000003053453312.000000 45208b2a2c280292 -25
100000000000000039124467712.000000 4554adf4b7320337 -26
1000000000000000425604415488.000000 4589d971e4fe8405 -27

Detect the overflow earily to avoid the rounding loss.

stinos · 2020-05-01T07:30:00Z

Can you fix codestyle (run tools/codeformat.py), add tests for this exact behavior, and review why the current float_parse tests fail on some platforms?

On arm64, `print(float('.' + '9' * 70))` resulted in `1.000000000000001` rather that `1.0`. This is because dec_val = 10 * dec_val + dig; was converted into a single fused multiply-add instruction (fmadd) rather than two (mulsd and addsd on x86), thus caused a slightly different rounding loss when `dec_val` is about to overflow. The parsing process of `'.' + '9' * 70` would look like this on x86 dec_val *(uint64_t*)&dec_val exp_extra ... 10000000000000003053453312.000000 45208b2a2c280292 -25 100000000000000021944598528.000000 4554adf4b7320336 -26 1000000000000000288165462016.000000 4589d971e4fe8404 -27 ... and on arm64 ... 10000000000000003053453312.000000 45208b2a2c280292 -25 100000000000000039124467712.000000 4554adf4b7320337 -26 1000000000000000425604415488.000000 4589d971e4fe8405 -27 ... Note the difference when `exp_extra` = -26. Unfortunately there are no easy ways to fix this problem [1], furthermore GCC does not recognize `#pragma STDC FP_CONTRACT OFF`, while it does enable FP_CONTRACT as soon as -O2 (or higher). The only feasible solution appears to me is only `-ffp-contract=off`. [1] https://stackoverflow.com/a/42134261 [2] https://lists.freedesktop.org/archives/libreoffice/2017-December/079046.html

yangfl · 2020-05-01T09:36:18Z

Come up with another solution. Now tests/float/float_parse.py passes on arm64.

dlech · 2020-05-01T14:58:50Z

dec_val = 10 * dec_val + dig;

Just for cross-reference, this looks like the same code that is causing #5831

dpgeorge · 2020-05-03T03:37:22Z

Thanks for the patch, but I don't think this is the right way to fix the problem. The mp_parse_num_decimal() function is not perfect (as described by a few other issues/PRs) and would probably need a substantial rewrite to make it so.

dpgeorge · 2020-05-03T03:43:29Z

If you want to work around the issue for ARM64 builds, then it's possible to pass CFLAGS_EXTRA=-ffp-contract=off when building with make.

yangfl · 2020-05-03T04:19:59Z

Thanks and I see. But I think I'd rather add that flag in makefile to prevent those problems before someone does actually come up with a bugfree mp_parse_num_decimal which won't be easy as described in the links.

dpgeorge · 2020-05-03T04:52:14Z

But I think I'd rather add that flag in makefile to prevent those problems

These flags put in py/mkrules.mk affect all systems. If anything the flags could go in ports/unix/Makefile and be selective based on the architecture (since it's currently ok for eg x86).

But, do these flags fix more cases of floating port parsing then they break? Previous attempts to fix the mp_parse_num_decimal() have made certain numbers parse correctly but break parsing of numbers which used to work.

yangfl · 2020-05-03T07:14:02Z

At lease it does not break any tests under tests/. It should be enabled for all systems since there may be more fused operations hidden in the code, which are both hard to point out and somewhat impossible to avoid, and x86 SIMD has fused instructions. Anyway it's all up to you and I'd happy to see any real fix.

dpgeorge · 2020-05-03T12:48:55Z

It should be enabled for all systems since there may be more fused operations hidden in the code, which are both hard to point out and somewhat impossible to avoid,

But are these operations bad? Some systems may be relying on them in other code (eg FFT) and having the option in py/mkrules.mk means it disables them for all code, including 3rd party libraries.

The usual case is that it is up to a port (eg unix port) to apply relevant compiler flags depending on the target system/arch/OS/etc.

To make progress here it'd be good to be able to reproduce the issue, eg using a virtualbox image or qemu.

yangfl · 2020-05-03T15:33:06Z

#include <math.h>
#include <stdio.h>

void __attribute__((optimize("-O0"))) madd1(double a[3][2], double * restrict d) {
  for (int i = 0; i < 2; i++) {
    d[i] = a[0][i] * a[1][i] + a[2][i];
  }
}

void __attribute__((optimize("-O3"), target("fma"))) madd2(double a[3][2], double * restrict d) {
  for (int i = 0; i < 2; i++) {
    d[i] = a[0][i] * a[1][i] + a[2][i];
  }
}

int __attribute__((optimize("-O0"))) main(){
  double a[3][2], d1[2], d2[2];
  a[1][0] = 10;
  a[2][0] = 9;
  d2[0] = 9;
  for (int j = 2; j < 60; j++) {
    a[0][0] = d2[0];
    madd1(a, d1);
    madd2(a, d2);
    printf("%s %.20f %.20f\n", d1[0] == d2[0] ? "yay" : "nah", d1[0] / pow(10, j), d2[0] / pow(10, j));
  }
}

Abused SIMD to demonstrate FMA inaccuracy. But since floating-point arithmetic comes with more or less error, it's merely one's preference to choose between speed or cross-platform consistency.

dpgeorge · 2020-05-06T15:52:23Z

I was able to reproduce this issue on Aarch64 (RPi 3 Model B+ running in 64-bit mode).

But I was also able to find a floating point number that is parsed correctly with fp-contraction enabled, but then incorrectly when fp-contraction is disabled. Ie the commit here fixes some floats but breaks others.

Eg the test:

import array
print(bytes(array.array('d', [float("." + "9" * 70)])))
print(bytes(array.array('d', [float("52306490527514614E49")])))

run under CPython 3.8.2 gives:

b'\x00\x00\x00\x00\x00\x00\xf0?'
b'\xdf \xd6[\x00\xde\x93M'

under MicroPython unix port (current master) which does have fp contraction:

b'\x04\x00\x00\x00\x00\x00\xf0?'
b'\xdf \xd6[\x00\xde\x93M'

(note that first is wrong, second is right) and under MicroPython unix port (current master) with fp contraction turned off:

b'\x00\x00\x00\x00\x00\x00\xf0?'
b'\xde \xd6[\x00\xde\x93M'

(note that first is now right, but second is now wrong).

The function mp_parse_num_decimal() is (I think...) always correct up to the least-significant-bit of the mantissa (bit 0). But see in the test above that the error for the first number is wrong in bit 2 of the mantissa (there's a 0x04 instead of 0x00 in the first byte). With fp-contraction turned off the case that was working that is now wrong is only wrong by bit 0. So I can see how turning fp-contraction off might be the right thing to do, to reduce error to bit 0 in all cases. More tests would need to be run to see if that's the case.

dpgeorge · 2020-05-11T11:51:46Z

For an alternative to this, which attempts to fix the FP parsing routine, see #6024 . This has not yet been tested on Aarch64 though.

dpgeorge · 2021-05-18T04:40:07Z

As mentioned above, this PR fixes some float parsing instances but breaks others. Better to work on a solution like #6024.

yangfl changed the title ~~py/parsenum.c: use safe way to parse decimal~~ py/mkrules.mk: workaround fused multiply-add inaccuracy May 1, 2020

yangfl mentioned this pull request May 1, 2020

Test failed for linux on !x86 archs #5997

Open

dpgeorge added the py-core Relates to py/ directory in source label May 3, 2020

This was referenced May 18, 2021

ports/unix/Makefile: make float() constistent across cpu architectures #4246

Closed

Trailing .0 in float literal can cause rounding error #5831

Closed

dpgeorge closed this May 18, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

py/mkrules.mk: workaround fused multiply-add inaccuracy #5995

py/mkrules.mk: workaround fused multiply-add inaccuracy #5995

yangfl commented May 1, 2020

stinos commented May 1, 2020

yangfl commented May 1, 2020

dlech commented May 1, 2020

dpgeorge commented May 3, 2020

dpgeorge commented May 3, 2020

yangfl commented May 3, 2020

dpgeorge commented May 3, 2020

yangfl commented May 3, 2020

dpgeorge commented May 3, 2020

yangfl commented May 3, 2020

dpgeorge commented May 6, 2020

dpgeorge commented May 11, 2020

dpgeorge commented May 18, 2021

py/mkrules.mk: workaround fused multiply-add inaccuracy #5995

py/mkrules.mk: workaround fused multiply-add inaccuracy #5995

Conversation

yangfl commented May 1, 2020

stinos commented May 1, 2020

yangfl commented May 1, 2020

dlech commented May 1, 2020

dpgeorge commented May 3, 2020

dpgeorge commented May 3, 2020

yangfl commented May 3, 2020

dpgeorge commented May 3, 2020

yangfl commented May 3, 2020

dpgeorge commented May 3, 2020

yangfl commented May 3, 2020

dpgeorge commented May 6, 2020

dpgeorge commented May 11, 2020

dpgeorge commented May 18, 2021