From 9e5532738f51e8c16ece14018bec454f25de0f55 Mon Sep 17 00:00:00 2001 From: yangfl Date: Fri, 1 May 2020 14:59:57 +0800 Subject: [PATCH] py/mkrules.mk: workaround fused multiply-add inaccuracy On arm64, `print(float('.' + '9' * 70))` resulted in `1.000000000000001` rather that `1.0`. This is because dec_val = 10 * dec_val + dig; was converted into a single fused multiply-add instruction (fmadd) rather than two (mulsd and addsd on x86), thus caused a slightly different rounding loss when `dec_val` is about to overflow. The parsing process of `'.' + '9' * 70` would look like this on x86 dec_val *(uint64_t*)&dec_val exp_extra ... 10000000000000003053453312.000000 45208b2a2c280292 -25 100000000000000021944598528.000000 4554adf4b7320336 -26 1000000000000000288165462016.000000 4589d971e4fe8404 -27 ... and on arm64 ... 10000000000000003053453312.000000 45208b2a2c280292 -25 100000000000000039124467712.000000 4554adf4b7320337 -26 1000000000000000425604415488.000000 4589d971e4fe8405 -27 ... Note the difference when `exp_extra` = -26. Unfortunately there are no easy ways to fix this problem [1], furthermore GCC does not recognize `#pragma STDC FP_CONTRACT OFF`, while it does enable FP_CONTRACT as soon as -O2 (or higher). The only feasible solution appears to me is only `-ffp-contract=off`. [1] https://stackoverflow.com/a/42134261 [2] https://lists.freedesktop.org/archives/libreoffice/2017-December/079046.html --- py/mkrules.mk | 3 +++ 1 file changed, 3 insertions(+) diff --git a/py/mkrules.mk b/py/mkrules.mk index c37c25db4bd2f..508269cbceb60 100644 --- a/py/mkrules.mk +++ b/py/mkrules.mk @@ -14,6 +14,9 @@ OBJ_EXTRA_ORDER_DEPS += $(HEADER_BUILD)/compressed.data.h CFLAGS += -DMICROPY_ROM_TEXT_COMPRESSION=1 endif +# see 'fused multiply-add' problem +CFLAGS += -ffp-contract=off + # QSTR generation uses the same CFLAGS, with these modifications. # Note: := to force evalulation immediately. QSTR_GEN_CFLAGS := $(CFLAGS)