py/parsenum: Implement exact float parsing using integer mpz (WIP) #6024

dpgeorge · 2020-05-10T15:23:04Z

This PR attempts to implement exact parsing of floats in mp_parse_num_decimal(). There have been quite a few issues related to this, and parsing floats correctly is important, so I thought it'd be worth trying to fix this properly.

Other implementations I could find (eg musl's) are pretty complex and use a lot of stack. The implementation here is deigned and written from scratch (but it's probably not original) and reuses the existing MicroPython big-int functions to do the bulk of the conversion.

The point is that all operations during the conversion must be exact (ie not do any rounding or else the rounding will compound), except as an optimisation the very last bit can use the FP hardware because it'll round in the correct way.

The idea is to parse the input number as a big-int and then apply the exponent also using big-int arithmetic. The trick is that the exponent of 10**e can be factored into (2*5)**e of which the 2**e part can be done exactly in hardware because that's the natural base of the FP representation. So only the 5**e calculation needs to be done using big-ints. If e is negative then the big-int is pre-multiplied by a large enough number (namely 8**e) so it can then be divided by 5**e without loss.

The entire parsing could be done without any FP hardware, ie just using big-ints to construct the bits in the floating point representation (mantissa, exponent, sign bit). But it seems OK to do the last bit using standard float operations (converting the big-int to a float then adjusting by the power-of-2 exponent) and this makes the code a bit simpler (reuses existing functions, like ldexp).

Tested against a set of known difficult numbers (from CPython test suite) and it parses all of them correctly (in double precision mode).

TODO/outstanding:

do more tests
test with 32-bit floats
don't allocate heap memory for temporary mpz's, use the stack instead
benchmark to see if it has acceptable performance

dpgeorge · 2020-05-11T11:51:07Z

For an example of a number that did not parse correctly before this PR, but parses correctly now, consider 74e46. This used to parse to b'\xee\xad\xa0\xec\xd73\xe0I' but now parses correctly to b'\xef\xad\xa0\xec\xd73\xe0I' (note LSB). And getting this correct is hard because 74 can be represented exactly in a double, so the error is introduced by applying the exponent, ie the multiplication by 10**46. This exponent cannot be represented exactly on its own so pow(10, 46) is inexact, and then doing the multiplication 74 * pow(10, 46) propagates the error.

The way this number is parsed in this PR is:

74 parsed as a big-int, with exponent 46 parsed as a machine int
5**exp computed as a big-int
74 * 5**exp computed as a big-int (still exact)
normalisation of 74 * 5**exp which does correct rounding (this step is not strictly needed though)
convert 74 * 5**exp to a float, no longer exact but as close as possible to the true value (due to IEEE properties of FP)
multiply byt 2**exp by: ldexp(float(74 * 5**exp), exp) which is an exact FP operation (basically just adjusts the FP exponent of 74 * 5**exp)

stinos · 2020-05-13T07:13:17Z

This looks really good. Didn't check the code in detail but the logic is sound and if tests pass the code should be correct :) With that in mind: it's probably worth adding as much 'speical' cases as needed to cover a good range. Then if the performance isn't abysmal this is the way forward I think.

dpgeorge · 2020-05-13T07:22:08Z

With that in mind: it's probably worth adding as much 'speical' cases as needed to cover a good range.

Not sure what you mean by this, which "special cases"?

stinos · 2020-05-13T07:29:00Z

The ones which don't parse correctly now like in your previous comment. And picked across the whole range of doubles, i.e. also very large and very small ones.

dpgeorge · 2020-05-13T07:48:48Z

Aah, I see, you mean add specific tests. Yes indeed.

For 32-bit parsing it should be possible to test every value (although not as part of the test suite, it'd take too long).

Float parsing should now be exact. Costs about 200 bytes on stm32. TODO: - test more - don't allocate heap memory, use the stack instead - benchmark to see if it's acceptable

ddiminnie · 2020-06-05T23:13:06Z

If you would like some additional 'special case' double-precision tests, feel free to use these (or feel free to bin them - won't hurt my feelings :-) )

(All cases refer to 17-digits-of-precision decimal inputs (sans trailing zeros), with expected results expressed as (little-endian) IEEE-754 binary64 values.)

Overflow/largest double boundary:
1.7976931348623159e308 (should overflow, 0x000000000000f07f )
1.7976931348623158e308 (should yield the max double, 0xffffffffffffef7f )

Normalized/denormalized double boundary
2.2250738585072012e-308 (should yield smallest normalized double, 0x0000000000001000 )
2.2250738585072011e-308 (should yield largest denormalized double, 0xffffffffffff0f00 )

Shortest (up to) 17-digit input that converts to smallest denormalized double:
5e-324 (should yield smallest denormalized double, 0x0100000000000000 )

Closest 17-digit input to the smallest denormalized double:
4.9406564584124654e-324 (should yield smallest denormalized double, 0x0100000000000000 )

The next boundary will depend on how good the ldexp implementation is on the target platform:
Smallest denormalized double/underflow boundary:

2.4703282292062328e-324 (should yield smallest denormalized double, 0x0100000000000000 )
(Note that this value is greater than 2**-1075 and therefore should round up. 64-bit CPython 3.7.5 on win32 gets this right. Your mileage may vary, since the 54 most significant bits of the result are 0b1.00000000000000000000000000000000000000000000000000000 x 2**-1075.)

2.4703282292062327e-324 (should underflow to zero: 0x0000000000000000 )
(Note that this value is less than 2**-1075 and therefore should round down to zero.)

projectgus · 2024-03-07T23:53:26Z

This is an automated heads-up that we've just merged a Pull Request
that removes the STATIC macro from MicroPython's C API.

See #13763

A search suggests this PR might apply the STATIC macro to some C code. If it
does, then next time you rebase the PR (or merge from master) then you should
please replace all the STATIC keywords with static.

Although this is an automated message, feel free to @-reply to me directly if
you have any questions about this.

dpgeorge added the py-core Relates to py/ directory in source label May 10, 2020

dpgeorge mentioned this pull request May 11, 2020

py/mkrules.mk: workaround fused multiply-add inaccuracy #5995

Closed

dpgeorge added 3 commits May 16, 2020 16:36

py/parsenum: Implement exact float parsing using integer mpz.

92d7557

Float parsing should now be exact. Costs about 200 bytes on stm32. TODO: - test more - don't allocate heap memory, use the stack instead - benchmark to see if it's acceptable

samd/Makefile: Include ldexp in build if needed by float parsing.

94ca428

py/parsenum: Improve parsing of floats with large number of digits.

1c5238c

dpgeorge force-pushed the py-parsenum-exact-float branch from 489efde to 1c5238c Compare May 16, 2020 07:38

ddiminnie mentioned this pull request Mar 23, 2021

Slight inaccuracy in decimal formatting of floating-point values #7066

Closed

This was referenced May 18, 2021

ports/unix/Makefile: make float() constistent across cpu architectures #4246

Closed

Trailing .0 in float literal can cause rounding error #5831

Closed

dlech mentioned this pull request Jun 4, 2021

py/parsenum: fix rounding error when float ends in .0 #5832

Closed

Gadgetoid mentioned this pull request Feb 29, 2024

global: Remove the STATIC macro. #13763

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

py/parsenum: Implement exact float parsing using integer mpz (WIP) #6024

py/parsenum: Implement exact float parsing using integer mpz (WIP) #6024

dpgeorge commented May 10, 2020

dpgeorge commented May 11, 2020

stinos commented May 13, 2020

dpgeorge commented May 13, 2020

stinos commented May 13, 2020

dpgeorge commented May 13, 2020

ddiminnie commented Jun 5, 2020 •

edited

Loading

projectgus commented Mar 7, 2024

py/parsenum: Implement exact float parsing using integer mpz (WIP) #6024

Are you sure you want to change the base?

py/parsenum: Implement exact float parsing using integer mpz (WIP) #6024

Conversation

dpgeorge commented May 10, 2020

dpgeorge commented May 11, 2020

stinos commented May 13, 2020

dpgeorge commented May 13, 2020

stinos commented May 13, 2020

dpgeorge commented May 13, 2020

ddiminnie commented Jun 5, 2020 • edited Loading

projectgus commented Mar 7, 2024

ddiminnie commented Jun 5, 2020 •

edited

Loading