-
-
Notifications
You must be signed in to change notification settings - Fork 8.2k
py/parsenum: Implement exact float parsing using integer mpz (WIP) #6024
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
For an example of a number that did not parse correctly before this PR, but parses correctly now, consider 74e46. This used to parse to The way this number is parsed in this PR is:
|
This looks really good. Didn't check the code in detail but the logic is sound and if tests pass the code should be correct :) With that in mind: it's probably worth adding as much 'speical' cases as needed to cover a good range. Then if the performance isn't abysmal this is the way forward I think. |
Not sure what you mean by this, which "special cases"? |
The ones which don't parse correctly now like in your previous comment. And picked across the whole range of doubles, i.e. also very large and very small ones. |
Aah, I see, you mean add specific tests. Yes indeed. For 32-bit parsing it should be possible to test every value (although not as part of the test suite, it'd take too long). |
Float parsing should now be exact. Costs about 200 bytes on stm32. TODO: - test more - don't allocate heap memory, use the stack instead - benchmark to see if it's acceptable
489efde
to
1c5238c
Compare
If you would like some additional 'special case' double-precision tests, feel free to use these (or feel free to bin them - won't hurt my feelings :-) ) (All cases refer to 17-digits-of-precision decimal inputs (sans trailing zeros), with expected results expressed as (little-endian) IEEE-754 binary64 values.) Overflow/largest double boundary: Normalized/denormalized double boundary Shortest (up to) 17-digit input that converts to smallest denormalized double: Closest 17-digit input to the smallest denormalized double: The next boundary will depend on how good the ldexp implementation is on the target platform: 2.4703282292062328e-324 (should yield smallest denormalized double, 0x0100000000000000 ) 2.4703282292062327e-324 (should underflow to zero: 0x0000000000000000 ) |
This is an automated heads-up that we've just merged a Pull Request See #13763 A search suggests this PR might apply the STATIC macro to some C code. If it Although this is an automated message, feel free to @-reply to me directly if |
This PR attempts to implement exact parsing of floats in
mp_parse_num_decimal()
. There have been quite a few issues related to this, and parsing floats correctly is important, so I thought it'd be worth trying to fix this properly.Other implementations I could find (eg musl's) are pretty complex and use a lot of stack. The implementation here is deigned and written from scratch (but it's probably not original) and reuses the existing MicroPython big-int functions to do the bulk of the conversion.
The point is that all operations during the conversion must be exact (ie not do any rounding or else the rounding will compound), except as an optimisation the very last bit can use the FP hardware because it'll round in the correct way.
The idea is to parse the input number as a big-int and then apply the exponent also using big-int arithmetic. The trick is that the exponent of
10**e
can be factored into(2*5)**e
of which the2**e
part can be done exactly in hardware because that's the natural base of the FP representation. So only the5**e
calculation needs to be done using big-ints. Ife
is negative then the big-int is pre-multiplied by a large enough number (namely8**e
) so it can then be divided by5**e
without loss.The entire parsing could be done without any FP hardware, ie just using big-ints to construct the bits in the floating point representation (mantissa, exponent, sign bit). But it seems OK to do the last bit using standard float operations (converting the big-int to a float then adjusting by the power-of-2 exponent) and this makes the code a bit simpler (reuses existing functions, like
ldexp
).Tested against a set of known difficult numbers (from CPython test suite) and it parses all of them correctly (in double precision mode).
TODO/outstanding: