Skip to content

py/formatfloat: Calculate reference values with pow(). #8985

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 2 commits into from

Conversation

dpwe
Copy link
Contributor

@dpwe dpwe commented Jul 29, 2022

Formerly, formatfloat struggled with formatting exact powers of 10
as created by parsenum because of small differences in how those
powers of 10 were calculated: formatfloat multiplied together a few
values of 10^(2^Y) from a lookup table, but parsenum used
pow(10, X). This was mitigated in #8905 by adding 2eps of extra
margin when comparing values ("aggressive rounding up").

This patch instead eliminates the discrepency by using pow() in
formatfloat also. This eliminates the need for the 2eps margin, as
well as the lookup tables. It is likely slower to run, however.

In addition, this patch directly estimates the power-of-10 exponent from
the power-of-2 exponent in the floating-point representation.

This change surfaced a previously-noted problem with parsing numbers
including trailing zeros. The fix to that problem, #8980, is included
in this PR to ensure the tests pass.

Signed-off-by: Dan Ellis dan.ellis@gmail.com

@dpwe dpwe force-pushed the format_float_with_pow branch 7 times, most recently from 20938e0 to 2e8e515 Compare August 4, 2022 17:05
@dpwe
Copy link
Contributor Author

dpwe commented Aug 5, 2022

@dpgeorge Any hints as to why this is failing on appveyor? All I can see is the build error outputs, which seem catastrophic - my code prints nothing for numbers smaller than 1. But I can't see how this is happening (and can't reproduce, obviously). The Appveyor build appears to be under Windows using an MS compiler - is there some reason why mp_float_union_t wouldn't work there? I got rid of the special case float_union_t (for accessing the bits in a 32-bit float) from formatfloat since it appeared redundant with mp_float_union_t, but now I'm doubting my actions. In particular, the new code computes directly from the exp bits extracted from mp_float_union_t, maybe these read as signed under the MS compiler or something?

Edit: This was fixed by directly accessing the bits in the uint member of the union, rather than trying to use the bit-sliced struct members. So I guess there's something weird with how those structs behave under the MS compiler.

@dpwe
Copy link
Contributor Author

dpwe commented Aug 5, 2022

Although the tests pass, there's a big problem with this. Using a float32 build,

>>> print("%.8e" % 1e-13)
0.99999990e-13

That's a malformed "e"-format number, the leading digit cannot be 0 in "e"-format.

The problem occurs with a mismatch between how the leading digit place is calculated, and how the leading digit itself is calculated. For numbers smaller than 1.0, we decide the leading digit by comparing the input number with pow(10, -e) for increasing values of e until we get something less than or equal to the input number. In the case of 1e-13, that returns e = 13 as expected.

We then scale up the input by pow(10, e), and the leading digit is however many times we can subtract 1 from this value and not go negative. Unfortunately, for some values of e (specifically, 13, 17, 19, 28, 31, and 38), the product pow(10, e) * pow(10, -e) is smaller than 1, e.g.:

>>> print("%.12f" % ((1e13) * (1e-13)))
0.999999904633

Thus, when we come to see how many units we can fit in the scaled-up input, it's zero, and we print the incorrectly-normalized "e"-format number above.

I think this was being hidden by the hard FPROUND_TO_ONE logic that I didn't understand and removed in this PR, but I'm looking for a cleaner solution.

Edit: This was fixed by dividing by pow(10, -e) instead of multiplying by pow(10, e), which is obvious in hindsight. Hooray for rubber-duck debugging!

@dpwe dpwe force-pushed the format_float_with_pow branch from 963fe59 to caaf141 Compare August 5, 2022 18:05
@dpwe
Copy link
Contributor Author

dpwe commented Aug 5, 2022

This code is more understandable, I believe more accurate (no aggressive rounding), and should even be smaller (although the changes to parsenum will have increase code size there). PTAL.

@dpgeorge dpgeorge added the py-core Relates to py/ directory in source label Aug 11, 2022
@dpgeorge
Copy link
Member

Code size diff for this PR (which includes #8980):

   bare-arm:    +0 +0.000% 
minimal x86:    +0 +0.000% 
   unix x64:  -192 -0.031% 
unix nanbox:  -208 -0.037% 
      stm32:   -52 -0.013% PYBV10
     cc3200:    -8 -0.004% 
    esp8266:   -48 -0.007% GENERIC
      esp32:  +168 +0.011% GENERIC[incl -48(data)]
     mimxrt:  -184 -0.052% TEENSY40
 renesas-ra:   -48 -0.008% RA6M2_EK
        nrf:   -32 -0.017% pca10040
        rp2:  -176 -0.034% PICO
       samd:   -52 -0.037% ADAFRUIT_ITSYBITSY_M4_EXPRESS

That's great!

@dpgeorge
Copy link
Member

This looks like a good improvement, because it reduces code size and fixes the trailing 0 problem. And as you mention it no longer does the aggressive rounding up.

Can you please rebase this on master and clean up the commits, with at least a separate commit to fix the trailing zero problem, and then 1 or more extra commits for the other changes. (At the moment the commits conflict, it seems it was merged to itself.)

@dpwe dpwe force-pushed the format_float_with_pow branch 2 times, most recently from 64d3e17 to 912334c Compare August 11, 2022 10:42
dpwe and others added 2 commits August 11, 2022 11:45
Rework the conversion of floats to decimal strings so it aligns
precisely with the conversion of strings to floats in parsenum.c.
This is to avoid rendering 1eX as 9.99999eX-1 etc.

Signed-off-by: Dan Ellis <dan.ellis@gmail.com>
@dpwe dpwe force-pushed the format_float_with_pow branch from 912334c to 0419279 Compare August 11, 2022 10:50
@dpwe
Copy link
Contributor Author

dpwe commented Aug 11, 2022

Sounds good. I have rebased and cleaned up the commit history as you requested. I have rebased to HEAD, then there are now two commits, one adds the changes to parsenum.c, and the second adds the changes to formatfloat.c and the tests.

I'm getting more successful with git rebase; however, there are a couple of formatting changes to parsenum in the second commit that don't belong there. I'm hoping this is OK.

@dpgeorge
Copy link
Member

Thanks for updating the commits.

Rebased and merged in 6cd2e41 and 6f4d424


Final code size diff for this PR is:

   bare-arm:    +0 +0.000% 
minimal x86:    +0 +0.000% 
   unix x64:  -144 -0.023% standard
unix nanbox:  -208 -0.037% nanbox
      stm32:   -44 -0.011% PYBV10
     cc3200:    +0 +0.000% 
    esp8266:   -44 -0.006% GENERIC
      esp32:  +156 +0.010% GENERIC[incl -48(data)]
     mimxrt:  -176 -0.049% TEENSY40
 renesas-ra:   -48 -0.008% RA6M2_EK
        nrf:   -24 -0.013% pca10040
        rp2:  -152 -0.030% PICO
       samd:   -44 -0.031% ADAFRUIT_ITSYBITSY_M4_EXPRESS

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
py-core Relates to py/ directory in source
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants