Skip to content

WIP: ENH: print float scalars using double_to_string instead of printf #9932

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 2 commits into from

Conversation

ahaldane
Copy link
Member

Currently, numpy scalars of floating type print differently from both float-array-elements and Python floats:

>>> 0.3, np.float64(0.3), str(np.array([0.3]))
(0.3, 0.29999999999999999, '[ 0.3]')

similarly, scalars of complex type are different:

>>> complex(1,np.inf), np.complex128(complex(1,np.inf)), str(np.array([complex(1,np.inf)]))
((1+infj), (1+inf*j), '[ 1.+infj]')

This is because the scalars use the OS's printf to print floats, in contrast to python-floats and numpy arrays which use CPython's version of the dtoa library to print accurate and human-friendly floats. (see discussion in #9919).

This PR rewrites the scalar float printing code to use PyOS_double_to_string which uses python's dtoa algorithm, and tweaks the complex repr too. Thus, scalars now use the same algorithm as python floats and arrays. Scalars will not print exactly the same as python floats because numpy prints with increased precision, but otherwise the behavior is the same, eg the rounding and trimming of trailing zeros.

One exception is for printing longfloats, since the dtoa algorithm cannot handle these. In this case we fall back to the OS's printf, but I've also added the zero-trimming behavior from dtoa so it is still more similar to python floats than before.

I've also generalized these functions so it is easy to change the trimming behavior, precision, and format, which may be useful in #9919.

This is a WIP. I think the behavior is finished, but I need to write comments and tests, and I'll also write up some implementation notes in a comment below at some point. I'll ping when I'm done.

@ahaldane ahaldane added this to the 1.14.0 release milestone Oct 27, 2017
@ahaldane ahaldane force-pushed the dtoa_scalars branch 3 times, most recently from a019e96 to 7f3ca0d Compare October 27, 2017 05:15
}
}
else {
/* we found a trailing nonzero-digit instead of '.' */
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not necessarily - this might also be the one and only zero, right?

while (repr[epos] != '\0') {
repr[nzpos++] = repr[epos++];
}
repr[nzpos] = repr[epos];
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could just use a do while here

Copy link
Contributor

@mhvk mhvk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did not have a thorough look at all, so these are a bit random.

* * format_code: similar to the argument to PyOS_double_to_string,
* * prec: same as the argument to PyOS_double_to_string
* * sign: boolean value, controls whether sign is always printed
* * tail: one of '\0', '.' or '0', to control what happens for
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should include 'r', no? You test for it below.

flags |= Py_DTSF_ALT;
}

/* 'g' format precision is 1 greater, so decr for consistency with f,e */
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm a bit confused why we do this if the goal is to be more similar to PyOS_double_to_string. Is it just that we do it elsewhere too?

@ahaldane ahaldane force-pushed the dtoa_scalars branch 5 times, most recently from 8f78ad5 to 36fc0cb Compare October 27, 2017 17:54
@ahaldane
Copy link
Member Author

Hmm, it turns out to be quite tricky to get everything to work using PyOS_double_to_string. The problem is that it has a specially coded "r" mode designed to print float64, which is what is used to print python floats, but which doesn't allow us to control precision. But in numpy we need to control precision, and we want to print things besides float64 with proper rounding. So PyOS_double_to_string is not good enough.

Furthermore, I noticed that in a lot of non-scalar places numpy prints ugly representations. Eg,

>>> np.arange(10., dtype='f4')/10
array([ 0.        ,  0.1       ,  0.2       ,  0.30000001,  0.40000001,
        0.5       ,  0.60000002,  0.69999999,  0.80000001,  0.89999998], dtype=float32)

Those trailing digits are unnecessary. This leads me to think we should include custom float-printing code in numpy.

After some inverstigation, the Dragon4 algorithm seems promising to me. Ryan Juckett has written it up in C++ here with a license which I think is numpy-compatible. It looks easy to port to C. I already tried modifying it to getfloat16 and float128` printing with "correct" rounding, which was quite easy to do.

Unless someone already has an objection, I'm probably going to close this PR, and work on another one to include Dragon4 printing in numpy. This should improve float reprs for 1. float-scalars and for 2. non-float64 types in general. Ie, the example above will print the way you would want.

@mhvk
Copy link
Contributor

mhvk commented Oct 28, 2017

If you're up for it, by all means!

@charris charris added the 57 - Close? Issues which may be closable unless discussion continued label Oct 28, 2017
@ahaldane
Copy link
Member Author

ahaldane commented Nov 5, 2017

Closed in favor of #9941

@ahaldane ahaldane closed this Nov 5, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
01 - Enhancement 51 - In progress 57 - Close? Issues which may be closable unless discussion continued component: numpy._core
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants