Skip to content

WIP: MAINT: Rewrite LongFloatFormat, trim zeros in scientific notation output #9919

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 1 commit into from

Conversation

ahaldane
Copy link
Member

@ahaldane ahaldane commented Oct 24, 2017

The main purpose of this PR is to implement the new longfloat arrayprint formatter, as promised in #9139.

Now, longfloat arrays will print essentially identically to float and double arrays, using the same code paths. This means longfloat arrays now align nicely and have the right number of spaces.

This PR does two additional things which are somewhat unrelated, but are in the same code: I removed trailing zeros in scientific notation, so 1.0000e+100 becomes 1.e+100, and I made the longfloat and half formatter precision customizable independently from the float/double precision.

I think this is mostly done, I just need to add a few more tests, eg to test the "locale"-related code, and to test the new precision customization.

Sidenote: I implemented format_longfloat using a "trick" involving temporarily setting the "C" locale (if necessary) which avoids all the complicated manipulation of decimals used in NumpyOS_ascii_format_double (which was copied from the CPython code). Resetting the locale was recommended on the gcc page on locales.

@ahaldane ahaldane force-pushed the longfloat_formatter branch 2 times, most recently from a195592 to e5b9ed6 Compare October 24, 2017 23:06
@ahaldane ahaldane force-pushed the longfloat_formatter branch from e5b9ed6 to 958914b Compare October 24, 2017 23:41
@ahaldane ahaldane added this to the 1.14.0 release milestone Oct 25, 2017
def __call__(self, x):
r = self.real_format(x.real)
i = self.imag_format(x.imag)
return r + i + 'j'
Copy link
Member

@eric-wieser eric-wieser Oct 25, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this different from the ComplexFormat version? Which one is preferable?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't notice they were different. I'll probably copy/subclass the ComplexFloat version.

@eric-wieser
Copy link
Member

Is there a reason not just to modify format_longdouble and friends, adding extra arguments for the new features? That way, the code path is mostly shared between scalar repr and array repr.

elif strip_zeros:
z = s.rstrip('0')
s = z + ' '*(len(s)-len(z))
return s


class LongFloatFormat(FloatFormat):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd be inclined to make the base class FloatingFormatter to match np.floating, and use a derived class for FloatFormatter too.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

will do

@ahaldane
Copy link
Member Author

I left format_longdouble alone since I didn't want to modify the scalar printing, which appears to be intentionally different from the array element printing (it includes a trailing 0). I also don't see much point in adding a lot of generalized code for the scalars if we will never use it.

Also, format_longfoat was already its own special function in the multiaraymodule, so I am just leaving it that way.

@ahaldane
Copy link
Member Author

ahaldane commented Oct 25, 2017

Also, just a comment on the state of float printing in numpy and python: It's messy.

We use C's printf to print numpy scalars, but we use python's printf-formatting to print array elements, and the two are not the same. CPython actually uses the dtoa algorithm for float printing, I think because they decided that the OS printf was unreliable and sometimes inaccurate. dtoa is advertised to be a highly accurate and also human-friendly way to print floats.

Using C's printf is not ideal since we have to to a lot of string manipulation to reformat the OS printf output to be what we want (see numpyos.c), to account for locale, exponent digits, and other things. CPython falls back to using the same methods if dtoa.c is not available for some reason.

Other languages like Julia, Go, Rust, javascript in some browsers, have decided that the OS printf was unreliable, and use specific float printing algorithms. They seem to use some combination of dtoa, an algorithm called grisu3 (dtoa is apparently based on grisu2), and one called dragon4.

Here are some links to discussion of these issues:

Blog Post about Dragon4 and Grisu3
Swift discussion of grisu3 vs dtoa, with more links
Python issue discussing dtoa improvements

Probably in an ideal world numpy would include a good float-printing algorithm, perhaps customized for our requirements (eg, for output alignment). But we have lived without one until now, so I don't think it's very high priority.

@ahaldane
Copy link
Member Author

Hmm, some of the numpy problems in my last comment might be solved by using calls to PyOS_double_to_string instead of PyOS_snprintf in the numpy scalar float formatting code, since the former uses the dtoa algorithm. We would need to write wrapper code to account for whitespace padding, exponent padding, like CPython does somewhere in the formatting code too.

@ahaldane
Copy link
Member Author

ahaldane commented Oct 25, 2017

One last comment:

I don't think the dtoa library can accurately print IEEE-extended precision floats (ie 80-bit long double on x64). We may have to rely on the OS printf for that, doubly so because the format of long double is arch-dependent.

@theodoregoetz
Copy link
Contributor

note: this PR is addressing issue #9699

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants