WIP: MAINT: Rewrite LongFloatFormat, trim zeros in scientific notation output #9919

ahaldane · 2017-10-24T22:24:26Z

The main purpose of this PR is to implement the new longfloat arrayprint formatter, as promised in #9139.

Now, longfloat arrays will print essentially identically to float and double arrays, using the same code paths. This means longfloat arrays now align nicely and have the right number of spaces.

This PR does two additional things which are somewhat unrelated, but are in the same code: I removed trailing zeros in scientific notation, so 1.0000e+100 becomes 1.e+100, and I made the longfloat and half formatter precision customizable independently from the float/double precision.

I think this is mostly done, I just need to add a few more tests, eg to test the "locale"-related code, and to test the new precision customization.

Sidenote: I implemented format_longfloat using a "trick" involving temporarily setting the "C" locale (if necessary) which avoids all the complicated manipulation of decimals used in NumpyOS_ascii_format_double (which was copied from the CPython code). Resetting the locale was recommended on the gcc page on locales.

eric-wieser · 2017-10-25T08:13:14Z

numpy/core/arrayprint.py

+    def __call__(self, x):
+        r = self.real_format(x.real)
+        i = self.imag_format(x.imag)
+        return r + i + 'j'


Why is this different from the ComplexFormat version? Which one is preferable?

I didn't notice they were different. I'll probably copy/subclass the ComplexFloat version.

eric-wieser · 2017-10-25T08:21:56Z

Is there a reason not just to modify format_longdouble and friends, adding extra arguments for the new features? That way, the code path is mostly shared between scalar repr and array repr.

eric-wieser · 2017-10-25T08:23:26Z

numpy/core/arrayprint.py

        elif strip_zeros:
            z = s.rstrip('0')
            s = z + ' '*(len(s)-len(z))
        return s

+
+class LongFloatFormat(FloatFormat):


I'd be inclined to make the base class FloatingFormatter to match np.floating, and use a derived class for FloatFormatter too.

ahaldane · 2017-10-25T15:36:47Z

I left format_longdouble alone since I didn't want to modify the scalar printing, which appears to be intentionally different from the array element printing (it includes a trailing 0). I also don't see much point in adding a lot of generalized code for the scalars if we will never use it.

Also, format_longfoat was already its own special function in the multiaraymodule, so I am just leaving it that way.

ahaldane · 2017-10-25T16:14:29Z

Also, just a comment on the state of float printing in numpy and python: It's messy.

We use C's printf to print numpy scalars, but we use python's printf-formatting to print array elements, and the two are not the same. CPython actually uses the dtoa algorithm for float printing, I think because they decided that the OS printf was unreliable and sometimes inaccurate. dtoa is advertised to be a highly accurate and also human-friendly way to print floats.

Using C's printf is not ideal since we have to to a lot of string manipulation to reformat the OS printf output to be what we want (see numpyos.c), to account for locale, exponent digits, and other things. CPython falls back to using the same methods if dtoa.c is not available for some reason.

Other languages like Julia, Go, Rust, javascript in some browsers, have decided that the OS printf was unreliable, and use specific float printing algorithms. They seem to use some combination of dtoa, an algorithm called grisu3 (dtoa is apparently based on grisu2), and one called dragon4.

Here are some links to discussion of these issues:

Blog Post about Dragon4 and Grisu3
Swift discussion of grisu3 vs dtoa, with more links
Python issue discussing dtoa improvements

Probably in an ideal world numpy would include a good float-printing algorithm, perhaps customized for our requirements (eg, for output alignment). But we have lived without one until now, so I don't think it's very high priority.

ahaldane · 2017-10-25T16:34:30Z

Hmm, some of the numpy problems in my last comment might be solved by using calls to PyOS_double_to_string instead of PyOS_snprintf in the numpy scalar float formatting code, since the former uses the dtoa algorithm. We would need to write wrapper code to account for whitespace padding, exponent padding, like CPython does somewhere in the formatting code too.

ahaldane · 2017-10-25T16:42:18Z

One last comment:

I don't think the dtoa library can accurately print IEEE-extended precision floats (ie 80-bit long double on x64). We may have to rely on the OS printf for that, doubly so because the format of long double is arch-dependent.

theodoregoetz · 2017-10-25T19:49:31Z

note: this PR is addressing issue #9699

ahaldane force-pushed the longfloat_formatter branch 2 times, most recently from a195592 to e5b9ed6 Compare October 24, 2017 23:06

MAINT: Rewrite LongFloatFormat, trim zeros in scientific notation output

958914b

ahaldane force-pushed the longfloat_formatter branch from e5b9ed6 to 958914b Compare October 24, 2017 23:41

ahaldane added 01 - Enhancement component: numpy._core labels Oct 25, 2017

ahaldane added this to the 1.14.0 release milestone Oct 25, 2017

eric-wieser reviewed Oct 25, 2017

View reviewed changes

This was referenced Oct 27, 2017

WIP: ENH: print float scalars using double_to_string instead of printf #9932

Closed

ENH: Use Dragon4 algorithm to print floating values #9941

Merged

charris closed this in 9ab9e8b Nov 5, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

WIP: MAINT: Rewrite LongFloatFormat, trim zeros in scientific notation output #9919

WIP: MAINT: Rewrite LongFloatFormat, trim zeros in scientific notation output #9919

ahaldane commented Oct 24, 2017 •

edited

Loading

eric-wieser Oct 25, 2017 •

edited

Loading

ahaldane Oct 25, 2017

eric-wieser commented Oct 25, 2017

eric-wieser Oct 25, 2017

ahaldane Oct 25, 2017

ahaldane commented Oct 25, 2017

ahaldane commented Oct 25, 2017 •

edited

Loading

ahaldane commented Oct 25, 2017

ahaldane commented Oct 25, 2017 •

edited

Loading

theodoregoetz commented Oct 25, 2017

WIP: MAINT: Rewrite LongFloatFormat, trim zeros in scientific notation output #9919

WIP: MAINT: Rewrite LongFloatFormat, trim zeros in scientific notation output #9919

Conversation

ahaldane commented Oct 24, 2017 • edited Loading

eric-wieser Oct 25, 2017 • edited Loading

Choose a reason for hiding this comment

ahaldane Oct 25, 2017

Choose a reason for hiding this comment

eric-wieser commented Oct 25, 2017

eric-wieser Oct 25, 2017

Choose a reason for hiding this comment

ahaldane Oct 25, 2017

Choose a reason for hiding this comment

ahaldane commented Oct 25, 2017

ahaldane commented Oct 25, 2017 • edited Loading

ahaldane commented Oct 25, 2017

ahaldane commented Oct 25, 2017 • edited Loading

theodoregoetz commented Oct 25, 2017

ahaldane commented Oct 24, 2017 •

edited

Loading

eric-wieser Oct 25, 2017 •

edited

Loading

ahaldane commented Oct 25, 2017 •

edited

Loading

ahaldane commented Oct 25, 2017 •

edited

Loading