MAINT: Remove newline before dtype in repr of arrays #10032

ahaldane · 2017-11-15T21:18:17Z

This PR more carefully chooses whether to put a newline before the dtype= part of ndarray reprs. If adding the dtype would put the last line of output past the max_line_width formatter option, then put the dtype on the new line. Otherwise keep the dtype on the same line. Supports 1.13 legacy mode (must merge #10030 first).

The old behavior was to always keep the dtype on the same line for non-flexible-typed arrays (sometimes going past the max_line_width), and always put it on a new line for flexible-typed arrays.

In the output below I've wrapped the lines as if in an 80-char terminal.

Old behavior:

>>> np.arange(10,20,dtype='f4')
array([10., 11., 12., 13., 14., 15., 16., 17., 18., 19.], dtype=float32)
>>> np.arange(10,24., dtype='f4')
array([10., 11., 12., 13., 14., 15., 16., 17., 18., 19., 20., 21., 22., 23.], dt
ype=float32)
>>> np.ones(3, dtype='S4')
array(['1', '1', '1'],
      dtype='|S4')
>>> np.ones(11, dtype='S4')
array(['1', '1', '1', '1', '1', '1', '1', '1', '1', '1', '1'],
      dtype='|S4')
>>> np.ones(12, dtype='S4')
array(['1', '1', '1', '1', '1', '1', '1', '1', '1', '1', '1', '1'],
      dtype='|S4')

New behavior:

>>> np.arange(10,20., dtype='f4')
array([10., 11., 12., 13., 14., 15., 16., 17., 18., 19.], dtype=float32)
>>> np.arange(10,24., dtype='f4')
array([10., 11., 12., 13., 14., 15., 16., 17., 18., 19., 20., 21., 22., 23.],
      dtype=float32)
>>> np.ones(3, dtype='S4')
array(['1', '1', '1'], dtype='|S4')
>>> np.ones(11, dtype='S4')
array(['1', '1', '1', '1', '1', '1', '1', '1', '1', '1', '1'], dtype='|S4')
>>> np.ones(12, dtype='S4')
array(['1', '1', '1', '1', '1', '1', '1', '1', '1', '1', '1', '1'],
      dtype='|S4')

Relatedly, I think that a lot of the linewidth related code should be rewritten, because there are lots of cases it can go past the user-requested line width. For instance, if you look in _formatArray you can see it does not take into account trailing commas, or trailing ], so these can go past the max_line_width. But I am somewhat loath to try to fix that right now. Note that the default linewidth is 75 instead of the standard 80, which I might guess is because of these problems.

Maybe we can leave those problems alone, and just change the newline for now?

eric-wieser · 2017-11-17T05:42:50Z

numpy/core/arrayprint.py

@@ -1209,6 +1211,31 @@ def array_repr(arr, max_line_width=None, precision=None, suppress_small=None):
            lf = '\n'+' '*len(class_name + "(")
        return "%s(%s,%sdtype=%s)" % (class_name, lst, lf, typename)

+
+    if issubclass(arr.dtype.type, flexible):


This entire if along with prefix can be placed before the if _format_options['legacy'] to save on code duplication. That way, the legacy case becomes simply:

if _format_options['legacy'] == '1.13': if issubclass(arr.dtype.type, flexible): lf = '\n'+' '*len(class_name + "(") else: lf = '' suffix = "{}dtype={})".format(lf, typename) else: # the longer computation of suffix

eric-wieser · 2017-11-17T05:50:27Z

numpy/core/arrayprint.py

+    if last_line_len + len(typename) + len(' dtype=)') > max_line_width:
+        suffix = "\n{}dtype={})".format(' '*len(class_name + "("), typename)
+    else:
+        suffix = " dtype={})".format(typename)


Too much duplication of dtype= here for my liking. If you assembled this in smaller pieces, you wouldn't need to keep building the same piece. So:

prefix = "{}(".format(class_name) suffix = "dtype={})".format(typename)

eric-wieser · 2017-11-17T05:55:44Z

I'd maybe impose a stronger constraint of "If any line wrapped, then wrap the dtype too" in addition to the "if dtype would cause the line to wrap". IMO this:

array([b'1', b'1', b'1', b'1', b'1', b'1', b'1', b'1', b'1', b'1', b'1',
       b'1', b'1', b'1', b'1', b'1', b'1', b'1', b'1'],
      dtype='|S4')

is better than

array([b'1', b'1', b'1', b'1', b'1', b'1', b'1', b'1', b'1', b'1', b'1',
       b'1', b'1', b'1', b'1', b'1', b'1', b'1', b'1'], dtype='|S4')

so essentially, if putting dtype= at the end of the line would cause there to be something above it, put it on a newline instead.

array(["dtype='<U11", "dtype='<U11", "dtype='<U11", "dtype='<U11",
       "dtype='<U11", "dtype='<U11", "dtype='<U11"],
      dtype='<U11')

is much better than

array(["dtype='<U11", "dtype='<U11", "dtype='<U11", "dtype='<U11",
       "dtype='<U11", "dtype='<U11", "dtype='<U11"], dtype='<U11')

too!

ahaldane · 2017-11-18T18:47:25Z

Cleaned up the code, and tried out your suggestion for adding a newline on all multi-line arrays. See the tests for before/after changes.

Both behaviors have benefits and disadvantages. My original one used up fewer lines, but yours more clearly distinguishes the dtype.

eric-wieser · 2017-11-18T19:34:35Z

I don't feel strongly either way on the newline, just though I'd make the suggestion.

Cleanup looks good.

eric-wieser · 2017-11-18T19:37:42Z

numpy/core/arrayprint.py

-        return "%s(%s,%sdtype=%s)" % (class_name, lst, lf, typename)
+            spacer = '\n' + ' '*len(class_name + "(")
+    elif newline_ind == -1 and len(prefix) + len(suffix) + 1 <= max_line_width:
+        spacer = ' '


This isn't quite the rule I was proposing. I was also intending to keep the following:

>>> np.zeros((3, 1), 'f2') array([[0.], [0.], [0.]], dtype=float16)

So the test would be something like

max(len(l) for l in prefix.split('\n')) + len(suffix) + 1 <= max_line_width

ahaldane · 2017-11-19T18:07:27Z

I played around with it some more. I've ultimately set it back to the behavior where it it stays on the last line as long as there is space. I think my slight preference is one less line even if it means the dtype has data above it.

mhvk

This is very nice. I see the argument for both cases, but am happy with the choice of just minimizing space.

One nitpick...

mhvk · 2017-11-19T23:25:11Z

numpy/core/tests/test_arrayprint.py

@@ -459,6 +457,21 @@ def test_legacy_mode_scalars(self):
                     '1.1234567891234568')
        assert_equal(str(np.complex128(complex(1, np.nan))), '(1+nanj)')

+    def test_dtype_linwdith_wrappiing(self):


Two typos in the function name

three, even

eric-wieser · 2017-11-19T23:27:46Z

numpy/core/tests/test_arrayprint.py

@@ -199,8 +198,7 @@ def test_unstructured_void_repr(self):
        assert_equal(str(a[0]), r"b'\x1B\x5B\x32\x4B\x07\x41\x0A\x08'")
        assert_equal(repr(a),
            r"array([b'\x1B\x5B\x32\x4B\x07\x41\x0A\x08'," "\n"
-            r"       b'\x1B\x5B\x33\x31\x6D\x52\x65\x64']," "\n"
-            r"      dtype='|V8')")
+            r"       b'\x1B\x5B\x33\x31\x6D\x52\x65\x64'], dtype='|V8')")


Since you touched it anyway, this test might be a lot clearer with textwrap.dedent(r"""\ ... """) to handle the multiline indented string

Yeah I didn't know about that until I saw you use it in the maskedarray PR. Good idea!

I think this line is better without dedent because it uses the r raw mode, which prevents me from starting the string with """\.

The lines below are better with dedent, I'll change those.

Raw should work with """, some of the docstrings are raw strings.

The problem I'm having is that textwrap.dedent needs the starting newline to be escaped, ie the arg should start with """\ (escaped newline). But in a raw string I can't escape the newline.

For instance the following script:

from __future__ import print_function import textwrap print(textwrap.dedent(r"""\ aaa bbb ccc"""))

prints

\ aaa bbb ccc

(curiously the ipython shell seems to correctly escape the newline, but not the plain python shell nor scripts. Edit: That's actually an ipython bug: ipython/ipython#5828)

~~But anyway, I can fix it by just avoiding the newline... will fix.~~ sorry I got confused.. I still don't see a better solution than what's there.

You're right, I hadn't though about the raw string interfering with the initial newline, Thanks for pointing that out to me before I run into it elsewhere! textwrap.dedent(...).strip() would do the job, but it's not so clear an improvement.

eric-wieser · 2017-11-19T23:28:49Z

numpy/core/tests/test_arrayprint.py

+            "array(['1', '1', '1'], dtype='{}')".format(styp))
+        assert_equal(repr(np.ones(12, dtype=styp)),
+            ("array(['1', '1', '1', '1', '1', '1', '1', '1', '1', '1', '1', '1'],\n"
+             "      dtype='{}')").format(styp))


Ditto for textwrap.dedent here

eric-wieser · 2017-11-19T23:29:42Z

Sticking with what you had is fine by me. Only nits above.

ahaldane · 2017-11-20T01:16:45Z

Updated.

eric-wieser · 2017-11-20T03:37:20Z

numpy/core/arrayprint.py

+
+    # compute whether we should put dtype on a new line: Do so if adding the
+    # dtype would extend the last line past max_line_width.
+    last_line_len = len(prefix) - prefix.rfind('\n') - 1


Is this correct when \n is not in prefix, and rfind returns -1?

I considered that, and I think it's ok:

[1]: f = lambda x: len(x) - x.rfind("\n") - 1 [2]: f("AAA") 3 [3]: f("AA\nA") 1 [4]: f("AA\n") 0

Huh, you're right. Would be a little clearer as len(prefix) - (prefix.rfind('\n') + 1), and could probably do with a comment that -1 is actually being handled correctly.

eric-wieser · 2017-11-20T05:13:44Z

Will go ahead and merge if you can either add a comment about the rfind magic, or rewrite that expression in a less magic way.

Fixes numpy#9717

ahaldane · 2017-11-20T05:22:59Z

Added a comment.

eric-wieser

Feel free to merge once tests pass.

We should probably go over the release notes after all these formatting things are in, but let's leave that till #10058

ahaldane added 01 - Enhancement component: numpy._core labels Nov 15, 2017

ahaldane added this to the 1.14.0 release milestone Nov 15, 2017

ahaldane force-pushed the remove_flexible_newline branch from 36989d7 to 16cedfb Compare November 15, 2017 23:07

eric-wieser reviewed Nov 17, 2017

View reviewed changes

ahaldane force-pushed the remove_flexible_newline branch from 16cedfb to 7b5ed99 Compare November 18, 2017 18:43

ahaldane force-pushed the remove_flexible_newline branch from 7b5ed99 to 0c98980 Compare November 18, 2017 18:56

eric-wieser reviewed Nov 18, 2017

View reviewed changes

ahaldane force-pushed the remove_flexible_newline branch 2 times, most recently from c88d6aa to 1ec0789 Compare November 19, 2017 16:23

ahaldane mentioned this pull request Nov 19, 2017

ENH: Various improvements to Maskedarray repr #9792

Merged

mhvk approved these changes Nov 19, 2017

View reviewed changes

eric-wieser reviewed Nov 19, 2017

View reviewed changes

ahaldane force-pushed the remove_flexible_newline branch from 1ec0789 to 82e15c7 Compare November 20, 2017 01:16

eric-wieser reviewed Nov 20, 2017

View reviewed changes

ahaldane added 2 commits November 20, 2017 00:17

MAINT: Output appropriate newline before dtype in array reprs

46c82c4

Fixes numpy#9717

TST: Update tests for changed flexible newline

25bc65d

ahaldane force-pushed the remove_flexible_newline branch from 82e15c7 to 25bc65d Compare November 20, 2017 05:18

eric-wieser approved these changes Nov 20, 2017

View reviewed changes

eric-wieser merged commit 3d5d12f into numpy:master Nov 20, 2017

Uh oh!

MAINT: Remove newline before dtype in repr of arrays #10032

MAINT: Remove newline before dtype in repr of arrays #10032

Uh oh!

Conversation

ahaldane commented Nov 15, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

eric-wieser Nov 17, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

eric-wieser Nov 17, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

eric-wieser commented Nov 17, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ahaldane commented Nov 18, 2017

Uh oh!

eric-wieser commented Nov 18, 2017

Uh oh!

eric-wieser Nov 18, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ahaldane commented Nov 19, 2017

Uh oh!

mhvk left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

eric-wieser Nov 19, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ahaldane Nov 20, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ahaldane Nov 20, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

eric-wieser Nov 20, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

eric-wieser commented Nov 19, 2017

Uh oh!

ahaldane commented Nov 20, 2017

Uh oh!

eric-wieser Nov 20, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

eric-wieser commented Nov 20, 2017

Uh oh!

ahaldane commented Nov 20, 2017

Uh oh!

eric-wieser left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

ahaldane commented Nov 15, 2017 •

edited

Loading

eric-wieser Nov 17, 2017 •

edited

Loading

eric-wieser Nov 17, 2017 •

edited

Loading

eric-wieser commented Nov 17, 2017 •

edited

Loading

eric-wieser Nov 18, 2017 •

edited

Loading

eric-wieser Nov 19, 2017 •

edited

Loading

ahaldane Nov 20, 2017 •

edited

Loading

ahaldane Nov 20, 2017 •

edited

Loading

eric-wieser Nov 20, 2017 •

edited

Loading

eric-wieser Nov 20, 2017 •

edited

Loading