Skip to content

MAINT: Remove newline before dtype in repr of arrays #10032

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
Nov 20, 2017

Conversation

ahaldane
Copy link
Member

@ahaldane ahaldane commented Nov 15, 2017

Fixes #9717

This PR more carefully chooses whether to put a newline before the dtype= part of ndarray reprs. If adding the dtype would put the last line of output past the max_line_width formatter option, then put the dtype on the new line. Otherwise keep the dtype on the same line. Supports 1.13 legacy mode (must merge #10030 first).

The old behavior was to always keep the dtype on the same line for non-flexible-typed arrays (sometimes going past the max_line_width), and always put it on a new line for flexible-typed arrays.

In the output below I've wrapped the lines as if in an 80-char terminal.

Old behavior:

>>> np.arange(10,20,dtype='f4')
array([10., 11., 12., 13., 14., 15., 16., 17., 18., 19.], dtype=float32)
>>> np.arange(10,24., dtype='f4')
array([10., 11., 12., 13., 14., 15., 16., 17., 18., 19., 20., 21., 22., 23.], dt
ype=float32)
>>> np.ones(3, dtype='S4')
array(['1', '1', '1'],
      dtype='|S4')
>>> np.ones(11, dtype='S4')
array(['1', '1', '1', '1', '1', '1', '1', '1', '1', '1', '1'],
      dtype='|S4')
>>> np.ones(12, dtype='S4')
array(['1', '1', '1', '1', '1', '1', '1', '1', '1', '1', '1', '1'],
      dtype='|S4')

New behavior:

>>> np.arange(10,20., dtype='f4')
array([10., 11., 12., 13., 14., 15., 16., 17., 18., 19.], dtype=float32)
>>> np.arange(10,24., dtype='f4')
array([10., 11., 12., 13., 14., 15., 16., 17., 18., 19., 20., 21., 22., 23.],
      dtype=float32)
>>> np.ones(3, dtype='S4')
array(['1', '1', '1'], dtype='|S4')
>>> np.ones(11, dtype='S4')
array(['1', '1', '1', '1', '1', '1', '1', '1', '1', '1', '1'], dtype='|S4')
>>> np.ones(12, dtype='S4')
array(['1', '1', '1', '1', '1', '1', '1', '1', '1', '1', '1', '1'],
      dtype='|S4')

Relatedly, I think that a lot of the linewidth related code should be rewritten, because there are lots of cases it can go past the user-requested line width. For instance, if you look in _formatArray you can see it does not take into account trailing commas, or trailing ], so these can go past the max_line_width. But I am somewhat loath to try to fix that right now. Note that the default linewidth is 75 instead of the standard 80, which I might guess is because of these problems.

Maybe we can leave those problems alone, and just change the newline for now?

@ahaldane ahaldane added this to the 1.14.0 release milestone Nov 15, 2017
@ahaldane ahaldane force-pushed the remove_flexible_newline branch from 36989d7 to 16cedfb Compare November 15, 2017 23:07
@@ -1209,6 +1211,31 @@ def array_repr(arr, max_line_width=None, precision=None, suppress_small=None):
lf = '\n'+' '*len(class_name + "(")
return "%s(%s,%sdtype=%s)" % (class_name, lst, lf, typename)


if issubclass(arr.dtype.type, flexible):
Copy link
Member

@eric-wieser eric-wieser Nov 17, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This entire if along with prefix can be placed before the if _format_options['legacy'] to save on code duplication. That way, the legacy case becomes simply:

if _format_options['legacy'] == '1.13':
    if issubclass(arr.dtype.type, flexible):
        lf =  '\n'+' '*len(class_name + "(")
    else:
        lf = ''
    suffix = "{}dtype={})".format(lf, typename)
else:
    # the longer computation of suffix

if last_line_len + len(typename) + len(' dtype=)') > max_line_width:
suffix = "\n{}dtype={})".format(' '*len(class_name + "("), typename)
else:
suffix = " dtype={})".format(typename)
Copy link
Member

@eric-wieser eric-wieser Nov 17, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Too much duplication of dtype= here for my liking. If you assembled this in smaller pieces, you wouldn't need to keep building the same piece. So:

prefix = "{}(".format(class_name)
suffix = "dtype={})".format(typename)

@eric-wieser
Copy link
Member

eric-wieser commented Nov 17, 2017

I'd maybe impose a stronger constraint of "If any line wrapped, then wrap the dtype too" in addition to the "if dtype would cause the line to wrap". IMO this:

array([b'1', b'1', b'1', b'1', b'1', b'1', b'1', b'1', b'1', b'1', b'1',
       b'1', b'1', b'1', b'1', b'1', b'1', b'1', b'1'],
      dtype='|S4')

is better than

array([b'1', b'1', b'1', b'1', b'1', b'1', b'1', b'1', b'1', b'1', b'1',
       b'1', b'1', b'1', b'1', b'1', b'1', b'1', b'1'], dtype='|S4')

so essentially, if putting dtype= at the end of the line would cause there to be something above it, put it on a newline instead.


array(["dtype='<U11", "dtype='<U11", "dtype='<U11", "dtype='<U11",
       "dtype='<U11", "dtype='<U11", "dtype='<U11"],
      dtype='<U11')

is much better than

array(["dtype='<U11", "dtype='<U11", "dtype='<U11", "dtype='<U11",
       "dtype='<U11", "dtype='<U11", "dtype='<U11"], dtype='<U11')

too!

@ahaldane ahaldane force-pushed the remove_flexible_newline branch from 16cedfb to 7b5ed99 Compare November 18, 2017 18:43
@ahaldane
Copy link
Member Author

Cleaned up the code, and tried out your suggestion for adding a newline on all multi-line arrays. See the tests for before/after changes.

Both behaviors have benefits and disadvantages. My original one used up fewer lines, but yours more clearly distinguishes the dtype.

@ahaldane ahaldane force-pushed the remove_flexible_newline branch from 7b5ed99 to 0c98980 Compare November 18, 2017 18:56
@eric-wieser
Copy link
Member

I don't feel strongly either way on the newline, just though I'd make the suggestion.

Cleanup looks good.

return "%s(%s,%sdtype=%s)" % (class_name, lst, lf, typename)
spacer = '\n' + ' '*len(class_name + "(")
elif newline_ind == -1 and len(prefix) + len(suffix) + 1 <= max_line_width:
spacer = ' '
Copy link
Member

@eric-wieser eric-wieser Nov 18, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This isn't quite the rule I was proposing. I was also intending to keep the following:

>>> np.zeros((3, 1), 'f2')
array([[0.],
       [0.],
       [0.]], dtype=float16)

So the test would be something like

max(len(l) for l in prefix.split('\n')) + len(suffix) + 1 <= max_line_width

@ahaldane ahaldane force-pushed the remove_flexible_newline branch 2 times, most recently from c88d6aa to 1ec0789 Compare November 19, 2017 16:23
@ahaldane
Copy link
Member Author

I played around with it some more. I've ultimately set it back to the behavior where it it stays on the last line as long as there is space. I think my slight preference is one less line even if it means the dtype has data above it.

Copy link
Contributor

@mhvk mhvk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is very nice. I see the argument for both cases, but am happy with the choice of just minimizing space.

One nitpick...

@@ -459,6 +457,21 @@ def test_legacy_mode_scalars(self):
'1.1234567891234568')
assert_equal(str(np.complex128(complex(1, np.nan))), '(1+nanj)')

def test_dtype_linwdith_wrappiing(self):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Two typos in the function name

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

three, even

@@ -199,8 +198,7 @@ def test_unstructured_void_repr(self):
assert_equal(str(a[0]), r"b'\x1B\x5B\x32\x4B\x07\x41\x0A\x08'")
assert_equal(repr(a),
r"array([b'\x1B\x5B\x32\x4B\x07\x41\x0A\x08'," "\n"
r" b'\x1B\x5B\x33\x31\x6D\x52\x65\x64']," "\n"
r" dtype='|V8')")
r" b'\x1B\x5B\x33\x31\x6D\x52\x65\x64'], dtype='|V8')")
Copy link
Member

@eric-wieser eric-wieser Nov 19, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since you touched it anyway, this test might be a lot clearer with textwrap.dedent(r"""\ ... """) to handle the multiline indented string

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah I didn't know about that until I saw you use it in the maskedarray PR. Good idea!

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this line is better without dedent because it uses the r raw mode, which prevents me from starting the string with """\.

The lines below are better with dedent, I'll change those.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Raw should work with """, some of the docstrings are raw strings.

Copy link
Member Author

@ahaldane ahaldane Nov 20, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The problem I'm having is that textwrap.dedent needs the starting newline to be escaped, ie the arg should start with """\ (escaped newline). But in a raw string I can't escape the newline.

For instance the following script:

from __future__ import print_function
import textwrap
print(textwrap.dedent(r"""\
    aaa
    bbb
    ccc"""))

prints

\
    aaa
    bbb
    ccc

(curiously the ipython shell seems to correctly escape the newline, but not the plain python shell nor scripts. Edit: That's actually an ipython bug: ipython/ipython#5828)

Copy link
Member Author

@ahaldane ahaldane Nov 20, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But anyway, I can fix it by just avoiding the newline... will fix. sorry I got confused.. I still don't see a better solution than what's there.

Copy link
Member

@eric-wieser eric-wieser Nov 20, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're right, I hadn't though about the raw string interfering with the initial newline, Thanks for pointing that out to me before I run into it elsewhere! textwrap.dedent(...).strip() would do the job, but it's not so clear an improvement.

"array(['1', '1', '1'], dtype='{}')".format(styp))
assert_equal(repr(np.ones(12, dtype=styp)),
("array(['1', '1', '1', '1', '1', '1', '1', '1', '1', '1', '1', '1'],\n"
" dtype='{}')").format(styp))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ditto for textwrap.dedent here

@eric-wieser
Copy link
Member

Sticking with what you had is fine by me. Only nits above.

@ahaldane ahaldane force-pushed the remove_flexible_newline branch from 1ec0789 to 82e15c7 Compare November 20, 2017 01:16
@ahaldane
Copy link
Member Author

Updated.


# compute whether we should put dtype on a new line: Do so if adding the
# dtype would extend the last line past max_line_width.
last_line_len = len(prefix) - prefix.rfind('\n') - 1
Copy link
Member

@eric-wieser eric-wieser Nov 20, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this correct when \n is not in prefix, and rfind returns -1?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I considered that, and I think it's ok:

[1]: f = lambda x: len(x) - x.rfind("\n") - 1
[2]: f("AAA")
3
[3]: f("AA\nA")
1
[4]: f("AA\n")
0

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Huh, you're right. Would be a little clearer as len(prefix) - (prefix.rfind('\n') + 1), and could probably do with a comment that -1 is actually being handled correctly.

@eric-wieser
Copy link
Member

Will go ahead and merge if you can either add a comment about the rfind magic, or rewrite that expression in a less magic way.

@ahaldane ahaldane force-pushed the remove_flexible_newline branch from 82e15c7 to 25bc65d Compare November 20, 2017 05:18
@ahaldane
Copy link
Member Author

Added a comment.

Copy link
Member

@eric-wieser eric-wieser left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Feel free to merge once tests pass.

We should probably go over the release notes after all these formatting things are in, but let's leave that till #10058

@eric-wieser eric-wieser merged commit 3d5d12f into numpy:master Nov 20, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

remove newline in repr of arrays with flexible datatype?
4 participants