-
-
Notifications
You must be signed in to change notification settings - Fork 10.8k
MAINT: Remove newline before dtype in repr of arrays #10032
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
36989d7
to
16cedfb
Compare
numpy/core/arrayprint.py
Outdated
@@ -1209,6 +1211,31 @@ def array_repr(arr, max_line_width=None, precision=None, suppress_small=None): | |||
lf = '\n'+' '*len(class_name + "(") | |||
return "%s(%s,%sdtype=%s)" % (class_name, lst, lf, typename) | |||
|
|||
|
|||
if issubclass(arr.dtype.type, flexible): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This entire if
along with prefix
can be placed before the if _format_options['legacy']
to save on code duplication. That way, the legacy case becomes simply:
if _format_options['legacy'] == '1.13':
if issubclass(arr.dtype.type, flexible):
lf = '\n'+' '*len(class_name + "(")
else:
lf = ''
suffix = "{}dtype={})".format(lf, typename)
else:
# the longer computation of suffix
numpy/core/arrayprint.py
Outdated
if last_line_len + len(typename) + len(' dtype=)') > max_line_width: | ||
suffix = "\n{}dtype={})".format(' '*len(class_name + "("), typename) | ||
else: | ||
suffix = " dtype={})".format(typename) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Too much duplication of dtype=
here for my liking. If you assembled this in smaller pieces, you wouldn't need to keep building the same piece. So:
prefix = "{}(".format(class_name)
suffix = "dtype={})".format(typename)
I'd maybe impose a stronger constraint of "If any line wrapped, then wrap the dtype too" in addition to the "if dtype would cause the line to wrap". IMO this: array([b'1', b'1', b'1', b'1', b'1', b'1', b'1', b'1', b'1', b'1', b'1',
b'1', b'1', b'1', b'1', b'1', b'1', b'1', b'1'],
dtype='|S4') is better than array([b'1', b'1', b'1', b'1', b'1', b'1', b'1', b'1', b'1', b'1', b'1',
b'1', b'1', b'1', b'1', b'1', b'1', b'1', b'1'], dtype='|S4') so essentially, if putting array(["dtype='<U11", "dtype='<U11", "dtype='<U11", "dtype='<U11",
"dtype='<U11", "dtype='<U11", "dtype='<U11"],
dtype='<U11') is much better than array(["dtype='<U11", "dtype='<U11", "dtype='<U11", "dtype='<U11",
"dtype='<U11", "dtype='<U11", "dtype='<U11"], dtype='<U11') too! |
16cedfb
to
7b5ed99
Compare
Cleaned up the code, and tried out your suggestion for adding a newline on all multi-line arrays. See the tests for before/after changes. Both behaviors have benefits and disadvantages. My original one used up fewer lines, but yours more clearly distinguishes the dtype. |
7b5ed99
to
0c98980
Compare
I don't feel strongly either way on the newline, just though I'd make the suggestion. Cleanup looks good. |
numpy/core/arrayprint.py
Outdated
return "%s(%s,%sdtype=%s)" % (class_name, lst, lf, typename) | ||
spacer = '\n' + ' '*len(class_name + "(") | ||
elif newline_ind == -1 and len(prefix) + len(suffix) + 1 <= max_line_width: | ||
spacer = ' ' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This isn't quite the rule I was proposing. I was also intending to keep the following:
>>> np.zeros((3, 1), 'f2')
array([[0.],
[0.],
[0.]], dtype=float16)
So the test would be something like
max(len(l) for l in prefix.split('\n')) + len(suffix) + 1 <= max_line_width
c88d6aa
to
1ec0789
Compare
I played around with it some more. I've ultimately set it back to the behavior where it it stays on the last line as long as there is space. I think my slight preference is one less line even if it means the dtype has data above it. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is very nice. I see the argument for both cases, but am happy with the choice of just minimizing space.
One nitpick...
numpy/core/tests/test_arrayprint.py
Outdated
@@ -459,6 +457,21 @@ def test_legacy_mode_scalars(self): | |||
'1.1234567891234568') | |||
assert_equal(str(np.complex128(complex(1, np.nan))), '(1+nanj)') | |||
|
|||
def test_dtype_linwdith_wrappiing(self): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Two typos in the function name
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
three, even
@@ -199,8 +198,7 @@ def test_unstructured_void_repr(self): | |||
assert_equal(str(a[0]), r"b'\x1B\x5B\x32\x4B\x07\x41\x0A\x08'") | |||
assert_equal(repr(a), | |||
r"array([b'\x1B\x5B\x32\x4B\x07\x41\x0A\x08'," "\n" | |||
r" b'\x1B\x5B\x33\x31\x6D\x52\x65\x64']," "\n" | |||
r" dtype='|V8')") | |||
r" b'\x1B\x5B\x33\x31\x6D\x52\x65\x64'], dtype='|V8')") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since you touched it anyway, this test might be a lot clearer with textwrap.dedent(r"""\
... """)
to handle the multiline indented string
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah I didn't know about that until I saw you use it in the maskedarray PR. Good idea!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this line is better without dedent because it uses the r
raw mode, which prevents me from starting the string with """\
.
The lines below are better with dedent, I'll change those.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Raw should work with """
, some of the docstrings are raw strings.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The problem I'm having is that textwrap.dedent
needs the starting newline to be escaped, ie the arg should start with """\
(escaped newline). But in a raw string I can't escape the newline.
For instance the following script:
from __future__ import print_function
import textwrap
print(textwrap.dedent(r"""\
aaa
bbb
ccc"""))
prints
\
aaa
bbb
ccc
(curiously the ipython shell seems to correctly escape the newline, but not the plain python shell nor scripts. Edit: That's actually an ipython bug: ipython/ipython#5828)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But anyway, I can fix it by just avoiding the newline... will fix. sorry I got confused.. I still don't see a better solution than what's there.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You're right, I hadn't though about the raw string interfering with the initial newline, Thanks for pointing that out to me before I run into it elsewhere! textwrap.dedent(...).strip()
would do the job, but it's not so clear an improvement.
numpy/core/tests/test_arrayprint.py
Outdated
"array(['1', '1', '1'], dtype='{}')".format(styp)) | ||
assert_equal(repr(np.ones(12, dtype=styp)), | ||
("array(['1', '1', '1', '1', '1', '1', '1', '1', '1', '1', '1', '1'],\n" | ||
" dtype='{}')").format(styp)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ditto for textwrap.dedent
here
Sticking with what you had is fine by me. Only nits above. |
1ec0789
to
82e15c7
Compare
Updated. |
numpy/core/arrayprint.py
Outdated
|
||
# compute whether we should put dtype on a new line: Do so if adding the | ||
# dtype would extend the last line past max_line_width. | ||
last_line_len = len(prefix) - prefix.rfind('\n') - 1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this correct when \n
is not in prefix
, and rfind
returns -1
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I considered that, and I think it's ok:
[1]: f = lambda x: len(x) - x.rfind("\n") - 1
[2]: f("AAA")
3
[3]: f("AA\nA")
1
[4]: f("AA\n")
0
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Huh, you're right. Would be a little clearer as len(prefix) - (prefix.rfind('\n') + 1)
, and could probably do with a comment that -1 is actually being handled correctly.
Will go ahead and merge if you can either add a comment about the |
82e15c7
to
25bc65d
Compare
Added a comment. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Feel free to merge once tests pass.
We should probably go over the release notes after all these formatting things are in, but let's leave that till #10058
Fixes #9717
This PR more carefully chooses whether to put a newline before the
dtype=
part of ndarray reprs. If adding the dtype would put the last line of output past themax_line_width
formatter option, then put the dtype on the new line. Otherwise keep the dtype on the same line. Supports 1.13 legacy mode (must merge #10030 first).The old behavior was to always keep the dtype on the same line for non-flexible-typed arrays (sometimes going past the
max_line_width
), and always put it on a new line for flexible-typed arrays.In the output below I've wrapped the lines as if in an 80-char terminal.
Old behavior:
New behavior:
Relatedly, I think that a lot of the
linewidth
related code should be rewritten, because there are lots of cases it can go past the user-requested line width. For instance, if you look in_formatArray
you can see it does not take into account trailing commas, or trailing]
, so these can go past themax_line_width
. But I am somewhat loath to try to fix that right now. Note that the defaultlinewidth
is 75 instead of the standard 80, which I might guess is because of these problems.Maybe we can leave those problems alone, and just change the newline for now?