Skip to content

ENH: remove unneeded spaces in float/bool reprs, fixes 0d str #9139

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 6 commits into from
Sep 26, 2017

Conversation

ahaldane
Copy link
Member

See discussion in #8983

This removes the leading space in the string representation of arrays of floating-point type if it is not needed (ie, all positive values), eg:

old behavior:

>>> np.array([1., 2., 3.])
array([ 1.,  2.,  3.])

New behavior:

>>> np.array([1., 2., 3.])
array([1., 2., 3.])

This noticebly improves the appearance of 0d arrays, so that np.array(1.0) prints as itself, instead of as np.array( 1.0) as happens after #8983.

It does the same for boolean arrays with only one element, so 0d boolean arrays print as array(True, dtype=bool) instead of array( True, dtype=bool).

(longfloats still have a space I think, but I would rather not tackle that here because I think it requires a big rewrite of longfloat formatter)

@@ -624,6 +618,7 @@ def fillFormat(self, data):
from . import numeric as _nc

with _nc.errstate(all='ignore'):
hassign = self.sign or any(data < 0)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should be _nc.any

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And probably has_sign not hassign

@ahaldane ahaldane force-pushed the print_no_leading_space branch 2 times, most recently from 3d81fd9 to 53cfc62 Compare May 18, 2017 20:58
@charris
Copy link
Member

charris commented May 19, 2017

This could break doc tests. Doc tests are evil, but some use them anyway...

@ahaldane
Copy link
Member Author

@mhvk just for your info as we decide how to make the repr changes, I ran this PR though astropy tests, and it messes up a lot of tests:

79 failed, 11325 passed, 574 skipped, 68 xfailed, 1 xpassed, 1 pytest-warnings

Most of them are things like

Expected:
    <Quantity 15.0 m / s>
Got:
    <Quantity 15. m / s>

or

Expected:
    array([ 0. ,  0.4,  0.6,  0.8,  1. ])
Got:
    array([0. , 0.4, 0.6, 0.8, 1. ])

@mhvk
Copy link
Contributor

mhvk commented May 19, 2017

@ahaldane - yes, but I've already got a PR that fixes those tests: astropy/astropy#6090; and am happy to fix things further if we go that route, as I think this PR made printing better in general. Just don't want to have to do it for each numpy version!

@mhvk
Copy link
Contributor

mhvk commented May 19, 2017

@ahaldane - I clearly did not pay much attention to the PR in which you commented... Even up to redoing your work here in #9144 (now closed). In any case, my comment above stands, if slightly differently: I gladly fix the astropy doctests if we end up with more consistent numpy printing -- after all, I think that after this it is not going to change again... But ideally we do make all changes in one go for 1.14!

@mhvk
Copy link
Contributor

mhvk commented May 21, 2017

If we try not to break too many things, one option for this would be to expose the sign option in set_printoptions, maybe having as options ' ', '+', '' (i.e., match what is used by the format mini-language). Though it would be a bit annoying that there is no way not to break compatibility if this is applied to both integers and float, since for integer the default would have to be '', while for floats it would be ' '.

@ahaldane
Copy link
Member Author

ahaldane commented May 24, 2017

Updated based on discussion in #9143.

Also, while we are annoying everyone by changing the spaces in array printing, we should try to do it as thoroughly as possible now to avoid any future changes. Specifically, I'd like to add another commit or two to 1. remove spaces from longfloat, and 2. I think the repr of np.array([100., 2., -1.]) can still be improved (see unit test for this): We should better account for whether the sign is on the longest value or not, currently this PR still has an unneeded space in this case.

(I also still need to update the release notes and other little things)

assert_equal(str(np.array('café', np.unicode_)), 'café')
assert_equal(repr(np.array('café', np.unicode_)),
"array('café',\n dtype='<U4')")
assert_equal(str(np.array('café', np.unicode_)), 'café')
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This case is identical to the one two above it, isn't it?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmf not sure how that happened.. will fix


# 0d arrays are *not* affected by printoptions
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This comment is now wrong!

@ahaldane ahaldane force-pushed the print_no_leading_space branch 2 times, most recently from dbec3f1 to 85a8c45 Compare May 31, 2017 16:43
@ahaldane
Copy link
Member Author

Updated.

I updated the treatment of signed values so the extra whitespace for the sign only appears if the longest elements have a sign. Release note and tests also updated.

I haven't updated the longfloat spacing, but I think I want to leave that for another possible PR. I'd like to get this one through so upstream projects can get their tests to stop complaining.

@ahaldane ahaldane force-pushed the print_no_leading_space branch from 85a8c45 to 5767ac1 Compare May 31, 2017 17:17
Copy link
Contributor

@mhvk mhvk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, modulo a few text issues in the changelog, and an unneeded list construction (which was there already, but might as well get rid of it).

``np.set_string_function``, unlike most other numpy scalars. This is no longer
the case.

removed an unneeded whitespace in float and bool array printing
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Idem, and also delete "an"

Has no effect, do not use.

.. deprecated:: 1.14.0
outputA
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Piece missing here. Change to "output correctly."?

if len(non_zero):
precision = max([_digits(x, self.precision, format)
for x in non_zero])
precision = max([_trailing_nonzero(x, self.precision)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No need to construct a list inside max -- btw, unrelated to this PR, but this seems a really slow way for large arrays...

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah I worried about that too, but then I worked out that data is going to be a small array.

data only contains the elements that are actually printed on the screen (ie, not the interior which is replaced by ...)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, that makes sense.

----------------------------------------------------
0d arrays now use the array2string formatters to print their elements, like
other arrays. The `style` argument of array2string is now non-functional.
printing of 0d arrays changed to be more consistent with scalars/ndarrays
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Trivial, but capitalize first word.

``None`` now prints the array using the ``formatter`` specified using
``np.set_printoptions``, otherwise ``style`` should be a function which accepts
a numpy scalar and returns a string, and ``style(a[()])`` is returned. The
default value is now ``style=None``.

integer scalars are now unaffected by ``np.set_string_function``
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Idem.

for x in non_zero])
precision = max([_trailing_nonzero(x, self.precision)
for x in abs_non_zero])
precision = min(self.precision, precision)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, I think this line is unnecessary, will remove.

@ahaldane ahaldane force-pushed the print_no_leading_space branch from 5767ac1 to d13d48e Compare May 31, 2017 17:55
@ahaldane
Copy link
Member Author

All right, reworked the trailing_nonzero part, hopefully the code is not too dense.

@mhvk
Copy link
Contributor

mhvk commented May 31, 2017

Looks good to me.

@mhvk
Copy link
Contributor

mhvk commented Jun 1, 2017

I'd be quite happy to merge this, as it seems a follow-up of gh-8983, and finishes making the print options much more logical. Obviously, this will break more doctests, but it does seem best to at least do it in one go. @eric-wieser, @charris: any objections to merging this?

}
return ret;

item_str = PyObject_Str(item);
Copy link
Member

@eric-wieser eric-wieser Jun 1, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As I think I said before, this isn't safe, since item can (and likely will for custom dtypes) return a copy of self, causing infinite recursion.

What actually falls back to this case anyway?
Seems to me that it would be better to just inherit object.__str__, making if obvious that custom dtype users have work to do.

Copy link
Member Author

@ahaldane ahaldane Jun 1, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, you are talking about this comment.

Currently, besides potential user-defined types, only numpy only void types fall back to this. But they no longer will after #8981.

I just tried your suggestion by simply removing gentype_repr and gentype_str, in the last commit pushed here.

Locally (not in the pushed commit) I also temporarily modified test_rational.c to remove the str and repr of rational methods as a check, and can confirm I get reasonable behavior:

>>> from numpy.core.test_rational import rational
>>> a = np.array([1,2,3], dtype=rational)
>>> a
array([<rational object at 0x7f37f1b4e918>,
       <rational object at 0x7f37f1b4e918>,
       <rational object at 0x7f37f1b4e918>], dtype=rational)

I'm not sure how to implement a test of this without recreating a new test_rational type missing the str and repr, which seems a bit heavy-handed.

@@ -641,7 +635,8 @@ def fillFormat(self, data):

if self.exp_format:
self.large_exponent = 0 < min_val < 1e-99 or max_val >= 1e100
self.max_str_len = 8 + self.precision
pad_sign = self.sign or _nc.any(non_zero < 0)
self.max_str_len = 6 + self.precision + pad_sign
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why did this decrease by one?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The simple answer is that there appeared to be an unnecessary space.

The more complex answer is that actually I'm not sure why that extra space was there... I'll take another look.

Copy link
Member Author

@ahaldane ahaldane Jun 1, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After investigation, I still don't understand why the extra space was there.

I count "6" extra characters more than the floating part and optional sign: x.e+xx where x are digits.

@ahaldane ahaldane force-pushed the print_no_leading_space branch from fd4ea55 to 2cbe4c4 Compare September 21, 2017 00:12
min_val = minimum.reduce(non_zero)
max_val = np.max(abs_non_zero)
min_val = np.min(abs_non_zero)
min_val_sgn = np.min(non_zero)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems a bit odd that we're importing np directly for these, yet importing other stuff indirectly. Not something you need to fix in this PR though

Copy link
Member Author

@ahaldane ahaldane Sep 21, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, looks like all the indirect imports can be replaced by the single np import - I just tried it, seems fine.

I won't do that in this PR though.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've never understood quite how, or why, that worked. But it generally does.

Copy link
Member Author

@ahaldane ahaldane Sep 21, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it works because we never use or call any numpy functions or variables in this file at import time, only at runtime.

At the moment arrayprint is imported from the main__init__ file, the numpy module has been created but not populated, so we are allowed to import numpy, but not allowed refer to any of its functions or variables. However, we can use it inside of function definitions since by the time they are called the numpy module will have been populated.

(Actually, I see a few exceptions: At module import time we call multiarray.set_string_function and we refer to int_, float_, complex_, intc and longlong. Therefore these will still need to be directly imported).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, but it might be safer to leave these all the way they are: What if another module calls arrayprint functions at import time? Then we would have a problem. Ideally no other modules would, though.

(And this isn't theoretical: We recently had a problem because getlimits.pywas calling array2string at module import time, though we've now fixed it so it no longer does (#9113))

def __init__(self, data, **kwargs):
# add an extra space so " True" and "False" have the same length and
# array elements align nicely when printed, but only for arrays with
# more than one element (0d and nd)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comment no longer matches code

@ahaldane ahaldane force-pushed the print_no_leading_space branch from 2cbe4c4 to 7be5048 Compare September 21, 2017 14:34
@ahaldane
Copy link
Member Author

Fixed the comment, plus the two doctrings describing the sign argument.

Copy link
Member

@eric-wieser eric-wieser left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Thanks for the iteration.

Let's leave this a day or two to collect any final comments

@eric-wieser
Copy link
Member

@mhvk: good to go from your perspective?

@mhvk
Copy link
Contributor

mhvk commented Sep 25, 2017

Looked over it once more and, yes, good to go! Though we should ensure that LongFloat gets done before 1.14 as well...

@ahaldane ahaldane force-pushed the print_no_leading_space branch from 7be5048 to 5810349 Compare September 25, 2017 15:49
@ahaldane
Copy link
Member Author

ahaldane commented Sep 25, 2017

I just tweaked the release note a little. I changed the title about Unneeded whitespace in float and bool array printing removed to Unneeded whitespace in float array printing removed and edited the note.

@ahaldane
Copy link
Member Author

Otherwise, I checked over it again, I don't have more changes.

@eric-wieser
Copy link
Member

I intend to put this in just to see who screams ;)

@charris, I think it's screaming time

@ahaldane ahaldane force-pushed the print_no_leading_space branch from 5810349 to 710e032 Compare September 26, 2017 01:25
@ahaldane
Copy link
Member Author

Rebased for conflicts with #9688

@charris charris merged commit d8be978 into numpy:master Sep 26, 2017
@charris
Copy link
Member

charris commented Sep 26, 2017

In it goes. Good job all.

@charris
Copy link
Member

charris commented Sep 26, 2017

There look to be four remaining PR relevant to this. It would be helpful if folks could keep Allan's list current and close any PRs that are no longer relevant.

xiaoqiangwang added a commit to CaChannel/CaChannel that referenced this pull request May 22, 2018
numpy 1.14 "fixes" the extra space in float array prints.(numpy/numpy#9139)
Use char type to keep back/forward compatible.
bchretien added a commit to bchretien/roboptim-core-python that referenced this pull request Jul 4, 2018
Before:

[[ 0.  0.  0. ]
 [ 0.  0.  0. ]
 [ 0.  0.  0. ]]

With recent NumPy:

[[0.  0.  0.]
 [0.  0.  0.]
 [0.  0.  0.]]

See numpy/numpy#9139
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

10 participants