Skip to content

WIP: MAINT: print 0d arrays using scalar str/repr #9143

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 2 commits into from

Conversation

ahaldane
Copy link
Member

@ahaldane ahaldane commented May 19, 2017

Continued from discussion at #8983, to investigate option 3:

  • The old behavior for 0d arrays was to print str(x.item()).
  • This PR's behavior is to print arrayprint.formatter(x).
  • A third unexplored option is to print str(x), ie rely on the string functions in scalartypes.c.src.

Two further comments on option 3:

For option 3 to work will need to modify the scalar fallbacks gentype_repr and gentype_str to not promote to 0d arrays, to avoid recursion issues noted in that PR. Here I made them print str(x.item()). After the proposed definitions of genint_type_repr and voidtype_repr in #8981 and #8983, gentype_repr and str will be unused by numpy itself so this shouldn't cause problems.

I am also having trouble deciding between str(x[()]) or repr(x[()]) for 0d arrays (see the line changed in _formatArray). Both of them break current behavior. I'm still working on this but I'll paste the current failed tests in comments below.

@ahaldane
Copy link
Member Author

ahaldane commented May 19, 2017

Using str(a[()]), I get 5 failues:

click to view
======================================================================
FAIL: test_structure_format (test_arrayprint.TestArray2String)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/allan/nmpy_dev/build/testenv/lib/python3.6/site-packages/numpy/core/tests/test_arrayprint.py", line 183, in test_structure_format
    assert_equal(np.array2string(array_scalar), "( 1.,  2.12345679,  3.)")
  File "/home/allan/nmpy_dev/build/testenv/lib/python3.6/site-packages/numpy/testing/utils.py", line 416, in assert_equal
    raise AssertionError(msg)
AssertionError: 
Items are not equal:
 ACTUAL: '(1.0, 2.1234567890123457, 3.0)'
 DESIRED: '( 1.,  2.12345679,  3.)'

======================================================================
FAIL: test_arrayprint.TestArrayRepr.test_self_containing
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/usr/lib/python3.6/site-packages/nose/case.py", line 197, in runTest
    self.test(*self.arg)
  File "/home/allan/nmpy_dev/build/testenv/lib/python3.6/site-packages/numpy/core/tests/test_arrayprint.py", line 41, in test_self_containing
    'array(array(..., dtype=object), dtype=object)')
  File "/home/allan/nmpy_dev/build/testenv/lib/python3.6/site-packages/numpy/testing/utils.py", line 416, in assert_equal
    raise AssertionError(msg)
AssertionError: 
Items are not equal:
 ACTUAL: 'array(..., dtype=object)'
 DESIRED: 'array(array(..., dtype=object), dtype=object)'

======================================================================
FAIL: test_arrayprint.TestPrintOptions.test_0d_arrays
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/usr/lib/python3.6/site-packages/nose/case.py", line 197, in runTest
    self.test(*self.arg)
  File "/home/allan/nmpy_dev/build/testenv/lib/python3.6/site-packages/numpy/core/tests/test_arrayprint.py", line 241, in test_0d_arrays
    "array('2005-02-25', dtype='datetime64[D]')")
  File "/home/allan/nmpy_dev/build/testenv/lib/python3.6/site-packages/numpy/testing/utils.py", line 416, in assert_equal
    raise AssertionError(msg)
AssertionError: 
Items are not equal:
 ACTUAL: "array(2005-02-25, dtype='datetime64[D]')"
 DESIRED: "array('2005-02-25', dtype='datetime64[D]')"

======================================================================
FAIL: test_fancy_printoptions (test_core.TestMaskedArray)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/allan/nmpy_dev/build/testenv/lib/python3.6/site-packages/numpy/ma/tests/test_core.py", line 700, in test_fancy_printoptions
    assert_equal(str(t_2d0), control)
  File "/home/allan/nmpy_dev/build/testenv/lib/python3.6/site-packages/numpy/ma/testutils.py", line 130, in assert_equal
    raise AssertionError(msg)
AssertionError: 
Items are not equal:
 ACTUAL: '(0, array([[--, 0.0, --],\n       [0.0, 0.0, --]], dtype=object), 0.0)'
 DESIRED: '(0, [[--, 0.0, --], [0.0, 0.0, --]], 0.0)'

======================================================================
FAIL: test_mvoid_multidim_print (test_core.TestMaskedArray)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/allan/nmpy_dev/build/testenv/lib/python3.6/site-packages/numpy/ma/tests/test_core.py", line 811, in test_mvoid_multidim_print
    assert_(str(t_ma[0]) == "([1, --, 3],)")
  File "/home/allan/nmpy_dev/build/testenv/lib/python3.6/site-packages/numpy/testing/utils.py", line 92, in assert_
    raise AssertionError(smsg)
AssertionError

(note that the third test is not too important since it was introduced in #8983)

@ahaldane
Copy link
Member Author

ahaldane commented May 19, 2017

Using repr(a[()]) I get 4 failures:

click to view ``` ====================================================================== FAIL: test_structure_format (test_arrayprint.TestArray2String) ---------------------------------------------------------------------- Traceback (most recent call last): File "/home/allan/nmpy_dev/build/testenv/lib/python3.6/site-packages/numpy/core/tests/test_arrayprint.py", line 183, in test_structure_format assert_equal(np.array2string(array_scalar), "( 1., 2.12345679, 3.)") File "/home/allan/nmpy_dev/build/testenv/lib/python3.6/site-packages/numpy/testing/utils.py", line 416, in assert_equal raise AssertionError(msg) AssertionError: Items are not equal: ACTUAL: '(1.0, 2.1234567890123457, 3.0)' DESIRED: '( 1., 2.12345679, 3.)'

======================================================================
FAIL: test_arrayprint.TestPrintOptions.test_0d_arrays

Traceback (most recent call last):
File "/usr/lib/python3.6/site-packages/nose/case.py", line 197, in runTest
self.test(*self.arg)
File "/home/allan/nmpy_dev/build/testenv/lib/python3.6/site-packages/numpy/core/tests/test_arrayprint.py", line 241, in test_0d_arrays
"array('2005-02-25', dtype='datetime64[D]')")
File "/home/allan/nmpy_dev/build/testenv/lib/python3.6/site-packages/numpy/testing/utils.py", line 416, in assert_equal
raise AssertionError(msg)
AssertionError:
Items are not equal:
ACTUAL: "array(numpy.datetime64('2005-02-25'), dtype='datetime64[D]')"
DESIRED: "array('2005-02-25', dtype='datetime64[D]')"

======================================================================
FAIL: test_fancy_printoptions (test_core.TestMaskedArray)

Traceback (most recent call last):
File "/home/allan/nmpy_dev/build/testenv/lib/python3.6/site-packages/numpy/ma/tests/test_core.py", line 700, in test_fancy_printoptions
assert_equal(str(t_2d0), control)
File "/home/allan/nmpy_dev/build/testenv/lib/python3.6/site-packages/numpy/ma/testutils.py", line 130, in assert_equal
raise AssertionError(msg)
AssertionError:
Items are not equal:
ACTUAL: '(0, array([[--, 0.0, --],\n [0.0, 0.0, --]], dtype=object), 0.0)'
DESIRED: '(0, [[--, 0.0, --], [0.0, 0.0, --]], 0.0)'

======================================================================
FAIL: test_mvoid_multidim_print (test_core.TestMaskedArray)

Traceback (most recent call last):
File "/home/allan/nmpy_dev/build/testenv/lib/python3.6/site-packages/numpy/ma/tests/test_core.py", line 811, in test_mvoid_multidim_print
assert_(str(t_ma[0]) == "([1, --, 3],)")
File "/home/allan/nmpy_dev/build/testenv/lib/python3.6/site-packages/numpy/testing/utils.py", line 92, in assert_
raise AssertionError(smsg)
AssertionError


</details> 

@charris
Copy link
Member

charris commented May 19, 2017

For reference, this did fix the scipy problem when I tested it.

@mhvk
Copy link
Contributor

mhvk commented May 19, 2017

I see that str(x[()] also changed the structured array scalar change format... Avoid that! I think that change was absolutely the correct one and I don't want to change back the many changes I had to made to the astropy doctests...

@charris
Copy link
Member

charris commented May 19, 2017

Note that is is also being fixed on the scipy end scipy/scipy#7419, but that won't help in the near future as we need to maintain backwards compatibility.

@eric-wieser
Copy link
Member

eric-wieser commented May 19, 2017

Not a fan of any of those changes. Really, I think there are only four cases that need to be fixed, concerning letting 0d arrays decay to string scalars (s = 'test'; u = 'café'):

  • In python 3:
    • str(np.array(u, np.unicode_)) == u
  • In python 2:
    • str(np.array(s, np.str_)) == s
    • str(np.array(u, np.unicode_)) == str(u) - incur the broken python 2 unicode-to-string coercion
    • unicode(np.array(u, np.unicode_)) == u

These should be unaffected by set_string_function.

The last one is already broken in 1.12

@pv
Copy link
Member

pv commented May 19, 2017

The "fix" on the scipy end is more of a code cleanup, which also happens to be workaround for this issue. But it is a backward-incompatible change and probably regression in numpy.

@ahaldane
Copy link
Member Author

All right, got this into a version that hopefully passes tests (including some of the behavior we discussed which was not previously tested),

This PR brings the state of things back to something closer to 1.13, while avoiding the cluncky str(a.item() behavior 0d arrays used to have. Instead of having scalars, 0d arrays, and 1+d arrays all print their elements differently like in 1.13, now 0d arrays essentially print the same as scalars.

The solutions were 1. to reintroduce the style argument to arrays2string, as it seems to be essential to correctly account for the unicode/string cases @eric-wieser menioned, and 2. special-case structured datatypes.

As @eric-wieser wieser noted, there is still a bunch of broken behavior involving unicode in python2, but those are very old problems. I also noticed another very old problem, with repr(np.zeros(2,np.dtype(('i8', 'i4,i4')))) which raises TypeError: zip argument #1 must support iteration.

@ahaldane ahaldane force-pushed the alternate_0d_repr branch 2 times, most recently from 6baad40 to 8081501 Compare May 20, 2017 20:05
ahaldane added 2 commits May 20, 2017 16:38
…g_s"

This reverts commit 692655e, reversing
changes made to d4eaa2c.
Instead of printing `style(a.item())` for 0d array a,
we now print `str(a[()])`.
@mhvk
Copy link
Contributor

mhvk commented May 21, 2017

I think I prefer the unification in #9139, but can see the argument for not causing needless code churn as doctests fail, etc. But perhaps we can have a bit of a "best of both worlds" by allowing style=None to mean "typeset array scalar as an array, not as a scalar"?

I should add that I've gotten a bit confused about what behaviour to expect for the various PRs.... At the risk of raising a topic that is ideal for bikeshedding, is this something for the mailing list?

Copy link
Contributor

@mhvk mhvk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One likely bug, otherwise mostly just comments.

@@ -521,7 +513,10 @@ def array2string(a, max_line_width=None, precision=None,
if formatter is None:
formatter = _formatter

if a.size == 0:
if a.shape == () and a.dtype.names is None:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess my suggestion might be as easy as adding if style and a.shape == () ...

Copy link
Member Author

@ahaldane ahaldane May 23, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I though I understood your suggestion but now I'm not sure. How does it change the behavior of this PR?

Also, note that array_str in numeric.py calls array2string(..., style=str), while array_repr calls array2string(...) (ie, default value of style). That is the main use/effect of the style argument in numpy.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My suggestion would be for other uses of array2string, i.e., to allow people to say that they just want the same style for arrays and scalars (it could also be assigning the same dict to style as for formatter, but I think this is less clean). This is handy for astropy at least, although it would have been even better if one could also set style with set_printoptions.

Indeed, it is somewhat surprising that one cannot:

In [2]: np.set_printoptions(formatter={'all': lambda x: 'haha'})

In [3]: np.array([1, 2])
Out[3]: array([haha, haha])

In [4]: np.array(1)
Out[4]: array(1)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, looking better I see you explicitly comment on the lack of ability to format scalars. Anyway, the style=None option to array2string would at least allow that to be remedied in explicit calls.

.. deprecated:: 1.14.0
style : function, optional
A function that accepts an ndarray and returns a string. Used only
when the shape of `a` is equal to ``()``, i.e. for 0-D arrays.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

and not for structured arrays, correct?

Copy link
Member Author

@ahaldane ahaldane May 23, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Correct. This is actually the original docstring (from before any of my changes).

By the way, I structured this PR in two commits: A revert commit of my recent 0d changes, then a new commit for this PR.

The diff of the second commit show the changes of this PR relative to numpy 1.13, showing the docstring is unchanged.

PyObject *arr, *ret = NULL;

if (!PyDataType_HASFIELDS(((PyVoidScalarObject*)self)->descr)) {
return gentype_str(self);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think here the use of the same function for repr and str for void will fail: wouldn't you want to call gentype_repr if this was called as repr(void_scalar)?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this was to maintain the old behavior. In numpy 1.13 the void scalars went though gentype_repr for repr and gentype_str for str, and both of these returned the output of PyObject_String (and not PyObject_Repr). You can see this in the diff of the 2nd commit.

assert_equal(repr(np.datetime64('2005-02-25')[...]),
"array('2005-02-25', dtype='datetime64[D]')")
"array(numpy.datetime64('2005-02-25'), dtype='datetime64[D]')")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Although I think I understand why this happens, it does not seem very nice. And it is still a regression compared to 1.12:

np.datetime64('2005-02-25')
# numpy.datetime64('2005-02-25')
np.datetime64('2005-02-25')[...]
# array(datetime.date(2005, 2, 25), dtype='datetime64[D]')

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed. As far as I can see, this is an unavoidable effect of this PR.

@eric-wieser
Copy link
Member

What exactly are we trying to fix here?

Can we enumerate exactly which parts are regressions, and which parts are just changes?

I particular, I don't think that repr(arr0d) needs any fixes at all - I think just str needs the str(arr0d[()]) treatment

if sys.version_info[0] >= 3:
assert_equal(str(np.array('café', np.unicode_)), 'café')
assert_equal(repr(np.array('café', np.unicode_)),
"array('café',\n dtype='<U4')")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we insert a newline here? (We do so in 1.12 too, but it seems pointless)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The newline appears for all flexible types. Probably this is because they can be very long. There is a special clause if issubclass(arr.dtype.type, flexible): in array_repr in numeric.py controlling it.

Copy link
Member

@eric-wieser eric-wieser May 23, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I always assumed that was just intended to handle the subset which is structured dtypes (which "flexible" is often errantly used to mean exclusively, eg in comments np.ma). Certainly for scalars, it seems overkill. Either way, out of scope to change here.


item_str = PyObject_Str(item);
Py_DECREF(item);
return item_str;
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I approve of these fixes - they remove the cyclic call-graph from arrayprint, and this seems like a pretty sane default.

Are user-defined dtypes required to implement item? And is it allowed to return self?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, the docs for user-defined-types say In addition, the required functions in the ”.f” member must be defined: nonzero, copyswap, copyswapn, setitem, getitem, and cast..

They don't say if self is a valid return value, but I think it isn't since none of the numpy types do that.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wait, isn't it completely reasonable for getitem to return a subclass of np.generic? I think that's what rational does.

But yes, it won't return self, because it gets there via an array.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Confirmed:

>>> from numpy.core.test_rational import rational
>>> x = rational(1, 2); x
rational(1,2)
>>> x.item()
rational(1,2)

More alarmingly

>>> str(x)
Traceback (most recent call last):
  File "<pyshell#15>", line 1, in <module>
    str(x)
TypeError: __str__ returned non-string (type bytes)

Which makes it impossible to test np.generic.__repr__(x)

Copy link
Member Author

@ahaldane ahaldane May 24, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah! A few days ago I was struggling to remember where I had seen an example user-defined type. Thanks for reminding me it is rational.

I might add some tests involving it.

@ahaldane
Copy link
Member Author

ahaldane commented May 22, 2017

@eric-wieser , @mhvk Here is the history and chain of thought behind this PR:

In 1.13 scalars, 0d arrays and 1+d arrays all had different ways of printing their elements, and 0d arrays depend on a wonky style(a.item()) behavior.

An attempted fix in #8983 was to print 0d arrays as format_function(a[()]), ie to print like 1+d arrays, but this had two problems:

  1. It added ugly spaces in the repr of 0d arrays because format_function for floats has extra spaces for alignment
  2. people depend on the str of unicode and string 0d arrays to behave how numpy scalars behave.

A potential fix to issue 1 was #9139 which removes unneeded spaces in float arrays much more generally. However, I perceived some pushback on that PR because it will break a lot of unit tests in other projects. It also does not solve issue 2.

Therefore, in this PR I instead made 0d arrays always print like numpy scalars, which fixed the space issue. This PR also fixes the str of unicode issue by reintroducing the style argument to array2string. I think this is a less invasive change, though I still have to investigate the issues you both raised above.

An alternative which I think you are both suggesting is to keep #9139, but reintroduce some version of the style argument (eg using style=None) so that for 0d arrays repr(a) will return 'array(' + format_function(a[()]) while str(a) will return str(a[()]). I kind of like that but we should be aware of what will happen: First, it means many doctests will need to be updated for spaces, and second it means that for 0d arrays, repr prints like an array while str prints like a scalar, which is perhaps a little weird.

Also, In the particular case of datetimes, I think this PR is definitely worse than #9139 (see @mhvk's note above), and I don't see a good fix for that.

I think I prefer #9139 plus a style=None argument, but then again I personally am not annoyed by larger changes to numpy that upstream projects might be more sensitive to.

Do you all think this is worth a post to the list, or are we all agreed to go forward with #9139?

@eric-wieser
Copy link
Member

I instead made 0d arrays always print like numpy scalars

If you mean that array(1) now prints as 1, then I think this is a really bad idea. Numpy arrays are still different objects to scalars, and this should be made very obvious in their repr.

@ahaldane
Copy link
Member Author

ahaldane commented May 22, 2017

No, I should have been more precise:

I meant that, in this PR, the inner element of the repr of a 0d array is printed like a scalar. In other words with 0d array a = array(1) such that a[()] is a scalar np.int64(1) then in this PR

  • repr(a) evaluates to 'array(' + repr(a[()]) + ')' , finally giving 'array(1)'.
  • str(a) evaluates to str(a[()]), finally giving '1'

That is unlike #9139 which evaluates as follows:

  • repr(a) evaluates to to 'array(' + format_function(a[()]) + ')' , finally giving 'array(1)'.
  • str(a) evaluates to format_function(a[()]), finally giving '1'

In this example the two PRs look identical, but actually they will differ for float and str types. The last str is bad because for str/unicode format_function prints the repr of the string which includes the quote marks. That's why in my last comment I talked about how we would need to modify #9139 so that

By the way, for float types like a = np.array(1.), in this PR str(a) gives '1.0', while #9139 gives '1.', and in general the significant digits will be different.

return item_str;
}

static PyObject *
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is identical to gentype_str, no? So, one could just use the latter below.

@mhvk
Copy link
Contributor

mhvk commented May 24, 2017

@ahaldane - I think I still prefer #9139, as it seems just a better direction to take, formatting array scalars just like regular arrays. However, the annoyance with the string/unicode array scalar is real, but perhaps it is just the str case (of array scalar str/unicode objects only?) that should be special-cased.

On the other hand, if it is possible to set style=None in array2string, then I think at least my use cases are mostly addressed, so perhaps that is the simpler route to go for (and we can still introduce a sign option as well, in both array2string and set_printoptions, which could use #9139 -- probably it would be substantially easier to just allow formats to be passed in through formatter...).

(Sorry, this is not being very helpful! I clearly have no good answer...)

@ahaldane
Copy link
Member Author

ahaldane commented May 24, 2017

@mhvk I've improved on #9139 along these lines, see there.

I added a style=None option, and copied the gentype_repr, gentype_str and voidtype_str functions from this PR over to there, and copied the new tests over too. Now #9139 implements the last behavior in my last comment.

The datetime repr is much nicer over there!

@charris
Copy link
Member

charris commented Sep 26, 2017

Closing on account of #9139 being merged.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants