-
-
Notifications
You must be signed in to change notification settings - Fork 10.8k
WIP: MAINT: print 0d arrays using scalar str/repr #9143
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Using click to view
(note that the third test is not too important since it was introduced in #8983) |
Using click to view``` ====================================================================== FAIL: test_structure_format (test_arrayprint.TestArray2String) ---------------------------------------------------------------------- Traceback (most recent call last): File "/home/allan/nmpy_dev/build/testenv/lib/python3.6/site-packages/numpy/core/tests/test_arrayprint.py", line 183, in test_structure_format assert_equal(np.array2string(array_scalar), "( 1., 2.12345679, 3.)") File "/home/allan/nmpy_dev/build/testenv/lib/python3.6/site-packages/numpy/testing/utils.py", line 416, in assert_equal raise AssertionError(msg) AssertionError: Items are not equal: ACTUAL: '(1.0, 2.1234567890123457, 3.0)' DESIRED: '( 1., 2.12345679, 3.)'======================================================================
|
8a40e46
to
4bb8b7b
Compare
For reference, this did fix the scipy problem when I tested it. |
I see that |
Note that is is also being fixed on the scipy end scipy/scipy#7419, but that won't help in the near future as we need to maintain backwards compatibility. |
Not a fan of any of those changes. Really, I think there are only four cases that need to be fixed, concerning letting 0d arrays decay to string scalars (
These should be unaffected by The last one is already broken in 1.12 |
The "fix" on the scipy end is more of a code cleanup, which also happens to be workaround for this issue. But it is a backward-incompatible change and probably regression in numpy. |
All right, got this into a version that hopefully passes tests (including some of the behavior we discussed which was not previously tested), This PR brings the state of things back to something closer to 1.13, while avoiding the cluncky The solutions were 1. to reintroduce the As @eric-wieser wieser noted, there is still a bunch of broken behavior involving unicode in python2, but those are very old problems. I also noticed another very old problem, with |
6baad40
to
8081501
Compare
Instead of printing `style(a.item())` for 0d array a, we now print `str(a[()])`.
8081501
to
9bdc531
Compare
I think I prefer the unification in #9139, but can see the argument for not causing needless code churn as doctests fail, etc. But perhaps we can have a bit of a "best of both worlds" by allowing I should add that I've gotten a bit confused about what behaviour to expect for the various PRs.... At the risk of raising a topic that is ideal for bikeshedding, is this something for the mailing list? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One likely bug, otherwise mostly just comments.
@@ -521,7 +513,10 @@ def array2string(a, max_line_width=None, precision=None, | |||
if formatter is None: | |||
formatter = _formatter | |||
|
|||
if a.size == 0: | |||
if a.shape == () and a.dtype.names is None: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess my suggestion might be as easy as adding if style and a.shape == () ...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I though I understood your suggestion but now I'm not sure. How does it change the behavior of this PR?
Also, note that array_str
in numeric.py
calls array2string(..., style=str)
, while array_repr
calls array2string(...)
(ie, default value of style). That is the main use/effect of the style argument in numpy.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My suggestion would be for other uses of array2string
, i.e., to allow people to say that they just want the same style for arrays and scalars (it could also be assigning the same dict
to style
as for formatter
, but I think this is less clean). This is handy for astropy at least, although it would have been even better if one could also set style
with set_printoptions
.
Indeed, it is somewhat surprising that one cannot:
In [2]: np.set_printoptions(formatter={'all': lambda x: 'haha'})
In [3]: np.array([1, 2])
Out[3]: array([haha, haha])
In [4]: np.array(1)
Out[4]: array(1)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, looking better I see you explicitly comment on the lack of ability to format scalars. Anyway, the style=None
option to array2string
would at least allow that to be remedied in explicit calls.
.. deprecated:: 1.14.0 | ||
style : function, optional | ||
A function that accepts an ndarray and returns a string. Used only | ||
when the shape of `a` is equal to ``()``, i.e. for 0-D arrays. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
and not for structured arrays, correct?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Correct. This is actually the original docstring (from before any of my changes).
By the way, I structured this PR in two commits: A revert commit of my recent 0d changes, then a new commit for this PR.
The diff of the second commit show the changes of this PR relative to numpy 1.13, showing the docstring is unchanged.
PyObject *arr, *ret = NULL; | ||
|
||
if (!PyDataType_HASFIELDS(((PyVoidScalarObject*)self)->descr)) { | ||
return gentype_str(self); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think here the use of the same function for repr
and str
for void will fail: wouldn't you want to call gentype_repr
if this was called as repr(void_scalar)
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this was to maintain the old behavior. In numpy 1.13 the void scalars went though gentype_repr
for repr and gentype_str
for str, and both of these returned the output of PyObject_String
(and not PyObject_Repr
). You can see this in the diff of the 2nd commit.
assert_equal(repr(np.datetime64('2005-02-25')[...]), | ||
"array('2005-02-25', dtype='datetime64[D]')") | ||
"array(numpy.datetime64('2005-02-25'), dtype='datetime64[D]')") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Although I think I understand why this happens, it does not seem very nice. And it is still a regression compared to 1.12:
np.datetime64('2005-02-25')
# numpy.datetime64('2005-02-25')
np.datetime64('2005-02-25')[...]
# array(datetime.date(2005, 2, 25), dtype='datetime64[D]')
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed. As far as I can see, this is an unavoidable effect of this PR.
What exactly are we trying to fix here? Can we enumerate exactly which parts are regressions, and which parts are just changes? I particular, I don't think that |
if sys.version_info[0] >= 3: | ||
assert_equal(str(np.array('café', np.unicode_)), 'café') | ||
assert_equal(repr(np.array('café', np.unicode_)), | ||
"array('café',\n dtype='<U4')") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we insert a newline here? (We do so in 1.12 too, but it seems pointless)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The newline appears for all flexible types. Probably this is because they can be very long. There is a special clause if issubclass(arr.dtype.type, flexible):
in array_repr
in numeric.py
controlling it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I always assumed that was just intended to handle the subset which is structured dtypes (which "flexible" is often errantly used to mean exclusively, eg in comments np.ma
). Certainly for scalars, it seems overkill. Either way, out of scope to change here.
|
||
item_str = PyObject_Str(item); | ||
Py_DECREF(item); | ||
return item_str; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I approve of these fixes - they remove the cyclic call-graph from arrayprint
, and this seems like a pretty sane default.
Are user-defined dtypes required to implement item
? And is it allowed to return self
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, the docs for user-defined-types say In addition, the required functions in the ”.f” member must be defined: nonzero, copyswap, copyswapn, setitem, getitem, and cast.
.
They don't say if self
is a valid return value, but I think it isn't since none of the numpy types do that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wait, isn't it completely reasonable for getitem
to return a subclass of np.generic
? I think that's what rational
does.
But yes, it won't return self, because it gets there via an array.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Confirmed:
>>> from numpy.core.test_rational import rational
>>> x = rational(1, 2); x
rational(1,2)
>>> x.item()
rational(1,2)
More alarmingly
>>> str(x)
Traceback (most recent call last):
File "<pyshell#15>", line 1, in <module>
str(x)
TypeError: __str__ returned non-string (type bytes)
Which makes it impossible to test np.generic.__repr__(x)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah! A few days ago I was struggling to remember where I had seen an example user-defined type. Thanks for reminding me it is rational
.
I might add some tests involving it.
@eric-wieser , @mhvk Here is the history and chain of thought behind this PR: In 1.13 scalars, 0d arrays and 1+d arrays all had different ways of printing their elements, and 0d arrays depend on a wonky An attempted fix in #8983 was to print 0d arrays as
A potential fix to issue 1 was #9139 which removes unneeded spaces in float arrays much more generally. However, I perceived some pushback on that PR because it will break a lot of unit tests in other projects. It also does not solve issue 2. Therefore, in this PR I instead made 0d arrays always print like numpy scalars, which fixed the space issue. This PR also fixes the An alternative which I think you are both suggesting is to keep #9139, but reintroduce some version of the style argument (eg using Also, In the particular case of datetimes, I think this PR is definitely worse than #9139 (see @mhvk's note above), and I don't see a good fix for that. I think I prefer #9139 plus a Do you all think this is worth a post to the list, or are we all agreed to go forward with #9139? |
If you mean that |
No, I should have been more precise: I meant that, in this PR, the inner element of the
That is unlike #9139 which evaluates as follows:
In this example the two PRs look identical, but actually they will differ for float and str types. The last
By the way, for float types like |
return item_str; | ||
} | ||
|
||
static PyObject * |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is identical to gentype_str
, no? So, one could just use the latter below.
@ahaldane - I think I still prefer #9139, as it seems just a better direction to take, formatting array scalars just like regular arrays. However, the annoyance with the string/unicode array scalar is real, but perhaps it is just the On the other hand, if it is possible to set (Sorry, this is not being very helpful! I clearly have no good answer...) |
@mhvk I've improved on #9139 along these lines, see there. I added a The datetime repr is much nicer over there! |
Closing on account of #9139 being merged. |
Continued from discussion at #8983, to investigate option 3:
str(x.item())
.arrayprint.formatter(x)
.str(x)
, ie rely on the string functions inscalartypes.c.src
.Two further comments on option 3:
For option 3 to work will need to modify the scalar fallbacks
gentype_repr
andgentype_str
to not promote to 0d arrays, to avoid recursion issues noted in that PR. Here I made them printstr(x.item())
. After the proposed definitions ofgenint_type_repr
andvoidtype_repr
in #8981 and #8983,gentype_repr
and str will be unused by numpy itself so this shouldn't cause problems.I am also having trouble deciding between
str(x[()])
orrepr(x[()])
for 0d arrays (see the line changed in_formatArray
). Both of them break current behavior. I'm still working on this but I'll paste the current failed tests in comments below.