Skip to content

BUG: New ragged array message has false positives #14138

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
seberg opened this issue Jul 27, 2019 · 13 comments · Fixed by #14341
Closed

BUG: New ragged array message has false positives #14138

seberg opened this issue Jul 27, 2019 · 13 comments · Fixed by #14341
Labels
00 - Bug 06 - Regression Priority: high High priority, also add milestones for urgent issues

Comments

@seberg
Copy link
Member

seberg commented Jul 27, 2019

A related effect of this change is that assigning a ragged list (but still of correct number of values) to a slice no longer works, as shown in this example, extracted from Matplotlib's tests:

points = np.array([[1, 2], [3, 6]])

xlim = points[:, 0]
ylim = points[:, 1]
edge_size = max(np.diff(xlim), np.diff(ylim))

interval = [xlim[0], xlim[0] + edge_size]

points[:, 0] = interval
# print(repr(points)) formerly gave:
# array([[1, 2],
#        [5, 6]])

fails with a ValueError because the first element is a scalar and the second element is an array of length one.

I wrote a fix for this, but a question did come up for whether this was an intended change?

Originally posted by @QuLogic in #13913 (comment)

@seberg seberg added 00 - Bug 06 - Regression Priority: high High priority, also add milestones for urgent issues labels Jul 27, 2019
@seberg
Copy link
Member Author

seberg commented Aug 22, 2019

@mattip ping, can you take a look at this issue? If it means we need to revert that other PR, than so be it.

@mattip
Copy link
Member

mattip commented Aug 22, 2019

Yeah, we should revert or fix PyArray_GetArrayParamsFromObject. That function is told to use a numeric dtype, but gives up and returns an object dtype, the construction proceeds with the numeric dtype anyway, and then counts on the actual assignment to raise if it cannot be done.

We should add somewhere a test that the following succeeds (it is the corner case from #14138 since PyArray_GetArrayParamsFromObject returns a object dtype)

b = np.array([1])
a = np.array([1, b], dtype=int)

@seberg seberg added this to the 1.17.1 release milestone Aug 22, 2019
@seberg
Copy link
Member Author

seberg commented Aug 22, 2019

@mattip I am good with reverting. Unfortunately this is in 1.17, so we should get it fixed for the 1.17.1 release probably

@WarrenWeckesser
Copy link
Member

This is probably a pointless comment, coming so late, but it seems the example given should result in an error. np.array([5]) is a sequence of length 1. Why is this OK

x = np.empty(2)
x[:] = [1, np.array([5])]

but this is not

x[:] = [1, [5]]

???

Was the argument for reversion purely for backward compatibility (i.e. it is not what we want, but the existing behavior has been entrenched for too long to change)?

@mattip
Copy link
Member

mattip commented Dec 16, 2019

Yes, the PR inadvertently caused existing code to error without any deprecation or other kind of notice. We could change the behaviour, but should do it in a controlled way.

@eric-wieser
Copy link
Member

I agree with @WarrenWeckesser that the behavior should be changed, but also with @mattip that it must go through a deprecation cycle.

@WarrenWeckesser
Copy link
Member

@mattip, OK, thanks, that makes sense.

@mattip
Copy link
Member

mattip commented Dec 16, 2019

We would have to decide which way we should go: whether to disallow np.array([1, np.array([2])], dtype=int) or to allow np.array([1, [2]], dtype=int). I think elsewhere we decided both should fail.

@seberg
Copy link
Member Author

seberg commented Dec 16, 2019

@WarrenWeckesser there is another dimension to this, because float(np.array([1.])) works. So the method we use to do the conversion to the scalar will not fail here.

@eric-wieser
Copy link
Member

eric-wieser commented Dec 16, 2019

Although @nschloe and I would argue that that case shouldn't work either, see #10615

@seberg
Copy link
Member Author

seberg commented Dec 16, 2019

Of course, and I 100% agree with him :). Although we do have some "reverse broadcasting" rules, similar to this, which work probably only because of this behaviour... OTOH, we did pull it off for bools and __index__ and the world did only half break...

@nschloe
Copy link
Contributor

nschloe commented Dec 16, 2019

We would have to decide which way we should go: whether to disallow `np.array([1, np.array([2])], dtype=int) or to allow np.array([1, [2]], dtype=int). I think elsewhere we decided both should fail.

Indeed, I'm very much in favor of disallowing it .

np.array([1, np.array([2])]) gives array([1, array([2])], dtype=object), and np.array([1, np.array([2])]) gives array([1, list([2])], dtype=object) -- this makes perfect sense, it's exactly what you wrote down. If you allow the conversion to int, you're going down a deep dark hole: What about

np.array([1, np.array([2]), dtype=int)
np.array([np.array([1]), np.array([2]), dtype=int)
np.array([1, np.array([[[[[[2]]]]]]), dtype=int)
np.array([1, np.array([np.array([2])]), dtype=int)

etc.?

Also, note that this conversion is implicit, yikes.

Explicit is better than implicit.

@seberg
Copy link
Member Author

seberg commented Dec 16, 2019

It should all be errors, but I proposed changing __float__, etc. 7 years ago probably and in the end only did __index__ ;). But... since then we got better at keeping the project moving!

@charris charris removed the 09 - Backport-Candidate PRs tagged should be backported label Jul 19, 2020
@charris charris removed this from the 1.17.1 release milestone Jul 19, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
00 - Bug 06 - Regression Priority: high High priority, also add milestones for urgent issues
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants