-
-
Notifications
You must be signed in to change notification settings - Fork 10.8k
the error message for accidentally irregular arrays is confusing #5303
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Wtf we seriously autodetect that as being an object array? I feel like the
|
I agree this "feature" is awful but I doubt we can change this, its most likely baked into lots of existing code that may happen to even work alright (just slower than the user probably anticipated) |
The generation of object arrays is kind of screwed up anyway, probably they should only be produced with an explicit object dtype, and a maxdepth keyword would also be helpful. I don't know how much use of this feature is in the wild, usually we recommend producing and empty array of object type and initializing it in a separate operation. Might try making this change with a deprecation and see what breaks. |
It's not clear to me how much use object arrays even get in the wild -- I'm I wish we had more data :-/. Too bad we can't reasonably ship an On Fri, Nov 21, 2014 at 12:19 AM, Charles Harris notifications@github.com
Nathaniel J. Smith |
we do get plenty of bugs related to object arrays which indicates they are used more than we might expect, though the large number of reports could also be caused by them being more broken than other stuff. I think object arrays are especially common in pandas space. |
However, having a
Object arrays can be useful, for instance if the elements are polynomials, Decimals, matrices. or some other such custom type where all one needs is arithmetic and maybe dot. |
Yeah, pandas uses object arrays for strings, because numpy's fixed-length None of the mentioned use cases would ever involve passing mismatched lists
|
interesting |
I don't know, but it sounds to me like a painful and long deprecation. Less because larger packages do that, but more because of loads of scripts just doing it for not very good reasons. Though I agree that the cleanest method for creating an object array which may hold sequences (and possibly the only clean one, aside a new depth/ndim) is to create an empty array first and then fill it. |
If it's logistically impractical to backwards-incompatibly change the details of object array creation, would it make sense to instead add some backwards-compatible code for the purpose of giving nicer error messages? |
We do actually, in a few places that handle arrays of variable-length strings. We don't use them a lot, but there are a few yet-to-be-implemented features that could use them (esp. kernels on structured objects). |
@njsmith wrote:
There are a couple uses of object arrays in http://docs.scipy.org/doc/scipy/reference/generated/scipy.linalg.invhilbert.html These don't use the autodetect "feature". |
adding a better error message may be a bit hard, though, since we cannot really know before hand whether the argument to setitem must be a scalar or not (https://github.com/numpy/numpy/blob/master/numpy/core/src/multiarray/ctors.c#L528) Any suggestion here ? |
A situation similar to this (unexpected behavior caused by numpy's magic regarding conformability vs. jaggedness of elements during array creation) has just been raised on the mailing list. |
So to summarize, we've found a bunch of cases where well-written code uses object arrays but none that rely on the "clever" handling of non-conformable lists. And there are multiple examples (#5394, http://mail.scipy.org/pipermail/numpy-discussion/2014-December/071775.html) where our willingness to mix these together is creating serious confusion for specific users. I think we ought to move (via deprecation etc.) to the world where I also think we should seriously consider implementing #5353 (don't create object arrays unless |
I agree with @njsmith's summary and his two suggestions. By the way here's a pandas issue that was just opened which also complains about the same obscure error message and involves similar underlying issues pandas-dev/pandas#9156. |
If you are looking at instances of "well-written code that uses object arrays" |
Or she. (Or they.)
|
oops, sorry for the unintentional sexism. |
Here's a similar report on the mailing list, where someone is getting weird bug reports from users who accidentally make irregular arrays. |
Anyone want to put together a patch?
|
A similar question again on the mailing list http://mail.scipy.org/pipermail/numpy-discussion/2015-February/072240.html, regarding over-coercion of arrays of tuples into high dimensional arrays. |
@agriffing: I think we have a pretty good pile of evidence that there's a On Mon, Feb 9, 2015 at 8:38 AM, argriffing notifications@github.com wrote:
Nathaniel J. Smith -- http://vorpus.org |
As an alternative (or supplement), might it be possible to have a function that explicitly creates object dtype arrays without converting subsequences? There could be an optional parameter to determine how many dimensions the array should have (which defaults to 1), with it automatically drilling down into subsequences until it reaches that level (and will raise an exception if the shapes don't match up to that level). |
One more use-case: |
From glancing at the docs I'm 99% sure ad will be fine if we fix this bug. It might need a little more tweaking to live in a world where we fixed #5353 (the no object arrays without explicit request bug), but I'm not sure: generally an object can always opt in to being converted to an array in whatever way it likes, so it would only be objects that aren't designed to be used with numpy that would be affected. |
@njsmith Yes, I completely agree; thanks for checking on that too. I just thought it wouldn't hurt to add another example. |
Stupid question, but the handling of object arrays is somewhat unrelated to the error message business, right? I expect fixing the explicit passing of |
Not that any more evidence of the potential confusion of the automatic conversion is needed at this point, but some of y'all might find it interesting. The somewhat surprising behavior described in the stackoverflow question http://stackoverflow.com/questions/38817357/randomly-select-item-from-list-of-lists-gives-valueerror#38817508 is a consequence of the "automagic" conversion to an object array. |
How to fix this?
|
@Light-- please open a new issue, and describe why you think this is a problem with NumPy. Maybe it is an issue with theano, or some other package you are using? What is |
The remaining corner case(s) are now also deprecated in master, so closing. The original case currently gives:
which is probably not great, but much better than no information. Ideas for improvements welcome. |
@seberg - thanks for the followup; indeed, that error message is much clearer! |
Someone using my code reported this error message, and I would have been able to track down the problem more quickly if the message had been more informative, maybe like
ValueError: setting an irregularly shaped array with a non-object dtype
instead ofValueError: setting an array element with a sequence.
The text was updated successfully, but these errors were encountered: