Skip to content

the error message for accidentally irregular arrays is confusing #5303

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
argriffing opened this issue Nov 20, 2014 · 34 comments · Fixed by #13913
Closed

the error message for accidentally irregular arrays is confusing #5303

argriffing opened this issue Nov 20, 2014 · 34 comments · Fixed by #13913

Comments

@argriffing
Copy link
Contributor

Someone using my code reported this error message, and I would have been able to track down the problem more quickly if the message had been more informative, maybe like ValueError: setting an irregularly shaped array with a non-object dtype instead of ValueError: setting an array element with a sequence.

>>> import numpy as np
>>> np.array([[0, 1], [2]]).dtype
dtype('O')
>>> np.array([[0, 1], [2]], dtype=int).dtype
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
ValueError: setting an array element with a sequence.
@njsmith
Copy link
Member

njsmith commented Nov 20, 2014

Wtf we seriously autodetect that as being an object array? I feel like the
real fix is that asarray should unconditionally treat list objects as
defining slices of the output array, and if they're irregular than just
throw an error, don't try to back up and make an object array. (I.e., I'm
saying both of those should error out.) I don't know if we can get away
with such a change though. But the current behaviour is both surprising and
dangerous (what if one day your lists just happen to have the same number
of elements? you'll suddenly get something totally unexpected).
On 20 Nov 2014 22:56, "argriffing" notifications@github.com wrote:

Someone using my code reported this error message, and I would have been
able to track down the problem more quickly if the message had been more
informative, maybe like ValueError: setting an irregularly shaped array
with a non-object dtype instead of ValueError: setting an array element
with a sequence.

import numpy as np
np.array([[0, 1], [2]]).dtype
dtype('O')
np.array([[0, 1], [2]], dtype=int).dtype
Traceback (most recent call last):
File "", line 1, in
ValueError: setting an array element with a sequence.


Reply to this email directly or view it on GitHub
#5303.

@juliantaylor
Copy link
Contributor

I agree this "feature" is awful but I doubt we can change this, its most likely baked into lots of existing code that may happen to even work alright (just slower than the user probably anticipated)

@charris
Copy link
Member

charris commented Nov 21, 2014

The generation of object arrays is kind of screwed up anyway, probably they should only be produced with an explicit object dtype, and a maxdepth keyword would also be helpful. I don't know how much use of this feature is in the wild, usually we recommend producing and empty array of object type and initializing it in a separate operation. Might try making this change with a deprecation and see what breaks.

@njsmith
Copy link
Member

njsmith commented Nov 21, 2014

It's not clear to me how much use object arrays even get in the wild -- I'm
pretty confident that all the big projects like scikit-learn etc. basically
never use them, and certainly won't be depending on such a quirky
autodetection feature. The main users are probably people who are somewhat
confused and maybe even using them by accident... there's an argument that
these are the users who would most benefit from our cleaning things up so
they stop being confused in general, even if there's a short-term pain as
their old scripts break. Or maybe I'm underestimating our users, I dunno.

I wish we had more data :-/. Too bad we can't reasonably ship an
instrumented version of numpy that phones home and tells us what features
people actually use :-) (For Science!)

On Fri, Nov 21, 2014 at 12:19 AM, Charles Harris notifications@github.com
wrote:

The generation of object arrays is kind of screwed up anyway, probably
they should only be produced with an explicit object dtype, and a maxdepth
keyword would also be helpful. I don't know how much use of this feature is
in the wild, usually we recommend producing and empty array of object type
and initializing it in a separate operation. Might try making this change
with a deprecation and see what breaks.


Reply to this email directly or view it on GitHub
#5303 (comment).

Nathaniel J. Smith
Postdoctoral researcher - Informatics - University of Edinburgh
http://vorpus.org

@juliantaylor
Copy link
Contributor

we do get plenty of bugs related to object arrays which indicates they are used more than we might expect, though the large number of reports could also be caused by them being more broken than other stuff. I think object arrays are especially common in pandas space.

@charris
Copy link
Member

charris commented Nov 21, 2014

However, having a None element to force an object array might be more common.

In [1]: array([1,2,3,None]).dtype
Out[1]: dtype('O')

Object arrays can be useful, for instance if the elements are polynomials, Decimals, matrices. or some other such custom type where all one needs is arithmetic and maybe dot.

@njsmith
Copy link
Member

njsmith commented Nov 21, 2014

Yeah, pandas uses object arrays for strings, because numpy's fixed-length
string types are pretty... special-purpose.

None of the mentioned use cases would ever involve passing mismatched lists
to asarray and expecting it to turn them into array entries, though.
On 21 Nov 2014 00:54, "Julian Taylor" notifications@github.com wrote:

we do get plenty of bugs related to object arrays which indicates they are
used more than we might expect, though the large number of reports could
also be caused by them being more broken than other stuff. I think object
arrays are especially common in pandas space.


Reply to this email directly or view it on GitHub
#5303 (comment).

@argriffing
Copy link
Contributor Author

Object arrays can be useful, for instance if the elements are ... matrices.

interesting

@seberg
Copy link
Member

seberg commented Nov 21, 2014

I don't know, but it sounds to me like a painful and long deprecation. Less because larger packages do that, but more because of loads of scripts just doing it for not very good reasons. Though I agree that the cleanest method for creating an object array which may hold sequences (and possibly the only clean one, aside a new depth/ndim) is to create an empty array first and then fill it.

@argriffing
Copy link
Contributor Author

If it's logistically impractical to backwards-incompatibly change the details of object array creation, would it make sense to instead add some backwards-compatible code for the purpose of giving nicer error messages?

@larsmans
Copy link
Contributor

I'm pretty confident that all the big projects like scikit-learn etc. basically never use them

We do actually, in a few places that handle arrays of variable-length strings. We don't use them a lot, but there are a few yet-to-be-implemented features that could use them (esp. kernels on structured objects).

@cournape cournape modified the milestone: 1.10 blockers Nov 29, 2014
@WarrenWeckesser
Copy link
Member

@njsmith wrote:

It's not clear to me how much use object arrays even get in the wild

There are a couple uses of object arrays in scipy.linalg, where the objects are Python integers that would overflow a 64 bit integer:

http://docs.scipy.org/doc/scipy/reference/generated/scipy.linalg.invhilbert.html
http://docs.scipy.org/doc/scipy/reference/generated/scipy.linalg.pascal.html
scipy/scipy#4012

These don't use the autodetect "feature".

@cournape
Copy link
Member

adding a better error message may be a bit hard, though, since we cannot really know before hand whether the argument to setitem must be a scalar or not (https://github.com/numpy/numpy/blob/master/numpy/core/src/multiarray/ctors.c#L528)

Any suggestion here ?

@argriffing
Copy link
Contributor Author

(what if one day your lists just happen to have the same number
of elements? you'll suddenly get something totally unexpected).

A situation similar to this (unexpected behavior caused by numpy's magic regarding conformability vs. jaggedness of elements during array creation) has just been raised on the mailing list.
http://mail.scipy.org/pipermail/numpy-discussion/2014-December/071775.html

@njsmith
Copy link
Member

njsmith commented Dec 26, 2014

So to summarize, we've found a bunch of cases where well-written code uses object arrays but none that rely on the "clever" handling of non-conformable lists. And there are multiple examples (#5394, http://mail.scipy.org/pipermail/numpy-discussion/2014-December/071775.html) where our willingness to mix these together is creating serious confusion for specific users.

I think we ought to move (via deprecation etc.) to the world where np.array always treats list objects as indicating array slices, and if these are non-conformable then we raise an error. Yes, it'll be painful for some existing confusingly/sloppily written scripts, but this will be a one-time source of pain; the current situation is creating new pain on an ongoing basis.

I also think we should seriously consider implementing #5353 (don't create object arrays unless dtype=object is explicitly specified) and making it so np.ndarray only treats list objects specially, with tuples and other sequence types being treats as array elements instead. These would both obviously need extended deprecation periods, but it would eliminate a lot of weird and confusing inconsistencies. (E.g., tuples are the item type for structured dtypes, making np.array behaviour very hard to predict for such types.)

@argriffing
Copy link
Contributor Author

I agree with @njsmith's summary and his two suggestions. By the way here's a pandas issue that was just opened which also complains about the same obscure error message and involves similar underlying issues pandas-dev/pandas#9156.

@MechCoder
Copy link
Contributor

If you are looking at instances of "well-written code that uses object arrays" scipy.lil_matrix stores it rows as an object array (which is used quite a bit in scikit-learn). It would be great to implement #5353 to let the user know that he is creating an array of dtype=object explicitly.

@njsmith
Copy link
Member

njsmith commented Dec 27, 2014

he

Or she. (Or they.)
On 27 Dec 2014 21:47, "Manoj Kumar" notifications@github.com wrote:

If you are looking at instance of "well-written code that uses object
arrays" scipy.lil_matrix stores it rows as an object array (which is used
quite a bit in scikit-learn). It would be great to implement #5353
#5353 to let the user know that he
is creating an array of dtype=object explicitly.


Reply to this email directly or view it on GitHub
#5303 (comment).

@MechCoder
Copy link
Contributor

oops, sorry for the unintentional sexism.

@argriffing
Copy link
Contributor Author

Here's a similar report on the mailing list, where someone is getting weird bug reports from users who accidentally make irregular arrays.

@njsmith
Copy link
Member

njsmith commented Jan 6, 2015

Anyone want to put together a patch?
On 6 Jan 2015 21:57, "argriffing" notifications@github.com wrote:

Here's a similar report
http://mail.scipy.org/pipermail/numpy-discussion/attachments/20150105/47f400e3/attachment-0001.html
on the mailing list, where someone is getting weird bug reports from users
who accidentally make irregular arrays.


Reply to this email directly or view it on GitHub
#5303 (comment).

@argriffing
Copy link
Contributor Author

I also think we should seriously consider implementing #5353 (don't create object arrays unless dtype=object is explicitly specified) and making it so np.ndarray only treats list objects specially, with tuples and other sequence types being treats as array elements instead.

A similar question again on the mailing list http://mail.scipy.org/pipermail/numpy-discussion/2015-February/072240.html, regarding over-coercion of arrays of tuples into high dimensional arrays.

@njsmith
Copy link
Member

njsmith commented Feb 10, 2015

@agriffing: I think we have a pretty good pile of evidence that there's a
problem here now :-). Any interest in having a go at a making a solution?

On Mon, Feb 9, 2015 at 8:38 AM, argriffing notifications@github.com wrote:

I also think we should seriously consider implementing #5353
#5353 (don't create object arrays
unless dtype=object is explicitly specified) and making it so np.ndarray
only treats list objects specially, with tuples and other sequence types
being treats as array elements instead.

A similar question again on the mailing list
http://mail.scipy.org/pipermail/numpy-discussion/2015-February/072240.html,
regarding over-coercion of arrays of tuples into high dimensional arrays.


Reply to this email directly or view it on GitHub
#5303 (comment).

Nathaniel J. Smith -- http://vorpus.org

@toddrjen
Copy link

and making it so np.ndarray only treats list objects specially, with tuples and other sequence types being treats as array elements instead.

As an alternative (or supplement), might it be possible to have a function that explicitly creates object dtype arrays without converting subsequences? There could be an optional parameter to determine how many dimensions the array should have (which defaults to 1), with it automatically drilling down into subsequences until it reaches that level (and will raise an exception if the shapes don't match up to that level).

@wackywendell
Copy link
Contributor

One more use-case: ad, for use with automatic differentiation. I don't think it requires any "clever handling" of mismatched lists, though.

@njsmith
Copy link
Member

njsmith commented May 27, 2015

From glancing at the docs I'm 99% sure ad will be fine if we fix this bug. It might need a little more tweaking to live in a world where we fixed #5353 (the no object arrays without explicit request bug), but I'm not sure: generally an object can always opt in to being converted to an array in whatever way it likes, so it would only be objects that aren't designed to be used with numpy that would be affected.

@wackywendell
Copy link
Contributor

@njsmith Yes, I completely agree; thanks for checking on that too. I just thought it wouldn't hurt to add another example.

@amueller
Copy link

Stupid question, but the handling of object arrays is somewhat unrelated to the error message business, right? I expect fixing the explicit passing of dtype=object doesn't do anything for the np.array([[0, 1], [2]], dtype=int) case. I just opened and closed #6584, not sure if it deserves it's own issue. Maybe close this and have #5353 and #6584? not sure.

@WarrenWeckesser
Copy link
Member

Not that any more evidence of the potential confusion of the automatic conversion is needed at this point, but some of y'all might find it interesting. The somewhat surprising behavior described in the stackoverflow question http://stackoverflow.com/questions/38817357/randomly-select-item-from-list-of-lists-gives-valueerror#38817508 is a consequence of the "automagic" conversion to an object array.

@mattip
Copy link
Member

mattip commented Aug 23, 2019

Reopening, since #13913 did not correctly handle the corner case np.array([1, np.array([5])], dtype=int). A correct fix would add that to a test, and fix discover_dimensions so it does not set is_object in this case, then redoing #13913

@Light--
Copy link

Light-- commented Dec 18, 2019

How to fix this?

Traceback (most recent call last):
...
    test_set_x.set_value(framesArr, borrow=True)
  File "/home/user/.local/lib/python2.7/site-packages/theano/gpuarray/type.py", line 672, in set_value
    self.container.value = value
  File "/home/user/.local/lib/python2.7/site-packages/theano/gof/link.py", line 477, in __set__
    **kwargs)
  File "/home/user/.local/lib/python2.7/site-packages/theano/gpuarray/type.py", line 266, in filter_inplace
    converted_data = theano._asarray(data, self.dtype)
  File "/home/user/.local/lib/python2.7/site-packages/theano/misc/safe_asarray.py", line 34, in _asarray
    rval = np.asarray(a, dtype=dtype, order=order)
  File "/usr/local/lib/python2.7/dist-packages/numpy-1.11.2-py2.7-linux-x86_64.egg/numpy/core/numeric.py", line 482, in asarray
    return array(a, dtype, copy=False, order=order)
ValueError: ('setting an array element with a sequence.', 'Container name "None"')

@mattip
Copy link
Member

mattip commented Dec 18, 2019

@Light-- please open a new issue, and describe why you think this is a problem with NumPy. Maybe it is an issue with theano, or some other package you are using? What is a, dtype, 'order' in the call to np.asarray(a, dtype=dtype, order=order)?

@seberg
Copy link
Member

seberg commented Aug 22, 2020

The remaining corner case(s) are now also deprecated in master, so closing. The original case currently gives:

ValueError: setting an array element with a sequence. The requested array has an inhomogeneous shape after 1 dimensions. The detected shape was (2,) + inhomogeneous part.

which is probably not great, but much better than no information. Ideas for improvements welcome.

@seberg seberg closed this as completed Aug 22, 2020
@wackywendell
Copy link
Contributor

@seberg - thanks for the followup; indeed, that error message is much clearer!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.