-
-
Notifications
You must be signed in to change notification settings - Fork 10.8k
Docs are unclear about behavior of nonzero
on zero-dim arrays.
#13522
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
cc @mwiebe who (as far as I can tell from git blame) implemented nonzero |
You're right that the current behavior is wrong, but I don't think there's a sensible choice to make here. >>> np.array([[True]]).nonzero()
(array([0], dtype=int64), array([0], dtype=int64))
>>> np.array([[False]]).nonzero()
(array([], dtype=int64), array([], dtype=int64)) >>> np.array([True]).nonzero()
(array([0], dtype=int64),)
>>> np.array([False]).nonzero()
(array([], dtype=int64),) >>> np.array(True).nonzero()
(array([0], dtype=int64),)
>>> np.array(False).nonzero()
(array([], dtype=int64),) Perhaps |
One thing to think about is what should |
@umanwizard: The simple answer is "never use |
@eric-wieser Should we change the docs informing the above as part of NOTE ? |
That sounds like a good idea, although you could probably use more helpful phrasing than I used above. |
Got it. @umanwizard Can you please tell the docs location so that I could proceed |
Sorry, I don't work on Numpy, I am not sure where the docs are in the repository. |
It lives in the docstring for |
Thanks |
Sorry to be coming late to this, but might it be good to think whether this should not be classified as a bug that we can fix? In particular, as @umanwizard wrote (#13522 (comment)), the docstring specifically states that you can extract non-zero elements by doing Though admittedly the return values are not that great. For nonzero scalar, it could also be an empty tuple, but that would seem to give the wrong impression. Unfortunately, just indexing with
For the scalar-zero case, it would be nicer to index with an empty array, but one cannot create a zero-dimensional empty array... |
The first extra dimension is actually correct, since nonzero returns a tuple of 1D arrays. But, neither of those fit with returning a tuple, so I am not sure we should do that (at least without some kwarg?). I guess the function is arguably wrong for 0-D currently, so a deprecation would make sense in that case. |
@seberg - The problem with any index array is that will not actually index a scalar, so I think it is not in any way more useful than the current state! Hence, my wondering whether we can make it useful, even if slightly ugly. Specifically, the start of the docstring is:
So, reading this strictly for a zero-dimensional array, I should get a tuple with zero arrays, i.e., EDIT: might as well add the explicit example:
|
If the semantics of the function are reduced to just "something that can be indexed with", it can be reduced greatly to just: def nonzero(a):
return a.astype(bool, copy=False) |
Well, |
Well, that would mean returning True and False, but that seems strange. I frankly am starting to think if we really want to move this, maybe an error is best, with some information of what is going on. If someone really wants to use it mostly for indexing, they probably should not be doing that to begin with ;) (although it can speed up things for repeated indexing operations).
|
I think we should deprecate this, with a message saying:
|
@eric-wieser - am also in favour of changing the behaviour for scalars - any preferences for what it would become? |
I'm leaning towards The something better would be returning something like: def broadcast_shapes(shapes):
# hack - we ought to expose this more directly somewhere
return np.lib.stride_tricks._broadcast_shape(*[np.empty(shape, dtype=[]) for shape in shapes])
class VIndex(tuple):
shape = ... #type: Tuple[int]
def __new__(cls, iter, *, shape=()):
self = super().__new__(cls, iter)
self.shape = broadcast_shapes([np.shape(i) for i in iter] + [shape])
return self
def __repr__(self):
return "VIndex({}, shape={})".format(tuple.__repr__(self), self.shape)
def __add__(self, other):
if isinstance(other, tuple):
other = VIndex(other)
if isinstance(other, VIndex):
ret = tuple.__new__(type(self), tuple.__add__(self, other))
ret.shape = broadcast_shapes([self.shape, other.shape])
return ret
return NotImplemented Used as: def fixed_nonzero(x):
old = nonzero(x)
shape = old[0].shape
if x.ndim == 0:
old = ()
return VIndex(old, shape=shape) Of course, now |
In my examples above, I've been a bit silly with respect to the indexing: To me, that is still slightly better than starting to error, but I feel either is better than sticking with the status quo. |
Deprecation PR is at #13708 |
From the docs, I would expect
nonzero
on a zero-dim array to return an empty tuple:Reproducing code example:
The text was updated successfully, but these errors were encountered: