Skip to content

REGR: np.argwhere on pd.Series raises ValueError #35334

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed

Conversation

simonjayhawkins
Copy link
Member

@simonjayhawkins simonjayhawkins commented Jul 18, 2020

not sure what behaviour we want (will add tests after discussion). We should be thinking of removing __array_wrap__ in favour of __array_ufunc__ and __array_function__ anyhow.

from https://numpy.org/doc/stable/user/basics.subclassing.html?highlight=__array_wrap__#array-wrap-for-ufuncs-and-other-functions

It is hoped to eventually deprecate these, but array_wrap is also used by other numpy functions and methods, such as squeeze, so at the present time is still needed for full functionality.

in 0.25.3 we returned the NumPy array (albeit with a warning)

>>> pd.__version__
'0.25.3'
>>>
>>> s = pd.Series(np.random.randn(5), index=["a", "b", "c", "d", "e"])
>>> s
a    1.640847
b    0.316918
c   -0.193584
d    1.473205
e    0.357223
dtype: float64
>>>
>>> np.argwhere(s < 0)
C:\Users\simon\Anaconda3\lib\site-packages\numpy\core\fromnumeric.py:61: FutureWarning: Series.nonzero() is deprecated and will be removed in a
 future version.Use Series.to_numpy().nonzero() instead
  return bound(*args, **kwds)
array([[2]], dtype=int64)
>>>

@simonjayhawkins simonjayhawkins added Regression Functionality that used to work in a prior pandas version Compat pandas objects compatability with Numpy or Python functions labels Jul 18, 2020
@jreback
Copy link
Contributor

jreback commented Jul 18, 2020

the 0.25.x behavior was wrong
as it returns a 2d ndarray

@simonjayhawkins
Copy link
Member Author

simonjayhawkins commented Jul 18, 2020

the 0.25.x behavior was wrong
as it returns a 2d ndarray

that's whats returned by numpy on the underlying arr, see numpy/numpy#13610 (numpy/numpy#15555 (comment))

>>> arr = s.__array__()
>>> arr
array([-2.12345154,  0.50973858,  0.11203232,  0.30739767, -0.78566872])
>>>
>>> np.argwhere(arr < 0)
array([[0],
       [4]], dtype=int64)
>>>

@simonjayhawkins
Copy link
Member Author

not broken anything so marking as ready for review. do we want an atomic PR to just address the np.argwhere or do we want to look at the other __array_wrap__ on Indexes as well and include more NumPy functions.

@simonjayhawkins simonjayhawkins marked this pull request as ready for review July 18, 2020 11:27
@simonjayhawkins
Copy link
Member Author

simonjayhawkins commented Jul 18, 2020

that's whats returned by numpy on the underlying arr, see numpy/numpy#13610 (numpy/numpy#15555 (comment))

I don't think those links are relevant.

np.argwhere returns an array with different dimensions see, https://numpy.org/doc/stable/reference/generated/numpy.argwhere.html and so we cannot box the result in a Series. (unless we reshape the data and break consistency with NumPy)

@simonjayhawkins
Copy link
Member Author

There is a problem (stemming from numpy only passing context for ufuncs) if the result dimensions and shape match the input, boxing can produce confusing results

>>> df = pd.DataFrame([[0, 1], [2, 0]], index=["a", "b"], columns=["foo", "bar"])
>>> df
   foo  bar
a    0    1
b    2    0
>>>
>>> np.argwhere(df > 0)
     a  b
foo  0  1
bar  1  0
>>>

here boxing in a DataFrame with the same labels makes no sense. without the context we don't know which function is being called.

maybe we just raise if context is None instead or always return a Numpy array and push on with implementing __array_function__ where we know the calling function.

@simonjayhawkins
Copy link
Member Author

@jreback closing this. There are issues with boxing the return of numpy operations eg. #35334 (comment) I think the best course of action is leave the wrapping as is to avoid breaking api changes. Any issues, such as #35331 will need to wait for array_function to be implemented. (even though that issue was a regression)

The fundamental issues are that the semantics of an array and a DataFrame are significantly different (and hence the return of np.argwhere shouldn't be boxed) and numpy only passing context for ufuncs (hence we don't know np.argwhere is the function calling array_wrap)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Compat pandas objects compatability with Numpy or Python functions Regression Functionality that used to work in a prior pandas version
Projects
None yet
Development

Successfully merging this pull request may close these issues.

BUG: np.argwhere on pandas series
2 participants