-
-
Notifications
You must be signed in to change notification settings - Fork 10.8k
ENH: change object-array comparisons to prefer OO->O unfuncs #14800
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
I wasn't sure of the proper labels |
a = np.array([1, np.array([1,2,3])], dtype=object) | ||
b = np.array([1, np.array([1,2,3])], dtype=object) | ||
self.assert_deprecated(op, args=(a, b), num=None) | ||
res = op(a, b) | ||
assert res.dtype == 'object' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This now returns an object array instead of deprecating.
if isinstance(r, bool): | ||
d = type(self)(r) | ||
else: | ||
d = r.view(type(self)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
since all
is np.logical_and.reduce
, the OO->O
mapping reduces the output to a boolean scalar, which has no view.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Alternatively, could pass keepdims=True
, and then throw away the dimension at the end
numpy/lib/nanfunctions.py
Outdated
@@ -99,7 +99,7 @@ def _replace_nan(a, val): | |||
|
|||
if a.dtype == np.object_: | |||
# object arrays do not support `isnan` (gh-9009), so make a guess | |||
mask = a != a | |||
mask = (a != a).astype(bool) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
xref gh-14802
a = np.array([1, np.array([1,2,3])], dtype=object) | ||
b = np.array([1, np.array([1,2,3])], dtype=object) | ||
self.assert_deprecated(op, args=(a, b), num=None) | ||
res = op(a, b) | ||
assert res.dtype == 'object' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
assert res.dtype == 'object' | |
assert res.dtype == object |
Let's put this in as a first step to finishing the deprecations in richcompare. It cleans up some of the potential blockers to the deeper changes. i changed the PR title appropriately |
Let's try, thanks Matti. I do expect that there may be some fallout downstream. Perhaps the docstrings of the affected ufuncs should have a |
This seems to have caused downstream failures in pandas. pandas-dev/pandas#29432 Essentially, pandas has a datetime subclass, Timestamp, which we'd like to compare with a datetime64[ns] ndarray and have it return an boolean ndarray. The same behavior occurs with # Before this PR
In [6]: import numpy as np
In [7]: import datetime
In [8]: np.array(['1970-01-01T00:00:00.000000000', '1970-01-01T00:00:00.000000001'],
...: dtype='datetime64[ns]') == datetime.datetime(1970, 1, 1)
Out[8]: array([False, False]) And with this PR # With #14800
In [6]: import numpy as np
In [7]: import datetime
In [8]: np.array(['1970-01-01T00:00:00.000000000', '1970-01-01T00:00:00.000000001'],
...: dtype='datetime64[ns]') == datetime.datetime(1970, 1, 1)
Out[8]: array([False, False], dtype=object) This caused failures when we tried to mask with that (now object-dtype) array. Should pandas update, or would that be considered a regression? |
@TomAugspurger That's unfortunate, but not unexpected. I don't think we can afford to break Pandas just like that. What do you think is the best way forward here? How long do we need to maintain compatibility with Pandas? I'd like to leave this in for a bit to see who else has problems, but will revert it for 1.18, which I plan to branch in a week or so. |
It's easy for us to workaround (pandas-dev/pandas#29433), so not at all a problem to leave it in master (though see my note below about performance). And just to be clear, this only failed a single test, so it's not world-ending from pandas' point of view. I worry somewhat that the pattern of mask = ndarray == <scalar object>
other_ndarray[mask] is somewhat common and will cause issues for other projects. Second concern: will this operation be slower now that we're getting an object-dtype ndarray of bools? Or were we always getting object-dtype, and NumPy converted them to |
@TomAugspurger Does using |
Is |
Oh, nevermind, it broke a lot more tests. Just didn't see it since the first one errored while pytest was collecting tests :) For example, this is object-dtype now, where it was previously bool In [4]: np.array(['a', 'b', 'c'], dtype="object") == 'c'
Out[4]: array([False, False, True], dtype=object) which does break the world for pandas & pandas users :) |
So I guess if we want to pursue this, it needs to go through a deprecation cycle. It does sound like we should revert the change for now. |
In my opinion, the breakage of |
MAINT: revert gh-14800, which gave precedence to OO->O over OO->?
We have many FutureDeprecationWarnings, DeprecationWarnings, and inconsistent behaviour in
array_richcompare
functions. xref gh-577. In discussion there, it was suggested to prioritizeOO->O
overOO->?
for object array richcompare, then to work through the rest of the warnings inarray_richcompare
and_failed_comparison_workaround
. For now Icompletely removedonly needed to addOO->?
andresult.astype(bool)
in a few code paths.The change did not adversely affect the sympy test suite.