ENH: change object-array comparisons to prefer OO->O unfuncs #14800

mattip · 2019-10-29T21:18:10Z

We have many FutureDeprecationWarnings, DeprecationWarnings, and inconsistent behaviour in array_richcompare functions. xref gh-577. In discussion there, it was suggested to prioritize OO->O over OO->? for object array richcompare, then to work through the rest of the warnings in array_richcompare and _failed_comparison_workaround. For now I ~~completely removed OO->? and~~ only needed to add result.astype(bool) in a few code paths.

The change did not adversely affect the sympy test suite.

mattip · 2019-10-29T21:18:43Z

I wasn't sure of the proper labels

mattip · 2019-10-29T21:20:46Z

numpy/core/tests/test_deprecations.py

            a = np.array([1, np.array([1,2,3])], dtype=object)
            b = np.array([1, np.array([1,2,3])], dtype=object)
-            self.assert_deprecated(op, args=(a, b), num=None)
+            res = op(a, b)
+            assert res.dtype == 'object'


This now returns an object array instead of deprecating.

numpy/linalg/tests/test_regression.py

mattip · 2019-10-29T21:27:01Z

numpy/ma/core.py

+            if isinstance(r, bool):
+                d = type(self)(r)
+            else:
+                d = r.view(type(self))


since all is np.logical_and.reduce, the OO->O mapping reduces the output to a boolean scalar, which has no view.

Alternatively, could pass keepdims=True, and then throw away the dimension at the end

numpy/core/code_generators/generate_umath.py

mattip · 2019-10-30T08:50:18Z

numpy/lib/nanfunctions.py

@@ -99,7 +99,7 @@ def _replace_nan(a, val):

    if a.dtype == np.object_:
        # object arrays do not support `isnan` (gh-9009), so make a guess
-        mask = a != a
+        mask = (a != a).astype(bool)


xref gh-14802

eric-wieser · 2019-11-01T00:34:46Z

numpy/core/tests/test_deprecations.py

            a = np.array([1, np.array([1,2,3])], dtype=object)
            b = np.array([1, np.array([1,2,3])], dtype=object)
-            self.assert_deprecated(op, args=(a, b), num=None)
+            res = op(a, b)
+            assert res.dtype == 'object'


Suggested change

assert res.dtype == 'object'

assert res.dtype == object

mattip · 2019-11-02T19:52:25Z

Let's put this in as a first step to finishing the deprecations in richcompare. It cleans up some of the potential blockers to the deeper changes. i changed the PR title appropriately

charris · 2019-11-05T00:36:31Z

Let's try, thanks Matti. I do expect that there may be some fallout downstream. Perhaps the docstrings of the affected ufuncs should have a ..versionchanged:: someplace because the version dependence of the results may be confusing.

TomAugspurger · 2019-11-06T16:39:55Z

This seems to have caused downstream failures in pandas. pandas-dev/pandas#29432

Essentially, pandas has a datetime subclass, Timestamp, which we'd like to compare with a datetime64[ns] ndarray and have it return an boolean ndarray. The same behavior occurs with datetime.datetime

# Before this PR
In [6]: import numpy as np

In [7]: import datetime

In [8]: np.array(['1970-01-01T00:00:00.000000000', '1970-01-01T00:00:00.000000001'],
   ...:       dtype='datetime64[ns]') == datetime.datetime(1970, 1, 1)
Out[8]: array([False, False])

And with this PR

# With #14800
In [6]: import numpy as np

In [7]: import datetime

In [8]: np.array(['1970-01-01T00:00:00.000000000', '1970-01-01T00:00:00.000000001'],
   ...:       dtype='datetime64[ns]') == datetime.datetime(1970, 1, 1)
Out[8]: array([False, False], dtype=object)

This caused failures when we tried to mask with that (now object-dtype) array.

Should pandas update, or would that be considered a regression?

charris · 2019-11-06T16:49:49Z

@TomAugspurger That's unfortunate, but not unexpected. I don't think we can afford to break Pandas just like that. What do you think is the best way forward here? How long do we need to maintain compatibility with Pandas?

I'd like to leave this in for a bit to see who else has problems, but will revert it for 1.18, which I plan to branch in a week or so.

TomAugspurger · 2019-11-06T16:54:25Z

It's easy for us to workaround (pandas-dev/pandas#29433), so not at all a problem to leave it in master (though see my note below about performance).

And just to be clear, this only failed a single test, so it's not world-ending from pandas' point of view.

I worry somewhat that the pattern of

mask = ndarray == <scalar object>
other_ndarray[mask]

is somewhat common and will cause issues for other projects.

Second concern: will this operation be slower now that we're getting an object-dtype ndarray of bools? Or were we always getting object-dtype, and NumPy converted them to bool dtype before returning?

charris · 2019-11-06T17:08:42Z

@TomAugspurger Does using np.equal(a, b, dtype=bool) work for pandas? I think it uses the old version of the ufunc loop, so should not change the performance. @mattip should be back in a day or two to comment.

jbrockmendel · 2019-11-06T17:09:16Z

Is ndarray[datetime64].__equals__(Timestamp) no longer returning NotImplemented? If it still defers to Timestamp, then this is something we can address in Timestamp.__richcmp__

TomAugspurger · 2019-11-06T17:10:01Z

Oh, nevermind, it broke a lot more tests. Just didn't see it since the first one errored while pytest was collecting tests :)

For example, this is object-dtype now, where it was previously bool

In [4]: np.array(['a', 'b', 'c'], dtype="object") == 'c'
Out[4]: array([False, False, True], dtype=object)

which does break the world for pandas & pandas users :)

mattip · 2019-11-06T20:41:50Z

So I guess if we want to pursue this, it needs to go through a deprecation cycle. It does sound like we should revert the change for now.

eric-wieser · 2019-11-06T21:59:14Z

In my opinion, the breakage of a[a == x] for well-behaved objects is a pretty strong argument for this change being a bad idea.

MAINT: revert gh-14800, which gave precedence to OO->O over OO->?

WIP, DEP, ENH: finish richcompare changes from 1.10

647ea19

mattip added 01 - Enhancement component: numpy._core 07 - Deprecation labels Oct 29, 2019

mattip commented Oct 29, 2019

View reviewed changes

numpy/linalg/tests/test_regression.py Outdated Show resolved Hide resolved

mattip commented Oct 29, 2019

View reviewed changes

eric-wieser reviewed Oct 29, 2019

View reviewed changes

numpy/core/code_generators/generate_umath.py Show resolved Hide resolved

mattip mentioned this pull request Oct 30, 2019

ENH: Add object loops to isnan, isinf, and isfinite #14802

Closed

mattip commented Oct 30, 2019

View reviewed changes

mattip added 2 commits October 30, 2019 10:54

ENH: add OO->? loops, use np.compare(a, b, dtype=bool), add comments

5c77895

DOC: add release note

9aa8c47

eric-wieser reviewed Nov 1, 2019

View reviewed changes

eric-wieser approved these changes Nov 1, 2019

View reviewed changes

mattip changed the title ~~WIP, DEP, ENH: finish richcompare changes from 1.10~~ ENH: change object-array comparisons to prefer OO->O unfuncs Nov 2, 2019

charris merged commit 4393e0c into numpy:master Nov 5, 2019

TomAugspurger mentioned this pull request Nov 6, 2019

CI: Numpydev failing pandas-dev/pandas#29432

Closed

TomAugspurger mentioned this pull request Nov 6, 2019

CI: workaround numpydev bug pandas-dev/pandas#29433

Merged

charris mentioned this pull request Nov 6, 2019

REL: Revert #14800 for 1.18. #14839

Closed

mattip mentioned this pull request Nov 6, 2019

ENH: add isinf, isnan, fmin, fmax loops for datetime64, timedelta64 #14841

Merged

mattip added a commit to mattip/numpy that referenced this pull request Nov 6, 2019

MAINT: revert numpygh-14800, which gave precedence to OO->O over OO->?

e6a9c11

mattip mentioned this pull request Nov 6, 2019

MAINT: revert gh-14800, which gave precedence to OO->O over OO->? #14845

Merged

charris added a commit that referenced this pull request Nov 6, 2019

Merge pull request #14845 from mattip/revert-14800

718c63f

MAINT: revert gh-14800, which gave precedence to OO->O over OO->?

mattip mentioned this pull request Dec 4, 2019

Possible problem with ragged-array as object deprecation #15041

Closed

tcztzy mentioned this pull request Feb 12, 2020

Numpy dependency problem multiply-org/atmospheric_correction#15

Closed

mattip deleted the reorder-obj-comparison-loop branch November 2, 2020 08:29

mattip mentioned this pull request Aug 11, 2021

BUG: Remove logical object ufuncs with bool output #19640

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ENH: change object-array comparisons to prefer OO->O unfuncs #14800

ENH: change object-array comparisons to prefer OO->O unfuncs #14800

mattip commented Oct 29, 2019 •

edited

Loading

mattip commented Oct 29, 2019

mattip Oct 29, 2019

mattip Oct 29, 2019

eric-wieser Nov 1, 2019

mattip Oct 30, 2019

eric-wieser Nov 1, 2019

mattip commented Nov 2, 2019

charris commented Nov 5, 2019

TomAugspurger commented Nov 6, 2019

charris commented Nov 6, 2019

TomAugspurger commented Nov 6, 2019 •

edited

Loading

charris commented Nov 6, 2019

jbrockmendel commented Nov 6, 2019

TomAugspurger commented Nov 6, 2019 •

edited

Loading

mattip commented Nov 6, 2019

eric-wieser commented Nov 6, 2019

ENH: change object-array comparisons to prefer OO->O unfuncs #14800

ENH: change object-array comparisons to prefer OO->O unfuncs #14800

Conversation

mattip commented Oct 29, 2019 • edited Loading

mattip commented Oct 29, 2019

mattip Oct 29, 2019

Choose a reason for hiding this comment

mattip Oct 29, 2019

Choose a reason for hiding this comment

eric-wieser Nov 1, 2019

Choose a reason for hiding this comment

mattip Oct 30, 2019

Choose a reason for hiding this comment

eric-wieser Nov 1, 2019

Choose a reason for hiding this comment

mattip commented Nov 2, 2019

charris commented Nov 5, 2019

TomAugspurger commented Nov 6, 2019

charris commented Nov 6, 2019

TomAugspurger commented Nov 6, 2019 • edited Loading

charris commented Nov 6, 2019

jbrockmendel commented Nov 6, 2019

TomAugspurger commented Nov 6, 2019 • edited Loading

mattip commented Nov 6, 2019

eric-wieser commented Nov 6, 2019

mattip commented Oct 29, 2019 •

edited

Loading

TomAugspurger commented Nov 6, 2019 •

edited

Loading

TomAugspurger commented Nov 6, 2019 •

edited

Loading