Remove NotImplemented handling from the ufunc machinery (almost) #5864

njsmith · 2015-05-10T20:16:17Z

We've long had consensus that returning NotImplemented from ufuncs was a bad and ugly hack, since it violates layering -- NotImplemented is the responsibility of __op__ methods, not ufuncs. And one of the things that came out of the discussion in gh-5844 is that the NotImplemented handling inside the ufunc machinery was actually mostly useless and broken (see the first post on that report for analysis).

Of course, it turns out it's not quite that simple -- there were some places, esp. in the trainwreck that is array_richcompare, which were depending on the current behavior.

This patch series fixes the main pieces of code that used to rely on this behavior, and then removes as much of the NotImplemented handling as it can. And for what isn't removed, it adds deprecation or future warnings.

In particular, there are a bunch of cases where array_richcompare returns bad and wrong results, like where you can compare two arrays (which should produce a boolean array), some failure occurs, and then the error gets thrown away and we end up returning a boolean scalar. Some of these were deprecated in 9b8f6c7, but this deprecates the rest. Probably the most controversial part of this is that this patch deprecates scalar ordering comparisons between unlike types, like np.float64(1) < "foo", which are already an error on py3 but have traditionally returned some semi-arbitrary value on py2. To convince you that this is broken and we should stop supporting it, check this out:

# Note: the name of this class affects the results below
In [1]: class quux(object):
   ...:     pass
   ...: 

In [2]: q = quux()

In [3]: np.string_("foo") < q
Out[3]: True

In [4]: np.asarray(np.string_("foo")) < q
Out[4]: False

For a full analysis of how we handle all the legacy comparison cases, see the long comment that starts here:

numpy/numpy/core/src/umath/ufunc_object.c

Line 867 in 19b829a

if (any_flexible && !any_flexible_userloops && !any_object) {

NB this PR is probably easier to review if you read the commits one at a time in order, instead of jumping straight to the big diff.

njsmith · 2015-05-11T01:23:32Z

Added release notes.

mhvk · 2015-05-11T16:01:35Z

@njsmith - I looked at most commits (though void comparison is beyond me), and like the general idea. One small worry: with your pre-ufunc testing, I think the regular numeric cases are slowed down more than they are now. Though from a quick look, e.g., PyArray_ISSTRING would seem to be a pretty fast function, so probably this is not an issue. However, I don't understand the C code well enough to be sure, so thought I would mention it.

mhvk · 2015-05-11T16:12:48Z

More importantly: Are you sure about what == and != should do? I thought the python rule was that == always returns False if the objects cannot be compared with each other. Implicit in returning a full array of False would seem to be that an actual comparison is done for each element. In this indeed intended? It implies things that seem odd to me; e.g., should one expect that:

np.arange(3) == [0., 'bla', 2.]
# -> array([ True, False,  True], dtype=bool)

(i.e., just as what would happen if I compared with [0., 3., 2.]).

Of course, while this is different from current behaviour and thus breaks backward compatibility, it is internally consistent, and the present deprecation warnings suggest it was discussed before. If so, no problem. But if the above would not be expected future behaviour, then maybe fcous this PR just on the removal of improper usage of NotImplemented inside ufuncs, leaving the updated warnings/deprecations for a different PR?

… case The ndarray richcompare function has a special case for handling string dtypes (which currently cannot be handled by ufuncs). Traditionally this was handled by the ufuncs returning NotImplemented, and then falling through to a special case. By moving the special case to the top of the richcompare function, it becomes unnecessary for the ufuncs to return NotImplemented in this case.

The ndarray richcompare function has special case code for handling void dtypes (esp. structured dtypes), since there are no ufuncs for this. Previously, we would attempt to call the relevant ufunc (e.g. np.equal), and then when this failed (as signaled by the ufunc returning NotImplemented), we would fall back on the special case code. This commit moves the special case code to before the regular code, so that it no longer requires ufuncs to return NotImplemented. Technically, it is possible to define ufunc loops for void dtypes using PyUFunc_RegisterLoopForDescr, so technically I think this commit changes behaviour: if someone had registered a ufunc loop for one of these operations, then previously it might have been found and pre-empted the special case fallback code; now, we use the special-case code without even checking for any ufunc. But the only possible use of this functionality would have been if someone wanted to redefine what == or != meant for a particular structured dtype -- like, they decided that equality for 2-tuples of float32's should be different from the obvious thing. This does not seem like an important capability to preserve. There were also several cases here where on error, an array comparison would return a scalar instead of raising. This is supposedly deprecated, but there were call paths that did this that had no deprecation warning. I added those warnings.

njsmith · 2015-05-12T05:56:45Z

@mhvk: Thanks for the comment (that now seems to have disappeared into the aether somewhere) about the reversed operators in np.ma -- it of course was left over from an earlier version that tried to duplicate all the details of the old ufunc special case, before I said screw it, any __array_priority__ special case that's good enough for ndarray is surely also good enough for np.ma. Fixed.

Regarding the speed issue: yeah, there are only two extra checks that are done in the common case: one is if (PyArray_ISSTRING(self)) ..., which is a macro that expands to if (self->descr->type_num == NPY_STRING || self->descr->type_num == NPY_UNICODE), so incredibly cheap, and the other is if (PyArray_TYPE(self) == NPY_VOID), which has a similar expansion. So I don't think we need to worry about this.

Regarding the proper behavior of == and != for arrays: I think it's pretty well-established that the proper behavior for these is to perform an elementwise comparison. (This is indeed different from Python's default assumptions, but Python doesn't have arrays, and this is a place where we intentionally deviate.) My feeling is that the equivalent of Python's rule that all objects are comparable for equality is that all array elements should be comparable for equality, i.e. if you have two uncomparable dtypes then you should get an array of Falses. I was actually surprised by your example, because I thought that it would in fact return [True, False, True]... consider this, which works right now:

In [3]: np.arange(3) == np.asarray([0.0, "blah", 2.0], dtype=object)
Out[3]: array([ True, False,  True], dtype=bool)

The reason you get a different result there is because it turns out that asarray applied to that particular list does not return an object array the way I expected, but instead produces this terribleness:

In [4]: np.asarray([0.0, "blah", 2.0])
Out[4]: 
array(['0.0', 'blah', '2.0'], 
      dtype='|S32')

(Not only is doing string coercion here broken, but... why 32 byte strings? Though at least that's a power of two -- OTOH if you make the numbers integers instead of floats, you get 21 byte strings. WTF? It's just fractally weird.)

The reason the deprecations etc. are here in this PR is b/c there's no reasonable way to produce similar behaviour to what we have now without having that one NotImplemented deep inside the ufunc machinery, so the only way to get rid of it eventually is to add warnings around it while leaving it there for now. The alternative would be to just make all these cases into errors immediately, but that might make it harder to get this merged and I'd rather make some forward progress :-).

ndarray special methods like __add__ have a special case where if the right argument is not an ndarray or subclass, and it has higher __array_priority__ than the left argument, then we return NotImplemented and let the right argument handle the operation. ufuncs have traditionally had a similar but different special case, where if it's a 2 input - 1 output ufunc, and the right argument is not an ndarray (exactly, subclasses don't count), and when converted to an ndarray ends up as an object array (presumably b/c it doesn't have a meaningful coercion route, though who knows), and it has a higher __array_priority__ than the left argument AND it has a __r<operation>__ attribute, then they return NotImplemented. In practice this latter special case is not used by regular ndarrays, b/c anytime it would need to be triggered, the former special case triggers first and the ufunc is never called. However, numpy.ma did not have the former special case, and was thus relying on the ufunc special case. This commit adds the special case to the numpy.ma special methods directly, so that they no longer depend on the quirky ufunc behaviour. It also cleans up the relevant test to things that actually should be true in general, instead of just testing some implementation details.

This was redundant/broken anyway. See numpygh-5844 for discussion. See the massive comment added to ufunc_object.c:get_ufunc_arguments for a discussion of the deprecation strategy here -- it turns out that array_richcompare is such a disaster zone that we can't quite wholly eliminate NotImplemented quite yet. But this removes most NotImplementeds, and lays the groundwork for eliminating the rest in a release or two.

njsmith · 2015-05-12T08:38:10Z

Fixed py3 test failures (let's see if Travis agrees).

pv · 2015-05-12T08:46:42Z

The getpriority checks inside ufunc (which are removed here) are
responsible for making array_priority be respected for inplace ops.
Some copypasting of the getpriority check in number.c is required to
make it work again.
.
Note also that the check in number.c has PyArray_Check(other), but the
check in the ufunc has PyArray_CheckExact(other), so without me testing
it it looks like there's a behavior change.

pv · 2015-05-12T08:51:16Z

To be able to do this refactoring safely, it's probably necessary to
exercise the various branches in the code, and to have additional tests.
It seems as if to cover the different branches, checks for operations
involving (i) ndarray & ndarray subclass, (ii) ndarray & non-ndarray
object, (iii) ndarray subclass & different ndarray subclass, (iv)
ndarray subclass & other object.

njsmith · 2015-05-12T09:09:45Z

I guess I don't know what it even means to respect __array_priority__ for inplace ops. There is no __iradd__ (nor would it make a lot of sense to have one), so... can you elaborate? What does happen if you return NotImplemented from __iadd__? Which code exactly are you saying needs updating?

Re: PyArray_Check versus PyArray_CheckExact in the number.c and ufunc __array_priority__ special cases respectively: PyArray_CheckExact is a stricter check, so removing it from the ufunc code has no effect on any code that goes through the number.c call path. (Anything that might possibly hit the ufunc check will have already hit the number.c check and bailed out before calling the ufunc.) AFAICT the only code that cared about the ufunc check but didn't go through the number.c check is the np.ma special methods, so it's the only code affected by this, and I fixed that by copying the number.c check into np.ma. So this is a behavioural change -- it makes np.ma become more consistent with ndarray. If you're worried, we could instead copy the old ufunc check into np.ma... I suspect this would be a little more broken, but I don't care that much either way, because np.ma is always fragile and a bit broken in corner cases...

If there are other paths besides np.ma that depended on the ufunc __array_priority__ check that I missed, then those indeed might be broken. Do you see any? (I guess you're saying the in-place ops are such a path?)

I believe the np.ma and np.matrix tests cover at least (i) and (iv) on your list. (FYI your email originally said "(vi)" instead of "(iv)" but I went ahead and edited this on github for clarity -- hopefully this is okay.) Basically you just want checks that show that dispatch does get delegated given an appropriate __array_priority__ value, that should pass identically on both 1.9 and trunk?

pv · 2015-05-12T10:01:58Z

If you return NotImplemented from iadd, Python does x = x + y.
GenericInplace in number.c is probably a good place to insert the
array priority check.
.
The check in number.c is weaker due to negation, but indeed the one in
ufunc is only applies to object dtype (after cast). So the behavior
indeed is fairly broken in master, array_priority doesn't affect
ordering between different ndarray subclasses :(
.
Yes, I'd prefer to have explicit tests for the different cases. They are
perhaps partly covered by matrix/maskedarray tests, but it is easier to
see if a test coverage is complete if the tests are in a single place
targeted at a single things.
.
Apart from the inplace op issues, I don't immediately (don't have time
to take a good look right now, however) see other backward incompat
issues. I would guess that because array_priority works so
inconsistently, it's not very common to have code that explicitly relies
on this. For backward compat, testing against known modules dealing with
ndarray binops, scipy.sparse, cvxopt, etc. can probably smoke up
remaining issues, if any.

pv · 2015-05-12T11:31:52Z

Ok, I see that the inplace op override only applies when lhs is an
object array. Fairly unlikely anyone depends on this then. It appears I wrote
the above comments under the mis-assumption that the behavior was sane in
some respects.

mhvk · 2015-05-12T13:48:33Z

@njsmith - OK, so it is the present that is broken! Doing the comparison by array element is indeed fine, if done consistently. Still am somewhat amused by this, but it is consistent:

np.arange(2, 5) * np.array([0., 'bla', 2.], dtype='O')
#-> array([0.0, 'blablabla', 8.0], dtype=object)

mhvk · 2015-05-12T13:54:29Z

I checked your branch on astropy and there is one test that fails (but passes on current master). The equivalent non-astropy test is below; this suggests __index__ is no longer properly used (which in turn suggests list.__rmul__ is not tried -- NotImplemented hitting again?):

np.array([4]) * ['a', 'b', 'c']
# on master: -> ['a', 'b', 'c', 'a', 'b', 'c', 'a', 'b', 'c', 'a', 'b', 'c']
# on your branch:
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-2-99d078c7927a> in <module>()
----> 1 np.array([4]) * ['a', 'b', 'c']

TypeError: ufunc 'multiply' did not contain a loop with signature matching types dtype('<U21') dtype('<U21') dtype('<U21')

charris · 2015-05-14T18:01:09Z

Tagging this for 1.10 release. This functionality needs to be finalized so that it can be used in development.

mhvk · 2015-05-14T18:06:32Z

Modulo the odd failure I mention above, which may just need a "we don't care" decision, this looks good.

charris · 2015-05-14T20:34:15Z

While we are at this, how should we handle dot, __matmul__ and __rmatmul__. None of those are currently ufuncs. Has anyone tried overriding dot?

njsmith · 2015-05-14T23:28:54Z

@charris: __numpy_ufunc__ has a dirty secret: np.dot pretends to be a ufunc and calls __numpy_ufunc__ if present, even though dot is not actually a ufunc. This is a sin, of course, which we can expiate by eventually turning dot into a ufunc.

charris · 2015-05-15T00:27:16Z

Yeah, I'm aware of that. I have a current version of matmul (for the @ operator) that does the same thing. I'm a bit more concerned with the methods as they have both left and right versions like the other builtin operators.

I've also done work for matmul as a ufunc, but am not pursueing that at the moment.

charris · 2015-06-03T00:46:21Z

numpy/core/src/umath/ufunc_object.c

            Py_INCREF(Py_NotImplemented);
            return Py_NotImplemented;
        }
        else {
            PyErr_SetString(PyExc_TypeError,
-                                        "Not implemented for this type");
+                            "XX can't happen, please report a bug XX");


Don't forget the XX.

charris · 2015-06-03T00:57:44Z

OK, this doesn't bother me. I'll update the @ stuff to match when this goes in.

charris · 2015-06-03T00:58:11Z

Might want to make a quick review of the C style guide.

charris · 2015-06-03T02:24:25Z

numpy/ma/core.py

@@ -3678,6 +3678,16 @@ def __repr__(self):
            return _print_templates['short_std'] % parameters
        return _print_templates['long_std'] % parameters

+    def _delegate_binop(self, other):


This is an improvement, but the fact that it is needed worries me. What other applications are going to break because of this change?

Will masked array still work without these changes, maybe with some warnings?

Might want to remove this bit up above also

# check it worked if result is NotImplemented: return NotImplemented

charris · 2015-06-09T17:24:44Z

@njsmith Needs finishing.

mhvk · 2015-06-11T00:47:47Z

@njsmith - can I help in some way? I like the PR both generally and would like to rebase some ma stuff after this gets in (notably #3907).

charris · 2015-06-12T03:36:41Z

@njsmith I will need to leave this out of the 1.10 release if it isn't finished up this weekend. It is blocking progress on other PRs.

charris · 2015-06-13T20:16:03Z

Cleaned up in #5964, so closing this.

Also added back some extended error messages that were in original PR numpy#5864.

Fixup gh 5864

In numpy-dev, the __index__ can no longer be used to multiply lists; for discussion about why, really, this was always broken & wrong, see numpy/numpy#5864.

njsmith added 2 commits May 11, 2015 14:50

njsmith force-pushed the binop-dispatch branch 2 times, most recently from 8224fb0 to cbd544e Compare May 12, 2015 05:36

njsmith added 2 commits May 12, 2015 01:35

njsmith force-pushed the binop-dispatch branch from cbd544e to acd37c8 Compare May 12, 2015 08:37

njsmith mentioned this pull request May 13, 2015

BUG: MaskedArray ignores __array_priority__ #5230

Closed

charris added 01 - Enhancement component: numpy._core labels May 13, 2015

njsmith mentioned this pull request May 14, 2015

Decide on what the resolution rules for __op__/__rop__/__numpy_ufunc__ actually are #5844

Closed

charris added this to the 1.10.0 release milestone May 14, 2015

mhvk mentioned this pull request May 14, 2015

BUG allow subclasses in MaskedArray ufuncs -- for non-ndarray _data #3907

Merged

charris reviewed Jun 3, 2015
View reviewed changes

ahaldane mentioned this pull request Jun 12, 2015

BUG: segfault in array_richcompare #5960

Closed

charris mentioned this pull request Jun 13, 2015

Fixup gh 5864 #5964

Merged

charris closed this Jun 13, 2015

charris added a commit to charris/numpy that referenced this pull request Jun 13, 2015

STY: Minor style cleanups in tests and C code.

0aa3260

Also added back some extended error messages that were in original PR numpy#5864.

charris referenced this pull request Jun 14, 2015

Merge pull request #5964 from charris/fixup-gh-5864

899a2a2

Fixup gh 5864

mhvk mentioned this pull request Jun 16, 2015

Bug when multiplying a masked column with a unit astropy/astropy#2701

Closed

This was referenced Jun 17, 2015

Quantity updates for numpy dev astropy/astropy#3860

Merged

BUG Ensure masked object arrays can always return single items. #5962

Merged

njsmith mentioned this pull request Jun 22, 2015

functions return NotImplemented instead of raising error (Trac #1494) #2091

Closed

njsmith mentioned this pull request Jun 26, 2015

In-place operators in 1.9 "respected" __array_priority__; in current master they don't #6020

Closed

shoyer mentioned this pull request Oct 15, 2015

np.equal can still return NotImplemented #6479

Closed

mhvk mentioned this pull request Jan 26, 2016

Inconsistent behavior for ufuncs in Numpy v1.10.X #7122

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Remove NotImplemented handling from the ufunc machinery (almost) #5864

Remove NotImplemented handling from the ufunc machinery (almost) #5864

njsmith commented May 10, 2015

njsmith commented May 11, 2015

mhvk commented May 11, 2015

mhvk commented May 11, 2015

njsmith commented May 12, 2015

njsmith commented May 12, 2015

pv commented May 12, 2015

pv commented May 12, 2015

njsmith commented May 12, 2015

pv commented May 12, 2015

pv commented May 12, 2015

mhvk commented May 12, 2015

mhvk commented May 12, 2015

charris commented May 14, 2015

mhvk commented May 14, 2015

charris commented May 14, 2015

njsmith commented May 14, 2015

charris commented May 15, 2015

charris Jun 3, 2015

charris commented Jun 3, 2015

charris commented Jun 3, 2015

charris Jun 3, 2015

charris Jun 3, 2015

charris Jun 3, 2015

charris commented Jun 9, 2015

mhvk commented Jun 11, 2015

charris commented Jun 12, 2015

charris commented Jun 13, 2015

Remove NotImplemented handling from the ufunc machinery (almost) #5864

Remove NotImplemented handling from the ufunc machinery (almost) #5864

Conversation

njsmith commented May 10, 2015

njsmith commented May 11, 2015

mhvk commented May 11, 2015

mhvk commented May 11, 2015

njsmith commented May 12, 2015

njsmith commented May 12, 2015

pv commented May 12, 2015

pv commented May 12, 2015

njsmith commented May 12, 2015

pv commented May 12, 2015

pv commented May 12, 2015

mhvk commented May 12, 2015

mhvk commented May 12, 2015

charris commented May 14, 2015

mhvk commented May 14, 2015

charris commented May 14, 2015

njsmith commented May 14, 2015

charris commented May 15, 2015

charris Jun 3, 2015

Choose a reason for hiding this comment

charris commented Jun 3, 2015

charris commented Jun 3, 2015

charris Jun 3, 2015

Choose a reason for hiding this comment

charris Jun 3, 2015

Choose a reason for hiding this comment

charris Jun 3, 2015

Choose a reason for hiding this comment

charris commented Jun 9, 2015

mhvk commented Jun 11, 2015

charris commented Jun 12, 2015

charris commented Jun 13, 2015