-
-
Notifications
You must be signed in to change notification settings - Fork 10.8k
Remove NotImplemented handling from the ufunc machinery (almost) #5864
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Added release notes. |
@njsmith - I looked at most commits (though void comparison is beyond me), and like the general idea. One small worry: with your pre-ufunc testing, I think the regular numeric cases are slowed down more than they are now. Though from a quick look, e.g., |
More importantly: Are you sure about what
(i.e., just as what would happen if I compared with Of course, while this is different from current behaviour and thus breaks backward compatibility, it is internally consistent, and the present deprecation warnings suggest it was discussed before. If so, no problem. But if the above would not be expected future behaviour, then maybe fcous this PR just on the removal of improper usage of |
… case The ndarray richcompare function has a special case for handling string dtypes (which currently cannot be handled by ufuncs). Traditionally this was handled by the ufuncs returning NotImplemented, and then falling through to a special case. By moving the special case to the top of the richcompare function, it becomes unnecessary for the ufuncs to return NotImplemented in this case.
The ndarray richcompare function has special case code for handling void dtypes (esp. structured dtypes), since there are no ufuncs for this. Previously, we would attempt to call the relevant ufunc (e.g. np.equal), and then when this failed (as signaled by the ufunc returning NotImplemented), we would fall back on the special case code. This commit moves the special case code to before the regular code, so that it no longer requires ufuncs to return NotImplemented. Technically, it is possible to define ufunc loops for void dtypes using PyUFunc_RegisterLoopForDescr, so technically I think this commit changes behaviour: if someone had registered a ufunc loop for one of these operations, then previously it might have been found and pre-empted the special case fallback code; now, we use the special-case code without even checking for any ufunc. But the only possible use of this functionality would have been if someone wanted to redefine what == or != meant for a particular structured dtype -- like, they decided that equality for 2-tuples of float32's should be different from the obvious thing. This does not seem like an important capability to preserve. There were also several cases here where on error, an array comparison would return a scalar instead of raising. This is supposedly deprecated, but there were call paths that did this that had no deprecation warning. I added those warnings.
8224fb0
to
cbd544e
Compare
@mhvk: Thanks for the comment (that now seems to have disappeared into the aether somewhere) about the reversed operators in np.ma -- it of course was left over from an earlier version that tried to duplicate all the details of the old ufunc special case, before I said screw it, any Regarding the speed issue: yeah, there are only two extra checks that are done in the common case: one is Regarding the proper behavior of
The reason you get a different result there is because it turns out that
(Not only is doing string coercion here broken, but... why 32 byte strings? Though at least that's a power of two -- OTOH if you make the numbers integers instead of floats, you get 21 byte strings. WTF? It's just fractally weird.) The reason the deprecations etc. are here in this PR is b/c there's no reasonable way to produce similar behaviour to what we have now without having that one |
ndarray special methods like __add__ have a special case where if the right argument is not an ndarray or subclass, and it has higher __array_priority__ than the left argument, then we return NotImplemented and let the right argument handle the operation. ufuncs have traditionally had a similar but different special case, where if it's a 2 input - 1 output ufunc, and the right argument is not an ndarray (exactly, subclasses don't count), and when converted to an ndarray ends up as an object array (presumably b/c it doesn't have a meaningful coercion route, though who knows), and it has a higher __array_priority__ than the left argument AND it has a __r<operation>__ attribute, then they return NotImplemented. In practice this latter special case is not used by regular ndarrays, b/c anytime it would need to be triggered, the former special case triggers first and the ufunc is never called. However, numpy.ma did not have the former special case, and was thus relying on the ufunc special case. This commit adds the special case to the numpy.ma special methods directly, so that they no longer depend on the quirky ufunc behaviour. It also cleans up the relevant test to things that actually should be true in general, instead of just testing some implementation details.
This was redundant/broken anyway. See numpygh-5844 for discussion. See the massive comment added to ufunc_object.c:get_ufunc_arguments for a discussion of the deprecation strategy here -- it turns out that array_richcompare is such a disaster zone that we can't quite wholly eliminate NotImplemented quite yet. But this removes most NotImplementeds, and lays the groundwork for eliminating the rest in a release or two.
Fixed py3 test failures (let's see if Travis agrees). |
The getpriority checks inside ufunc (which are removed here) are |
To be able to do this refactoring safely, it's probably necessary to |
I guess I don't know what it even means to respect Re: If there are other paths besides np.ma that depended on the ufunc I believe the np.ma and np.matrix tests cover at least (i) and (iv) on your list. (FYI your email originally said "(vi)" instead of "(iv)" but I went ahead and edited this on github for clarity -- hopefully this is okay.) Basically you just want checks that show that dispatch does get delegated given an appropriate |
If you return NotImplemented from iadd, Python does x = x + y. |
Ok, I see that the inplace op override only applies when lhs is an |
@njsmith - OK, so it is the present that is broken! Doing the comparison by array element is indeed fine, if done consistently. Still am somewhat amused by this, but it is consistent:
|
I checked your branch on astropy and there is one test that fails (but passes on current master). The equivalent non-astropy test is below; this suggests
|
Tagging this for 1.10 release. This functionality needs to be finalized so that it can be used in development. |
Modulo the odd failure I mention above, which may just need a "we don't care" decision, this looks good. |
While we are at this, how should we handle |
@charris: |
Yeah, I'm aware of that. I have a current version of matmul (for the I've also done work for matmul as a ufunc, but am not pursueing that at the moment. |
Py_INCREF(Py_NotImplemented); | ||
return Py_NotImplemented; | ||
} | ||
else { | ||
PyErr_SetString(PyExc_TypeError, | ||
"Not implemented for this type"); | ||
"XX can't happen, please report a bug XX"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Don't forget the XX
.
OK, this doesn't bother me. I'll update the |
Might want to make a quick review of the C style guide. |
@@ -3678,6 +3678,16 @@ def __repr__(self): | |||
return _print_templates['short_std'] % parameters | |||
return _print_templates['long_std'] % parameters | |||
|
|||
def _delegate_binop(self, other): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is an improvement, but the fact that it is needed worries me. What other applications are going to break because of this change?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will masked array still work without these changes, maybe with some warnings?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Might want to remove this bit up above also
# check it worked
if result is NotImplemented:
return NotImplemented
@njsmith Needs finishing. |
@njsmith I will need to leave this out of the 1.10 release if it isn't finished up this weekend. It is blocking progress on other PRs. |
Cleaned up in #5964, so closing this. |
Also added back some extended error messages that were in original PR numpy#5864.
In numpy-dev, the __index__ can no longer be used to multiply lists; for discussion about why, really, this was always broken & wrong, see numpy/numpy#5864.
In numpy-dev, the __index__ can no longer be used to multiply lists; for discussion about why, really, this was always broken & wrong, see numpy/numpy#5864.
In numpy-dev, the __index__ can no longer be used to multiply lists; for discussion about why, really, this was always broken & wrong, see numpy/numpy#5864.
In numpy-dev, the __index__ can no longer be used to multiply lists; for discussion about why, really, this was always broken & wrong, see numpy/numpy#5864.
In numpy-dev, the __index__ can no longer be used to multiply lists; for discussion about why, really, this was always broken & wrong, see numpy/numpy#5864.
We've long had consensus that returning
NotImplemented
from ufuncs was a bad and ugly hack, since it violates layering --NotImplemented
is the responsibility of__op__
methods, not ufuncs. And one of the things that came out of the discussion in gh-5844 is that theNotImplemented
handling inside the ufunc machinery was actually mostly useless and broken (see the first post on that report for analysis).Of course, it turns out it's not quite that simple -- there were some places, esp. in the trainwreck that is
array_richcompare
, which were depending on the current behavior.This patch series fixes the main pieces of code that used to rely on this behavior, and then removes as much of the
NotImplemented
handling as it can. And for what isn't removed, it adds deprecation or future warnings.In particular, there are a bunch of cases where
array_richcompare
returns bad and wrong results, like where you can compare two arrays (which should produce a boolean array), some failure occurs, and then the error gets thrown away and we end up returning a boolean scalar. Some of these were deprecated in 9b8f6c7, but this deprecates the rest. Probably the most controversial part of this is that this patch deprecates scalar ordering comparisons between unlike types, likenp.float64(1) < "foo"
, which are already an error on py3 but have traditionally returned some semi-arbitrary value on py2. To convince you that this is broken and we should stop supporting it, check this out:For a full analysis of how we handle all the legacy comparison cases, see the long comment that starts here:
numpy/numpy/core/src/umath/ufunc_object.c
Line 867 in 19b829a
NB this PR is probably easier to review if you read the commits one at a time in order, instead of jumping straight to the big diff.