Skip to content

Fixup gh 5864 #5964

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 9 commits into from
Jun 14, 2015
Merged

Fixup gh 5864 #5964

merged 9 commits into from
Jun 14, 2015

Conversation

charris
Copy link
Member

@charris charris commented Jun 13, 2015

Fix failing test in #5864 and make some style cleanups. Original message from Nathaniel follows.

We've long had consensus that returning NotImplemented from ufuncs was a bad and ugly hack, since it violates layering -- NotImplemented is the responsibility of op methods, not ufuncs. And one of the things that came out of the discussion in gh-5844 is that the NotImplemented handling inside the ufunc machinery was actually mostly useless and broken (see the first post on that report for analysis).

Of course, it turns out it's not quite that simple -- there were some places, esp. in the trainwreck that is array_richcompare, which were depending on the current behavior.

This patch series fixes the main pieces of code that used to rely on this behavior, and then removes as much of the NotImplemented handling as it can. And for what isn't removed, it adds deprecation or future warnings.

In particular, there are a bunch of cases where array_richcompare returns bad and wrong results, like where you can compare two arrays (which should produce a boolean array), some failure occurs, and then the error gets thrown away and we end up returning a boolean scalar. Some of these were deprecated in 9b8f6c7, but this deprecates the rest. Probably the most controversial part of this is that this patch deprecates scalar ordering comparisons between unlike types, like np.float64(1) < "foo", which are already an error on py3 but have traditionally returned some semi-arbitrary value on py2. To convince you that this is broken and we should stop supporting it, check this out:

Note: the name of this class affects the results below

In [1]: class quux(object):
...: pass
...:

In [2]: q = quux()

In [3]: np.string_("foo") < q
Out[3]: True

In [4]: np.asarray(np.string_("foo")) < q
Out[4]: False

For a full analysis of how we handle all the legacy comparison cases, see the long comment that starts here:

if (any_flexible && !any_flexible_userloops && !any_object) {

NB this PR is probably easier to review if you read the commits one at a time in order, instead of jumping straight to the big diff.

njsmith and others added 6 commits June 13, 2015 12:32
… case

The ndarray richcompare function has a special case for handling
string dtypes (which currently cannot be handled by
ufuncs). Traditionally this was handled by the ufuncs returning
NotImplemented, and then falling through to a special case. By moving
the special case to the top of the richcompare function, it becomes
unnecessary for the ufuncs to return NotImplemented in this case.
The ndarray richcompare function has special case code for handling
void dtypes (esp. structured dtypes), since there are no ufuncs for
this. Previously, we would attempt to call the relevant
ufunc (e.g. np.equal), and then when this failed (as signaled by the
ufunc returning NotImplemented), we would fall back on the special
case code. This commit moves the special case code to before the
regular code, so that it no longer requires ufuncs to return
NotImplemented.

Technically, it is possible to define ufunc loops for void dtypes
using PyUFunc_RegisterLoopForDescr, so technically I think this commit
changes behaviour: if someone had registered a ufunc loop for one of
these operations, then previously it might have been found and
pre-empted the special case fallback code; now, we use the
special-case code without even checking for any ufunc. But the only
possible use of this functionality would have been if someone wanted
to redefine what == or != meant for a particular structured dtype --
like, they decided that equality for 2-tuples of float32's should be
different from the obvious thing. This does not seem like an important
capability to preserve.

There were also several cases here where on error, an array comparison
would return a scalar instead of raising. This is supposedly
deprecated, but there were call paths that did this that had no
deprecation warning. I added those warnings.
ndarray special methods like __add__ have a special case where if the
right argument is not an ndarray or subclass, and it has higher
__array_priority__ than the left argument, then we return
NotImplemented and let the right argument handle the operation.

ufuncs have traditionally had a similar but different special case,
where if it's a 2 input - 1 output ufunc, and the right argument is
not an ndarray (exactly, subclasses don't count), and when converted
to an ndarray ends up as an object array (presumably b/c it doesn't
have a meaningful coercion route, though who knows), and it has a
higher __array_priority__ than the left argument AND it has a
__r<operation>__ attribute, then they return NotImplemented.

In practice this latter special case is not used by regular ndarrays,
b/c anytime it would need to be triggered, the former special case
triggers first and the ufunc is never called. However, numpy.ma did
not have the former special case, and was thus relying on the ufunc
special case. This commit adds the special case to the numpy.ma
special methods directly, so that they no longer depend on the quirky
ufunc behaviour.

It also cleans up the relevant test to things that actually should be
true in general, instead of just testing some implementation details.
This was redundant/broken anyway.

See numpygh-5844 for discussion. See the massive comment added to
ufunc_object.c:get_ufunc_arguments for a discussion of the deprecation
strategy here -- it turns out that array_richcompare is such a
disaster zone that we can't quite wholly eliminate NotImplemented
quite yet. But this removes most NotImplementeds, and lays the
groundwork for eliminating the rest in a release or two.
This is probably a good change in any case, and in particular might
help with debugging whatever bizarro thing is going wrong on Travis.
The NotArray test class needs to define a `__ne__` method, otherwise the
inherited Python 3 method will call `__eq__`, resulting in two rather
than one DeprecationWarning.
@charris
Copy link
Member Author

charris commented Jun 13, 2015

Another fixup in ma/core.py needs to be made before committing.

@charris
Copy link
Member Author

charris commented Jun 13, 2015

See also #2091.

Also added back some extended error messages that were in original
PR numpy#5864.
@charris
Copy link
Member Author

charris commented Jun 13, 2015

Additional tests are still needed before this can go in.@pv suggests the following

The getpriority checks inside ufunc (which are removed here) are responsible for making array_priority be respected for inplace ops. Some copypasting of the getpriority check in number.c is required to make it work again. . Note also that the check in number.c has PyArray_Check(other), but the check in the ufunc has PyArray_CheckExact(other), so without me testing it it looks like there's a behavior change.

To be able to do this refactoring safely, it's probably necessary to exercise the various branches in the code, and to have additional tests. It seems as if to cover the different branches, checks for operations involving (i) ndarray & ndarray subclass, (ii) ndarray & non-ndarray object, (iii) ndarray subclass & different ndarray subclass, (iv) ndarray subclass & other object.

@pv I assume these tests involve the various affected operators. Could you expand a bit on the first comment?

charris added 2 commits June 13, 2015 18:52
The following functions now raise a TypeError instead of returning
NotImplemented:

    np.power, np.add, np.subtract, np.multiply, np.divide,
    np.bitwise_and, np.bitwise_or, np.bitwise_xor, np.floor_divide,
    np.fmax, np.fmin, np.fmod, np.hypot, np.logaddexp,
    np.logaddexp2, np.logical_and, np.logical_or, np.logical_xor,
    np.maximum, np.minimum, np.mod

Functions that remain to be fixed:

    np.greater, np.greater_equal, np.less,
    np.less_equal, np.not_equal
This checks the four classes with different __array_priority__ values
against each other for the operators

    __add__, __radd__
    __sub__, __rsub__
    __mul__, __rmul__
    __pow__, __rpow__
    __div__, __rdiv__
    __mod__, __rmod__
    __truediv__, __rtruediv__
    __floordiv__, __rfloordiv__
    __and__, __rand__
    __xor__, __rxor__
    __or__, __ror__
    __lshift__, __rlshift__
    __rshift__, __rrshift__
    __eq__
    __ne__
    __gt__
    __ge__
    __lt__
    __le__

The classes are ndarray, two subclasses of ndarray, and a class unrelated
to ndarray. The combinations checked are

    ndarray -- ndarray subclass
    ndarray -- unrelated class
    ndarray subclass -- ndarray subclass
    ndarray subclass -- unrelated class
@charris
Copy link
Member Author

charris commented Jun 14, 2015

Tests for binary operators in classes of different types with differing values of __array_priority__ added. I think this PR is ready unless someone can recommend inplace operation tests.

charris added a commit that referenced this pull request Jun 14, 2015
@charris charris merged commit 899a2a2 into numpy:master Jun 14, 2015
@charris charris deleted the fixup-gh-5864 branch June 14, 2015 15:54
charris added a commit to charris/numpy that referenced this pull request Jun 14, 2015
@njsmith
Copy link
Member

njsmith commented Jun 15, 2015

Thanks, Chuck, for picking this up!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants