Skip to content

BUG Numpy ufunc binop fix for subclasses with __numpy_ufunc__ (closes #4815) #5748

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 3 commits into from
May 6, 2015

Conversation

mhvk
Copy link
Contributor

@mhvk mhvk commented Apr 5, 2015

This fix for #4815 includes the solution by @pv and a test case to guard against regression.

pv and others added 3 commits April 4, 2015 21:47
These changes only affect objects defining __numpy_ufunc__. Other
objects use the __array_priority__ mechanism to decide whether
NotImplemented should be returned or not.

The binops previously returned NotImplemented even if other._r<op>__ is
ndarray.__r<op>__, rather than trying to evaluate the result via ufuncs.
This causes problems in binops of two objects that both subclass from
ndarray and define __numpy_ufunc__.

The solution added here makes the total logic as follows:

    def __binop__(self, other):
        if (hasattr(other, '__numpy_ufunc__') and
            not isinstance(other, self.__class__) and
            hasattr(other, '__rop__') and
            other.__class__.__rop__ is not self.__class__.__rop__):
            return NotImplemented
        return np.binop(self, other)
In particular, for a class S3(S2), where S2 defines __numpy_ufunc__, ensure
that, e.g., S3() + S2() does *not* pass control to S2 if S2 and S3 share the
same __radd__.
@njsmith
Copy link
Member

njsmith commented Apr 5, 2015

@pv, @mhvk, @cowlicks

def __binop__(self, other):
    try:
        return np.binop(self, other)
    except <NotImplementedError or something>:
        return NotImplemented

Can someone remind me what exactly is wrong with this?

One possiblity is that the plain old ufunc fallback behaviour is just broken in general, like maybe we don't actually have a way to tell when the np.binop() call actually failed in a way that should trigger a NotImplemented. I certainly don't understand how this works right now (ufuncs sometimes return NotImplemented themselves, which can't be the right thing?). I'm sure this could be cleaned up (and probably should be), but maybe there are nastier dragons lurking that I haven't thought of, and that mean we need to check some more things before dispatching to the ufunc. Any example dragons?

I re-read the NEP, and it has (a) a made-up-example that doesn't mean much on its own, and (b) a argument that implementing __numpy_ufunc__ is like subclassing, so it should get the special subclassing priority (ref). I don't find this very convincing on its own. One could just as well say that ndarray is like a subclass of whichever other array class its interacting with. They're peers. And anyway, the object defining __numpy_ufunc__ will get a chance to handle the operation, because it has a __numpy_ufunc__ method. The only reason to want to override __mul__ and __numpy_ufunc__ separately is if you want to violate the Liskov substitution principle, and while sometimes it makes sense to break the rules, I don't feel like putting in extra complexity just to make it easier to accomplish a particular type of "bad" design is justified.

That said, there is one very important specific place where we have to enable "bad" design, which is scipy.sparse's * = dot behavior. If we could do things over from scratch then I think we all agree that scipy.sparse would use * for elementwise multiplication, and dot/@ for matrix multiply, and AFAIK the plan is to somehow move to that model eventually, but first we need __numpy_ufunc__. So I think the question here isn't "what's the general principle for enabling array-like objects with incompatible semantics", the question is "how do we do a narrow, ideally time-limited hack to get scipy.sparse past this compatibility hump, ideally without encouraging bad behaviour on the part of other libraries". (There exist a few other libraries that do the matmul-* thing, but AFAICT they are few, not widely used, and are generally planning to switch to elementwise * in any case. And np.matrix is no problem b/c is subclasses ndarray so already gets priority on binop resolution.)

So I propose:

def __mul__(self, other):
    if other->tp_class->tp_name.startswith("scipy.sparse."):
        return NotImplemented
    else:
        <generic binop logic>

Ugly, isn't it :-). But simple, unabusable, and does the job!

AFAICT the logic above ought to fix #4815, as well as (obviously) the scipy.sparse issues, and is at least possible for me to understand, which the current code / this PR doesn't seem to be.

Even more obviously I'm surely missing something. What is it?

@mhvk
Copy link
Contributor Author

mhvk commented Apr 5, 2015

@njsmith - I have to agree the rationale right now is opaque. Rereading your e-mail and the NEP example you linked to, I also think that in that context my original PR (#4815) actually has some merit, in that if one argues that having __numpy_ufuc__ should be treated similarly as being an ndarray subclass, then if self has __numpy_ufunc__ as well, it is at the same level as other and thus there is no reason any more for special treatment of other.

Still, from a larger perspective, it would seem up to the implementers of __numpy_ufunc__ to deal with reverse operators. In this respect, as long as sparse can ensure that a normal multiplcation by a left-hand side returns NotImplemented, one should be fine, correct? Unfortunately, from an earlier trial I did, that seems not possible right now (see #5074). Maybe it is worth it to revisit that? Or would this wake one of those dragons?

p.s. But don't let this detract from the fact that the introduction of __numpy_ufunc__ is great! I very much hope it can be formally exposed in 1.10.

@charris charris added this to the 1.10.0 release milestone Apr 24, 2015
charris added a commit that referenced this pull request May 6, 2015
BUG Numpy ufunc binop fix for subclasses with __numpy_ufunc__ (closes #4815)
@charris charris merged commit eecb2e3 into numpy:master May 6, 2015
@charris
Copy link
Member

charris commented May 6, 2015

Merged. Thanks @pv @mhvk .

@njsmith
Copy link
Member

njsmith commented May 6, 2015

Uh. I am unsure whether this patch is the correct solution, but strongly suspect it is not.

@charris
Copy link
Member

charris commented May 6, 2015

@mhvk This is a fairly substantive addition, so it needs some documentation, both in doc/release/1.10.0-notes.rst and in the official documention, probably in doc/source/reference/arrays.classes.rst and numpy/doc/subclassing.py.

@charris
Copy link
Member

charris commented May 6, 2015

@njsmith Alternatives welcome ;) I vaguely recall a proposal to add a dictionary to ndarray, which I think might allow us to define real left and right operators for ndarray. Longer term, I'd like to see both masks and units as part fo the basic array, but I think that means funding some people to work on dynd. Both extended types and priorites will alway suffer from the promotion/precedence problem.

What I like about this solution is that it is isolated to the operators and is not very intrusive.

@njsmith
Copy link
Member

njsmith commented May 6, 2015

Units as part of the basic array makes no sense architecturally, units
should be part of the dtype. I could write more here but instead I will
nudge you to reply to my email about the "numpy summit" :-).

Regarding the actual patch: What I dislike about the solution is that it
leaves us with operator dispatch that is even more complicated than it used
to be, and it was already so complicated that no-one understood it :-). I
would really like to move towards the point where it is possible for actual
mortals to implement classes that interoperate with ndarray.

I am really uncomfortable with releasing 1.10 with new numpy_ufunc
complications, when we don't even know what the dispatch rules are. (In the
sense that they are not written down anywhere other than the code, and the
code is extremely convoluted.)

On Tue, May 5, 2015 at 8:13 PM, Charles Harris notifications@github.com
wrote:

@njsmith https://github.com/njsmith Alternatives welcome ;) I vaguely
recall a proposal to add a dictionary to ndarray, which I think might allow
us to define real left and right operators for ndarray. Longer term, I'd
like to see both masks and units as part fo the basic array, but I think
that means funding some people to work on dynd. Both extended types and
priorites will alway suffer from the promotion/precedence problem.

What I like about this solution is that it is isolated to the operators
and is not very intrusive.


Reply to this email directly or view it on GitHub
#5748 (comment).

Nathaniel J. Smith -- http://vorpus.org

@charris
Copy link
Member

charris commented May 6, 2015

I think the complexity here is comes from subclassing ndarray. Essentially, we are redefining subclassing ndarray as private inheritance (c++) (component or implementation) rather than "real" subclassing/polymorphism. We could make that official with another attribute __numpy_private__ ;)

@charris
Copy link
Member

charris commented May 6, 2015

Mind, I'm willing to revert this if we can come to a better solution.

@njsmith
Copy link
Member

njsmith commented May 6, 2015

My comment above is perhaps a better solution? I am also trying to write up an actual description of how we do dispatch for these methods currently in another window right now. (Surprise discovery so far: all of those weird cases where ufunc calls return NotImplemented? They are AFAICT totally useless, they can already be ripped out without causing any trouble.)

@pv
Copy link
Member

pv commented May 6, 2015

@njsmith: let's recap: (i) numpy_ufunc is logically independent of
the binop dispatch, (ii) Numpy's current binop dispatch is wonky.
.
As far as scipy.sparse is concerned, just ripping out all the changes
made to the binop dispatch, and returning to the legacy mechanism of
array_priority would be sufficient. Modulo the fact that inplace ops
should also respect array_priority, ie., the workaround in
PyArray_GenericBinaryFunction in number.c needs to be copied also to
PyArray_GenericInplaceBinaryFunction in the same file. A 'fun' exercise:
suppose a is np.ndarray, and b is np.matrix. What does a *= b do,
and what does b *= a do? Unfortunately fixing this causes a semantic
change :( (It is possible to preserve strict backward compat if the new
behavior is guarded with a check for numpy_ufunc.)
.
Comparison ops do not deal with array_priority either, which is also
wrong, but scipy.sparse doesn't care. Perhaps array_priority can be
stuffed also there.
.
The problem with your suggested def __mul__(self, other): return np.multiply(self, other) is that if other is an ndarray subclass, its
__rmul__ is never called in any situation. If it is not an ndarray
subclass, it would also never be called, were it not for the hack in
ufunc_object.c:2530 that basically does if hasattr(other, '__rmul__'): return NotImplemented. So in reality the simple solution, which was the
old behavior, has somewhat complex dispatch logic tucked inside the
ufunc implementation.
.
The motivation for all the recent changes in the binop logic is this:
suppose we did not have numpy_ufunc. Let's also suppose a static
linear order specified by array_priority is not a good solution. How
do we change the binop dispatch logic so that it works sensibly for
ndarray subclasses and other objects that have overridden binops? The
place where numpy_ufunc comes into play here is as an if hasattr(obj, '__numpy_ufunc__'): new_logic(); else old_logic(); guard,
to preserve backward compat.
.
So the point here is this: the binop dispatch mechanism with its
inconsistently implemented array_priority et al. is already fairly
complex, and as shown by experience and the bug reports, does not
actually work that well in practice.
.
numpy_ufunc is useful in making things work, but using it to fix the
issues in the current binops is, in my view, just a crutch. Yes, you can
do it, but should all this not be doable without having a second layer
of overridable behavior below the first one?
.
In the bigger picture, I think this is in the end a shortcoming in
Python's binop mechanism; doing it correctly needs some co-operative
best-practice approach that needs to be adhered to at all levels. Such
co-operative behavior already sneaked into Numpy's binop dispatch, but
it is not consistently implemented.

@mhvk
Copy link
Contributor Author

mhvk commented May 6, 2015

I think discussion is probably best done in #5844. I'm with @charris that at least this is the least intrusive change, but I can see the logic of trying to get things in a less rather than a more complicated direction!

@mhvk mhvk deleted the numpyufunc-binop-fix branch February 19, 2017 22:11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants