Fix "numpy scalar * array-like == performance horror" #3501

inducer · 2013-07-03T21:23:50Z

This pull request fixes #3375. All tests that pass before this change also pass after. (There's a failure that seems unrelated.) The PR also adds a test that asserts that __array_priority__ gets looked at early enough to make sure no copy is attempted.

seberg · 2013-07-04T08:05:02Z

Seems not quite that simple. The tests fail. It seems that this change as is breaks compatibility for array subclasses (i.e. matrix and masked arrays)? I don't have time to look into it, my best guess right now is that the reflected op, i.e. __radd__ also calls np.add and then fails again, because whatever mechanism made sure that doesn't happen is somehow not active? Unfortunatly we have to be careful about what might be changed, there are probably no tests defining the current behaviour well.

inducer · 2013-07-04T15:32:04Z

Ok, second attempt. This stays completely out of the ufunc machinery and only applies to binary operators (as it should). Also see the lengthy comment in the code. Tests pass for me (I only ran numpy.test() the first time around)

seberg · 2013-07-04T16:57:58Z

Can you explain to me why the array is converted currently? As far as I
understand the ufunc machinery expects that such objects as yours got
converted to an 0-d object array (or such). Where and why does this
happen? If we cannot fix it, can we hook into there to make your type
become a 0-d object array instead of a doing a full array conversion
process? Just thinking, but hooking your fix in further up doesn't
really sound right to me right now, though I will not look into it
deeply now.

inducer · 2013-07-04T17:27:45Z

First, to answer your question: get_ufunc_arguments is called from these two sites:

That routine calls PyArray_FromAny on the ufunc arguments, which converts MyThing into a 3x3 (IOW: large) object array of "scalar" MyThings. That's the core of the issue, because it is O(N) in the number of array elements. Nothing produces a 0-D array, and I'm not sure that would be correct or would fix anything.

Second, in my opinion, having __array_priority__ logic in umath is just broken. It should be ripped out. A tell-tale sign of this is that the logic is activated only for two-argument ufuncs with one output. What's so special about them? Why should they behave differently from three-argument ufuncs?

Another thing that's broken in the current code and would have to be fixed along with the issue above is that lots of places (such as 1, 2) use PyArray_Type->tp_as_number->nb_multiply to call the multiply ufunc (etc.). Because of __array_priority__, the semantics of nb_multiply are no longer the same as the ufunc. Specifically, nb_multiply is required to to return NotImplemented in cases of priority inversion, whereas the ufunc has a job to do and just needs to do it, without asking questions about priorities.

My patch above is a valid band aid for #3375. It does not claim to fix the issues I've mentioned. I do claim that it doesn't break anything that was previously working. Moreover, having priority logic "further up" (as you called it) is actually less risky, because it only touches dispatch for binary operators. In other words, fewer code paths go through there than through the ufunc.

The issues I mentioned above should probably be filed as bugs of their own.

seberg · 2013-07-04T18:31:14Z

Good points, it is true that it doesn't even make sense that ufuncs can possibly return NotImplemented, etc. I agree that this whole thing isn't ideal and likely it is the only reasonable thing to fix it where you fixed it. It is basically only because of how python handles subclassing with operators (i.e. Python knows that the subclass should be asked to handle the operator) that this code works at all. To be honest, I am not quite certain that array priority is necessary at all :).

I think the whole thing needs more thinking. I would digg a bit into checking other options before deciding that this perfect for myself, but I don't have the time right now. This creates one change I think, that I don't really mind too much, but should maybe be deprecated? If YourThing does not implement the __rop__ it would seem that things break?

inducer · 2013-07-04T19:09:59Z

Array priority itself is a reasonable idea--it makes my example possible at all. Without it, numpy would not know that it's not supposed to try and convert MyThing into an array. (which is possible, but a terrible idea)

If some object declares an __array_priority__, but does not implement __rop__, an error will result as you point out. To me, that's fine. I would argue that that is correct behavior on the part of numpy, and a mistake on the part of the priority-declaring object, who claimed that it wanted to handle reverse arithmetic but then didn't. One could argue that numpy should check whether the corresponding __rop__ is implemented before returning NotImplemented, but no other part of Python works like that, and it inserts more expensive (and possibly brittle) checks into what is a performance-relevant code path. That's why I'm hesitant.

charris · 2013-07-08T01:32:55Z

numpy/core/src/multiarray/number.c

@@ -209,6 +209,49 @@
        Py_INCREF(Py_NotImplemented);
        return Py_NotImplemented;
    }
+
+    if (!PyArray_Check(m2)) {
+          /* Catch priority inversion and punt, but only if it's guaranteed


Blank first line in comment.

The comment is very large, I think you should omit the extended argument and just have a short explanation of what happens. The argument would look better as part of the pull request explanation, but it's late for that ;)

Basically just the first paragraph.

charris · 2013-07-08T01:52:33Z

@seberg Are you happy with this?

seberg · 2013-07-08T08:50:50Z

It seems like it is the right spot for such a check. I am still a little worried that breaking the case of people defining array priority but not supporting all the operators actually might affect code out there. I don't mind that as such, so maybe just try and see... What I am thinking of is mostly objects defining __array__ and for some reason also __array_priority__, i.e. objects in the general direction of PIL images or so? Though those shouldn't need array priority defined, so probably its fine.

njsmith · 2013-07-08T14:46:11Z

numpy/core/src/multiarray/number.c

+           * Catch priority inversion and punt, but only if it's guaranteed
+           * that we were called through m1 and the other guy is not an array
+           * at all. Note that some arrays need to pass through here even
+           * with priorities inverted, for example: float(17) * np.matrix(...)


float(17) isn't an array so I guess there's a mistake in the comment?

Nope, that's correct. float(17) * np.matrix(...) first ends up in np.matrix.__rmul__, which calls np.dot, which calls PyArray_MatrixProduct2, which casts float(17) into an array scalar an then calls this place. :)

inducer · 2013-07-08T15:07:31Z

Btw, I think I've addressed @charris's concerns in the commit I pushed earlier.

charris · 2013-07-08T15:41:00Z

OK, one more thing. Because this changes behavior it should be mentioned in doc\release\1.8.0-notes.rst, probably under the Changes and maybe under Compatibility notes also if you think this might affect current code, as in combining __array_priority__ and __array__.

For next time note the standard prefixes for commit messages in doc/source/dev/gitwash/development_workflow.rst.

Fixes numpy#3375

inducer · 2013-07-09T03:53:33Z

Revised, with change comments and better commit message, as requested.

Fix "numpy scalar * array-like == performance horror"

inducer · 2013-07-09T20:13:19Z

Thanks for your help getting this merged!

This was referenced Jul 5, 2013

Don't (try to) handle __array_priority__ in ufunc_object.c (badly) #3502

Closed

array_multiply != the multiply ufunc #3503

Closed

charris reviewed Jul 8, 2013
View reviewed changes

njsmith reviewed Jul 8, 2013
View reviewed changes

BUG: Check earlier for higher priority in binary operators, add test

4441bdd

Fixes numpy#3375

charris added a commit that referenced this pull request Jul 9, 2013

Merge pull request #3501 from inducer/master

6eeb612

Fix "numpy scalar * array-like == performance horror"

charris merged commit 6eeb612 into numpy:master Jul 9, 2013

seberg mentioned this pull request Feb 21, 2014

Why does __array_priority__ not take precedence over __array__? #3164

Closed

Uh oh!

Fix "numpy scalar * array-like == performance horror" #3501

Fix "numpy scalar * array-like == performance horror" #3501

Uh oh!

Conversation

inducer commented Jul 3, 2013

Uh oh!

seberg commented Jul 4, 2013

Uh oh!

inducer commented Jul 4, 2013

Uh oh!

seberg commented Jul 4, 2013

Uh oh!

inducer commented Jul 4, 2013

Uh oh!

seberg commented Jul 4, 2013

Uh oh!

inducer commented Jul 4, 2013

Uh oh!

charris Jul 8, 2013

Choose a reason for hiding this comment

Uh oh!

charris Jul 8, 2013

Choose a reason for hiding this comment

Uh oh!

charris Jul 8, 2013

Choose a reason for hiding this comment

Uh oh!

charris commented Jul 8, 2013

Uh oh!

seberg commented Jul 8, 2013

Uh oh!

njsmith Jul 8, 2013

Choose a reason for hiding this comment

Uh oh!

inducer Jul 8, 2013

Choose a reason for hiding this comment

Uh oh!

inducer commented Jul 8, 2013

Uh oh!

charris commented Jul 8, 2013

Uh oh!

inducer commented Jul 9, 2013

Uh oh!

inducer commented Jul 9, 2013

Uh oh!

Uh oh!