Skip to content

ENH: preserve subclasses in ufunc.outer #8662

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
Apr 19, 2019

Conversation

eric-wieser
Copy link
Member

Previously this was forcing to ndarray. Fixes #8661

@eric-wieser
Copy link
Member Author

eric-wieser commented Feb 22, 2017

Good ol' ma and matrixlib conspiring to ruin my day once again:

======================================================================
ERROR: test_masked_binary_operations (test_subclassing.TestSubclassing)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "C:\Py\lib\site-packages\numpy\ma\tests\test_subclassing.py", line 214, in test_masked_binary_operations
    self.assertTrue(isinstance(add.outer(mx, mx), mmatrix))
  File "C:\Py\lib\site-packages\numpy\ma\core.py", line 1102, in outer
    np.copyto(d, da, where=m)
ValueError: could not broadcast where mask from shape (1,5,1,5) into shape (1,5)
----------------------------------------------------------------------

Seems we need a PyArray_FROM_O_BUT_NOT_EVIL that never returns np.matrix objects

@mhvk
Copy link
Contributor

mhvk commented Feb 22, 2017

This is great! I was worried this might not go through __array_prepare__ and __array_wrap__ but I just checked and it works with Quantity!

It would probably good to test the subclass-with-method explicitly, but sadly the current state of testing is not all that great (there are some tests with subclasses in test_umath, but most of them don't really test the methods get run, as they just implement what the default would do anyway...).

p.s. If you have any ideas for ufunc.reduce, accumulate, at, reduceat....; for now, I've just been waiting for __[numpy|array]_ufunc__ (for which this all works very well).

@eric-wieser
Copy link
Member Author

eric-wieser commented Feb 22, 2017

Ok, what should this code do?

>>> x = np.matrix([[1, 2, 3]])
>>> np.add.outer(x, x)
array([[[[2, 3, 4]],

        [[3, 4, 5]],

        [[4, 5, 6]]]])

Right now, it returns an object of shape (1, 3, 1, 3). Of course, np.matrix can't be that shape. With this patch:

>>> np.add.outer(x, x)
matrix([[2, 4, 6]])

Do we really need to special case a pure-python subclass in the C code to make this work?

Or can we kill this class of bugs forever by allowing matrix.reshape to return an ndarray rather than squashing dimensions?

@mhvk
Copy link
Contributor

mhvk commented Feb 22, 2017

I just looked at the failures and noticed the same thing -- and, no, one doesn't need to patch the C code, but I fear this means matrix needs to acquire __array_prepare__ and __array_wrap__ methods, which do something when an outer method is called (perhaps just error out, since matrices cannot be 4-D)...
...
Though: checking, I don't think those methods actually know what specific way the ufunc is called:

class A(np.ndarray):
    def __array_prepare__(self, obj, context=None):
        print(context)
        return obj

a = np.arange(3).view(A)
np.add(a,a)
# (<ufunc 'add'>, (A([0, 1, 2]), A([0, 1, 2])), 0)

So, I think in __array_prepare__ one would have to check if obj has a shape that is inconsistent with a matrix (which means outer was called), and then either error or cast down to ndarray.

(Definitely needs more thought...).

@mhvk
Copy link
Contributor

mhvk commented Feb 22, 2017

p.s. It would be nice for context to contain the relevant information, but since we're presumably moving to __array_ufunc__ that may be a bit of a waste of effort.

@eric-wieser
Copy link
Member Author

eric-wieser commented Feb 22, 2017

p.s. It would be nice for context to contain the relevant information

It already does. All outer does is call reshape on its arguments, and then call the base ufunc.

this means matrix needs to acquire array_prepare and array_wrap

I don't think that'd help - the problem is that np.empty((1, 3)).view(matrix).reshape(1, 3, 1, 1).shape == (1, 3)), which fails before we even enter the ufunc. But reshape doesn't call either of those magic methods - all it calls is __array_finalize__

@mhvk
Copy link
Contributor

mhvk commented Feb 22, 2017

OK, I understand the path now. I guess there are only two real solutions:

  1. Ignore the matrix problem;
  2. Don't worry about subclasses in this path; soon __array_ufunc__ can deal with it.

In favour of (1) is that matrices are semi-deprecated anyway, now thought to have been a bad idea (and, really, to not have stacks of matrices is itself quite deadly). And it is nice to have other subclasses just work as expected.

In favour of (2) is that somebody out there may count on outer always returning ndarray, plus it would really be cool to get __array_ufunc__ in! (There's still a bit of a hang-up on what, if any, opt-out mechanism to provide; see #8247 and the numpy github discussion with the most comments ever, #5844)

@eric-wieser
Copy link
Member Author

I don't think __array_ufunc__ would even apply to outer?

outer is just a convenience method that does broadcasting, before actually invoking the ufunc, and that's exactly what it should be. The problem is that matrix objects don't broadcast

@mhvk
Copy link
Contributor

mhvk commented Feb 22, 2017

@eric-wieser - PyUFunc_CheckOverride is the one that defers to __array_ufunc__ if needed: https://github.com/numpy/numpy/pull/8662/files#diff-ded4bbd920a163719d3850ce89bbcfa9R5094; it passes method outer (indeed, it is partially because I implemented this for Quantity that I assumed, wrongly, that things would go through __array_prepare__ and __array_wrap__; note that subclasses that cannot be reshaped are explicitly mentioned as a reason... #8247 (comment)).

@eric-wieser
Copy link
Member Author

eric-wieser commented Apr 13, 2017

@mhvk: So, interesting question about outer - what does this do:

class A(ndarray):
    def __array_ufunc__(self, ufunc, method, *args, **kwargs):
        if method != '__call__':
            print("Not handled")
            return NotImplemented
        
        print('Handling')
        return super().__array_ufunc__(self, ufunc, method, *args, **kwargs)

a = np.zeros(2).view(A)
np.add.outer(a)

Does this execute both print statements, or just the first?

To me, it seems desirable that both are executed. Maybe merging this pr after the __array_ufunc__ one will achieve this

@mhvk
Copy link
Contributor

mhvk commented Apr 13, 2017

I just checked and currently it will just go through the "not handled" clause. This is probably the best behaviour -- for ndarray subclasses, by passing on to ndarray.__array_ufunc__, they still get the reshaping implementation, while others can do something else if need be.

@eric-wieser
Copy link
Member Author

by passing on to ndarray.array_ufunc, they still get the reshaping implementation

Not without this patch they don't, as outer discards subclasses before reshaping

@mhvk
Copy link
Contributor

mhvk commented Apr 13, 2017

But presumably subclasses have already discarded their subclass nature in their __array_ufunc__ implementation (otherwise, ndarray.__array_ufunc__ returns NotImplemented), and will put the subclass nature back on the result themselves.

@eric-wieser
Copy link
Member Author

I guess I'm envisaging an array with a custom reshape implementation that handles moving metadata,

Right now, the user is forced to reimplement the entirety of outer, even if all their changes can be described with just reshape and __call__

@mhvk
Copy link
Contributor

mhvk commented Apr 13, 2017

I don't quite see the problem: with __array_ufunc__, for ndarray subclasses, the implementation can be identical to that of __call__, as the super call will take care of the reshaping. For non-subclasses, indeed, the __array_ufunc__ will have to call the class's reshape method, but that doesn't seem to be such a burden. And, in any case, the PR here will not help non-subclasses, since these presumably should not be turned into arrays anyway.

@eric-wieser eric-wieser reopened this Jul 31, 2018
@eric-wieser
Copy link
Member Author

Restarting tests against the latest head, since they've moved around a bunch in the last year

@tylerjereddy tylerjereddy force-pushed the ufunc-outer-subclass branch from 6d3eaf7 to 126bd86 Compare April 17, 2019 01:35
@tylerjereddy
Copy link
Contributor

I got the tests to pass locally by preserving the old behavior for np.matrix, and using the new behavior Eric added otherwise. Not sure there was really a consensus solution in the discussion above, but the linked issue does persist in current master. I did rebase here because there were almost 5000 missing commits and the original commit was pretty simple.

Copy link
Contributor

@tylerjereddy tylerjereddy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would need to add unit tests even if reviewers could stomach this approach.

@@ -5396,6 +5396,8 @@ ufunc_outer(PyUFuncObject *ufunc, PyObject *args, PyObject *kwds)
PyArrayObject *ap1 = NULL, *ap2 = NULL, *ap_new = NULL;
PyObject *new_args, *tmp;
PyObject *shape1, *shape2, *newshape;
PyObject *numpy = PyImport_ImportModule("numpy");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can't imagine this will help performance any for ufunc method--though these kinds of imports are littered throughout NumPy source

@@ -5428,7 +5430,12 @@ ufunc_outer(PyUFuncObject *ufunc, PyObject *args, PyObject *kwds)
if (tmp == NULL) {
return NULL;
}
ap1 = (PyArrayObject *) PyArray_FromObject(tmp, NPY_NOTYPE, 0, 0);
if (PyObject_IsInstance(tmp, numpy_matrix)) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I also spent some time looking at ways to probe for attributes unique to np.matrix, in the hopes of keeping this check confined to the C level as much as possible.

Those with more matrix experience may have a better idea.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we care enough about matrix subclasses to do the IsInstance rather than a direct type check? (Assuming the latter is fast also for pure python classes.)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems wrong. Why are we special-casing matrix, which we want to deprecate anyway, deep inside the ufunc machinery? Matrix is no more special than any other ndarray subclass, and should be using a protocol to avoid getting here in the first place.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because if matrix isn't handled specially somehow, it will create a behavior change here. Trying to implement a correct __array_ufunc__ for a deprecated type is probably not worth the time vs just hacking around it until it can be removed.

@mattip
Copy link
Member

mattip commented Apr 17, 2019

Does this actually fix the linked issue?

>>> class foo(np.ndarray): pass
>>> np.multiply(np.arange(2).view(foo), np.arange(2).view(foo))
foo([0, 1])
>>> np.multiply.outer(np.arange(2).view(foo), np.arange(2).view(foo))
array([[0, 0],
       [0, 1]])

@charris
Copy link
Member

charris commented Apr 17, 2019

Just to note that the black hole imaging software used matrices in places.

@tylerjereddy
Copy link
Contributor

Does this actually fix the linked issue?

I don't understand--do you have a specific question? Yes, I ran the 3 lines of code in the linked issue to verify the fix, and I suspect Eric did on the original change too.

Adding tests eventually can solidify things.

* use an instance check to avoid
complications with the matrix subclass

* add unit test for allowing subclass
passthrough in ufunc.outer
@tylerjereddy tylerjereddy force-pushed the ufunc-outer-subclass branch from 126bd86 to e043bb9 Compare April 17, 2019 18:54
@tylerjereddy
Copy link
Contributor

I revised with an attempt at Eric's caching suggestion & a first draft of a possible unit test.

I didn't switch to a stricter matrix type check as per Marten's comment yet, is that the direction the majority wants?

@mattip
Copy link
Member

mattip commented Apr 18, 2019

I think this is good enough. Will merge soon unless someone objects

@mattip mattip merged commit 31e71d7 into numpy:master Apr 19, 2019
@mattip
Copy link
Member

mattip commented Apr 19, 2019

Thanks Eric

npy_cache_import(
"numpy",
"matrix",
&_numpy_matrix);
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing null check after this runs

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

ufunc.outer does not preserve subclasses
5 participants