Skip to content

NEP: Update NEP-18 to include the __skip_array_function__ attribute. #13305

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 4 commits into from
May 13, 2019

Conversation

shoyer
Copy link
Member

@shoyer shoyer commented Apr 11, 2019

Fixes #12974

This includes two minor changes to NEP-18 that will need to be implemented in NumPy:

  • The __numpy_implementation__ attribute is documented and officially
    supported as a way to defer to NumPy's own implementation.
  • I have noted that dispatcher functions should be given the same name as
    implementations, which should result in more interprettable error messages.

Fix GH12974

This includes two minor changes to NEP-18 that will need to be implemented in
NumPy:
- The ``__numpy_implementation__`` attribute is documented and officially
  supported as a way to defer to NumPy's own implementation.
- I have noted that dispatcher functions should be given the same name as
  implementations, which should result in more interprettable error messages.
Copy link
Contributor

@mhvk mhvk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mostly looks good. But there is the bike shed: I propose to just call it __implementation__ - partially because our other methods do not refer to numpy either, and partially because I think we would like to encourage eventually, e.g., scipy to wrap its functions similarly (already works for all its special functions with __array_ufunc__).

@shoyer
Copy link
Member Author

shoyer commented Apr 11, 2019

Mostly looks good. But there is the bike shed: I propose to just call it __implementation__ - partially because our other methods do not refer to numpy either, and partially because I think we would like to encourage eventually, e.g., scipy to wrap its functions similarly (already works for all its special functions with __array_ufunc__).

What I wanted to invoke with the name __numpy_implementation__ was a notion of "the implementation for NumPy's arrays", not "the implementation by NumPy the library". These mean the same thing within NumPy, but this is also compatible overrides for outside functions, e.g., scipy.linalg.solve.__numpy_implementation__. You could even imagine SciPy defining its own first class implementations for other array backends, e.g., scipy.linalg.solve.__cupy_implementation__.

(Either way, this is probably worth clarifying in the text.)

@mhvk
Copy link
Contributor

mhvk commented Apr 11, 2019

I'm not 100% sure that is right: suppose DASK wants to make its functions more general, is its default implementation a __numpy_implementation__?

On the other hand, I guess that is what you are getting add: with __numpy_implementation__ it is immediately obvious that one is trying to get the one that is relevant for arrays, not just for whatever DASK array. I think I'd still prefer __ndarray_implementation__ then, since that makes clear it is an implementation for the class, not for the project.

I do wonder if for this case we should recommend to cast oneself to ndarray explicitly and use ndarray.__array_function__(...).

@shoyer
Copy link
Member Author

shoyer commented Apr 11, 2019

I'm not 100% sure that is right: suppose DASK wants to make its functions more general, is its default implementation a __numpy_implementation__?

Dask defines it's own data structures, so it's generic implementations would be something like __dask_implementation__.

On the other hand, I guess that is what you are getting add: with __numpy_implementation__ it is immediately obvious that one is trying to get the one that is relevant for arrays, not just for whatever DASK array. I think I'd still prefer __ndarray_implementation__ then, since that makes clear it is an implementation for the class, not for the project.

Maybe __numpy_array_implementation__? Or the slightly abbreviated __numpy_array_impl__? I think including the name "numpy" is important. Informally, people think of numpy.ndarray as a "numpy array".t __ndarray_implementation__ on its own is too generic, especially when it's an attribute on a function that isn't defined by NumPy, e.g., what does scipy.linalg.solve.__ndarray_implementation__ mean?

Let me try adding some clarifying language about the intent of __numpy_implementation__. But certainly my intention is that this should not preclude adding more generic implementations to NumPy functions, e.g., we can imagine also defining __generic_implementation__ as something that works with duck typing.

I do wonder if for this case we should recommend to cast oneself to ndarray explicitly and use ndarray.__array_function__(...).

We've discussed this before, but this doesn't work as a fall-back in in __array_function__ because we don't have an easy way to find "this array" in *args and **kwargs.

@mhvk
Copy link
Contributor

mhvk commented Apr 11, 2019

OK, yes, should have remembered the problem with ndarray.__array_function__.

If we're abbreviating, probably __np_array_implementation__, but I think I prefer original suggestion over that.

@hameerabbasi
Copy link
Contributor

I prefer __array_implementation__... NumPy uses all of the __array_*__ anyway...

@hameerabbasi
Copy link
Contributor

This should probably be accompanied by consensus on the mailing list, as per governance rules.

@shoyer
Copy link
Member Author

shoyer commented Apr 12, 2019

I prefer __array_implementation__... NumPy uses all of the __array_*__ anyway...

I'm not sure this comparison is entirely apt. So far (until this PR) the __array_function__ protocol has been entirely agnostic as to the underlying array library -- you could just as easily use it with arbitrary functions defined on any arbitrary array type.

I guess the critical question is whether it makes sense to distinguish between array function implementations intended for different types. For example, consider a scipy function that is implemented by coercing its arguments to sparse arrays, e.g., scipy.sparse.linalg.splu. What should the implementation be?

  1. splu.__array_implementation__
  2. splu.__numpy_implementation__
  3. splu.__sparse_implementation__

(1) would preserve existing behavior for splu() on NumPy arrays, assuming we use __array_implementation__ inside np.ndarray.__array_function__.

(2) feels a little strange -- this implementation is not NumPy array specific. (Though it is what should be used on NumPy arrays.)

(3) could make sense, but there's no way NumPy is going to know to call a __sparse_implementation__ method. So at best, using a NumPy array would result in an error, unless we defined both __sparse_implementation__ and __numpy_implementation__.

The advantage of __sparse_implementation__ is that we can imagine it being used inside SparseArray.__array_function__, e.g.,

class SparseArray:
    def __array_function__(self, func, types, args, kwargs):
        if not all(issubclass(t, (SparseArray, np.ndarray) for t in types):
            return NotImplemented
        sparse_func = getattr(func, '__sparse_implementation__', None)
        if sparse_func is not None:
            return sparse_func(*args, **kwargs)
        # look elsewhere for implementations, or return NotImplemented

In contrast, SparseArray.__array_function__ would definitely not call __array_function__ or __numpy_implementation__ internally in a generic way, because densifying a sparse array is often catastrophic. I don't know if this is a real use-case, though.

@shoyer
Copy link
Member Author

shoyer commented Apr 14, 2019

Thinking more about this, I'm pretty firmly of the opinion that we should stick with __numpy_implementation__ rather than __array_implementation__.

This attribute is current used for three things:

  • providing the hook that allows np.ndarray.__array_function__ to call back out to an external implementation
  • providing a hook that other __array_function__ methods can use to call the implemented intended for NumPy arrays.
  • fast access to a non-dispatching version of NumPy functions, e.g., np.concatenate.__numpy_implementation__(list_of_all_numpy_arrays).

For all these use cases, the name "numpy" is more descriptive than "array." The situation reminds me of the mistake that is the name of the __array__ method. __array_function__ and __array_ufunc__ are also not great names, but least they could conceivably refer to functions/ufuncs defined by libraries that aren't NumPy. Also, unlike these __array* methods, __numpy_implementation__ will be on functions, not array types, so the consistently argument is weaker.

Yes, assuming the existence of a __numpy_implementation__ attribute means that dispatching with __array_function__ will bake in special treatment for NumPy arrays, but I think that is basically inevitable with our current design approach. __array_function__ is a dispatching protocol for the NumPy ecosystem.

The generic alternative would be to guard access to __numpy_implementation__ even inside ndarray.__array_function__, e.g.,

class ndarray:
    def __array_function__(self, func, types, args, kwargs):
        if not all(issubclass(t, ndarray) for t in types):
            # Defer to any non-subclasses that implement __array_function__
            return NotImplemented
        implementation = getattr(func, '__numpy_implementation__', None)
        if implementation is None:
            return NotImplemented
        return implementation(*args, **kwargs)

But I think this would be a little much. It is not too onerous to expect that libraries that reuse NumPy's dispatching protocol could also at least implement a dummy version of __numpy_implementation__.

@hameerabbasi
Copy link
Contributor

Thinking a bit more, I agree with @shoyer. It does make sense to "separate the interface from implementation" in a sense.

@mhvk
Copy link
Contributor

mhvk commented Apr 15, 2019

My original suggestion was plain __implementation__ or __ndarray_implementation__, but I see the disadvantages of the former in not being clear what implementation it is, and of the latter of most people not thinking of ndarray as the class, but rather as array (since that's the constructor). But __array_implementation__ is worse for the reasons given, so let's stick with __numpy_implementation__.

@shoyer
Copy link
Member Author

shoyer commented Apr 15, 2019

I emailed the mailing list: https://mail.python.org/pipermail/numpy-discussion/2019-April/079317.html

return NotImplemented
if not all(issubclass(t, ndarray) for t in types):
# Defer to any non-subclasses that implement __array_function__
return NotImplemented
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What if all the other types are python builtins?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

They don't implement __array_function__ so the old behavior still applies... But I believe that's lacking in this code example, but is implemented in the PR.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@shoyer Does the example need updating?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, it's all okay... It'll fall through, so the example is fine.

@charris charris changed the title NEP: Update NEP-18 to include the __numpy_implementation__ attribute. NEP: Update NEP-18 to include the __skip_array_function__ attribute. May 11, 2019
@charris
Copy link
Member

charris commented May 13, 2019

I think it was agreed that this is ready after the name change. Comments?

@mattip
Copy link
Member

mattip commented May 13, 2019

LGTM

Copy link
Contributor

@hameerabbasi hameerabbasi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@mattip mattip merged commit ff894ae into numpy:master May 13, 2019
@shoyer
Copy link
Member Author

shoyer commented May 20, 2019

Are NEPs still being automatically updated on numpy.org?

The old version of NEP-18 does not seem to have been updated yet: http://www.numpy.org/neps/nep-0018-array-function-protocol.html

@mattip
Copy link
Member

mattip commented May 21, 2019

Very strange. The built documents are pushed to a repo, and the nep there is the updated version.

shoyer added a commit to shoyer/numpy that referenced this pull request May 25, 2019
This reverts most of the changes from numpyGH-13305, and adds a brief discussion
of ``__skip_array_function__`` into the "Alternatives" section.

We still use NumPy's implementation of the function internally inside
``ndarray.__array_function__``, but I've given it a new name in the NEP
(``_implementation``) to indicate that it's a private API.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

NEP-18: Handling aliased NumPy functions
6 participants