Skip to content

NEP: Array function protocol #11189

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 10 commits into from
Jun 1, 2018
Merged

Conversation

mrocklin
Copy link
Contributor

@mrocklin mrocklin commented May 29, 2018

Written with Stephan Hoyer
after discussion with Stefan Van der Walt, Charles Harris, Matti Picus, Jaime Frio and Nathaniel Smith

Written with Stephan Hoyer after discussion with Stefan Van der Walt,
Charles Harris, Matti Picus, and Jaime Frio
@mrocklin mrocklin changed the title Add first draft of NEP 0016: array function protocol NEP: Array function protocol May 29, 2018
Copy link
Member

@shoyer shoyer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we save NEP-16 for Nathaniel's An abstract base class for identifying "duck arrays" in #10706 ? That would make this NEP-17.

@mrocklin
Copy link
Contributor Author

mrocklin commented May 29, 2018 via email


.. code-block:: python

def __array_function__(self, func, types, *args, **kwargs)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What if {'func', 'types'} & kwargs.keys() is non-empty?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed, I think we need to switch this to not use * and **` unpacking.

@mattip
Copy link
Member

mattip commented May 29, 2018

This should be merged as nep-0018 once the initial comments are handled. Deeper discussion, as I understand the process, should take place on the mailing list

.. code:: python

def broadcast_to(array, shape, subok=False):
success, value = do_array_function_dance(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it be better to handle this as a decorator? Perhaps we could use

ba = inspect.signature(func).bind(*args, **kwargs)
ba.apply_defaults()
array_function = ...
array_function(func, *ba.args, **bar.kwargs)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, if we only supported Python 3. But in Python 2, we would lose introspection of function arguments.

Nonetheless this should indeed be mentioned.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This example really does not look nice!! If this cannot be done more elegantly in python2, then I think we should aim for python3 only - this process is unlikely to be that fast and even if it were, skipping one release would be well worth having a much simpler interface. In particular, in python3 one could use a decorator that looks at annotations (which we'd like anyway!) - much more elegant.

Copy link
Contributor

@mhvk mhvk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The nature of types is really rather unclear. Should it perhaps be a list of types the ndarray implementation does not know what to do with? (This would include ndarray subclasses if the current function would do asarray)? More comments on mailing list.

2. Are all arguments of a type that we know how to handle?

If these conditions hold, ``__array_function__`` should return
implementation for ``func(*args, **kwargs)``. Otherwise, it should
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given the example below I think you mean "the result from calling its implementation of ...".

``__array_function__`` attribute on those inputs, and call those
methods appropriately until one succeeds.

This is one additional function of moderate complexity.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe comment here that speed will be of the essence.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In particular, the speed for regular arrays should not be impacted much even for small arrays.

.. code:: python

def broadcast_to(array, shape, subok=False):
success, value = do_array_function_dance(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This example really does not look nice!! If this cannot be done more elegantly in python2, then I think we should aim for python3 only - this process is unlikely to be that fast and even if it were, skipping one release would be well worth having a much simpler interface. In particular, in python3 one could use a decorator that looks at annotations (which we'd like anyway!) - much more elegant.

(Autograd, Tangent), higher order array factorizations (TensorLy), etc.
that add additional functionality on top of the Numpy API.

We would like to be able to use these libraries together, for example we
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add:

Finally, some ``ndarray`` subclasses add new behaviour (e.g., MaskedArray and astropy's Quantity),
which gives different meaning to functions. Indeed, MaskedArray reimplements some of the core
numpy functions, and Quantity has long-standing issues about compatibility with stacking functions.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure how best to weave this text into the NEP. To me this seems like orthogonal functionality.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My text may have been confusing, but I think __array_function__ is definitely very relevant for subclasses.

For instance, currently ma.extras is full of re-definitions of functions like stack - i.e., it follows the make-your-own-namespace "solution" that this NEP would like to address. This could be completely replaced by the __array_function__ proposed here!

Similarly, for Quantity we are currently relying on overrides or hope that a function uses ufuncs and does not do asarray - and we have no good solution for functions like concatenate (not being willing to provide our own namespace...). With __array_function__, this would be solved. For instance, for concatenate we can ensure that all inputs are converted to the same unit before doing the actual concatenation.

In other words, I probably should have written what looks much like this NEP for MaskedArray and Quantity alone - hence the request to mention them!

Aside: For both the above, I see possible implementations very much in terms of "1. prepare input arrays; 2. call original function on those; 3. set some properties on outputs" - step (2) is most logically implemented with a super() call, which is why I strongly suggest there be an ndarray.__array_function__.

@mhvk
Copy link
Contributor

mhvk commented May 30, 2018

I cannot find this mentioned on the mailing list, so then here:

  1. I'm rather unclear about the use of types. It can help me decide what to do, but I would still have to find the argument in question. If we pass anything at all, should it just be the argument itself that does not get immediately recognized by ndarray? Or, better still, perhaps a tuple of all arguments that were inspected?
  2. For subclasses, it would be very handy to have ndarray.__array_function__, so one can call super after changing arguments. (For __array_ufunc__, there was lots of question about whether this was useful, but it really is!!).
  3. This ndarray.__array_function__ might also help solve the problem of cases where coercion is fine: it could have an extra keyword argument (say coerce) that would call the function with coercion in place.
  4. Indeed, the ndarray.__array_function__ could just be used inside the "dance" function, and then the actual implementation of a given function would just be a separate, private one.

@shoyer
Copy link
Member

shoyer commented May 30, 2018 via email

@mattip
Copy link
Member

mattip commented May 30, 2018

The authors should send it out to the mailing list.

Still needed for merge (as per Nep Workflow, the merge should be sooner rather than later):

  • rebase against master to remove the failing test
  • rename the file to doc/neps/nep-0018-array-function-protocol.rst

@mhvk
Copy link
Contributor

mhvk commented May 31, 2018

Another general comment, since this is still not on the mailing list, yet I'm thinking about it now: Since speed for normal operation should be impacted as minimally as possible, there should be obvious ways to ensure no type checking dance is done. Some possible solutions (which I think should be in the NEP, even if as discounted options):

  • Two namespaces, one for the undecorated base functions, and one for the decorated ones. The idea would be that if one knows one is dealing with arrays only, one would do import numpy.array_only as np. The latter namespace would be the one automatically used for ndarray.__array_function__.
  • Decorator automatic insertion of a array_only=np._NoValue (or coerce and perhaps subok=... if not present) in the function signature, so that users who know that they have arrays only could pass array_only=True (name to be decided). This would be most useful if there were also some type of configuration parameter that could set the default of array_only.

@mrocklin
Copy link
Contributor Author

mrocklin commented Jun 1, 2018

rebase against master to remove the failing test
rename the file to doc/neps/nep-0018-array-function-protocol.rst

Done! My apologies for the delays on this @mattip .

@mattip mattip merged commit 1c50f24 into numpy:master Jun 1, 2018
@mattip
Copy link
Member

mattip commented Jun 1, 2018

Thanks @mrocklin

@shoyer
Copy link
Member

shoyer commented Jun 2, 2018

I've now posted the NEP to the mailing list for discussion
https://mail.python.org/pipermail/numpy-discussion/2018-June/078127.html

@mhvk with regards to ndarray.__array_function__ in particular, I don't have strong feelings here and am happy to differ to your judgment (you know I don't like subclassing). Certainly there is virtue in using the same approach as __array_ufunc__. I'll add a brief note on this in the next revision.

With regards to namespaces, please note the section on "Separate namespace" that is already in the NEP text.

Copy link
Contributor

@mhvk mhvk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Two smaller comments; more general ones on the mailing list.

``__array_function__``.

This protocol is intended to be a catch-all for NumPy functionality that
is not covered by existing protocols, like reductions (like ``np.sum``)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This particular reduction is an example of a ufunc - maybe stick to the more complicated example of np.mean?

would only include these as keyword arguments when they have changed
from default values. This is similar to `what NumPy already has
done <https://github.com/numpy/numpy/blob/v1.14.2/numpy/core/fromnumeric.py#L1865-L1867>`__,
e.g., for the optional ``keepdims`` argument in ``sum``:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might be worth mentioning that the dance decorator could do this part as well!

.. code:: python

def broadcast_to(array, shape, subok=False):
success, value = do_array_function_dance(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's a mini-thing, but one might as well have the dance return either a result or NotImplemented.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I’d be in favor of this. No need to complicate things with multiple out args, remember the order and so on.

@mattip
Copy link
Member

mattip commented Jul 23, 2018

Are we ready to start the approval process for NEP 18?

@shoyer
Copy link
Member

shoyer commented Jul 23, 2018

@mattip almost -- I'd like to add short paragraphs on class methods and dtypes to "Alternatives" to make it clear that we aren't aren't ruling out such options in the future.

@okuta
Copy link
Contributor

okuta commented Sep 17, 2018

Hello, I am a CuPy developer.
This feature is important for NumPy compatible library.
I create a PR for CuPy in experimental cupy/cupy#1650 . Please use it if you want.
Thanks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants