-
-
Notifications
You must be signed in to change notification settings - Fork 10.8k
NEP: Proposal for array creation dispatching with __array_function__
#14715
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
This NEP proposes the introduction of a `like=` argument. The argument is to be added to all array creation functions. This allows the user to pass a reference array that will use `__array_function__` protocol to dispatch such functions to downstream libraries.
doc/neps/nep-0033-array-creation-dispatch-with-array-function.rst
Outdated
Show resolved
Hide resolved
doc/neps/nep-0033-array-creation-dispatch-with-array-function.rst
Outdated
Show resolved
Hide resolved
This newly proposed keyword shall be removed by the ``__array_function__`` | ||
mechanism from the keyword dictionary before dispatching. The purpose for this | ||
is twofold: | ||
|
||
1. The object will have no use in the downstream library's implementation; and |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This part feels questionable to me. I don't think it's a good idea to add special cases to a protocol, so I think it's a good idea to include like
in argument even if it wouldn't be useful.
I can also definitely imagine a library with multiple duck array types (e.g., pydata/sparse) that uses a single implementation that relies upon the like
argument.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In that case, we could just have a separate __array_function__
on each of our types. But in any case, I would rather the like=
kwarg was removed, it'd drive adaptability up quickly... But only for array creation functions, not others.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can also definitely imagine a library with multiple duck array types (e.g., pydata/sparse) that uses a single implementation that relies upon the like argument.
But that would still rely on the NumPy dispatching system, no? In that case, I would expect the implementation to find out internally what is the actual type it needs to create and call NumPy's API to create a new object, much like what we do now with _meta
in Dask.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In that case, we could just have a separate array_function on each of our types.
I'm not sure I understand your comment, but perhaps this is what I also tried to say in my comment just above?
But in any case, I would rather the like= kwarg was removed, it'd drive adaptability up quickly... But only for array creation functions, not others.
This is indeed my intent, I think the removal would be incredibly helpful for downstream libraries, even though I see why @shoyer is concerned about it. Perhaps would be nice to have more opinions here and see if they eventually converge in some direction.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure I understand your comment, but perhaps this is what I also tried to say in my comment just above?
I mean that PyData/Sparse has different types for the different kinds of arrays it uses. Each of those types could have its own __array_function__
if we wanted to specialize array creation functions at all, instead of using one overarching __array_function__
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
However I could think of da.Array
using such a keyword to copy e.g. chunk sizes and inner array types from the source array.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I mean that PyData/Sparse has different types for the different kinds of arrays it uses. Each of those types could have its own array_function if we wanted to specialize array creation functions at all, instead of using one overarching array_function.
Oh, I see now. Yeah, that would be an alternative, but that could be a big burden too. So maybe in that case just not removing like=
kwarg would make more sense?
However I could think of da.Array using such a keyword to copy e.g. chunk sizes and inner array types from the source array.
In the case of Dask, we can now use _meta
to identify the inner array and use it as the like
object. So there I think this wouldn't be a problem.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
True, you would get the like=
array in the form of self
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
but that could be a big burden too
Not really, I consider it a separation of concerns.
Edit: I should clarify: If I wanted separate implementations, adding the protocol on each type of array isn't a big burden. If I wanted a unified one, I could just have one on the ABC, as I do now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You're right, this makes sense. I was mostly saying this from the quick/seamless adaptability point-of-view.
How about let the default value of |
I don't know if I understand your comment correctly, but I think what you're suggesting is that the global value be a module name, something like |
@pentschev def asarray(a, dtype=None, order='C', like=np.ndarray/None):
pass then have an API to set the default ndarray type, then we can make all the following ndarray be the expected type import cupy as cp
import numpy as np
np.set_default_ndarray_to(cp.ndarray) # example method
np.ones((2, 2)) # -> a cupy ndarray |
Modifying global state sounds like a recipe for disaster. Any random package could call that function and cause the whole application to behave differently. |
Besides @mattip's comment (which I agree with, btw), the purpose of this NEP or |
into the C implementation itself. This is not the primary suggestion here due | ||
to its inherent complexity which would be difficult too long to describe in its | ||
entirety here, and too tedious for the reader. However, we leave that as an | ||
option open for discussion. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we may want to go the C route (or get out the cython idea again). (The main issue in python is that there is no simple fast kwarg parsing solution available right now; python has the clinic internally, but that does not really help us).
The thing is that if this is in python, it will add at least one more function overhead call, which is fairly significant for a no-op. For many library functions it will not matter, but for the simple numpy functions it may.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suspected people would lean this way, that's why I suggested this, from my perspective, I'm fine with either way. Maybe would be good to wait and see what others prefer too?
If we go the C path, IMO it doesn't make sense to extend the NEP quite significantly to point out all the adaptations needed to allow this. Unless people disagree with my point, I would only move the suggestion for Python implementation to a "rejected implementation proposal" section and mention that we are going the C way, but not describe all its details, as those details will likely not aggregate any important information to non-NumPy developers using that as a reference.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The only part that is currently not implemented in C is the public_api
helper function from insidenumpy.core.overrides.array_function_dispatch
:
def public_api(*args, **kwargs):
relevant_args = dispatcher(*args, **kwargs)
return implement_array_function(
implementation, public_api, relevant_args, args, kwargs)
Maybe we could save the overhead of parsing args
and kwargs
in public_api
if we wrote it in C? Honestly I'm not entirely sure.
This uses a closure over implementation
and public_api
, which is probably the main reason why it wasn't written in C in the first place.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry for being unresponsive. I have been thinking about this, what I suppose could work is to add the like
kwarg to the C implementation of each function, and then do the inverse process:
- If
like
is valid, call the C implementation of override; - The C implementation of override would have a callback to the Python implementation which would dispatch to the downstream library.
Admittedly, I don't know much about C Python and the current C implementation of the __array_function__
override, so I don't know whether this is something that is possible to achieve. Besides this, I also don't see any other alternatives to doing it the Python way, as per my proposal in the NEP itself. Thoughts on this @shoyer ?
This NEP should be designated NEP 35. NEP 33 is the dtype NEP, and NEP 34 was introduced 15 days ago by Matti. |
Thanks @stefanv for catching this. I've renamed it in the latest commit. |
Comments seem to have been addressed and discussion has stopped, so merging this NEP with status Draft. Discussion can still continue if needed of course. |
Thanks @rgommers for merging! |
Is there a need for additional discussion? If so, where should we have that? If not, what should be the next step here? |
Yep, @seberg took the lead on organizing a call - forwarded you the invite. |
This NEP proposes the introduction of a
like=
argument. The argument is to beadded to all array creation functions. This allows the user to pass a reference
array that will use
__array_function__
protocol to dispatch such functions todownstream libraries.