Skip to content

NEP: Proposal for array creation dispatching with __array_function__ #14715

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 4 commits into from
Apr 11, 2020

Conversation

pentschev
Copy link
Contributor

This NEP proposes the introduction of a like= argument. The argument is to be
added to all array creation functions. This allows the user to pass a reference
array that will use __array_function__ protocol to dispatch such functions to
downstream libraries.

This NEP proposes the introduction of a `like=` argument. The argument is to be
added to all array creation functions. This allows the user to pass a reference
array that will use `__array_function__` protocol to dispatch such functions to
downstream libraries.
@pentschev
Copy link
Contributor Author

@shoyer @rgommers @seberg after being sidetracked for some time, I finally got the chance to finish the proposal that we discussed in the context of #14441.

cc @mrocklin @jakirkham

Comment on lines 63 to 67
This newly proposed keyword shall be removed by the ``__array_function__``
mechanism from the keyword dictionary before dispatching. The purpose for this
is twofold:

1. The object will have no use in the downstream library's implementation; and
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This part feels questionable to me. I don't think it's a good idea to add special cases to a protocol, so I think it's a good idea to include like in argument even if it wouldn't be useful.

I can also definitely imagine a library with multiple duck array types (e.g., pydata/sparse) that uses a single implementation that relies upon the like argument.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In that case, we could just have a separate __array_function__ on each of our types. But in any case, I would rather the like= kwarg was removed, it'd drive adaptability up quickly... But only for array creation functions, not others.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can also definitely imagine a library with multiple duck array types (e.g., pydata/sparse) that uses a single implementation that relies upon the like argument.

But that would still rely on the NumPy dispatching system, no? In that case, I would expect the implementation to find out internally what is the actual type it needs to create and call NumPy's API to create a new object, much like what we do now with _meta in Dask.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In that case, we could just have a separate array_function on each of our types.

I'm not sure I understand your comment, but perhaps this is what I also tried to say in my comment just above?

But in any case, I would rather the like= kwarg was removed, it'd drive adaptability up quickly... But only for array creation functions, not others.

This is indeed my intent, I think the removal would be incredibly helpful for downstream libraries, even though I see why @shoyer is concerned about it. Perhaps would be nice to have more opinions here and see if they eventually converge in some direction.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure I understand your comment, but perhaps this is what I also tried to say in my comment just above?

I mean that PyData/Sparse has different types for the different kinds of arrays it uses. Each of those types could have its own __array_function__ if we wanted to specialize array creation functions at all, instead of using one overarching __array_function__.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

However I could think of da.Array using such a keyword to copy e.g. chunk sizes and inner array types from the source array.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I mean that PyData/Sparse has different types for the different kinds of arrays it uses. Each of those types could have its own array_function if we wanted to specialize array creation functions at all, instead of using one overarching array_function.

Oh, I see now. Yeah, that would be an alternative, but that could be a big burden too. So maybe in that case just not removing like= kwarg would make more sense?

However I could think of da.Array using such a keyword to copy e.g. chunk sizes and inner array types from the source array.

In the case of Dask, we can now use _meta to identify the inner array and use it as the like object. So there I think this wouldn't be a problem.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

True, you would get the like= array in the form of self.

Copy link
Contributor

@hameerabbasi hameerabbasi Oct 16, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

but that could be a big burden too

Not really, I consider it a separation of concerns.

Edit: I should clarify: If I wanted separate implementations, adding the protocol on each type of array isn't a big burden. If I wanted a unified one, I could just have one on the ABC, as I do now.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're right, this makes sense. I was mostly saying this from the quick/seamless adaptability point-of-view.

@hgt312
Copy link
Contributor

hgt312 commented Oct 18, 2019

How about let the default value of like to be a global editable value?

@pentschev
Copy link
Contributor Author

How about let the default value of like to be a global editable value?

I don't know if I understand your comment correctly, but I think what you're suggesting is that the global value be a module name, something like numpy or cupy. The __array_function__ protocol dispatches based on the function name and array type, so that's really not how it's expected to work. Maybe what you're looking for is #14389.

@hgt312
Copy link
Contributor

hgt312 commented Oct 18, 2019

@pentschev
I think that a creation function's implementation may like this:

def asarray(a, dtype=None, order='C', like=np.ndarray/None):
    pass

then have an API to set the default ndarray type, then we can make all the following ndarray be the expected type

import cupy as cp
import numpy as np

np.set_default_ndarray_to(cp.ndarray)  # example method
np.ones((2, 2))  # -> a cupy ndarray

@mattip
Copy link
Member

mattip commented Oct 18, 2019

Modifying global state sounds like a recipe for disaster. Any random package could call that function and cause the whole application to behave differently.

@pentschev
Copy link
Contributor Author

Besides @mattip's comment (which I agree with, btw), the purpose of this NEP or __array_function__ for that matter isn't this. What you're suggesting is more or less what uarray suggests. The like= parameter here takes a reference array, not a type, this reference array can be used by __array_function__ to dispatch. A library such as Dask could for example get a CuPy array passed by the user and dispatch a creation of a new array internally to CuPy via the protocol, all this with Dask internally never knowing about CuPy's existence directly.

into the C implementation itself. This is not the primary suggestion here due
to its inherent complexity which would be difficult too long to describe in its
entirety here, and too tedious for the reader. However, we leave that as an
option open for discussion.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we may want to go the C route (or get out the cython idea again). (The main issue in python is that there is no simple fast kwarg parsing solution available right now; python has the clinic internally, but that does not really help us).

The thing is that if this is in python, it will add at least one more function overhead call, which is fairly significant for a no-op. For many library functions it will not matter, but for the simple numpy functions it may.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suspected people would lean this way, that's why I suggested this, from my perspective, I'm fine with either way. Maybe would be good to wait and see what others prefer too?

If we go the C path, IMO it doesn't make sense to extend the NEP quite significantly to point out all the adaptations needed to allow this. Unless people disagree with my point, I would only move the suggestion for Python implementation to a "rejected implementation proposal" section and mention that we are going the C way, but not describe all its details, as those details will likely not aggregate any important information to non-NumPy developers using that as a reference.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The only part that is currently not implemented in C is the public_api helper function from insidenumpy.core.overrides.array_function_dispatch:

        def public_api(*args, **kwargs):
            relevant_args = dispatcher(*args, **kwargs)
            return implement_array_function(
                implementation, public_api, relevant_args, args, kwargs)

Maybe we could save the overhead of parsing args and kwargs in public_api if we wrote it in C? Honestly I'm not entirely sure.

This uses a closure over implementation and public_api, which is probably the main reason why it wasn't written in C in the first place.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for being unresponsive. I have been thinking about this, what I suppose could work is to add the like kwarg to the C implementation of each function, and then do the inverse process:

  1. If like is valid, call the C implementation of override;
  2. The C implementation of override would have a callback to the Python implementation which would dispatch to the downstream library.

Admittedly, I don't know much about C Python and the current C implementation of the __array_function__ override, so I don't know whether this is something that is possible to achieve. Besides this, I also don't see any other alternatives to doing it the Python way, as per my proposal in the NEP itself. Thoughts on this @shoyer ?

@stefanv
Copy link
Contributor

stefanv commented Oct 25, 2019

This NEP should be designated NEP 35. NEP 33 is the dtype NEP, and NEP 34 was introduced 15 days ago by Matti.

@pentschev
Copy link
Contributor Author

Thanks @stefanv for catching this. I've renamed it in the latest commit.

@rgommers
Copy link
Member

Comments seem to have been addressed and discussion has stopped, so merging this NEP with status Draft. Discussion can still continue if needed of course.

@rgommers rgommers merged commit e44d0ec into numpy:master Apr 11, 2020
@pentschev
Copy link
Contributor Author

Thanks @rgommers for merging!

@jakirkham
Copy link
Contributor

Is there a need for additional discussion? If so, where should we have that? If not, what should be the next step here?

@rgommers
Copy link
Member

Is there a need for additional discussion? If so, where should we have that? If not, what should be the next step here?

Yep, @seberg took the lead on organizing a call - forwarded you the invite.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

10 participants