-
-
Notifications
You must be signed in to change notification settings - Fork 10.8k
NEP 35: Finalize like=
argument behaviour before 1.21 release
#17075
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
A few comments on the 4 points from the description:
|
I agree, a lot of effort was spent into making this work with weird corner cases, but that it was a good idea to do wasn't spelled out clearly anywhere. There may be a use case where one would like to mix something like a Pint array with a numpy array (not sure if that's a subclass), but things like mixing |
That's a little deceiving rather than helpful maybe. One takes over |
It's very different though, The |
I'm not sure if you're saying |
Yes, but taking |
But saying Note that if we error when def my_pad(arr, padding):
padding = np.asarray(padding, like=arr)
return np.concatenate([padding, arr, padding])
my_pad([1, 2, 3], [0, 0]) # Errors because `arr` is a `list` |
That's exactly why @seberg talks about |
For more context: for creating new libraries, it would be great if there was a clean way of accepting multiple array types, but not accept any kind of sequence or generator. So separating the ultra-flexible (and error-prone) |
I'm saying "I agree with you, it's unclear". Would be nice to go back and retro-actively do for NEP 18 what you just did for your NEP - add motivation, scope, use cases. That would sort this out. |
I think the The difference in the Now, you may be arguing: That doesn't make sense, both versions pick the output explicitly here! That is true, because there is only a single argument, making implicit and explicit pretty much the same thing. But, realizing that differences should show up for multiple arguments, we can remember
very much looks like it should work, but doesn't, because we force you to spell it:
Adding a
looks a lot like "Try to create a NumPy array, or fail" (although I am sure you can give meaning to returning a cupy array here). But:
is probably something we can agree on being unclear as to what should be expected. |
But, that in large parts circles back to the NEP 37 "We need something explicit" argument. So I don't know if it helps with the specific issue... With respect to NEP 37, I think a problem is that we do not know how much we need explicit dispatching. E.g. we could also allow to write the above as:
instead of adding |
Let me be more explicit on this example:
This Also think of the situation where one would like to write a new strict library like that - if
Now, for existing libraries that already use
When you're modifying existing code to add
It doesn't, it's completely unrelated. I suggest to keep the discussion to Saying that
Yes, and if it's ambiguous like that it should error out. |
I've been thinking about this the past few days, and I see your point @rgommers in dealing with existing vs future libraries. Generally speaking, I understand the point and have no direct objections to requiring something like the following, as per your example: arr = np.asduckarray(arr) # convert array_like's to numpy arrays
padding = np.asarray(padding, like=arr) However, I don't really like that within Dask, for example, we would need probably everywhere to call A suboptimal solution would be to have two arguments, |
My intuition says you're not having to add an extra |
Certainly, for example https://github.com/dask/dask/blob/51509339c1cc476f5f96ae78e61daf125037ca20/dask/array/slicing.py#L878 . Note that it explicitly checks if it's an instance of |
That code is pretty weird, I hope there's not too many explicit
and avoid turning a duckarray that's not a subclass into a There's no reason to use
It likely doesn't matter much, but that way it's also faster - avoids the |
Sorry @rgommers for the late reply, I'm still OOO until the end of this week and have been forcing myself on reducing screen time. I agree that the example above is weird, and perhaps a better one would be in https://github.com/dask/dask/blob/51509339c1cc476f5f96ae78e61daf125037ca20/dask/array/slicing.py#L539-L542 , where we have index = np.asanyarray(index)
cum_chunks = cached_cumsum(chunks)
cum_chunks = np.asarray(cum_chunks, like=index)
chunk_locations = np.searchsorted(cum_chunks, index, side="right") |
@pentschev no worries at all, reduced screen time sounds like an excellent idea.
That will still convert to
or
The first option is very weird, so I'd say the Also there is no |
Sorry @rgommers , indeed, what I meant was
I wasn't focusing on the downstream functionality either, but in this case it is implemented, see cupy/cupy#2726, documentation was missing and got fixed in cupy/cupy#3908 though.
I agree that wasn't the best example, I would have to look more carefully at all use cases in Dask to quantify how much of a burden that would be. Doing another quick search, one other example I can find that I think would match that is https://github.com/dask/dask/blob/377965addc6be168ad1c697999ded6f9abdf6cce/dask/array/core.py#L3964 . Consider for instance we decide to follow the same ordering as of
If that's how we should handle this case, we would then rewrite that similar to the following: args = [np.asarray(a, like=args[0]) if isinstance(a, (list, tuple)) else a for a in args]` The leftmost argument in that case would dominate the array type, and if it's |
Sorry for the silence here @rgommers and @seberg , but I think I'm now confident on picking this backup to finalize it for 1.20. I did quite some testing of NEP-35 in Dask (see dask/dask#6738) and it was very positive overall. I was able to cover pretty much everything, with a couple of exceptions ( I also wasn't able to find any cases where Apart from that, I just opened #17678 . The PR suggests allowing the passage of Finally, I have no relevant updates to my comment in #17075 (comment) -- and follow-up discussion -- that address the original description of this issue. I see only that a new item (number 5) was added since my reply, addressing whether we should dispatch only |
We have to make a final decision soon for the 1.20 release. From my perspective we were tending towards the strict My problem is that I think we need to do something, but I am not sure if this can be a final right solution. I don't really see the Dispatching aliases seems fine to me. Did we already have a decision on whether the tuple syntax is useful here? We could plan a meeting dedicated to these decisions end of next week or so, if it helps. We definitely have to call this experimental, and I am not quite sure what it would mean for dask if we need to modify behaviour. I am more than happy to do the code fixups, getting the decisions out of the way is the tricky part here. |
A meeting with all interested parties seems like the best way to go about this. |
I think that seems to be a preference for you and @rgommers at least. From my perspective, either way will work, so I'll defer the final solution to both of you.
Indeed, this is a major improvement for Dask especially, it will finally allow us to have complete (or at least complete to the extent of what I've seen so far) coverage of
I'm not sure I understand what you mean by tuple syntax, are you referring to things like
I'd be ok with that, if others feel that's necessary.
I can also take care or help with that if we have a decision before December. I'll be moving in early-/mid-December, so my availability may be compromised then. |
My preference here is towards permissive, but I agree it's easy to make things more permissive later and hard to make them more conservative. |
@charris I think we are OK for 1.20, I moved the milestone. I am sure we should revisit it then, maybe even go with the permissive one. |
This may be slightly annoying:
Will not use
although I doubt there was ever any much intention that the above really should work. The reason why is this:
So, the current solution is inconvenient for these C implemented functions. Basically, the NumPy implementation is special, for one it has to avoid actually calling |
Ok, to be honest, the result of dropping the subclass is fine. What bothers me is probably mainly that it only works due to optimizations. And right now the |
Honestly, I don't see why this is a problem or annoying. As per both NEP-18 and NEP-35, we're dispatching to the downstream library, and the downstream library is responsible for deciding what's the appropriate behavior given its capabilities. For instance, neither CuPy nor Dask implement the |
Yeah, behaviour wise it probably doesn't matter much. The way |
ISTR this going by. What is the current status? |
@pentschev do you think there is any usage we have to reconsider here, or must clarify. There is a slippery slope to being de-facto not experimental api anymore unless we revisit this soon enough. (Ideally by accepting the NEP as final.) |
So far I haven't really encountered anything we need to clarify or reconsider. Examples of real usage of NEP-35 can be found in Dask, particularly dask/dask#6738 which introduces support for NEP-35 and several functions that were impossible to implement before that, as well as percentile support (dask/dask#7162), lstsq/cholesky (dask/dask#7563) and bincount slicing (dask/dask#7391). IMO at this time NEP-35 as it stands is already successful. As we approach NumPy 1.21 release I think it would be great to consider accepting it as final, unless something presents itself until then. |
like=
argument behaviour before 1.20 releaselike=
argument behaviour before 1.21 release
I guess the main volatility may have been around whether to relax the |
Apologies for the late reply. I agree we could adjust later, but so far even that I'm not sure would be necessary. Therefore, I'm +1 on just accepting NEP-35 as is at this time, and we can revisit that later if a real use case shows up in the future. |
This accepts NEP 35 as final. There has been no discussion about it in a long time. The current mode is strict about type input (`like=` must be an array-like). So that most of the "open" points are OK to remain open. Unless we need to discuss the name `like` or the fact that we pass an array-like itself, the previously noted open points numpygh-17075 all seem not very relevant anymore.
This accepts NEP 35 as final. There has been no discussion about it in a long time. The current mode is strict about type input (`like=` must be an array-like). So that most of the "open" points are OK to remain open. Unless we need to discuss the name `like` or the fact that we pass an array-like itself, the previously noted open points numpygh-17075 all seem not very relevant anymore.
This accepts NEP 35 as final. There has been no discussion about it in a long time. The current mode is strict about type input (`like=` must be an array-like). So that most of the "open" points are OK to remain open. Unless we need to discuss the name `like` or the fact that we pass an array-like itself, the previously noted open points numpygh-17075 all seem not very relevant anymore.
This accepts NEP 35 as final. There has been no discussion about it in a long time. The current mode is strict about type input (`like=` must be an array-like). So that most of the "open" points are OK to remain open. Unless we need to discuss the name `like` or the fact that we pass an array-like itself, the previously noted open points numpygh-17075 all seem not very relevant anymore.
This accepts NEP 35 as final. There has been no discussion about it in a long time. The current mode is strict about type input (`like=` must be an array-like). So that most of the "open" points are OK to remain open. Unless we need to discuss the name `like` or the fact that we pass an array-like itself, the previously noted open points numpygh-17075 all seem not very relevant anymore.
This accepts NEP 35 as final. There has been no discussion about it in a long time. The current mode is strict about type input (`like=` must be an array-like). So that most of the "open" points are OK to remain open. Unless we need to discuss the name `like` or the fact that we pass an array-like itself, the previously noted open points numpygh-17075 all seem not very relevant anymore.
This accepts NEP 35 as final. There has been no discussion about it in a long time. The current mode is strict about type input (`like=` must be an array-like). So that most of the "open" points are OK to remain open. Unless we need to discuss the name `like` or the fact that we pass an array-like itself, the previously noted open points numpygh-17075 all seem not very relevant anymore.
Before we release 1.20, we should finalize the behaviour. Or possibly hide/replace the API. Unless NEP 35 was officially accepted, there are still alternative suggestions such as
np.get_array_module
which may win the race.There are also a few issues which require attention if we decide to keep the API largely untouched:
(This is currently NOT valid anymore, so it is the strict version)
l = [1, 2]; np.asarray(l, like=l)
is currently valid. However, modern API may want to only accept array-object as input and not objects such as lists/tuples, which the user should convert ahead of time. The above spelling could raise an error here, requiring the use of a different function (e.g.np.asduckarray()
) to achieve the no-error version, which most existing API currently uses:the alternative might be to force the use of a different API:
(Note that there might be a use for supporting an
obj.__asduckarray__()
method to mirrorobj.__array__()
, which would requirenp.asduckarray()
)(This seems like an extension, since 1. is strict it is still possible) There may be a use for finding a common array-like for multiple input functions:
it was previously discussed that this could be achieved by making like work on tuples:
np.arange(1000, like=(arr1, arr2)
.Ensure that the name (currently
like=
) is thought out and has been discussed sufficiently.(seberg) Are we sure that some of these function should never dispatch based on the existing (usually scalar) input? (I guess so, but I thought I saw(Linspace is not include, so this seems fine)np.linspace
in there, which made me wonder, but I suppose its not there). In any case, we may want to review each function more individually.(We currently dispatch aliases) Should we dispatch the aliases
np.asarray()
, etc. or rather only use the fullnp.array()
call. The reason for inclusion is currently that CuPy would have issues with the last version, because it doesn't support thesubok
argument (there are no cupy subclasses):The text was updated successfully, but these errors were encountered: