-
-
Notifications
You must be signed in to change notification settings - Fork 10.8k
NEP: Adjust NEP-35 to make it more user-accessible #17093
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
@ilayn @jni you have earned the honor of proofreading this! 😄 @rgommers @seberg @hameerabbasi @shoyer @jakirkham @quasiben @mrocklin FYI |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was a way sweeter read than the previous version, miles ahead in terms of clarity. Thank you so much for the hard work @pentschev I've left some ideas here and there but please feel free to dismiss them.
doc/neps/nep-0035-array-creation-dispatch-with-array-function.rst
Outdated
Show resolved
Hide resolved
doc/neps/nep-0035-array-creation-dispatch-with-array-function.rst
Outdated
Show resolved
Hide resolved
doc/neps/nep-0035-array-creation-dispatch-with-array-function.rst
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@pentschev thank you for this, for your patience, and for the ping!
I've left a few comments/questions but this is a massive improvement on the existing document, and makes it easy for a newcomer like me to ask questions — which I don't think the doc before your changes did. Hopefully the questions are not too off-base. 😂
doc/neps/nep-0035-array-creation-dispatch-with-array-function.rst
Outdated
Show resolved
Hide resolved
We expect that this functionality will be mostly useful to library developers, | ||
allowing them to create new arrays for internal usage based on arrays passed | ||
by the user, preventing unnecessary creation of NumPy arrays that will | ||
ultimately lead to an additional conversion into a downstream array type. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This section is wonderful, thank you @pentschev for writing it. =) Definitely exactly what I was looking for. =) The only thing I think would be useful is to have a complete usage example, "eg here is scikit-learn k-means now — without any option to create like, even if it works with a CuPy array as input it will end up with a NumPy array as output. But after this NEP and modifying the code to look like below, it will work completely within CuPy, or dask, or other arrays implementing this protocol."
I was trying to rack my brain for examples in scikit-image that need this, but can't come up with any off the top of my head. Happy to go hunting for one, but I figure there might be existing examples that motivated this NEP in the first place and that would be simple enough to be included?
Update: my_pad below is sufficient. I wonder whether it belongs in this section rather than the next one, ie move up to line 113 (cupy's TypeError) into this section, then start the next section.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This section is wonderful, thank you @pentschev for writing it. =) Definitely exactly what I was looking for. =)
To be fair this is not a new addition, it was already in the original text, it just got moved a bit. See https://numpy.org/neps/nep-0035-array-creation-dispatch-with-array-function.html#usage-guidance .
Update: my_pad below is sufficient.
I'm not sure if you mean that my_pad
already addresses the "complete usage example" you mentioned above or the strikethrough text. Are you confirming the existing example suffices?
I wonder whether it belongs in this section rather than the next one, ie move up to line 113 (cupy's TypeError) into this section, then start the next section.
I tried to follow the NEP X -- Template and Instructions. To my understanding, it should belong in the "Motivation and Scope" section, where it is now.
"Motivation and Scope: ... It should describe the existing problem, who it affects, what it is trying to solve, and why."
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, my_pad is sufficient, and I agree, it belongs in motivation and scope, but it is currently in Usage, and I was asking whether it should be moved to motivation and scope.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, my_pad is sufficient, and I agree, it belongs in motivation and scope, but it is currently in Usage, and I was asking whether it should be moved to motivation and scope.
Are we still talking about the paragraph saying "We expect that this functionality will be mostly useful to library developers, ..."? That is in Motivation and Scope already, not in Usage and Impact.
doc/neps/nep-0035-array-creation-dispatch-with-array-function.rst
Outdated
Show resolved
Hide resolved
doc/neps/nep-0035-array-creation-dispatch-with-array-function.rst
Outdated
Show resolved
Hide resolved
Apparently I can't resolve the comments but yes it looks good(better) to me. |
doc/neps/nep-0035-array-creation-dispatch-with-array-function.rst
Outdated
Show resolved
Hide resolved
doc/neps/nep-0035-array-creation-dispatch-with-array-function.rst
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This reads much better now.
doc/neps/nep-0035-array-creation-dispatch-with-array-function.rst
Outdated
Show resolved
Hide resolved
doc/neps/nep-0035-array-creation-dispatch-with-array-function.rst
Outdated
Show resolved
Hide resolved
doc/neps/nep-0035-array-creation-dispatch-with-array-function.rst
Outdated
Show resolved
Hide resolved
Obviously, having a protocol in-place is useful if the arrays are created | ||
elsewhere and let NumPy handle them. But still these arrays have to be started | ||
in their native library and brought back. Instead if it was possible to create | ||
these objects through NumPy API then there would be an almost complete | ||
experience, all using NumPy syntax. For example, say we have some CuPy array |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Obviously, having a protocol in-place is useful if the arrays are created | |
elsewhere and let NumPy handle them. But still these arrays have to be started | |
in their native library and brought back. Instead if it was possible to create | |
these objects through NumPy API then there would be an almost complete | |
experience, all using NumPy syntax. For example, say we have some CuPy array | |
The mechanism as described above covers cases where the input is already an array. | |
These arrays have to be created. NumPy provides `array creation routines` like | |
`np.ones` but how to use these to create a CuPy or Dask array? For example, | |
say we have some CuPy array |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was one of the reasons why I got confused in the beginning. "Why should I even be able to create other arrays if I am already using those libs? I can create via CuPy or Dask whatever I need".
That detail is being lost with this compact narrative.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was suggested in #17093 (comment) . Given that the intent was to make the NEP clearer to users, I agree with @ilayn that the detail is getting lost.
doc/neps/nep-0035-array-creation-dispatch-with-array-function.rst
Outdated
Show resolved
Hide resolved
doc/neps/nep-0035-array-creation-dispatch-with-array-function.rst
Outdated
Show resolved
Hide resolved
doc/neps/nep-0035-array-creation-dispatch-with-array-function.rst
Outdated
Show resolved
Hide resolved
doc/neps/nep-0035-array-creation-dispatch-with-array-function.rst
Outdated
Show resolved
Hide resolved
doc/neps/nep-0035-array-creation-dispatch-with-array-function.rst
Outdated
Show resolved
Hide resolved
doc/neps/nep-0035-array-creation-dispatch-with-array-function.rst
Outdated
Show resolved
Hide resolved
doc/neps/nep-0035-array-creation-dispatch-with-array-function.rst
Outdated
Show resolved
Hide resolved
doc/neps/nep-0035-array-creation-dispatch-with-array-function.rst
Outdated
Show resolved
Hide resolved
doc/neps/nep-0035-array-creation-dispatch-with-array-function.rst
Outdated
Show resolved
Hide resolved
doc/neps/nep-0035-array-creation-dispatch-with-array-function.rst
Outdated
Show resolved
Hide resolved
Co-authored-by: Juan Nunez-Iglesias <juan.nunez-iglesias@monash.edu>
Co-authored-by: Juan Nunez-Iglesias <juan.nunez-iglesias@monash.edu>
Co-authored-by: Juan Nunez-Iglesias <juan.nunez-iglesias@monash.edu>
Co-authored-by: Chunlin <fangchunlin@huawei.com>
Co-authored-by: Chunlin <fangchunlin@huawei.com>
Co-authored-by: Matti Picus <matti.picus@gmail.com>
Co-authored-by: Matti Picus <matti.picus@gmail.com>
Co-authored-by: Eric Wieser <wieser.eric@gmail.com>
Co-authored-by: Eric Wieser <wieser.eric@gmail.com>
Co-authored-by: Eric Wieser <wieser.eric@gmail.com>
Co-authored-by: Eric Wieser <wieser.eric@gmail.com>
Co-authored-by: Eric Wieser <wieser.eric@gmail.com>
doc/neps/nep-0035-array-creation-dispatch-with-array-function.rst
Outdated
Show resolved
Hide resolved
doc/neps/nep-0035-array-creation-dispatch-with-array-function.rst
Outdated
Show resolved
Hide resolved
doc/neps/nep-0035-array-creation-dispatch-with-array-function.rst
Outdated
Show resolved
Hide resolved
Half-way through PR addressing I realized I should have been changing the commit messages to match NumPy's standard, e.g. "NEP: Description". What's generally the preferred way to fix this, edit commit messages or squash those / the entire PR into a single commit? |
I wouldn't worry - probably the thing to do after review is done will either to be:
But until everyone agrees on the content, I think what you're doing is just fine. |
import dask.array as da | ||
from dask.array.utils import meta_from_array | ||
|
||
def my_dask_pad(arr, padding): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This begs the question "Why can't I just use the my_pad
from earlier on dask arrays?".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That is what is explained in https://github.com/numpy/numpy/pull/17093/files#diff-e62e74367c379dd1a3b21bc549c1eba6R166-R173 .
I think there were a few points being discussed we may not settle fully by editing the NEP, but should not forget to "check off" before the end. Can everyone make sure that points they are concerned about end up in gh-17075? |
Yes also I am a bit scared that @pentschev will run out of steam answering all comments :) so I'll stay silent for a bit until things settle down. |
``__array_function__``, allowing Dask to handle ``_meta`` appropriately. That | ||
function is primarily targeted at the library's internal usage to ensure chunks | ||
are created with correct types. Without the ``like=`` argument, it would be | ||
impossible to ensure ``my_pad`` creates a padding array with a type matching | ||
that of the input array, which would cause a ``TypeError`` exception to | ||
be raised by CuPy, as discussed above would happen to the CuPy case alone. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So to be clear:
my_pad(dask_array)
should work just fine, thanks to thelike
argument and__array_function__
my_dask_pad(dask_array)
is an example of what code internal to dask might look like.
Mentioning my_pad
without comparing it to my_dask_pad
makes reading this a little confusing to me
At any rate, it seems that meta_from_array
is redundant here. Perhaps a more convincing example would be something like
# disclaimer: I have never used dask, I am guessing the API from your summary!
def pad_chunks(arr: dask.array, padding: ArrayLike):
# we could use `np.array(padding, like=arr).chunks[0]`, but this is more direct
padding = np.array(padding, like=meta_from_array(arr))
return dask.from_chunks([padding] + arr.chunks + [padding])
where the key point is you don't actually need meta_from_array
except:
- Inside
dask.array.__array_function__
- As an optimization inside dask-specific functions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
my_pad(dask_array) should work just fine, thanks to the like argument and array_function
It would work fine for a "pure" Dask array, that's backed by NumPy. It won't work fine for a CuPy-backed Dask array, which needs to be handled accordingly, thus the meta_from_array
call.
meta_from_array
would be redundant if a Dask Array wasn't and array wrapping another type of array, which is why we need to handle it properly.
Anyway, I tried improving the text in 57f78df , can you check if it makes things clearer?
def my_pad(arr, padding): | ||
padding = np.array(padding, like=arr) | ||
return np.concatenate((padding, arr, padding)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This code would work just fine without the second line. To motivate it, perhaps:
def my_pad(arr, padding): | |
padding = np.array(padding, like=arr) | |
return np.concatenate((padding, arr, padding)) | |
def my_pad(arr, padding): | |
# coerce `padding` just once, rather than letting `concatenate` do it twice | |
padding = np.array(padding, like=arr) | |
return np.concatenate((padding, arr, padding)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This code would work just fine without the second line.
I don't get this comment, are you saying it works without the np.concatenate
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The naive reader may not understand why the call to padding = np.array(padding, like=arr)
is needed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
.. but then it is explained in the text below.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reads very well, thanks a lot @pentschev. I have no substantial comments besides the ones we already agreed to postpone in the implementation PR.
I think we should put this in as-is, and if needed open new issues around the one or two rough spots. |
Thanks @pentschev . |
Thanks everyone for reviews and @charris for merging ! |
This change aims to clarify some difficulties from the original NEP text being too technical and inaccessible for readers without in-depth knowledge of the terminology and internals of
__array_function__
protocol, as raised via the mailing list discussion in https://mail.python.org/pipermail/numpy-discussion/2020-August/080919.html .