Skip to content

NEP: Adjust NEP-35 to make it more user-accessible #17093

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 21 commits into from
Sep 7, 2020

Conversation

pentschev
Copy link
Contributor

This change aims to clarify some difficulties from the original NEP text being too technical and inaccessible for readers without in-depth knowledge of the terminology and internals of __array_function__ protocol, as raised via the mailing list discussion in https://mail.python.org/pipermail/numpy-discussion/2020-August/080919.html .

@pentschev
Copy link
Contributor Author

@ilayn @jni you have earned the honor of proofreading this! 😄

@rgommers @seberg @hameerabbasi @shoyer @jakirkham @quasiben @mrocklin FYI

Copy link
Contributor

@ilayn ilayn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was a way sweeter read than the previous version, miles ahead in terms of clarity. Thank you so much for the hard work @pentschev I've left some ideas here and there but please feel free to dismiss them.

Copy link
Contributor

@jni jni left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@pentschev thank you for this, for your patience, and for the ping!

I've left a few comments/questions but this is a massive improvement on the existing document, and makes it easy for a newcomer like me to ask questions — which I don't think the doc before your changes did. Hopefully the questions are not too off-base. 😂

We expect that this functionality will be mostly useful to library developers,
allowing them to create new arrays for internal usage based on arrays passed
by the user, preventing unnecessary creation of NumPy arrays that will
ultimately lead to an additional conversion into a downstream array type.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This section is wonderful, thank you @pentschev for writing it. =) Definitely exactly what I was looking for. =) The only thing I think would be useful is to have a complete usage example, "eg here is scikit-learn k-means now — without any option to create like, even if it works with a CuPy array as input it will end up with a NumPy array as output. But after this NEP and modifying the code to look like below, it will work completely within CuPy, or dask, or other arrays implementing this protocol."

I was trying to rack my brain for examples in scikit-image that need this, but can't come up with any off the top of my head. Happy to go hunting for one, but I figure there might be existing examples that motivated this NEP in the first place and that would be simple enough to be included?

Update: my_pad below is sufficient. I wonder whether it belongs in this section rather than the next one, ie move up to line 113 (cupy's TypeError) into this section, then start the next section.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This section is wonderful, thank you @pentschev for writing it. =) Definitely exactly what I was looking for. =)

To be fair this is not a new addition, it was already in the original text, it just got moved a bit. See https://numpy.org/neps/nep-0035-array-creation-dispatch-with-array-function.html#usage-guidance .

Update: my_pad below is sufficient.

I'm not sure if you mean that my_pad already addresses the "complete usage example" you mentioned above or the strikethrough text. Are you confirming the existing example suffices?

I wonder whether it belongs in this section rather than the next one, ie move up to line 113 (cupy's TypeError) into this section, then start the next section.

I tried to follow the NEP X -- Template and Instructions. To my understanding, it should belong in the "Motivation and Scope" section, where it is now.

"Motivation and Scope: ... It should describe the existing problem, who it affects, what it is trying to solve, and why."

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, my_pad is sufficient, and I agree, it belongs in motivation and scope, but it is currently in Usage, and I was asking whether it should be moved to motivation and scope.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, my_pad is sufficient, and I agree, it belongs in motivation and scope, but it is currently in Usage, and I was asking whether it should be moved to motivation and scope.

Are we still talking about the paragraph saying "We expect that this functionality will be mostly useful to library developers, ..."? That is in Motivation and Scope already, not in Usage and Impact.

@ilayn
Copy link
Contributor

ilayn commented Aug 19, 2020

Apparently I can't resolve the comments but yes it looks good(better) to me.

Copy link
Member

@mattip mattip left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This reads much better now.

Comment on lines +52 to +56
Obviously, having a protocol in-place is useful if the arrays are created
elsewhere and let NumPy handle them. But still these arrays have to be started
in their native library and brought back. Instead if it was possible to create
these objects through NumPy API then there would be an almost complete
experience, all using NumPy syntax. For example, say we have some CuPy array
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Obviously, having a protocol in-place is useful if the arrays are created
elsewhere and let NumPy handle them. But still these arrays have to be started
in their native library and brought back. Instead if it was possible to create
these objects through NumPy API then there would be an almost complete
experience, all using NumPy syntax. For example, say we have some CuPy array
The mechanism as described above covers cases where the input is already an array.
These arrays have to be created. NumPy provides `array creation routines` like
`np.ones` but how to use these to create a CuPy or Dask array? For example,
say we have some CuPy array

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was one of the reasons why I got confused in the beginning. "Why should I even be able to create other arrays if I am already using those libs? I can create via CuPy or Dask whatever I need".

That detail is being lost with this compact narrative.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was suggested in #17093 (comment) . Given that the intent was to make the NEP clearer to users, I agree with @ilayn that the detail is getting lost.

pentschev and others added 14 commits August 19, 2020 17:47
Co-authored-by: Juan Nunez-Iglesias <juan.nunez-iglesias@monash.edu>
Co-authored-by: Juan Nunez-Iglesias <juan.nunez-iglesias@monash.edu>
Co-authored-by: Juan Nunez-Iglesias <juan.nunez-iglesias@monash.edu>
Co-authored-by: Chunlin <fangchunlin@huawei.com>
Co-authored-by: Chunlin <fangchunlin@huawei.com>
Co-authored-by: Matti Picus <matti.picus@gmail.com>
Co-authored-by: Matti Picus <matti.picus@gmail.com>
Co-authored-by: Eric Wieser <wieser.eric@gmail.com>
Co-authored-by: Eric Wieser <wieser.eric@gmail.com>
Co-authored-by: Eric Wieser <wieser.eric@gmail.com>
Co-authored-by: Eric Wieser <wieser.eric@gmail.com>
Co-authored-by: Eric Wieser <wieser.eric@gmail.com>
@pentschev
Copy link
Contributor Author

Half-way through PR addressing I realized I should have been changing the commit messages to match NumPy's standard, e.g. "NEP: Description". What's generally the preferred way to fix this, edit commit messages or squash those / the entire PR into a single commit?

@eric-wieser
Copy link
Member

I wouldn't worry - probably the thing to do after review is done will either to be:

  • git rebase -i and edit each commit message
  • Squash all the commits

But until everyone agrees on the content, I think what you're doing is just fine.

import dask.array as da
from dask.array.utils import meta_from_array

def my_dask_pad(arr, padding):
Copy link
Member

@eric-wieser eric-wieser Aug 19, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This begs the question "Why can't I just use the my_pad from earlier on dask arrays?".

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@seberg
Copy link
Member

seberg commented Aug 19, 2020

I think there were a few points being discussed we may not settle fully by editing the NEP, but should not forget to "check off" before the end. Can everyone make sure that points they are concerned about end up in gh-17075?

@ilayn
Copy link
Contributor

ilayn commented Aug 19, 2020

Yes also I am a bit scared that @pentschev will run out of steam answering all comments :) so I'll stay silent for a bit until things settle down.

Comment on lines 168 to 173
``__array_function__``, allowing Dask to handle ``_meta`` appropriately. That
function is primarily targeted at the library's internal usage to ensure chunks
are created with correct types. Without the ``like=`` argument, it would be
impossible to ensure ``my_pad`` creates a padding array with a type matching
that of the input array, which would cause a ``TypeError`` exception to
be raised by CuPy, as discussed above would happen to the CuPy case alone.
Copy link
Member

@eric-wieser eric-wieser Aug 19, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So to be clear:

  • my_pad(dask_array) should work just fine, thanks to the like argument and __array_function__
  • my_dask_pad(dask_array) is an example of what code internal to dask might look like.

Mentioning my_pad without comparing it to my_dask_pad makes reading this a little confusing to me

At any rate, it seems that meta_from_array is redundant here. Perhaps a more convincing example would be something like

# disclaimer: I have never used dask, I am guessing the API from your summary!
def pad_chunks(arr: dask.array, padding: ArrayLike):
    # we could use `np.array(padding, like=arr).chunks[0]`, but this is more direct
    padding = np.array(padding, like=meta_from_array(arr))
    return dask.from_chunks([padding] + arr.chunks + [padding])

where the key point is you don't actually need meta_from_array except:

  • Inside dask.array.__array_function__
  • As an optimization inside dask-specific functions.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

my_pad(dask_array) should work just fine, thanks to the like argument and array_function

It would work fine for a "pure" Dask array, that's backed by NumPy. It won't work fine for a CuPy-backed Dask array, which needs to be handled accordingly, thus the meta_from_array call.

meta_from_array would be redundant if a Dask Array wasn't and array wrapping another type of array, which is why we need to handle it properly.

Anyway, I tried improving the text in 57f78df , can you check if it makes things clearer?

Comment on lines +105 to +107
def my_pad(arr, padding):
padding = np.array(padding, like=arr)
return np.concatenate((padding, arr, padding))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This code would work just fine without the second line. To motivate it, perhaps:

Suggested change
def my_pad(arr, padding):
padding = np.array(padding, like=arr)
return np.concatenate((padding, arr, padding))
def my_pad(arr, padding):
# coerce `padding` just once, rather than letting `concatenate` do it twice
padding = np.array(padding, like=arr)
return np.concatenate((padding, arr, padding))

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This code would work just fine without the second line.

I don't get this comment, are you saying it works without the np.concatenate?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The naive reader may not understand why the call to padding = np.array(padding, like=arr) is needed.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

.. but then it is explained in the text below.

@eric-wieser
Copy link
Member

eric-wieser commented Aug 19, 2020

Is there some way we can disable CI on this? It seems like a massive waste to run travis for almost 3 hours after every typo fix building python and C code when all that changed is documentation....

@mattip, @rgommers, any ideas?

Copy link
Member

@rgommers rgommers left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reads very well, thanks a lot @pentschev. I have no substantial comments besides the ones we already agreed to postpone in the implementation PR.

@mattip
Copy link
Member

mattip commented Aug 24, 2020

I think we should put this in as-is, and if needed open new issues around the one or two rough spots.

@charris charris merged commit 274d5d6 into numpy:master Sep 7, 2020
@charris
Copy link
Member

charris commented Sep 7, 2020

Thanks @pentschev .

@pentschev
Copy link
Contributor Author

Thanks everyone for reviews and @charris for merging !

@pentschev pentschev deleted the nep-35-template-rewrite branch February 23, 2021 11:47
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

10 participants