NEP: Adjust NEP-35 to make it more user-accessible #17093

pentschev · 2020-08-17T19:47:43Z

This change aims to clarify some difficulties from the original NEP text being too technical and inaccessible for readers without in-depth knowledge of the terminology and internals of __array_function__ protocol, as raised via the mailing list discussion in https://mail.python.org/pipermail/numpy-discussion/2020-August/080919.html .

pentschev · 2020-08-17T19:49:17Z

@ilayn @jni you have earned the honor of proofreading this! 😄

@rgommers @seberg @hameerabbasi @shoyer @jakirkham @quasiben @mrocklin FYI

ilayn

This was a way sweeter read than the previous version, miles ahead in terms of clarity. Thank you so much for the hard work @pentschev I've left some ideas here and there but please feel free to dismiss them.

doc/neps/nep-0035-array-creation-dispatch-with-array-function.rst

jni

@pentschev thank you for this, for your patience, and for the ping!

I've left a few comments/questions but this is a massive improvement on the existing document, and makes it easy for a newcomer like me to ask questions — which I don't think the doc before your changes did. Hopefully the questions are not too off-base. 😂

doc/neps/nep-0035-array-creation-dispatch-with-array-function.rst

jni · 2020-08-19T03:04:52Z

doc/neps/nep-0035-array-creation-dispatch-with-array-function.rst

+We expect that this functionality will be mostly useful to library developers,
+allowing them to create new arrays for internal usage based on arrays passed
+by the user, preventing unnecessary creation of NumPy arrays that will
+ultimately lead to an additional conversion into a downstream array type.


This section is wonderful, thank you @pentschev for writing it. =) Definitely exactly what I was looking for. =) The only thing I think would be useful is to have a complete usage example, "eg here is scikit-learn k-means now — without any option to create like, even if it works with a CuPy array as input it will end up with a NumPy array as output. But after this NEP and modifying the code to look like below, it will work completely within CuPy, or dask, or other arrays implementing this protocol."

I was trying to rack my brain for examples in scikit-image that need this, but can't come up with any off the top of my head. Happy to go hunting for one, but I figure there might be existing examples that motivated this NEP in the first place and that would be simple enough to be included?

Update: my_pad below is sufficient. I wonder whether it belongs in this section rather than the next one, ie move up to line 113 (cupy's TypeError) into this section, then start the next section.

This section is wonderful, thank you @pentschev for writing it. =) Definitely exactly what I was looking for. =)

To be fair this is not a new addition, it was already in the original text, it just got moved a bit. See https://numpy.org/neps/nep-0035-array-creation-dispatch-with-array-function.html#usage-guidance .

Update: my_pad below is sufficient.

I'm not sure if you mean that my_pad already addresses the "complete usage example" you mentioned above or the strikethrough text. Are you confirming the existing example suffices?

I wonder whether it belongs in this section rather than the next one, ie move up to line 113 (cupy's TypeError) into this section, then start the next section.

I tried to follow the NEP X -- Template and Instructions. To my understanding, it should belong in the "Motivation and Scope" section, where it is now.

"Motivation and Scope: ... It should describe the existing problem, who it affects, what it is trying to solve, and why."

Yes, my_pad is sufficient, and I agree, it belongs in motivation and scope, but it is currently in Usage, and I was asking whether it should be moved to motivation and scope.

Yes, my_pad is sufficient, and I agree, it belongs in motivation and scope, but it is currently in Usage, and I was asking whether it should be moved to motivation and scope.

Are we still talking about the paragraph saying "We expect that this functionality will be mostly useful to library developers, ..."? That is in Motivation and Scope already, not in Usage and Impact.

doc/neps/nep-0035-array-creation-dispatch-with-array-function.rst

ilayn · 2020-08-19T08:52:20Z

Apparently I can't resolve the comments but yes it looks good(better) to me.

doc/neps/nep-0035-array-creation-dispatch-with-array-function.rst

mattip

This reads much better now.

doc/neps/nep-0035-array-creation-dispatch-with-array-function.rst

mattip · 2020-08-19T11:26:50Z

doc/neps/nep-0035-array-creation-dispatch-with-array-function.rst

+Obviously, having a protocol in-place is useful if the arrays are created
+elsewhere and let NumPy handle them. But still these arrays have to be started
+in their native library and brought back. Instead if it was possible to create
+these objects through NumPy API then there would be an almost complete
+experience, all using NumPy syntax. For example, say we have some CuPy array


Suggested change

Obviously, having a protocol in-place is useful if the arrays are created

elsewhere and let NumPy handle them. But still these arrays have to be started

in their native library and brought back. Instead if it was possible to create

these objects through NumPy API then there would be an almost complete

experience, all using NumPy syntax. For example, say we have some CuPy array

The mechanism as described above covers cases where the input is already an array.

These arrays have to be created. NumPy provides `array creation routines` like

`np.ones` but how to use these to create a CuPy or Dask array? For example,

say we have some CuPy array

This was one of the reasons why I got confused in the beginning. "Why should I even be able to create other arrays if I am already using those libs? I can create via CuPy or Dask whatever I need".

That detail is being lost with this compact narrative.

This was suggested in #17093 (comment) . Given that the intent was to make the NEP clearer to users, I agree with @ilayn that the detail is getting lost.

doc/neps/nep-0035-array-creation-dispatch-with-array-function.rst

Co-authored-by: Juan Nunez-Iglesias <juan.nunez-iglesias@monash.edu>

Co-authored-by: Chunlin <fangchunlin@huawei.com>

Co-authored-by: Matti Picus <matti.picus@gmail.com>

Co-authored-by: Eric Wieser <wieser.eric@gmail.com>

doc/neps/nep-0035-array-creation-dispatch-with-array-function.rst

pentschev · 2020-08-19T18:04:38Z

Half-way through PR addressing I realized I should have been changing the commit messages to match NumPy's standard, e.g. "NEP: Description". What's generally the preferred way to fix this, edit commit messages or squash those / the entire PR into a single commit?

eric-wieser · 2020-08-19T18:06:24Z

I wouldn't worry - probably the thing to do after review is done will either to be:

git rebase -i and edit each commit message
Squash all the commits

But until everyone agrees on the content, I think what you're doing is just fine.

eric-wieser · 2020-08-19T18:08:53Z

doc/neps/nep-0035-array-creation-dispatch-with-array-function.rst

+    import dask.array as da
+    from dask.array.utils import meta_from_array
+
+    def my_dask_pad(arr, padding):


This begs the question "Why can't I just use the my_pad from earlier on dask arrays?".

That is what is explained in https://github.com/numpy/numpy/pull/17093/files#diff-e62e74367c379dd1a3b21bc549c1eba6R166-R173 .

seberg · 2020-08-19T18:39:15Z

I think there were a few points being discussed we may not settle fully by editing the NEP, but should not forget to "check off" before the end. Can everyone make sure that points they are concerned about end up in gh-17075?

ilayn · 2020-08-19T18:46:08Z

Yes also I am a bit scared that @pentschev will run out of steam answering all comments :) so I'll stay silent for a bit until things settle down.

eric-wieser · 2020-08-19T19:10:24Z

doc/neps/nep-0035-array-creation-dispatch-with-array-function.rst

+``__array_function__``, allowing Dask to handle ``_meta`` appropriately. That
+function is primarily targeted at the library's internal usage to ensure chunks
+are created with correct types. Without the ``like=`` argument, it would be
+impossible to ensure ``my_pad`` creates a padding array with a type matching
+that of the input array, which would cause a ``TypeError`` exception to
+be raised by CuPy, as discussed above would happen to the CuPy case alone.


So to be clear:

my_pad(dask_array) should work just fine, thanks to the like argument and __array_function__

my_dask_pad(dask_array) is an example of what code internal to dask might look like.

Mentioning my_pad without comparing it to my_dask_pad makes reading this a little confusing to me

At any rate, it seems that meta_from_array is redundant here. Perhaps a more convincing example would be something like

# disclaimer: I have never used dask, I am guessing the API from your summary! def pad_chunks(arr: dask.array, padding: ArrayLike): # we could use `np.array(padding, like=arr).chunks[0]`, but this is more direct padding = np.array(padding, like=meta_from_array(arr)) return dask.from_chunks([padding] + arr.chunks + [padding])

where the key point is you don't actually need meta_from_array except:

Inside dask.array.__array_function__

As an optimization inside dask-specific functions.

my_pad(dask_array) should work just fine, thanks to the like argument and array_function

It would work fine for a "pure" Dask array, that's backed by NumPy. It won't work fine for a CuPy-backed Dask array, which needs to be handled accordingly, thus the meta_from_array call.

meta_from_array would be redundant if a Dask Array wasn't and array wrapping another type of array, which is why we need to handle it properly.

Anyway, I tried improving the text in 57f78df , can you check if it makes things clearer?

eric-wieser · 2020-08-19T19:14:21Z

doc/neps/nep-0035-array-creation-dispatch-with-array-function.rst

+    def my_pad(arr, padding):
+        padding = np.array(padding, like=arr)
+        return np.concatenate((padding, arr, padding))


This code would work just fine without the second line. To motivate it, perhaps:

Suggested change

def my_pad(arr, padding):

padding = np.array(padding, like=arr)

return np.concatenate((padding, arr, padding))

def my_pad(arr, padding):

# coerce `padding` just once, rather than letting `concatenate` do it twice

padding = np.array(padding, like=arr)

return np.concatenate((padding, arr, padding))

This code would work just fine without the second line.

I don't get this comment, are you saying it works without the np.concatenate?

The naive reader may not understand why the call to padding = np.array(padding, like=arr) is needed.

.. but then it is explained in the text below.

eric-wieser · 2020-08-19T21:51:50Z

Is there some way we can disable CI on this? It seems like a massive waste to run travis for almost 3 hours after every typo fix building python and C code when all that changed is documentation....

@mattip, @rgommers, any ideas?

doc/neps/nep-0035-array-creation-dispatch-with-array-function.rst

rgommers

Reads very well, thanks a lot @pentschev. I have no substantial comments besides the ones we already agreed to postpone in the implementation PR.

mattip · 2020-08-24T11:46:05Z

I think we should put this in as-is, and if needed open new issues around the one or two rough spots.

charris · 2020-09-07T21:30:05Z

Thanks @pentschev .

pentschev · 2020-09-08T13:29:17Z

Thanks everyone for reviews and @charris for merging !

NEP: Adjust NEP-35 to make it more user-accessible

7d7b46c

ilayn reviewed Aug 17, 2020

View reviewed changes

NEP: Simplify NEP-35 further with reviewer's suggestions

9b660e4

jni reviewed Aug 19, 2020

View reviewed changes

Qiyu8 reviewed Aug 19, 2020

View reviewed changes

doc/neps/nep-0035-array-creation-dispatch-with-array-function.rst Outdated Show resolved Hide resolved

doc/neps/nep-0035-array-creation-dispatch-with-array-function.rst Outdated Show resolved Hide resolved

mattip reviewed Aug 19, 2020

View reviewed changes

eric-wieser reviewed Aug 19, 2020

View reviewed changes

doc/neps/nep-0035-array-creation-dispatch-with-array-function.rst Outdated Show resolved Hide resolved

eric-wieser reviewed Aug 19, 2020

View reviewed changes

doc/neps/nep-0035-array-creation-dispatch-with-array-function.rst Show resolved Hide resolved