ENH: implement NEP-35's `like=` argument #16935

pentschev · 2020-07-23T16:24:12Z

This PR adds the implementation of NEP-35's like= argument, allowing dispatch of array creation functions with __array_function__ based on a reference array.

There are two ways to dispatch, details below.

The first is via Python API (as demonstrated by np.ones and np.full), where array_function_dispatch is used, but differently than the existing dispatch for compute functions, where it's dispatched usually on the first argument, this is dispatched on the like= keyword argument, returning (like,). We also check if like is not None as a performance optimization, see #16935 (comment) .

The second dispatch occurs via C through splitting array_implement_array_function in two functions. The first function remains very similar to how it was originally implemented (array_implement_array_function) but adds a step to remove like= argument before calling downstream libraries -- downstream libraries shall not add like= to their signatures. The second function (array_implement_c_array_function) will also remove the like= argument, but it will also extract the reference array from it and will gather the public_api Python function by doing an import on np.function_name, where function_name shall be passed by the calling function.

The usage of the C dispatch is very straightforward and optimized for the case where like=None, adding minimal overhead to such functions. The necessary work will thus be only done when a reference array is passed, such as importing the NumPy Python function -- this can still be improved by using a lookup mechanism to avoid reimporting for each subsequent call but it was decided not to do that in this PR. The caller function will still need to add a like argument to its keyword list and parse that, but it will not be used anywhere other than the dispatcher function, it will also need to call the C dispatcher and check for its return value, if it returns Py_NotImplemented it will continue with NumPy's implementation, otherwise return that value (from downstream library) immediately.

This new function allows dispatching from C directly, while also implementing the new `like=` argument, requiring only minimal changes to existing array creation functions that need to add support for that argument.

The implementation uses array_implement_c_array_function, thus introducing minimal complexity to the original _array_fromobject code.

np.asarray can rely on np.array's C dispatcher instead.

Tests comprise some of the functions that have been implemented already: * np.array (C dispatcher) * np.asarray (indirect C dispatcher via np.array) * np.full (Python dispatcher) * np.ones (Python dispatcher)

pentschev · 2020-07-23T16:25:04Z

cc @shoyer @seberg @hameerabbasi @mrocklin @jakirkham @quasiben

numpy/core/numeric.py

numpy/core/src/multiarray/multiarraymodule.c

numpy/core/src/multiarray/arrayfunction_override.c

numpy/core/src/multiarray/multiarraymodule.c

Replace PyUnicode_FromString by PyUnicode_InternFromString to cache "like" string.

Check and handle errors on importing NumPy's Python functions

Using Python dispatcher for all of them. Using the C dispatcher directly on the `np.array` call can result in incorrect behavior. Incorrect behavior may happen if the downstream library's implementation is different or if not all keyword arguments are supported.

numpy/core/src/multiarray/arrayfunction_override.c

…mentation

pentschev · 2020-08-26T20:39:01Z

Thanks @seberg for bringing this up, I've fixed the conflict now and I am very much excited we're close to merging this, so more folks can do some testing now! :)

eric-wieser · 2020-08-26T20:52:41Z

numpy/core/overrides.py

+        public_api.__doc__ = public_api.__doc__.replace(
+            "${ARRAY_FUNCTION_LIKE}",
+            array_function_like_doc,
+        )


Suggested change

)

)

return public_api # to allow decorator use

Thanks @eric-wieser , I tried something much more complicated and failed to achieve this, I'm glad it's so simple! Maybe one day I should take a few Python decorator classes with you! :)

That is now done in 1fc1b60 .

eric-wieser · 2020-08-26T20:55:57Z

numpy/core/_asarray.py

+    public_api=ascontiguousarray,
+)
+def _ascontiguousarray_with_like(a, dtype=None, *, like=None):
+    pass


This strikes me as really weird. The following spelling would also work, and avoids repeating the signature:

_ascontiguousarray_with_like = array_function_dispatch( _asarray_contiguous_fortran_dispatcher )(ascontiguousarray)

As written right now, the body and signatures of these functions are completely ignored - a recipe for confusion and stale signatures

Thanks, I definitely missed we could do something simple like this. Done now in d83cf9d .

jakirkham · 2020-08-26T21:21:12Z

@eric-wieser, would you be ok just pushing the minor changes proposed or rolling them into a follow-up PR? Peter is out-of-the-office (though I see he kindly stopped by to tidy things briefly; thanks Peter! 🙂). So I think anything else we are wanting addressed in the near term would be better handled directly. Does that sound ok?

pentschev · 2020-08-26T22:15:01Z

Sorry @jakirkham , I only saw your message here now but changes are in now. Thanks for being on top of this! 😄

eric-wieser · 2020-08-26T22:23:28Z

numpy/core/_asarray.py

    possible_flags = {'C': 'C', 'C_CONTIGUOUS': 'C', 'CONTIGUOUS': 'C',
                      'F': 'F', 'F_CONTIGUOUS': 'F', 'FORTRAN': 'F',
                      'A': 'A', 'ALIGNED': 'A',
                      'W': 'W', 'WRITEABLE': 'W',
                      'O': 'O', 'OWNDATA': 'O',
                      'E': 'E', 'ENSUREARRAY': 'E'}
    if not requirements:
-        return asanyarray(a, dtype=dtype)


Why pass like here given we know it is none? Do we actually want to dispatch to like in these functions, or should we just forward the argument to np.array?

Good catch, this was probably a leftover from some early attempts, I now removed those in c1e8754 .

I think this pattern is present in lots of places. I'd perhaps argue that for instance np.array should have like dispatching, but np.asarray and np.asanyarray should remain an alias for array(subok=..., copy=..., like=like). Thoughts @seberg?

Downstream libraries may have completely different implementations, that's why we need each function to have like=. For instance, CuPy doesn't support subok, as per the note in https://docs.cupy.dev/en/stable/reference/generated/cupy.array.html , so we have to dispatch it directly to CuPy so it handles that appropriately.

Hmmmpf, I thought I had brought that up but forgot about it again.

I am not sure why we should have more than one entry point for the as.*array() functions. In principle it seems like only ever using np.array() with the full signature during __array_function__ dispatching should be good enough?

@pentschev did you have a reason for wanting np.asarray, np.ascontiguousarray, ... instead of only np.array() as the public_api dispatching "target"? Dask should not really have to know that the call was np.asarray(arr) as opposed to np.array(arr, copy=False), right? And that would mean fewer functions to wrap (although I guess you probably just wrap everything anyway).

We might still want an if like is None: branch for micro-optimization, but I am not sure whether it will make a real difference in that case.

Yeah, I was typing when you posted. That seems like a reason, although I didn't quite grok the implications yet...

Note we also discussed this previously in #16935 (comment) :

Or, we admit that these are just aliases around np.array, and only defer for np.array itself. But I suppose we do not want that, since the error messages would be confusing (np.asarray saying that it did not find an implementation for np.array).

I did that in the beginning, but I realized that's not necessarily correct and removed that in 2c54820, downstream libraries may implement these functions in a completely different manner than NumPy.

Yeah, I was typing when you posted. That seems like a reason, although I didn't quite grok the implications yet...

The implication is when you call np.array(..., subok=True, like=cupy_array) would will fail with NotImplemented because cupy.array doesn't implement subok, and cupy.asanyarray is implemented differently which works with how it's currently implemented doing np.asanyarray(..., like=cupy_array).

Hmm, yeah, I don't mind these aliases too much (they are a bit tedious, but only a bit), but I don't really like it much either :). While cupy.array() probably shouldn't support subok=True, I wonder if the function called by its __array_function__ could just as well support (and ignore) it...

Although the whole subok parameter is slightly dubious in combination with like=?

PS: I added it to the tracking issue just in case we don't quite find a resolution.

As far as I understand it, implementing the NumPy API doesn't imply that the underlying implementation has to behave exactly the same. With that said, some downstream library implementing asanyarray could, for example, implement it to always return np.array(1) for whatever reason, and that could be totally valid. Therefore, from the NumPy perspective, we shouldn't assume that downstream libraries are supposed to implement those aliases exactly the same way as NumPy does. My example is obviously a bit extreme, but I'm just trying to make the point that with __array_function__ we're using the API only, and the underlying implementation is entirely of responsibility of the downstream library and thus we should only dispatch on the API.

This was there for a bit, but the same thing can already be achieved without the parameter, so that it is now unused.

seberg · 2020-08-28T18:10:20Z

Thanks everyone for the review and the last fixups! @eric-wieser (and others) if you do continue to comment here, I will make sure to fix things up (I had added one more commit to undo now unnecessary changes in override.py). One reason I liked to merge it now is that it should ends up in the weekly build.

Everyone should be aware that very likely this will not be the final version, and not land unmodified in NumPy 1.20 (at least not without the NEP being accepted)!

@pentschev can you make sure we schedule a meeting in a month or so to discuss this? At least assuming you or sklearn had some chance of trying this out? @thomasjpfan just a ping in case you were not in the loop here.

eric-wieser · 2020-08-28T18:24:38Z

numpy/core/numeric.py

+    if like is not None:
+        return _identity_with_like(n, dtype=dtype, like=like)
+
    from numpy import eye
-    return eye(n, dtype=dtype)
+    return eye(n, dtype=dtype, like=like)


This one is a really strong argument for not having a like override - it shouldn't be up to dask or cupy to decide whether eye and identity are aliases.

(also, this like=like is always like=None)

I'll fix that in a separate PR, thanks @eric-wieser for spotting this.

jakirkham · 2020-08-28T18:31:24Z

Really excited to see this land! Thanks for the hard work all 😄

pentschev · 2020-08-31T22:01:31Z

Thanks everyone for reviews and @seberg for merging this!

@pentschev can you make sure we schedule a meeting in a month or so to discuss this? At least assuming you or sklearn had some chance of trying this out? @thomasjpfan just a ping in case you were not in the loop here.

We will try this out in Dask and other potential use cases and then I'll update those who have been in our previous meeting discussing NEPs 35 and 37 to discuss next steps, we still have an email thread with those involved so I'll write it there.

paugier · 2021-03-15T13:36:46Z

I detected a performance regression using a Python accelerator (Pythran) between Numpy 0.19.5 and 0.20 (see serge-sans-paille/pythran#1735). The regression is in a loop which does not involve Python (only pure C++ without CPython API call). This indicates that it is about the raw data of the arrays. I didn't try to investigate if it can also impact other tools to accelerate Python or create Python extensions (like Cython or Numba) but I don't see why it would be specific to Pythran.

By preparing the arrays used in the computation with arr.copy() (where arr is a Numpy ndarray produced from a Pandas dataframe with df.values), I get some good results with Numpy 0.19.5 and some less good results with Numpy 0.20. I'm posting this here because the problem appears with commit 4cd6e4b, which seems to correspond to this PR.

@pentschev or @seberg, you might be able to shed some light on what is happening? I see that arr.copy() does not produce anymore the same arrays. It's quite subtle because the values, the flags and the strides are the same. Just the arrays are such as native computations on these arrays are a bit less efficient.

pentschev · 2021-03-15T14:08:33Z

@paugier would it be possible for you to open a new GH issue with a minimal reproducer for that, hopefully detached from Pandas and other 3rd-party libraries? I read and responded a few minutes back on the discussion list, but it's still not exactly clear to me what the differences are that you're referring to, plus a reproducer would allow us to verify the codepath that is potentially causing this.

pentschev · 2021-03-15T14:11:19Z

Also for completeness on this thread, curiously I see NumPy 1.20.1 being considerably
faster for small arrays and mildly-faster with large arrays (results here).

seberg · 2021-03-15T15:14:54Z

@pentschev yeah, the life-cycle of small arrays was improved when we cleaned up the buffer handling. (creating a view and deleting it should be about 30% faster, e.g. for array[...], IIRC.

I will distinctly not worry about this PR having anything to do related to @paugier observation. As said multiple times on the mailing list, these type of fluctuations are as common as they are puzzling. As much as I would like to know how to avoid them, I would be shocked if looking for a code change in NumPy will get you anywhere closer to understanding this.

pentschev · 2021-03-15T15:28:24Z

Thanks for the insight @seberg . I agree with you, unless there's a consistent minimal reproducer that can show this or any other commit to be responsible there isn't much that can be done.

pentschev added 9 commits July 23, 2020 08:41

ENH: Add like= kwarg via __array_function__ dispatcher to asarray

d535b5a

ENH: Add new function for __array_function__ dispatching from C

466381b

This new function allows dispatching from C directly, while also implementing the new `like=` argument, requiring only minimal changes to existing array creation functions that need to add support for that argument.

ENH: Add like= support to numpy.array

fe745c2

The implementation uses array_implement_c_array_function, thus introducing minimal complexity to the original _array_fromobject code.

BUG: Fix like= dispatcher for np.full

d7f351d

ENH: Remove np.asarray like= dispatcher via Python

12e4176

np.asarray can rely on np.array's C dispatcher instead.

TST: Add some tests for like= argument

c6d97f5

Tests comprise some of the functions that have been implemented already: * np.array (C dispatcher) * np.asarray (indirect C dispatcher via np.array) * np.full (Python dispatcher) * np.ones (Python dispatcher)

ENH: Remove like= argument during array_implement_array_function

320c5a9

ENH: Add like= kwarg to ones and full

ec47ec7

BUG: prevent duplicate removal of like= argument

4a0f37b

rgommers added 01 - Enhancement component: __array_function__ labels Jul 23, 2020

shoyer reviewed Jul 23, 2020

View reviewed changes

numpy/core/numeric.py Outdated Show resolved Hide resolved

numpy/core/src/multiarray/multiarraymodule.c Outdated Show resolved Hide resolved

This was referenced Jul 23, 2020

Support cupy in as_shared_dtype pydata/xarray#4232

Merged

Use np.empty_like in PolynomialFeatures for NEP18 support scikit-learn/scikit-learn#16196

Closed

ENH: Make like= a keyword-only argument

2236c0d

eric-wieser reviewed Jul 24, 2020

View reviewed changes

numpy/core/src/multiarray/arrayfunction_override.c Outdated Show resolved Hide resolved

eric-wieser reviewed Jul 24, 2020

View reviewed changes

numpy/core/src/multiarray/arrayfunction_override.c Show resolved Hide resolved

eric-wieser reviewed Jul 24, 2020

View reviewed changes

numpy/core/src/multiarray/multiarraymodule.c Outdated Show resolved Hide resolved

pentschev added 5 commits July 24, 2020 16:34

ENH: Use PyUnicode_InternFromString in arrayfunction_override

88a5f37

Replace PyUnicode_FromString by PyUnicode_InternFromString to cache "like" string.

ENH: Check for arrayfunction_override import errors

c5ad4fb

Check and handle errors on importing NumPy's Python functions

BUG: Fix array_implement_c_array_function error handling

715416a

ENH: Handle exceptions with C implementation of like=

58bbd08

eric-wieser reviewed Jul 27, 2020

View reviewed changes

numpy/core/src/multiarray/arrayfunction_override.c Outdated Show resolved Hide resolved

eric-wieser reviewed Jul 27, 2020

View reviewed changes

numpy/core/src/multiarray/arrayfunction_override.c Outdated Show resolved Hide resolved

eric-wieser reviewed Jul 27, 2020

View reviewed changes

numpy/core/src/multiarray/arrayfunction_override.c Outdated Show resolved Hide resolved

pentschev added 3 commits July 27, 2020 14:30

ENH: Simplify handling of exceptions with like=

38e3e22

TST: Add test for exception handling with like=

97e490e

ENH: Add support for like= to np.empty and np.zeros

4b5828e

Merge remote-tracking branch 'upstream/master' into nep-35-like-imple…

cc44216

…mentation

eric-wieser reviewed Aug 26, 2020

View reviewed changes

pentschev added 2 commits August 26, 2020 15:04

ENH: Make set_array_function_like_doc a decorator

1fc1b60

ENH: Simplify _*_with_like functions

d83cf9d

eric-wieser reviewed Aug 26, 2020

View reviewed changes

BUG: Fix multiple like= dispatching in require

c1e8754

jakirkham mentioned this pull request Aug 26, 2020

GPU Support dask/dask-image#133

Open

MAINT: Remove now unused public_api from array_function_dispatch

458e864

This was there for a bit, but the same thing can already be achieved without the parameter, so that it is now unused.

seberg merged commit 4cd6e4b into numpy:master Aug 28, 2020

eric-wieser reviewed Aug 28, 2020

View reviewed changes

jakirkham mentioned this pull request Oct 15, 2020

Added dask implementation of ndimage.affine_transform in new ndinterp directory. dask/dask-image#159

Merged

pentschev mentioned this pull request Oct 29, 2020

NEP 35: Finalize like= argument behaviour before 1.21 release #17075

Closed

ogrisel mentioned this pull request Nov 4, 2020

[WIP] NEP-18 support for preprocessing algorithms scikit-learn/scikit-learn#17744

Closed

quasiben mentioned this pull request Dec 4, 2020

Add ability to change underlying array type pydata/xarray#4234

Open

pentschev deleted the nep-35-like-implementation branch February 23, 2021 11:47

mhvk mentioned this pull request Apr 17, 2021

Documentation, tests, follow-up of Quantity.__array_function__ astropy/astropy#10575

Open

4 tasks

neutrinoceros mentioned this pull request Oct 2, 2024

BUG: overridable functions from NEP 35 do not show up in ARRAY_FUNCTIONS #27451

Closed

DimitriPapadopoulos mentioned this pull request Apr 21, 2025

MNT: Enforce ruff/Pyflakes rules (F) #27309

Merged

Uh oh!

ENH: implement NEP-35's like= argument #16935

ENH: implement NEP-35's like= argument #16935

Uh oh!

Conversation

pentschev commented Jul 23, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pentschev commented Jul 23, 2020

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

pentschev commented Aug 26, 2020

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

eric-wieser Aug 26, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jakirkham commented Aug 26, 2020

Uh oh!

pentschev commented Aug 26, 2020

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

eric-wieser Aug 26, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

seberg commented Aug 28, 2020

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jakirkham commented Aug 28, 2020

Uh oh!

pentschev commented Aug 31, 2020

Uh oh!

paugier commented Mar 15, 2021

Uh oh!

pentschev commented Mar 15, 2021

Uh oh!

pentschev commented Mar 15, 2021

Uh oh!

seberg commented Mar 15, 2021

Uh oh!

pentschev commented Mar 15, 2021

Uh oh!

Uh oh!

ENH: implement NEP-35's `like=` argument #16935

ENH: implement NEP-35's `like=` argument #16935

pentschev commented Jul 23, 2020 •

edited

Loading

eric-wieser Aug 26, 2020 •

edited

Loading

eric-wieser Aug 26, 2020 •

edited

Loading