Skip to content

ENH: Propagate like= downstream #17678

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed

Conversation

pentschev
Copy link
Contributor

After doing some testing with NEP-35 in Dask (see dask/dask#6738), I noticed something I grossly overlooked in the original proposal, which is passing the like= argument downstream iff supported. The benefit of doing that is to achieve the following:

In [1]: import numpy as np, cupy as cp, dask.array as da

In [2]: np.asarray([1, 2, 3], like=da.array(cp.array(())))
Out[2]: dask.array<array, shape=(3,), dtype=int64, chunksize=(3,), chunktype=cupy.ndarray>

Note how passing a Dask array backed by a CuPy array to like= will make it possible for Dask to also identify that we want chunktype=cupy.ndarray. Without this change, we would only be able to achieve Dask's default, which is chunktype=np.ndarray.

This is a change that is unnecessary (or at least I think so) for CuPy, as a CuPy array is purely CuPy-backed, therefore implementing like= there would have no value. Due to that, this change will attempt passing like= downstream, and if a TypeError is raised, then it will fallback to the previous behavior of simply removing like= before calling the downstream implementation.

It's also good to point out that such change isn't necessary for Dask internally, we can completely get away without that, as is currently the case in dask/dask#6738 . However, this may be useful for other library developers in the future, so I think it's worth expanding the scope of NEP-35 to allow this. Of course, if we agreed on this change, I will update NEP-35's text to reflect this, but I preferred to have the code functional for some review before doing so.

cc @seberg @rgommers @shoyer @hameerabbasi @quasiben @jakirkham @leofang

method, argument, public_api, types, args, kwargs, NULL);
PyObject *err = PyErr_Occurred();
if (err && PyErr_GivenExceptionMatches(err, PyExc_TypeError)) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This would match every TypeError, which could be very slow, and is hacky. I'd suggest something along the lines of if getattr(type, __array_function_like__, False).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Of course, this would require an update to the NEP.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That would also work, the downside to such an approach is that we would need to then follow an all-or-nothing approach, so if the downstream library implements the __array_function_like__ attribute, then all creation functions would need to add like=. That's not necessarily bad, but it increases effort for adoptability and forces us to maintain a list of all functions that add like= in NEP-35, so maintainers know which functions need to add that argument.

I would like to hear more opinions and potentially some other drawbacks from folks before we decide on what path to take.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure I like idea of an additional flag much. I guess this is a bit special due to the assumption that you do not want to force projects to add like= if it is just ignored anyway.

Right now, adding kwargs.pop("like") in your __array_function__ implementation is also possible (using the fact that like isn't used for anything else at the moment). I guess there is a transition problem possible where projects do generalize to these functions, it just never happened yet. And forcing to add like= means they have to catch up. But, in a sense I guess the whole setup is prone to small transition issues like that already.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To be clear, you'd only force projects to add like= to functions iff they also choose to add __array_function_like__ attribute. For example, CuPy wouldn't implement either one, but Dask would probably add __array_function_like__, and only then it would be forced to add like= to all the functions that NEP-35 covers. IOW, it would be opt-in only and once the library decides to add __array_function_like__, they would know already of the associated cost.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe I'm overly concerned about things, I'm not sure yet. I believe you're right on the situations where this may show up, I guess I'm also considering the case where CuPy is older than NumPy 1.20, for example, if one were to upgrade to NumPy master now and remain with CuPy in release version, when using a Dask function that does some implicit like=-related calls to CuPy, this would be a hassle.

It would only be necessary to do that change in CuPy, Dask would be fine as long as it does add like= to all functions where NumPy does too, or remove like= on a per-function basis.

As for backward compatibility, what worries me is that for a reader of NEP-18 there was nothing as like=, but now we're introducing a change that requires an amendment stating they need to explicitly remove like=, I'm not sure if that's ideal, maybe we can rely solely on the DeprecationWarning should we follow that path?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, NEP-18 explicitly does not include np.array, if you support it you are beyond what it covers? Also I expect NEP-18 says that signatures should match, which might suggest you have to add the like= kwarg as well.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@pentschev I talked about this with @seberg at today's triage meeting, we decided that this PR isn't needed. Any function with like= implies that if like= is passed in, it's the only thing being dispatched on. Therefore, inside __array_function__, self is equivalent to the like that was passed in. So anytime we need like=, we could use self instead, and copy chunk-types from that. Does that work for your use-case?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@hameerabbasi wow, that's indeed a great idea! I'm now embarrassed I didn't think of it before, but thankfully we have you here to come up with that! I agree, this seems to entirely cover that situation, we can then do one of two things in the __array_function__ definition of those libraries (e.g., Dask) that may eventually have an use for it:

  1. Inspect the function to check whether it implements the like= argument and do func(*args, **kwargs, like=self); or
  2. Special case functions that will use it it and pass func(*args, **kwargs, like=self), in case we it's not possible to do introspection or that's too detrimental for performance.

I'll try to do a bit more testing tomorrow just to ensure I'm not overlooking anything. After that, I'll then close this PR and I'm thinking of making a change to NEP-35 only to add the information on how downstream libraries can achieve this result that we're trying to do for Dask, should someone else need this in the future.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry it took me longer than I expected to come back to this. I did some more checking, and I think @hameerabbasi 's solution indeed covers the cases I had in mind. With that said, I just opened #17725 to add instructions for downstream library maintainers on how to do that, and I'm closing this PR. Thanks @hameerabbasi @seberg and @eric-wieser for feedback here!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants