-
-
Notifications
You must be signed in to change notification settings - Fork 10.8k
ENH: Add __array_ufunc__
#8247
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ENH: Add __array_ufunc__
#8247
Conversation
I'm all for getting this into 1.13, but I don't get why we would want to On Nov 6, 2016 9:58 AM, "Charles Harris" notifications@github.com wrote:
|
My thought is to have it there so we can work on it and have it tested against. If things breaks, tests will break and will need fixing anyway. I think trying to work this out in one big commit would be difficult and I would like to get an early start. However, there are a couple of things we might want to put in here before merging this:
Other changes, such as adding an override, have less effect downstream and can come in later. Currently priority still plays a role and will need a deprecation cycle. I suppose we need to finalize what the function looks like, then argue about the binops. |
I would agree with @njsmith in principle, but with @charris in practice. If we don't re-enable it on master, there's a good chance it will never make it in. However, at the moment it's very hard to judge how far off we still are. Would be good to have a concise summary of the open points and a plan about who/how to approach those in broad lines before hitting the green button here. |
IIRC, the plan we sort-of converged to try was to (i) rename
numpy_ufunc, (ii) remove the index argument, (iii) in Python binops,
make it so that ndarray and its subclasses always defer to numpy_ufunc,
if present. If numpy_ufunc ends up with NotImplemented, this results to
a TypeError, never falling back to Python resolution.
.
The last point takes a similar (and backward-compatible) view as Numpy
does currently vs. treating all non-array Python objects similarly as
shape=() object arrays --- in the Python binop resolution hierarchy,
arrays always win over non-arrays. I think this is not the only possible
choice, but it was preferred over the alternatives by @njsmith and @shoyer.
.
I think the question of numpy scalars did not come up so far in the
discussion, so that's an open point.
.
I think the points (i-iii) were discussed to death last time, so unless
there's something new to discuss, I think we shouldn't reopen that.
|
@pv I think the sticking point was the last. My own preference would be to have an opt out mechanism with the usual NotImplemented return. The reason is twofold, first, at the present time, many Objects opt out by using |
Yes, that is the point where we failed to find consensus on what is the optimum. It's quite possible that if we start again, the issue will not be clarified, so the decision needs to be made without convergence (and with an escape route in mind). I think there were some responses to your specific point above in the old huge discussion (rare use case to handle numpy stuff without knowing about numpy, provide dummy implementation in numpy?), plus alternative proposals for the opt-out. |
5cf3706
to
88845c0
Compare
I've left the NEP alone as an historical document, even though names and arguments have changed. |
Agreed with @pv's summary. And that the remaining unresolved issue was with treating items that become object arrays. If I recall correctly, plans are now afoot to disallow automatic creation of object arrays with |
I've removed the index. For Note on documentation, there looks to be very little any apart from the NEP, which is now outdated. EDIT: Looks like the tuple needs to contain exactly as many entries, some possibly None, as the function has, so I guess this means always having |
50b585f
to
07b0e5c
Compare
It looks to me like the |
OK, there are complications with
Note that all of these can be passed by position or by keyword, there is no simple division between the two except that everything after the last positional argument must be passed by keyword. ISTM that from the user end that we should have a definite division into required (positional) and optional (keyword) arguments. That is not so difficult to achieve, but is likely to add call overhead. I think that the simplest way to implement this is to use Thoughts? |
I agree that all optional arguments should always end up in One question: can even an array passed in via |
It would be useful to allow where= to be a sparse array. On Nov 14, 2016 2:09 PM, "Marten van Kerkwijk" notifications@github.com
|
@mhvk I didn't say there needs to be an |
What exactly do you mean by that? Are you suggesting position-only arguments, keyword-only arguments, or both? Is there any precedent elsewhere in numpy for keyword-only arguments (when not combined with vararg functions)? |
This is for the arguments passed into |
Checking, I see that the override function already takes care of the argument normalization. |
@pv The EDIT: OK, I think we need to preserve the call as the array reshape might fail for some inputs, so the arguments need a proper normalization. |
The input may contain objects for which reshaping may not be even
defined, so preferably it should dispatch once, immediately at the
beginning of the method.
|
Hmm, there is no error checking in the normalization functions, not even if there are sufficient positional arguments. All of the python C-API functions may error, even EDIT: And the positional arguments could be passed as keywords. Should we error on that? |
The default values of optional arguments are also not set, should we handle those? |
I'd say we should not set anything unless it is explicitly passed in, if only to ensure that if |
c33d9c4
to
32221df
Compare
Merged. Thanks to all for pulling this off, and a special thanks to Marten for sheparding the process to completion. Further modifications and enhancements should go through the normal PR process. |
Thanks @charris. Just to be 100% clear: I think we've very close, but before releasing NumPy we need to change the ufunc override rules so that encountering any argument with |
I am very glad to see this. Thank you for pushing this through. I'm now looking forward to the 1.13 release more than any other release that I can recall :) |
Do we want to get |
@eric-wieser I think it is best to leave applications to future releases as they are likely to take a fair amount of time to settle as people experiment and gain experience. |
I think this is a definitely worth trying. Certainly implementing |
🎆 🎆 🎆 I'd actually not make further changes right now -- we do note that the thing is provisional after all. Definitely, redoing |
To be clear, I think it's worth trying to redo |
@shoyer: Was the use of the word redo intentional? It's tempting to make a new |
@eric-wieser ask @mhvk, he used "redo" first :) |
I'm definitely tempted to redo -- that way we also don't have to worry about, e.g., introducing views of masks instead of copies. And the code could become substantially more agnostic about what the items being masked actually are. Anyway, for another issue/PR... |
See #9014 for a follow-up PR, changing the behavior of |
If re-doing It definitely would be an interesting experiment. |
* ENH: NDArrayOperatorsMixin calls ufuncs directly, like ndarray Per our discussion in numpy#8247 (comment) * add back in accidentally dropped __repr__
I have a naive implementation of this up for dask.array here: dask/dask#2438 In [1]: import dask.array as da, numpy as np
In [2]: x = np.arange(24).reshape((4, 6))
In [3]: np.sum(np.sin(x), axis=0)
Out[3]:
array([-1.56697566, 2.06850183, 3.80220828, 2.04018197, -1.59757823,
-3.76653238])
In [4]: d = da.arange(24, chunks=(4,)).reshape((4, 6))
In [5]: np.sum(np.sin(d), axis=0)
Out[5]: dask.array<sum-aggregate, shape=(6,), dtype=float64, chunksize=(6,)>
In [6]: _.compute()
Out[6]:
array([-1.56697566, 2.06850183, 3.80220828, 2.04018197, -1.59757823,
-3.76653238]) |
@mrocklin Have you been happy with the implementation so far? |
@charris very. My first attempt took about ten lines to reroute dask.array's existing functions to be called when the user uses the proper numpy equivalent. I suspect that with a bit more work I can remove dask's functions entirely so that |
@mrocklin -- Great that it seemed relatively easy! Do ping if there seem to be things that do not work quite as expected. |
This reverts commit bac094c and enables
__numpy_ufunc__
for development in the NumPy 1.13.0 development cycle.EDIT: Note that the name has been changed to
__array_ufunc__
together with various changes to the function signature and implementation.