-
-
Notifications
You must be signed in to change notification settings - Fork 10.8k
ENH: __array_function__
support for most of numpy.core
#12115
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
With the notable exceptions of np.einsum and np.block. xref GH12028
Looks like I also missed |
Could you write up a section in the release notes for all these related PRs? |
Also apply labels for the PRs. |
I made a new label for I'll add the release notes in another PR. I can do that now if you prefer, but note that it's on the checklist in #12028, and that checklist really should be entirely complete before we release this in any form. If we're not ready in time for the next major release, we should replace |
return (a,) | ||
|
||
|
||
@array_function_dispatch(_array2string_dispatcher) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could the dispatcher be written
def default1_dispatcher(*args, **kwargs):
return args[:1]
Then maybe a few of those could be put into overrides, imported, and reused. Of course there may be cases for a different dispatcher in special cases.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a good idea, but even though most numpy functions have a meaningless first argument name, technically you can still use them as a keyword argument, e.g., numpy.sum(a=array)
, in which args
would be empty. So we need to have an exactly matching function signature, including the same argument names.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
At least for python 3.3+, we could use inspect.signature
and set the dispatcher.__signature__
as in this stack overflow answer, or is that too magical?
Edit: formatting
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if signature rewriting would let us restore the original traceback when the wrong arguments are supplied. With the current version of NumPy:
In [2]: np.argmax(x=[1, 2, 3])
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-2-d25c18244167> in <module>()
----> 1 np.argmax(x=[1, 2, 3])
TypeError: argmax() got an unexpected keyword argument 'x'
but after this PR:
In [2]: np.argmax(x=[1, 2, 3])
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-2-d25c18244167> in <module>()
----> 1 np.argmax(x=[1, 2, 3])
~/dev/numpy/numpy/core/overrides.py in public_api(*args, **kwargs)
147 @functools.wraps(implementation)
148 def public_api(*args, **kwargs):
--> 149 relevant_args = dispatcher(*args, **kwargs)
150 return array_function_implementation_or_override(
151 implementation, public_api, relevant_args, args, kwargs)
TypeError: _argmax_dispatcher() got an unexpected keyword argument 'x'
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Replacing public_api.__signature__
doesn't effect the error message, but it looks like this is something that the decorator library would fix. So maybe that's worth doing after all...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would prefer if we shared dispatchers across functions with identical signatures, such as argmin
/argmax
and atleast_?d
, it'd be less code and less maintenance, but otherwise this looks okay to me.
I thought about this, but I leaned against it only because I think it's actually clearer to repeat one-line dispatcher functions right above the functions they decorate, rather than hunting for a dispatcher function defined very far away. But you're probably right that this could make sense for very closely related functions like argmin/argmax. |
Depends who uses what IDE and how used they are to navigating code I guess...
I actually meant this for families of functions that are highly likely to track the same signature over time. |
What is the decorated function is deprecated? If the function itself is never called that warning will never be issued. |
In that case, we would have to add the deprecation warning to the wrapper generated by |
To give a little more context, when I generated this PR by copying and pasting each last dispatcher function to write the next one, I was by surprised by how often function signatures changed, at least in some subtle way. For example, if you look at NumPy's "statistical functions" (https://docs.scipy.org/doc/numpy-1.15.0/reference/routines.statistics.html), most of them have similar but subtly different signatures. Of those with the same signature, some are defined across multiple files (e.g., The most common class of functions with the same signature are ufuncs, but they already have their own dispatch mechanism. I like keeping the dispatcher functions very near function definitions, like docstrings or type annotations. It is indeed a little more verbose, but the logic is clearer and it makes it easier to verify that everything is in sync. The tradeoff is that we can't share implementations for dispatchers, but there are very few cases were that would save more than a few lines of code (and I'm happy to make exceptions of those). |
Actually, as @hameerabbasi points out in #12119, we can do this by putting the deprecation warning in the dispatcher function, which is always called. |
Is everyone happy with this set of PRs as a first cut? If so, I'll go ahead and put them in. @shoyer @hameerabbasi @mhvk, et al, Ready? |
I think everything except #12119 is ready to merge now. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this looks good, and merging it will make it much easier to try it.
Let's start with this one then. Thanks Stephan, and all the reviewers. |
With this implementation, how do I get ExampleFor example, I was testing a nep-18 ducktype, and I want to implement its repr as def __repr__(self):
return np.array_repr(self) since the original numpy implementation should work fine. I tried adding (using the NEP-18 boilerplate): @implements(np.array_repr)
def array_repr(arr, max_line_width=None, precision=None, suppress_small=None):
return np.array_repr(arr, max_line_width, precision, suppress_small) but this causes infinite recursion between Modified Api?I may be missing something, but it seems like what we did in this PR mixes up the API and the implementation: There is no way to refer to the core numpy implementation of In def array_str(a):
... (code) ... in from numpy.core.arrayprint import array_str as core_array_str
def _array_str_dispatcher(a):
return (a,)
@array_function_dispatch(_array_str_dispatcher)
def array_str(a):
return core_array_str(a) |
@ahaldane Is it a duck-type or a subclass? If it's a duck-type, that makes sense, you're just calling back to the original object, so it'll try the dispatch again and so on ad-infinitum. If you've composed or wrapped an array, you might want to call it on the wrapped array instead. If you're doing this for a subclass, you might want to use |
@ahaldane - your suggestion is one I made as well, to have separate Anyway, the current idea is that if you want to use the
|
We can also do something like the following: def magic_func(func):
def wrapped(a, b):
return a * b
wrapped.original = func
return wrapped
@magic_func
def func(a, b):
return a + b
func.original(5, 3) # 8 |
It is a ducktype, and it cannot be easily cast to an ndarray without losing important properties. (To test array_function, I am playing with an "ArrayCollection" which behaves almost identically to a structured array, but stores each field as a separate array. So it supports concatenate, reshape, nd-indexing etc). I just got it working by going in to |
Then the proper course of action would be to call the NumPy function on the wrapped/constituent arrays. |
We certainly could put the NumPy implementation into the But how do you use |
@ahaldane By the way, big thanks for actually trying this out! |
@hameerabbasi: Not possible to call on the consituent arrays, the point is for the arrayprint code to treat the ArrayCollection just like a structured array, ant iterate over elements together. Here is my working test code: https://github.com/ahaldane/NDducktype_tests Here are the modifications I needed in numpy to get it to work: ahaldane@db612bc |
@ahaldane OK, so you had to remove the I'm not sure this counts as a real issue from a NumPy perpsective :). I'm pretty sure we wouldn't want to remove |
@ahaldane - so, your cases uses the nice duck typing in I can definitely see how one would like to access that, but note that the obvious disadvantage is the one @shoyer notes just above, that it forces the implementation to continue to be so good in duck typing. One of the advantages of the Overall, my feeling would be that the implementation should be accessible but that we shouldn't promise anything about it. In fact, of course it is accessible, via
Anyway, bottom-line suggestion from me would be to happily let this be the case, not advertise it, and if people use it and things stop working, it is their problem (and, yes, I fear I might well use it, at least for initial trials for |
This is very clever, I like this and agree. |
I think the arrayprint machinery is one part of numpy we should aim to have work with ducktypes. Printing will be one of the first obstacles for any ducktype-creator, and the arrayprint code is already meant to generalize easily for subclasses and new dtypes. As demonstrated here, it already pretty much works! In fact, last time we looked at overhauling MaskedArray, the printing machinery is one of the first things we looked at, and @eric-wieser has a branch somewhere which starts to generalize arrayprint. |
I should add, I worked out some of the ducktype requirements of array print (with my modification). You need to implement appropriate |
OK, I will trust you on the interface here; I don't disagree with you here :). In that case let me refer you to this section of the NEP. Would your duck array be a NEP 22 "full" or "partial" duck array? |
Reading those sections, I can see there are a lot of complicated decisions to make! (I'm not sure if ArrayCollection would aim to be a full or partial duckarray.) Whether or not we allow partial ducktypes, or whether or not we define a "core" api, it seems that we could at least document for particular subcomponents (like arrayprint) what the ducktype requirements or guarantees are. Different components of numpy might have different ducktype requirements. So, for arrayprint, I am imagining we would explain somewhere that you need the Anyway, I will continue playing with ArrayCollection, and maybe also a MaskedArray ducktype if anyone is interested. The MaskedArray ducktype will want to use the same changes to arrayprint.py as I needed for ArrayCollection. |
__array_function__
support for most of numpy.core
With the notable exceptions of np.einsum and np.block.
xref #12028