Skip to content

ENH: __array_function__ support for most of numpy.core #12115

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Oct 11, 2018

Conversation

shoyer
Copy link
Member

@shoyer shoyer commented Oct 8, 2018

With the notable exceptions of np.einsum and np.block.

xref #12028

With the notable exceptions of np.einsum and np.block.

xref GH12028
@shoyer
Copy link
Member Author

shoyer commented Oct 8, 2018

Looks like I also missed defchararray.py, which I'll leave that for a follow-on PR.

@charris
Copy link
Member

charris commented Oct 8, 2018

Could you write up a section in the release notes for all these related PRs?

@charris
Copy link
Member

charris commented Oct 8, 2018

Also apply labels for the PRs.

@shoyer
Copy link
Member Author

shoyer commented Oct 8, 2018

I made a new label for __array_function__ PRs.

I'll add the release notes in another PR. I can do that now if you prefer, but note that it's on the checklist in #12028, and that checklist really should be entirely complete before we release this in any form. If we're not ready in time for the next major release, we should replace array_function_dispatch with a no-op that returns the original function to disable all of this.

return (a,)


@array_function_dispatch(_array2string_dispatcher)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could the dispatcher be written

def default1_dispatcher(*args, **kwargs):
    return args[:1]

Then maybe a few of those could be put into overrides, imported, and reused. Of course there may be cases for a different dispatcher in special cases.

Copy link
Member Author

@shoyer shoyer Oct 9, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a good idea, but even though most numpy functions have a meaningless first argument name, technically you can still use them as a keyword argument, e.g., numpy.sum(a=array), in which args would be empty. So we need to have an exactly matching function signature, including the same argument names.

Copy link
Member

@mattip mattip Oct 9, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At least for python 3.3+, we could use inspect.signature and set the dispatcher.__signature__ as in this stack overflow answer, or is that too magical?

Edit: formatting

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if signature rewriting would let us restore the original traceback when the wrong arguments are supplied. With the current version of NumPy:

In [2]: np.argmax(x=[1, 2, 3])
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-2-d25c18244167> in <module>()
----> 1 np.argmax(x=[1, 2, 3])

TypeError: argmax() got an unexpected keyword argument 'x'

but after this PR:

In [2]: np.argmax(x=[1, 2, 3])
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-2-d25c18244167> in <module>()
----> 1 np.argmax(x=[1, 2, 3])

~/dev/numpy/numpy/core/overrides.py in public_api(*args, **kwargs)
    147         @functools.wraps(implementation)
    148         def public_api(*args, **kwargs):
--> 149             relevant_args = dispatcher(*args, **kwargs)
    150             return array_function_implementation_or_override(
    151                 implementation, public_api, relevant_args, args, kwargs)

TypeError: _argmax_dispatcher() got an unexpected keyword argument 'x'

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Replacing public_api.__signature__ doesn't effect the error message, but it looks like this is something that the decorator library would fix. So maybe that's worth doing after all...

Copy link
Contributor

@hameerabbasi hameerabbasi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would prefer if we shared dispatchers across functions with identical signatures, such as argmin/argmax and atleast_?d, it'd be less code and less maintenance, but otherwise this looks okay to me.

@shoyer
Copy link
Member Author

shoyer commented Oct 9, 2018

I would prefer if we shared dispatchers across functions with identical signatures, such as argmin/argmax and atleast_?d, it'd be less code and less maintenance

I thought about this, but I leaned against it only because I think it's actually clearer to repeat one-line dispatcher functions right above the functions they decorate, rather than hunting for a dispatcher function defined very far away. But you're probably right that this could make sense for very closely related functions like argmin/argmax.

@hameerabbasi
Copy link
Contributor

hunting for a dispatcher function

Depends who uses what IDE and how used they are to navigating code I guess...

this could make sense for very closely related functions like argmin/argmax

I actually meant this for families of functions that are highly likely to track the same signature over time.

@charris
Copy link
Member

charris commented Oct 9, 2018

What is the decorated function is deprecated? If the function itself is never called that warning will never be issued.

@shoyer
Copy link
Member Author

shoyer commented Oct 9, 2018

What is the decorated function is deprecated? If the function itself is never called that warning will never be issued.

In that case, we would have to add the deprecation warning to the wrapper generated by array_function_dispatch, e.g., by adding an optional argument to array_function_dispatch for a function that is called before dispatch. Fortunately, this is quite rare for NumPy itself.

@shoyer
Copy link
Member Author

shoyer commented Oct 9, 2018

this could make sense for very closely related functions like argmin/argmax

I actually meant this for families of functions that are highly likely to track the same signature over time.

To give a little more context, when I generated this PR by copying and pasting each last dispatcher function to write the next one, I was by surprised by how often function signatures changed, at least in some subtle way.

For example, if you look at NumPy's "statistical functions" (https://docs.scipy.org/doc/numpy-1.15.0/reference/routines.statistics.html), most of them have similar but subtly different signatures. Of those with the same signature, some are defined across multiple files (e.g., percentile/nanpercentile).

The most common class of functions with the same signature are ufuncs, but they already have their own dispatch mechanism.

I like keeping the dispatcher functions very near function definitions, like docstrings or type annotations. It is indeed a little more verbose, but the logic is clearer and it makes it easier to verify that everything is in sync. The tradeoff is that we can't share implementations for dispatchers, but there are very few cases were that would save more than a few lines of code (and I'm happy to make exceptions of those).

@shoyer
Copy link
Member Author

shoyer commented Oct 9, 2018

What is the decorated function is deprecated? If the function itself is never called that warning will never be issued.

In that case, we would have to add the deprecation warning to the wrapper generated by array_function_dispatch, e.g., by adding an optional argument to array_function_dispatch for a function that is called before dispatch. Fortunately, this is quite rare for NumPy itself.

Actually, as @hameerabbasi points out in #12119, we can do this by putting the deprecation warning in the dispatcher function, which is always called.

@shoyer
Copy link
Member Author

shoyer commented Oct 11, 2018

@charris see #12148 for draft release notes

@charris
Copy link
Member

charris commented Oct 11, 2018

Is everyone happy with this set of PRs as a first cut? If so, I'll go ahead and put them in.

@shoyer @hameerabbasi @mhvk, et al, Ready?

@shoyer
Copy link
Member Author

shoyer commented Oct 11, 2018

I think everything except #12119 is ready to merge now.

Copy link
Contributor

@mhvk mhvk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this looks good, and merging it will make it much easier to try it.

@charris
Copy link
Member

charris commented Oct 11, 2018

Let's start with this one then. Thanks Stephan, and all the reviewers.

@charris charris merged commit eb2bd11 into numpy:master Oct 11, 2018
@ahaldane
Copy link
Member

ahaldane commented Oct 12, 2018

With this implementation, how do I get __array_function__ to simply call the original numpy implementation of the function? (sorry if this is obvious or off topic, not sure where to ask)

Example

For example, I was testing a nep-18 ducktype, and I want to implement its repr as

    def __repr__(self):
        return np.array_repr(self)

since the original numpy implementation should work fine. I tried adding (using the NEP-18 boilerplate):

@implements(np.array_repr)
def array_repr(arr, max_line_width=None, precision=None, suppress_small=None):
    return np.array_repr(arr, max_line_width, precision, suppress_small)

but this causes infinite recursion between np.array_repr and my new array_repr. Without that definition, I get TypeError: no implementation found for <function array_repr .... Before this PR, a call to repr(myducktypeobj) worked (edit.. with a one-line change in arrayprint).

Modified Api?

I may be missing something, but it seems like what we did in this PR mixes up the API and the implementation: There is no way to refer to the core numpy implementation of np.core.arrayprint.array_repr separately from the np.array_repr api function. Would it make sense to define things as follows?

In arrayprint.py:

def array_str(a):
    ... (code) ...

in api.py:

from numpy.core.arrayprint import array_str as core_array_str

def _array_str_dispatcher(a):
    return (a,)

@array_function_dispatch(_array_str_dispatcher)
def array_str(a):
    return core_array_str(a)

@hameerabbasi
Copy link
Contributor

@ahaldane Is it a duck-type or a subclass? If it's a duck-type, that makes sense, you're just calling back to the original object, so it'll try the dispatch again and so on ad-infinitum. If you've composed or wrapped an array, you might want to call it on the wrapped array instead.

If you're doing this for a subclass, you might want to use super.

@mhvk
Copy link
Contributor

mhvk commented Oct 12, 2018

@ahaldane - your suggestion is one I made as well, to have separate api and implementation namespaces; it is explicitly excluded in the NEP. Arguably correctly, as it is too much work to keep both consistent. Though perhaps one should revisit the ability to access the implementation, either using @hameerabbasi's NotImplementedButCoercible or through exposing the implementation somewhere on the api function (or in the __array_function__ arguments).

Anyway, the current idea is that if you want to use the ndarray implementation, you have to turn yourself into a regular array and call again, so inside your implementation, it would be

# for duck-type
return function(np.asarray(duckarray))
# alternative for subclass (which eventually should be faster)
arr = subclass.view(np.ndarray)
super().__array_function__(function, (), arr)

@hameerabbasi
Copy link
Contributor

We can also do something like the following:

def magic_func(func):
    def wrapped(a, b):
        return a * b
    
    wrapped.original = func
    
    return wrapped

@magic_func
def func(a, b):
    return a + b

func.original(5, 3) # 8

@ahaldane
Copy link
Member

ahaldane commented Oct 12, 2018

It is a ducktype, and it cannot be easily cast to an ndarray without losing important properties. (To test array_function, I am playing with an "ArrayCollection" which behaves almost identically to a structured array, but stores each field as a separate array. So it supports concatenate, reshape, nd-indexing etc).

I just got it working by going in to arrayprint.py and defining array_repr_impl, array_str_impl and array2string_impl and then wrapping them with array_repr and so on. Maybe we can expose the numpy implementations on a case-by-case basis in that way. I expect that most numpy functions cannot be reused for ducktypes, but the arrayprint ones are special because they are pure-python.

@shoyer shoyer deleted the array-function-numpy-core branch October 12, 2018 20:54
@hameerabbasi
Copy link
Contributor

Then the proper course of action would be to call the NumPy function on the wrapped/constituent arrays.

@shoyer
Copy link
Member Author

shoyer commented Oct 12, 2018

We certainly could put the NumPy implementation into the __array_function__ interface, either as a separate argument or attribute of the public_api function. There are definitely cases for this, one of which was discussed in the NEP: http://www.numpy.org/neps/nep-0018-array-function-protocol.html#other-possible-choices-for-the-protocol

But how do you use np.array_repr() directly on non-NumPy arrays? I still don't entirely understand your use-case here.

@shoyer
Copy link
Member Author

shoyer commented Oct 12, 2018

@ahaldane By the way, big thanks for actually trying this out!

@ahaldane
Copy link
Member

@hameerabbasi: Not possible to call on the consituent arrays, the point is for the arrayprint code to treat the ArrayCollection just like a structured array, ant iterate over elements together.

Here is my working test code: https://github.com/ahaldane/NDducktype_tests

Here are the modifications I needed in numpy to get it to work: ahaldane@db612bc

@shoyer
Copy link
Member Author

shoyer commented Oct 12, 2018

@ahaldane OK, so you had to remove the asarray() call at the start of _array2string() to get this to work. And then it just happens that array2string() works properly on your duck array, even though of course this isn't official supported or documented.

I'm not sure this counts as a real issue from a NumPy perpsective :). I'm pretty sure we wouldn't want to remove np.asarray() internally anyways to avoid getting stuck supporting this behavior.

@mhvk
Copy link
Contributor

mhvk commented Oct 12, 2018

@ahaldane - so, your cases uses the nice duck typing in arrayprint... At least one piece of code written well! (just goes to show one should remove asarray everywhere... ;-)

I can definitely see how one would like to access that, but note that the obvious disadvantage is the one @shoyer notes just above, that it forces the implementation to continue to be so good in duck typing. One of the advantages of the __array_function__ approach is that if we find an implementation that is much better for ndarray but doesn't duck type, we can just use it. On the other hand, I guess your example serves as a case for a NotImplementedButCoercible return value (or, more appropriately here, JustPassMeThrough)

Overall, my feeling would be that the implementation should be accessible but that we shouldn't promise anything about it. In fact, of course it is accessible, via np.array2string.__wrapped__:

class A(np.ndarray):
    def __array_function__(self, *args, **kwargs):
        return NotImplemented

a = np.arange(10.).view(A)
np.array2string(a)
# TypeError, as expected
np.array2string.__wrapped__(a)
# '[0. 1. 2. 3. 4. 5. 6. 7. 8. 9.]'

Anyway, bottom-line suggestion from me would be to happily let this be the case, not advertise it, and if people use it and things stop working, it is their problem (and, yes, I fear I might well use it, at least for initial trials for Quantity, as a fall-back for the case where a function is not yet known...).

@shoyer
Copy link
Member Author

shoyer commented Oct 12, 2018

Anyway, bottom-line suggestion from me would be to happily let this be the case, not advertise it, and if people use it and things stop working, it is their problem (and, yes, I fear I might well use it, at least for initial trials for Quantity, as a fall-back for the case where a function is not yet known...).

This is very clever, I like this and agree.

@ahaldane
Copy link
Member

ahaldane commented Oct 12, 2018

I think the arrayprint machinery is one part of numpy we should aim to have work with ducktypes. Printing will be one of the first obstacles for any ducktype-creator, and the arrayprint code is already meant to generalize easily for subclasses and new dtypes. As demonstrated here, it already pretty much works!

In fact, last time we looked at overhauling MaskedArray, the printing machinery is one of the first things we looked at, and @eric-wieser has a branch somewhere which starts to generalize arrayprint.

@ahaldane
Copy link
Member

I should add, I worked out some of the ducktype requirements of array print (with my modification).

You need to implement appropriate .shape, .dtype, and .size attributes, and then you need to support numpy-style nd-indexing (with some tricky parts related to 0d arrays).

@shoyer
Copy link
Member Author

shoyer commented Oct 12, 2018

OK, I will trust you on the interface here; I don't disagree with you here :). In that case let me refer you to this section of the NEP.

Would your duck array be a NEP 22 "full" or "partial" duck array?

@ahaldane
Copy link
Member

Reading those sections, I can see there are a lot of complicated decisions to make! (I'm not sure if ArrayCollection would aim to be a full or partial duckarray.)

Whether or not we allow partial ducktypes, or whether or not we define a "core" api, it seems that we could at least document for particular subcomponents (like arrayprint) what the ducktype requirements or guarantees are. Different components of numpy might have different ducktype requirements.

So, for arrayprint, I am imagining we would explain somewhere that you need the dtype/size/shape attributes and corresponding ndindexing behavior, and no more. That list is short enough that this dream seems maybe feasible.

Anyway, I will continue playing with ArrayCollection, and maybe also a MaskedArray ducktype if anyone is interested. The MaskedArray ducktype will want to use the same changes to arrayprint.py as I needed for ArrayCollection.

@charris charris changed the title ENH: __array_function__ support for most of numpy.core ENH: __array_function__ support for most of numpy.core Nov 10, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants