ENH: `__array_function__` support for most of `numpy.core` #12115

shoyer · 2018-10-08T19:35:26Z

With the notable exceptions of np.einsum and np.block.

xref #12028

With the notable exceptions of np.einsum and np.block. xref GH12028

shoyer · 2018-10-08T20:49:26Z

Looks like I also missed defchararray.py, which I'll leave that for a follow-on PR.

charris · 2018-10-08T20:50:18Z

Could you write up a section in the release notes for all these related PRs?

charris · 2018-10-08T20:56:00Z

Also apply labels for the PRs.

shoyer · 2018-10-08T21:03:41Z

I made a new label for __array_function__ PRs.

I'll add the release notes in another PR. I can do that now if you prefer, but note that it's on the checklist in #12028, and that checklist really should be entirely complete before we release this in any form. If we're not ready in time for the next major release, we should replace array_function_dispatch with a no-op that returns the original function to disable all of this.

mattip · 2018-10-08T23:49:42Z

numpy/core/arrayprint.py

+    return (a,)
+
+
+@array_function_dispatch(_array2string_dispatcher)


Could the dispatcher be written

def default1_dispatcher(*args, **kwargs): return args[:1]

Then maybe a few of those could be put into overrides, imported, and reused. Of course there may be cases for a different dispatcher in special cases.

This is a good idea, but even though most numpy functions have a meaningless first argument name, technically you can still use them as a keyword argument, e.g., numpy.sum(a=array), in which args would be empty. So we need to have an exactly matching function signature, including the same argument names.

At least for python 3.3+, we could use inspect.signature and set the dispatcher.__signature__ as in this stack overflow answer, or is that too magical?

Edit: formatting

I wonder if signature rewriting would let us restore the original traceback when the wrong arguments are supplied. With the current version of NumPy:

In [2]: np.argmax(x=[1, 2, 3]) --------------------------------------------------------------------------- TypeError Traceback (most recent call last) <ipython-input-2-d25c18244167> in <module>() ----> 1 np.argmax(x=[1, 2, 3]) TypeError: argmax() got an unexpected keyword argument 'x'

but after this PR:

In [2]: np.argmax(x=[1, 2, 3]) --------------------------------------------------------------------------- TypeError Traceback (most recent call last) <ipython-input-2-d25c18244167> in <module>() ----> 1 np.argmax(x=[1, 2, 3]) ~/dev/numpy/numpy/core/overrides.py in public_api(*args, **kwargs) 147 @functools.wraps(implementation) 148 def public_api(*args, **kwargs): --> 149 relevant_args = dispatcher(*args, **kwargs) 150 return array_function_implementation_or_override( 151 implementation, public_api, relevant_args, args, kwargs) TypeError: _argmax_dispatcher() got an unexpected keyword argument 'x'

Replacing public_api.__signature__ doesn't effect the error message, but it looks like this is something that the decorator library would fix. So maybe that's worth doing after all...

hameerabbasi

I would prefer if we shared dispatchers across functions with identical signatures, such as argmin/argmax and atleast_?d, it'd be less code and less maintenance, but otherwise this looks okay to me.

shoyer · 2018-10-09T13:53:23Z

I would prefer if we shared dispatchers across functions with identical signatures, such as argmin/argmax and atleast_?d, it'd be less code and less maintenance

I thought about this, but I leaned against it only because I think it's actually clearer to repeat one-line dispatcher functions right above the functions they decorate, rather than hunting for a dispatcher function defined very far away. But you're probably right that this could make sense for very closely related functions like argmin/argmax.

hameerabbasi · 2018-10-09T14:13:12Z

hunting for a dispatcher function

Depends who uses what IDE and how used they are to navigating code I guess...

this could make sense for very closely related functions like argmin/argmax

I actually meant this for families of functions that are highly likely to track the same signature over time.

charris · 2018-10-09T15:09:03Z

What is the decorated function is deprecated? If the function itself is never called that warning will never be issued.

shoyer · 2018-10-09T15:26:21Z

What is the decorated function is deprecated? If the function itself is never called that warning will never be issued.

In that case, we would have to add the deprecation warning to the wrapper generated by array_function_dispatch, e.g., by adding an optional argument to array_function_dispatch for a function that is called before dispatch. Fortunately, this is quite rare for NumPy itself.

shoyer · 2018-10-09T15:41:38Z

this could make sense for very closely related functions like argmin/argmax

I actually meant this for families of functions that are highly likely to track the same signature over time.

To give a little more context, when I generated this PR by copying and pasting each last dispatcher function to write the next one, I was by surprised by how often function signatures changed, at least in some subtle way.

For example, if you look at NumPy's "statistical functions" (https://docs.scipy.org/doc/numpy-1.15.0/reference/routines.statistics.html), most of them have similar but subtly different signatures. Of those with the same signature, some are defined across multiple files (e.g., percentile/nanpercentile).

The most common class of functions with the same signature are ufuncs, but they already have their own dispatch mechanism.

I like keeping the dispatcher functions very near function definitions, like docstrings or type annotations. It is indeed a little more verbose, but the logic is clearer and it makes it easier to verify that everything is in sync. The tradeoff is that we can't share implementations for dispatchers, but there are very few cases were that would save more than a few lines of code (and I'm happy to make exceptions of those).

shoyer · 2018-10-09T15:44:03Z

What is the decorated function is deprecated? If the function itself is never called that warning will never be issued.

In that case, we would have to add the deprecation warning to the wrapper generated by array_function_dispatch, e.g., by adding an optional argument to array_function_dispatch for a function that is called before dispatch. Fortunately, this is quite rare for NumPy itself.

Actually, as @hameerabbasi points out in #12119, we can do this by putting the deprecation warning in the dispatcher function, which is always called.

shoyer · 2018-10-11T15:56:01Z

@charris see #12148 for draft release notes

charris · 2018-10-11T18:27:54Z

Is everyone happy with this set of PRs as a first cut? If so, I'll go ahead and put them in.

@shoyer @hameerabbasi @mhvk, et al, Ready?

shoyer · 2018-10-11T18:38:32Z

I think everything except #12119 is ready to merge now.

mhvk

I think this looks good, and merging it will make it much easier to try it.

charris · 2018-10-11T19:01:00Z

Let's start with this one then. Thanks Stephan, and all the reviewers.

ahaldane · 2018-10-12T20:01:27Z

With this implementation, how do I get __array_function__ to simply call the original numpy implementation of the function? (sorry if this is obvious or off topic, not sure where to ask)

Example

For example, I was testing a nep-18 ducktype, and I want to implement its repr as

    def __repr__(self):
        return np.array_repr(self)

since the original numpy implementation should work fine. I tried adding (using the NEP-18 boilerplate):

@implements(np.array_repr)
def array_repr(arr, max_line_width=None, precision=None, suppress_small=None):
    return np.array_repr(arr, max_line_width, precision, suppress_small)

but this causes infinite recursion between np.array_repr and my new array_repr. Without that definition, I get TypeError: no implementation found for <function array_repr .... Before this PR, a call to repr(myducktypeobj) worked (edit.. with a one-line change in arrayprint).

Modified Api?

I may be missing something, but it seems like what we did in this PR mixes up the API and the implementation: There is no way to refer to the core numpy implementation of np.core.arrayprint.array_repr separately from the np.array_repr api function. Would it make sense to define things as follows?

In arrayprint.py:

def array_str(a):
    ... (code) ...

in api.py:

from numpy.core.arrayprint import array_str as core_array_str

def _array_str_dispatcher(a):
    return (a,)

@array_function_dispatch(_array_str_dispatcher)
def array_str(a):
    return core_array_str(a)

hameerabbasi · 2018-10-12T20:06:29Z

@ahaldane Is it a duck-type or a subclass? If it's a duck-type, that makes sense, you're just calling back to the original object, so it'll try the dispatch again and so on ad-infinitum. If you've composed or wrapped an array, you might want to call it on the wrapped array instead.

If you're doing this for a subclass, you might want to use super.

mhvk · 2018-10-12T20:38:54Z

@ahaldane - your suggestion is one I made as well, to have separate api and implementation namespaces; it is explicitly excluded in the NEP. Arguably correctly, as it is too much work to keep both consistent. Though perhaps one should revisit the ability to access the implementation, either using @hameerabbasi's NotImplementedButCoercible or through exposing the implementation somewhere on the api function (or in the __array_function__ arguments).

Anyway, the current idea is that if you want to use the ndarray implementation, you have to turn yourself into a regular array and call again, so inside your implementation, it would be

# for duck-type
return function(np.asarray(duckarray))
# alternative for subclass (which eventually should be faster)
arr = subclass.view(np.ndarray)
super().__array_function__(function, (), arr)

hameerabbasi · 2018-10-12T20:45:30Z

We can also do something like the following:

def magic_func(func):
    def wrapped(a, b):
        return a * b
    
    wrapped.original = func
    
    return wrapped

@magic_func
def func(a, b):
    return a + b

func.original(5, 3) # 8

ahaldane · 2018-10-12T20:52:16Z

It is a ducktype, and it cannot be easily cast to an ndarray without losing important properties. (To test array_function, I am playing with an "ArrayCollection" which behaves almost identically to a structured array, but stores each field as a separate array. So it supports concatenate, reshape, nd-indexing etc).

I just got it working by going in to arrayprint.py and defining array_repr_impl, array_str_impl and array2string_impl and then wrapping them with array_repr and so on. Maybe we can expose the numpy implementations on a case-by-case basis in that way. I expect that most numpy functions cannot be reused for ducktypes, but the arrayprint ones are special because they are pure-python.

hameerabbasi · 2018-10-12T20:55:09Z

Then the proper course of action would be to call the NumPy function on the wrapped/constituent arrays.

shoyer · 2018-10-12T20:59:12Z

We certainly could put the NumPy implementation into the __array_function__ interface, either as a separate argument or attribute of the public_api function. There are definitely cases for this, one of which was discussed in the NEP: http://www.numpy.org/neps/nep-0018-array-function-protocol.html#other-possible-choices-for-the-protocol

But how do you use np.array_repr() directly on non-NumPy arrays? I still don't entirely understand your use-case here.

shoyer · 2018-10-12T21:00:14Z

@ahaldane By the way, big thanks for actually trying this out!

ahaldane · 2018-10-12T21:04:23Z

@hameerabbasi: Not possible to call on the consituent arrays, the point is for the arrayprint code to treat the ArrayCollection just like a structured array, ant iterate over elements together.

Here is my working test code: https://github.com/ahaldane/NDducktype_tests

Here are the modifications I needed in numpy to get it to work: ahaldane@db612bc

shoyer · 2018-10-12T21:09:34Z

@ahaldane OK, so you had to remove the asarray() call at the start of _array2string() to get this to work. And then it just happens that array2string() works properly on your duck array, even though of course this isn't official supported or documented.

I'm not sure this counts as a real issue from a NumPy perpsective :). I'm pretty sure we wouldn't want to remove np.asarray() internally anyways to avoid getting stuck supporting this behavior.

mhvk · 2018-10-12T21:13:52Z

@ahaldane - so, your cases uses the nice duck typing in arrayprint... At least one piece of code written well! (just goes to show one should remove asarray everywhere... ;-)

I can definitely see how one would like to access that, but note that the obvious disadvantage is the one @shoyer notes just above, that it forces the implementation to continue to be so good in duck typing. One of the advantages of the __array_function__ approach is that if we find an implementation that is much better for ndarray but doesn't duck type, we can just use it. On the other hand, I guess your example serves as a case for a NotImplementedButCoercible return value (or, more appropriately here, JustPassMeThrough)

Overall, my feeling would be that the implementation should be accessible but that we shouldn't promise anything about it. In fact, of course it is accessible, via np.array2string.__wrapped__:

class A(np.ndarray):
    def __array_function__(self, *args, **kwargs):
        return NotImplemented

a = np.arange(10.).view(A)
np.array2string(a)
# TypeError, as expected
np.array2string.__wrapped__(a)
# '[0. 1. 2. 3. 4. 5. 6. 7. 8. 9.]'

Anyway, bottom-line suggestion from me would be to happily let this be the case, not advertise it, and if people use it and things stop working, it is their problem (and, yes, I fear I might well use it, at least for initial trials for Quantity, as a fall-back for the case where a function is not yet known...).

shoyer · 2018-10-12T21:16:59Z

Anyway, bottom-line suggestion from me would be to happily let this be the case, not advertise it, and if people use it and things stop working, it is their problem (and, yes, I fear I might well use it, at least for initial trials for Quantity, as a fall-back for the case where a function is not yet known...).

This is very clever, I like this and agree.

ahaldane · 2018-10-12T21:17:10Z

I think the arrayprint machinery is one part of numpy we should aim to have work with ducktypes. Printing will be one of the first obstacles for any ducktype-creator, and the arrayprint code is already meant to generalize easily for subclasses and new dtypes. As demonstrated here, it already pretty much works!

In fact, last time we looked at overhauling MaskedArray, the printing machinery is one of the first things we looked at, and @eric-wieser has a branch somewhere which starts to generalize arrayprint.

ahaldane · 2018-10-12T21:19:41Z

I should add, I worked out some of the ducktype requirements of array print (with my modification).

You need to implement appropriate .shape, .dtype, and .size attributes, and then you need to support numpy-style nd-indexing (with some tricky parts related to 0d arrays).

shoyer · 2018-10-12T21:28:43Z

OK, I will trust you on the interface here; I don't disagree with you here :). In that case let me refer you to this section of the NEP.

Would your duck array be a NEP 22 "full" or "partial" duck array?

ahaldane · 2018-10-12T21:58:03Z

Reading those sections, I can see there are a lot of complicated decisions to make! (I'm not sure if ArrayCollection would aim to be a full or partial duckarray.)

Whether or not we allow partial ducktypes, or whether or not we define a "core" api, it seems that we could at least document for particular subcomponents (like arrayprint) what the ducktype requirements or guarantees are. Different components of numpy might have different ducktype requirements.

So, for arrayprint, I am imagining we would explain somewhere that you need the dtype/size/shape attributes and corresponding ndindexing behavior, and no more. That list is short enough that this dream seems maybe feasible.

Anyway, I will continue playing with ArrayCollection, and maybe also a MaskedArray ducktype if anyone is interested. The MaskedArray ducktype will want to use the same changes to arrayprint.py as I needed for ArrayCollection.

ENH: __array_function__ support for most of numpy.core

607842a

With the notable exceptions of np.einsum and np.block. xref GH12028

shoyer mentioned this pull request Oct 8, 2018

Tracking issue for implementation of NEP-18 (__array_function__) #12028

Closed

33 tasks

shoyer added the component: __array_function__ label Oct 8, 2018

charris added 01 - Enhancement component: numpy._core labels Oct 8, 2018

mattip reviewed Oct 8, 2018

View reviewed changes

hameerabbasi approved these changes Oct 9, 2018

View reviewed changes

mhvk approved these changes Oct 11, 2018

View reviewed changes

charris merged commit eb2bd11 into numpy:master Oct 11, 2018

shoyer deleted the array-function-numpy-core branch October 12, 2018 20:54

shoyer mentioned this pull request Oct 26, 2018

BUG? hstack/vstack/column_stack no longer accept generators #12263

Closed

charris changed the title ~~ENH: __array_function__ support for most of numpy.core~~ ENH: __array_function__ support for most of numpy.core Nov 10, 2018

		return (a,)


		@array_function_dispatch(_array2string_dispatcher)

ENH: __array_function__ support for most of numpy.core #12115

ENH: __array_function__ support for most of numpy.core #12115

Conversation

shoyer commented Oct 8, 2018

shoyer commented Oct 8, 2018 • edited Loading

charris commented Oct 8, 2018

charris commented Oct 8, 2018

shoyer commented Oct 8, 2018

mattip Oct 8, 2018

Choose a reason for hiding this comment

shoyer Oct 9, 2018 • edited Loading

Choose a reason for hiding this comment

mattip Oct 9, 2018 • edited Loading

Choose a reason for hiding this comment

shoyer Oct 9, 2018

Choose a reason for hiding this comment

shoyer Oct 9, 2018

Choose a reason for hiding this comment

hameerabbasi left a comment

Choose a reason for hiding this comment

shoyer commented Oct 9, 2018

hameerabbasi commented Oct 9, 2018

charris commented Oct 9, 2018

shoyer commented Oct 9, 2018

shoyer commented Oct 9, 2018

shoyer commented Oct 9, 2018 • edited Loading

shoyer commented Oct 11, 2018

charris commented Oct 11, 2018

shoyer commented Oct 11, 2018

mhvk left a comment

Choose a reason for hiding this comment

charris commented Oct 11, 2018

ahaldane commented Oct 12, 2018 • edited Loading

Example

Modified Api?

hameerabbasi commented Oct 12, 2018

mhvk commented Oct 12, 2018

hameerabbasi commented Oct 12, 2018

ahaldane commented Oct 12, 2018 • edited Loading

hameerabbasi commented Oct 12, 2018

shoyer commented Oct 12, 2018 • edited Loading

shoyer commented Oct 12, 2018

ahaldane commented Oct 12, 2018

shoyer commented Oct 12, 2018

mhvk commented Oct 12, 2018

shoyer commented Oct 12, 2018

ahaldane commented Oct 12, 2018 • edited Loading

ahaldane commented Oct 12, 2018

shoyer commented Oct 12, 2018 • edited Loading

ahaldane commented Oct 12, 2018

ENH: `__array_function__` support for most of `numpy.core` #12115

ENH: `__array_function__` support for most of `numpy.core` #12115

shoyer commented Oct 8, 2018 •

edited

Loading

shoyer Oct 9, 2018 •

edited

Loading

mattip Oct 9, 2018 •

edited

Loading

shoyer commented Oct 9, 2018 •

edited

Loading

ahaldane commented Oct 12, 2018 •

edited

Loading

ahaldane commented Oct 12, 2018 •

edited

Loading

shoyer commented Oct 12, 2018 •

edited

Loading

ahaldane commented Oct 12, 2018 •

edited

Loading

shoyer commented Oct 12, 2018 •

edited

Loading