Tracking issue for implementation of NEP-18 (__array_function__) #12028

shoyer · 2018-09-24T22:32:58Z

The text was updated successfully, but these errors were encountered:

mattip · 2018-10-07T14:12:47Z

It might be good to merge a preliminary "Decorate all public NumPy functions with @array_function_dispatch" for some high-profile functions and request downstream consumers of the protocol to try it out

shoyer · 2018-10-07T17:13:21Z

Once we merge #12099 I have another PR ready that will add dispatch decorators for most of numpy.core. It will be pretty easy to finish things up -- this one took less than an hour to put together.

shoyer · 2018-10-08T16:05:48Z

cc @eric-wieser @mrocklin @mhvk @hameerabbasi

shoyer · 2018-10-08T16:20:24Z

See https://github.com/shoyer/numpy/tree/array-function-easy-impl for my branch implementing all the "easy" overrides on functions with Python wrappers. The leftover parts are np.block, np.einsum and a handful of multiarray functions written entirely in C (e.g., np.concatenate). I'll split this into a bunch of PRs once we're done with #12099.

Note that I haven't written tests for overrides on each individual function. I'd like to add a few integration tests when we're done (e.g., a duck array that logs all applied operations), but I don't think it would be productive to write dispatching tests for each individual function. The checks in #12099 should catch the most common errors on dispatchers, and every line of code in dispatcher functions should get executed by existing tests.

mhvk · 2018-10-08T17:13:31Z

@shoyer - on the tests, I agree that it is not particularly useful to write tests for each one; instead, within numpy, it may make most sense to start using the overrides relatively quickly in MaskedArray.

shoyer · 2018-10-08T17:17:11Z

@mhvk sounds good to me, though I'll let someone else who uses/knows MaskedArray take the lead on that.

shoyer · 2018-10-08T19:38:20Z

See #12115, #12116, #12119 and #12117 for PRs implementing __array_function__ support on functions defined in Python.

mhvk · 2018-10-11T19:19:12Z

@shoyer - seeing some of the implementations, I have two worries:

For some functions, like reshape, the original functionality already provided a way to override it, by defining a reshape method. We are effectively deprecating that for any class that defines __array_function__.
For other functions, like np.median, careful use of np.asanyarray and ufuncs ensured that subclasses could already use them. But that functionality can no longer be accessed directly.

I think overall these two things are probably benefits, since we simplify the interface and can make the implementations optimized for pure ndarray - though the latter suggests that ndarray.__array_function__ should take over converting lists, etc., to ndarray, so that the implementations can skip that part). Still, I thought I'd note it since it makes me dread implementing this for Quantity a bit more than I thought -- in terms of both the amount of work and the possible hit in performance.

shoyer · 2018-10-12T01:36:54Z

though the latter suggests that ndarray.array_function should take over converting lists, etc., to ndarray, so that the implementations can skip that part).

I'm not sure I follow here.

We are indeed effectively deprecating the old way of overriding functions like reshape and mean, though the old way does still support incomplete implementations of NumPy's API.

hameerabbasi · 2018-10-12T11:10:38Z

I'm not sure I follow here.

I think the issue is that if we implement __array_function__ for even a single function, the previous mechanisms break completely and there is no way to fail-over. Which is why I propose we revisit my NotImplementedButCoercible proposal.

mhvk · 2018-10-12T15:13:58Z

@hameerabbasi - yes, that is the problem. Though we need to be careful here how easy we make it to rely on duct-tape solutions that we would really rather get rid of... (which is why I wrote above that my "problems" may actually be benefits...). Maybe there is a case for trying as is in 1.16 and then deciding on actual experience whether we want to provide fall-back of "ignore my __array_function__ for this case".

hameerabbasi · 2018-10-12T15:51:06Z

Re: dispatcher styling: My preferences on style are based on memory/import time considerations, and verbosity. Quite simply, merge the dispatchers where the signature is likely to remain the same. This way, we create the least amount of objects and the cache hits will be higher too.

That said, I'm not too opposed to the lambda style.

shoyer · 2018-10-12T16:06:55Z

The style for writing dispatcher functions has now come up in a few PRs. It would good to make a consistent choice across NumPy.

We have a few options:

Option 1: Write a separate dispatcher for each function, e.g.,

def _sin_dispatcher(a):
    return (a,)


@array_function_dispatch(_sin_dispatcher)
def sin(a):
     ...


def _cos_dispatcher(a):
    return (a,)


@array_function_dispatch(_cos_dispatcher)
def cos(a):
    ...

Advantages:

Very readable
Easy to find definitions of dispatcher functions
Clear error message when you supply the wrong arguments, e.g., sin(x=1) -> TypeError: _sin_dispatcher() got an unexpected keyword argument 'x'.

Disadvantages:

Lots of repetition, even when many functions in a module have the exact same signature.

Option 2: Reuse dispatcher functions within a module, e.g.,

def _unary_dispatcher(a):
    return (a,)


@array_function_dispatch(_unary_dispatcher)
def sin(a):
     ...


@array_function_dispatch(_unary_dispatcher)
def cos(a):
    ...

Advantages:

Less repetition
Readable

Disadvantages:

Can be a little harder to find definitions of dispatcher functions
Slightly less clear error messages for bad arguments, e.g., sin(x=1) -> TypeError: _unary_dispatcher() got an unexpected keyword argument 'x'

Option 3: Use lambda functions when the dispatcher definition would fit on one line, e.g.,

# inline style (shorter)
@array_function_dispatch(lambda a: (a,))
def sin(a):
     ...


@array_function_dispatch(lambda a, n=None, axis=None, norm=None: (a,))
def fft(a, n=None, axis=-1, norm=None):
     ...

# multiline style (more readable?)
@array_function_dispatch(
    lambda a: (a,)
)
def sin(a):
     ...


@array_function_dispatch(
    lambda a, n=None, axis=None, norm=None: (a,)
)
def fft(a, n=None, axis=-1, norm=None):
     ...

Advantages:

No need to hunt for dispatcher definitions, they are right there.
Fewer number of characters and lines of codes.
Looks very nice for short cases (e.g., one argument), especially when the lambda is shorter than the function name.

Disadvantages:

More repeated code than option 2.
Looks pretty cluttered if there are more than a few arguments
Also has less clear error messages (TypeError: <lambda>() got an unexpected keyword argument 'x')

eric-wieser · 2018-10-12T16:13:20Z

Note that the error message issues can be fixed by reconstructing the code object, although that will come with some import-time cost. Perhaps worth investigating, and breaking out @nschloe's tuna to compare some options.

shoyer · 2018-10-12T16:19:04Z

Yep, the decorator module could also be used for generating function definition (it uses a slightly different approach for code generation, a little more like namedtuple in that it uses exec()).

mhvk · 2018-10-12T16:36:56Z

As long as the error is not solved, I think we need to stick to the options with a dispatcher which has a clear name. I'd slightly to bundle dispatchers together (2) for memory reasons, though would then keep the error message very much in mind, so would suggest calling the dispatcher something like _dispatch_on_x.

Though if we can change the error, things change. E.g., it might be as simple as catching exceptions, replacing <lambda> with the function name in the exception text, and then re-raising. (Or does that chain thing these days?)

hameerabbasi · 2018-10-12T16:40:57Z

I agree that the error message needs to be clear, ideally shouldn't change at all.

hameerabbasi · 2019-02-16T01:53:57Z

I think in order to keep current code working, the actual function must come after the dispatcher.

shoyer · 2019-02-16T02:17:45Z

Right, but we can give it the same name as the dispatcher. The dispatcher name will get overwritten.

saulshanabrook · 2019-04-03T16:45:36Z

It would great to be able to define custom dispatching for function like np.arange or np.empty.

I guess one option would be for to NumPy to dispatch on scalars as well as arrays. Is this incompatible with the NEP? Would anything break with this change?

shoyer · 2019-04-03T17:02:14Z

For discussion about np.arange, see #12379.

I don't see how np.empty() could do dispatching -- there's nothing to dispatch on, just a shape and a dtype. But certainly np.empty_like() could do dispatching with an overwritten shape -- that's exactly what #13046 is about supporting.

shoyer · 2019-04-03T17:06:19Z

Option 4: Write a separate dispatcher for each function, with the same name as the function:

Any objections to adopting this option? I think it's probably the friendliest choice from a user perspective.

saulshanabrook · 2019-04-03T17:09:58Z

I don't see how np.empty() could do dispatching -- there's nothing to dispatch on, just a shape and a dtype

You might want to dispatch on either of those. For example, here is a custom shape object, that we might want to dispatch differently on.

This example isn't very useful, but the idea is that I have a lazy object that behaves like shape, but doesn't return integers, it returns expressions. For example, it would nice to be able to do something like this:

class ExprShape:
    def __getitem__(self, i):
        return ('getitem', self, i)
    def __len__(self):
        return ('len', self)

numpy.empty(ExprShape())

Which would I would like to override to return something like ExprArray('empty', ExprShape()).

shoyer · 2019-04-03T18:34:00Z

Yes, in principle we could dispatch on shape, too. That would add additional complexity/overhead to the protocol. Do you have use-cases where using an array as a template (like empty_like with shape) would not suffice?

The other cases I can think of is the size argument to np.random.RandomState methods, but note that we currently don't support those at all -- see http://www.numpy.org/neps/nep-0018-array-function-protocol.html#callable-objects-generated-at-runtime

saulshanabrook · 2019-04-03T18:42:27Z

Do you have use-cases where using an array as a template (like empty_like with shape) would not suffice?

If we are taking an existing API that depends on NumPy and would like to transparently have it work on a different backend, without changing the existing source code.

For example, let's say we were attempting to call scipy.optimize.differential_evolution with NP like arrays, that build up a call graph instead of executing immediately.

You can see here it would be helpful if we could change np.full to create a symbolic array instead of a default numpy array, if the input passed into it was also symbolic.

shoyer · 2019-04-03T18:50:05Z

If we are taking an existing API that depends on NumPy and would like to transparently have it work on a different backend, without changing the existing source code.

This isn't possible in general. Explicit array construction like np.array() is definitely going to need to be rewritten to be compatible with duck typing.

In this case, switching energies = np.full(num_members, np.inf) to energies = np.full_like(population, np.inf, shape=num_members) seems like an easy and readable change.

saulshanabrook · 2019-04-03T19:10:14Z

This isn't possible in general. Explicit array construction like np.array() is definitely going to need to be rewritten to be compatible with duck typing.

Is there a proposal around making this sort of change or are you saying that supporting dispatching of np.array would be really hard and so we can't ever get to 100% support?

In this case, switching energies = np.full(num_members, np.inf) to energies = np.full_like(population, np.inf, shape=num_members) seems like an easy and readable change.

Definitely. But there are many cases where either you don't control the source code, or you want to support users in using the functions they know and love as much as possible.

There are other ways to provide users with that experience like:

Providing a new module that acts like numpy but behaves how you would like. Requires users to change their imports
Inspect the source to understand the behavior. ala numba or tangent.

Both of those options might be required in certain cases (like letting users call np.full and return a symbolic result currently), but if I understand correctly, the goal of NEP-18 is to try to limit when those are needed and let people use the original NumPy in more cases.

I understand there is a performance/complexity tradeoff here and that might be a good reason not to implement these. But it might force users to explore other means to get the flexibility they desire.

shoyer · 2019-04-03T21:02:07Z

Is there a proposal around making this sort of change or are you saying that supporting dispatching of np.array would be really hard and so we can't ever get to 100% support?

NEP 22 has some discussion of the options here. I don't think we can safely change the semantics of np.asarray() to return anything other than a numpy.ndarray object -- we will need a new protocol for this.

The problem is that np.asarray() is currently the idiomatic way of casting to a numpy array object, which uses can and do expect to exactly match numpy.ndarray, e.g., down to the memory layout.

There are certainly plenty of use-cases where this isn't the case, but switching this behavior would break lots of downstream code, so it's a non-starter. Downstream projects will need to opt-in to at least this aspect of array duck typing.

I understand there is a performance/complexity tradeoff here and that might be a good reason not to implement these. But it might force users to explore other means to get the flexibility they desire.

Yes. NEP 18 is not intended to be a complete solution for drop-in NumPy alternatives, but its a step in that direction.

shoyer · 2019-04-11T18:24:34Z

I've drafted a revision to NEP-18 for adding a __numpy_implementation__ attribute:
#13305

xref numpyGH-12028 Current behavior: >>> np.dot(None, None) Traceback (most recent call last): File "<stdin>", line 1, in <module> File "/Users/shoyer/dev/numpy/numpy/core/overrides.py", line 175, in public_api implementation, public_api, relevant_args, args, kwargs) TypeError: unsupported operand type(s) for *: 'NoneType' and 'NoneType' >>> np.stack([], invalid=True) Traceback (most recent call last): File "<stdin>", line 1, in <module> File "/Users/shoyer/dev/numpy/numpy/core/overrides.py", line 148, in public_api relevant_args = dispatcher(*args, **kwargs) TypeError: _stack_dispatcher() got an unexpected keyword argument 'invalid' With this change: >>> np.dot(None, None) Traceback (most recent call last): File "<stdin>", line 1, in <module> File "<string>", line 6, in dot TypeError: unsupported operand type(s) for *: 'NoneType' and 'NoneType' >>> np.stack([], invalid=True) Traceback (most recent call last): File "<stdin>", line 1, in <module> File "<string>", line 4, in stack TypeError: _stack_dispatcher() got an unexpected keyword argument 'invalid'

shoyer · 2019-05-19T21:20:06Z

It occurs to me that we forget to warp the functions in numpy.testing: #13588

I'm going to do that shortly...

xref numpyGH-13624, numpyGH-12028 TODO: update tests/CI for NUMPY_EXPERIMENTAL_ARRAY_FUNCTION=0

shoyer · 2019-05-26T23:24:29Z

There's one revision that I'd like to see to the NEP, specifically to clarify what guarantees NEP-18 offers to subclass authors: #13633

mattip · 2019-06-11T19:28:37Z

I marked the usability tasks complete since gh-13329 was fixed. We decidedgh-#13588 can wait till after the release of 1.17. That leaves documentation improvements and arange gh-12379 still open for inclusion in 1.17.

mhvk · 2019-06-12T00:29:00Z

There's also #13728 - a bug in the dispatcher for histogram[2d]d

rgommers · 2019-06-27T00:09:26Z

That leaves documentation improvements and arange gh-12379 still open for inclusion in 1.17.

An issue for documentation was missing, so I opened gh-13844. I think docs are a lot more important than the arange open issue.

mattip · 2019-08-18T18:03:10Z

@shoyer can we close this?

This was referenced Sep 24, 2018

ENH: initial implementation of core __array_function__ machinery #12005

Merged

MAINT: cleanup and better document core/overrides.py #12042

Merged

mattip mentioned this issue Oct 3, 2018

A catch-all protocol for numpy-like duck arrays #11129

Closed

This was referenced Oct 8, 2018

ENH: __array_function__ support for most of numpy.core #12115

Merged

ENH: __array_function__ support for np.lib, part 1/2 #12116

Merged

ENH: __array_function__ support for np.fft and np.linalg #12117

Merged

shoyer mentioned this issue Oct 8, 2018

ENH: __array_function__ support for np.lib, part 2/2 #12119

Merged

shoyer added the component: __array_function__ label Oct 8, 2018

shoyer mentioned this issue Oct 11, 2018

DOC: document NEP-18 overrides in release notes #12148

Merged

shoyer mentioned this issue Oct 12, 2018

ENH: __array_function__ for np.core.defchararray #12154

Merged

This comment has been minimized.

Sign in to view

This was referenced Oct 13, 2018

ENH: __array_function__ for np.einsum and np.block #12163

Merged

ENH: __array_function__ for multiarray functions #12175

Merged

dgasmith mentioned this issue Feb 20, 2019

NumPy Dispatch Mechanisms dgasmith/opt_einsum#46

Closed

shoyer mentioned this issue May 11, 2019

MAINT: Use exec() instead array_function_dispatch to improve tracebacks #13529

Merged

shoyer mentioned this issue May 19, 2019

__array_function__ overrides for functions in numpy.testing? #13588

Closed

shoyer mentioned this issue May 25, 2019

DOC: revert __skip_array_function__ from NEP-18 #13624

Merged

shoyer added a commit to shoyer/numpy that referenced this issue May 25, 2019

MAINT: revert __skip_array_function__ from NEP-18

766281d

xref numpyGH-13624, numpyGH-12028 TODO: update tests/CI for NUMPY_EXPERIMENTAL_ARRAY_FUNCTION=0

This was referenced May 25, 2019

MAINT: revert __skip_array_function__ from NEP-18 #13627

Merged

DOC: caution against relying upon NumPy's implementation in subclasses #13633

Merged

rgommers mentioned this issue Jun 27, 2019

high-level documentation for __array_function__ missing #13844

Closed

2 tasks

shoyer closed this as completed Aug 18, 2019

jpivarski mentioned this issue Sep 27, 2019

Can't convert ChunkedArray from a lazy array with a selection to a numpy array scikit-hep/uproot3#323

Closed

Tracking issue for implementation of NEP-18 (__array_function__) #12028

Tracking issue for implementation of NEP-18 (__array_function__) #12028

Comments

shoyer commented Sep 24, 2018 • edited by mattip Loading

mattip commented Oct 7, 2018

shoyer commented Oct 7, 2018

shoyer commented Oct 8, 2018

shoyer commented Oct 8, 2018 • edited Loading

mhvk commented Oct 8, 2018

shoyer commented Oct 8, 2018

shoyer commented Oct 8, 2018 • edited Loading

mhvk commented Oct 11, 2018

shoyer commented Oct 12, 2018

hameerabbasi commented Oct 12, 2018

mhvk commented Oct 12, 2018

hameerabbasi commented Oct 12, 2018

shoyer commented Oct 12, 2018 • edited by eric-wieser Loading

This comment has been minimized.

eric-wieser commented Oct 12, 2018

shoyer commented Oct 12, 2018

mhvk commented Oct 12, 2018

hameerabbasi commented Oct 12, 2018

hameerabbasi commented Feb 16, 2019

shoyer commented Feb 16, 2019

saulshanabrook commented Apr 3, 2019

shoyer commented Apr 3, 2019

shoyer commented Apr 3, 2019

saulshanabrook commented Apr 3, 2019 • edited Loading

shoyer commented Apr 3, 2019

saulshanabrook commented Apr 3, 2019

shoyer commented Apr 3, 2019

saulshanabrook commented Apr 3, 2019 • edited Loading

shoyer commented Apr 3, 2019 • edited Loading

shoyer commented Apr 11, 2019

shoyer commented May 19, 2019

shoyer commented May 26, 2019

mattip commented Jun 11, 2019

mhvk commented Jun 12, 2019

rgommers commented Jun 27, 2019

mattip commented Aug 18, 2019

shoyer commented Sep 24, 2018 •

edited by mattip

Loading

shoyer commented Oct 8, 2018 •

edited

Loading

shoyer commented Oct 8, 2018 •

edited

Loading

shoyer commented Oct 12, 2018 •

edited by eric-wieser

Loading

saulshanabrook commented Apr 3, 2019 •

edited

Loading

saulshanabrook commented Apr 3, 2019 •

edited

Loading

shoyer commented Apr 3, 2019 •

edited

Loading