Skip to content

ENH: port np.core.overrides to C for speed #12317

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
Dec 20, 2018
Merged

Conversation

shoyer
Copy link
Member

@shoyer shoyer commented Nov 3, 2018

TL,DR: This significantly speeds up dispatch for __array_function__, reducing the overhead down to about 0.7 microseconds for typical cases (vs 2-3 microseconds currently with a pure Python implementation). For functions that handle many arguments (like concatenate), this reduces the overhead of dispatching by something like 100x, i.e., from "makes concatenate 4x slower" to "barely unnoticeable".

Original post:


Still needs:

  • documentation for the C functions
  • actually using the C functions in np.core.overrides
  • some way to test get_overloaded_types_and_args directly (a Python wrapper for testing?)

Currently NumPy doesn't even import properly after I build it with this change. Hopefully I'm doing something obviously wrong!

Original error was: dlopen(/Users/shoyer/dev/numpy/build/testenv/lib/python3.7/site-packages/numpy/core/_multiarray_umath.cpython-37m-darwin.so, 2): Symbol not found: _array_function_implementation_or_override
  Referenced from: /Users/shoyer/dev/numpy/build/testenv/lib/python3.7/site-packages/numpy/core/_multiarray_umath.cpython-37m-darwin.so
  Expected in: flat namespace
 in /Users/shoyer/dev/numpy/build/testenv/lib/python3.7/site-packages/numpy/core/_multiarray_umath.cpython-37m-darwin.so

Also: I expect it's obvious, but I'm pretty new to C and Python's C-API. I expect there are lots of ways this PR could be improved.

@shoyer
Copy link
Member Author

shoyer commented Nov 3, 2018

Oops... I forgot to add arrayfunction_override.c to setup.py!

@shoyer
Copy link
Member Author

shoyer commented Nov 3, 2018

I've made some progress -- NumPy builds and imports, but running the tests in numpy/core/tests/test_overrides.py segfaults.

I'm currently trying to get gdb and python configured so I can get a stacktrace. Currently I see:

(gdb) backtrace
#0  0x000000010485aa8d in ?? ()
#1  0x0000000108bbfe58 in ?? ()
#2  0x0000000000000030 in ?? ()
#3  0x00007ffeefbf7540 in ?? ()
#4  0x0000000100143251 in ?? ()
#5  0x0000000000000000 in ?? ()

@charris
Copy link
Member

charris commented Nov 3, 2018

This seems a bit premature. Is there an actual performance benefit from doing this?

@shoyer
Copy link
Member Author

shoyer commented Nov 3, 2018

This seems a bit premature. Is there an actual performance benefit from doing this?

Well, it doesn't work yet so I haven't been able to benchmark how much it helps yet. But checking for overrides definitely adds some overhead, and it's now on every single NumPy function call, e.g.,

In [3]: x = np.arange(10)

In [4]: %timeit np.sum(x)
5.53 µs ± 56.9 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)

In [5]: %timeit x.sum()
1.83 µs ± 11.7 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)

In [6]: %timeit np.sum.__wrapped__(x)
2.68 µs ± 40.4 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)

In [9]: %prun [np.sum(x) for _ in range(100000)]
         1500004 function calls in 0.907 seconds

   Ordered by: internal time

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
   100000    0.197    0.000    0.339    0.000 overrides.py:15(get_overloaded_types_and_args)
   100000    0.175    0.000    0.175    0.000 {method 'reduce' of 'numpy.ufunc' objects}
   100000    0.095    0.000    0.394    0.000 fromnumeric.py:1968(sum)
   100000    0.094    0.000    0.282    0.000 fromnumeric.py:69(_wrapreduction)
   200000    0.073    0.000    0.073    0.000 {built-in method builtins.hasattr}
   100000    0.057    0.000    0.858    0.000 overrides.py:148(public_api)
   100000    0.056    0.000    0.789    0.000 overrides.py:64(array_function_implementation_or_override)
        1    0.048    0.048    0.906    0.906 <string>:1(<listcomp>)
   100000    0.026    0.000    0.026    0.000 overrides.py:57(<listcomp>)
   100000    0.021    0.000    0.021    0.000 {method 'insert' of 'list' objects}
   100000    0.017    0.000    0.017    0.000 {built-in method builtins.isinstance}
   100000    0.013    0.000    0.013    0.000 {method 'append' of 'list' objects}
   100000    0.012    0.000    0.012    0.000 {method 'items' of 'dict' objects}
   100000    0.011    0.000    0.011    0.000 fromnumeric.py:1963(_sum_dispatcher)
   100000    0.010    0.000    0.010    0.000 {built-in method builtins.len}
        1    0.001    0.001    0.907    0.907 <string>:1(<module>)
        1    0.000    0.000    0.907    0.907 {built-in method builtins.exec}
        1    0.000    0.000    0.000    0.000 {method 'disable' of '_lsprof.Profiler' objects}

I'll check the ASV suite, but I suspect it would be a meaningful performance win if we shave off 1-2 us of overhead on every NumPy operation. Typical NumPy operations take ~10 us on small arrays.

@pv
Copy link
Member

pv commented Nov 3, 2018

asv shows regressions on array_ufunc in a number of placs https://pv.github.io/numpy-bench/#regressions

@shoyer
Copy link
Member Author

shoyer commented Nov 3, 2018

Here's a spreadsheet with all the ASV changes (from @pv's site) that I identified as due to __array_function__:
https://docs.google.com/spreadsheets/d/15-AFI_cmZqfkU6mo2p1znsQF2E52PEXpF68QqYqEar4/edit#gid=0

@shoyer
Copy link
Member Author

shoyer commented Nov 5, 2018

OK, I managed to debug this. For future reference, I wasn't able to get gdb working on OS X, but lldb worked pretty well.

It still needs docs and a bit of cleanup but it's in good enough shape to run benchmarks and for anyone interested to take a look at the implementation. Here are some highlights.

The general override benchmarks are only about 10% faster for functions that handle a single argument, but 3x faster for functions given a long list of arrays:

## on master
bench_overrides.ArrayFunction.time_mock_broadcast_to_duck    3.47±0.07μs
bench_overrides.ArrayFunction.time_mock_broadcast_to_numpy   2.46±0.03μs
bench_overrides.ArrayFunction.time_mock_concatenate_duck     4.24±0.08μs
bench_overrides.ArrayFunction.time_mock_concatenate_many         271±9μs
bench_overrides.ArrayFunction.time_mock_concatenate_mixed     6.52±0.1μs
bench_overrides.ArrayFunction.time_mock_concatenate_numpy    3.09±0.01μs
## with this PR
bench_overrides.ArrayFunction.time_mock_broadcast_to_duck    2.78±0.02μs
bench_overrides.ArrayFunction.time_mock_broadcast_to_numpy   2.22±0.05μs
bench_overrides.ArrayFunction.time_mock_concatenate_duck     3.53±0.09μs
bench_overrides.ArrayFunction.time_mock_concatenate_many        74.1±1μs
bench_overrides.ArrayFunction.time_mock_concatenate_mixed     4.37±0.2μs
bench_overrides.ArrayFunction.time_mock_concatenate_numpy    2.71±0.02μs

For the hstack in particular, I see:

l = [np.arange(1000), np.arange(1000)]
%timeit np.hstack(l)
# no overrides: 4.04 µs ± 85.2 ns per loop
# master: 10.6 µs ± 108 ns per loop
# this PR: 8.94 µs ± 97.7 ns per loop
# without nested dispatch: 5.95 µs ± 241 ns per loop

"With nested dispatch" refers to manually adjusting hstack() to call atleast_1d.__wrapped__ and concatenate.__wrapped__, and thus avoid re-invoking __array_function__ dispatching. This might be a good option for speeding up NumPy functions that invoke other NumPy functions -- but if so, we should also consider exposing __wrapped__ (or maybe another attribute, such as implementation) as part of NumPy's public interface for the benefit of third-party libraries that also want the speed improvement.

@shoyer
Copy link
Member Author

shoyer commented Dec 3, 2018

Thanks to some performance tips from @eric-wieser, this latest version is working much faster.

Pure Python implementation:

$ asv run -b bench_override -E existing -a sample_time=0.5
bench_overrides.ArrayFunction.time_mock_broadcast_to_duck    3.57±0.04μs
bench_overrides.ArrayFunction.time_mock_broadcast_to_numpy   1.96±0.03μs
bench_overrides.ArrayFunction.time_mock_concatenate_duck     3.82±0.08μs
bench_overrides.ArrayFunction.time_mock_concatenate_many         183±3μs
bench_overrides.ArrayFunction.time_mock_concatenate_mixed     7.25±0.2μs
bench_overrides.ArrayFunction.time_mock_concatenate_numpy    2.32±0.01μs

C implementation:

$ asv run -b bench_override -E existing -a sample_time=0.5
bench_overrides.ArrayFunction.time_mock_broadcast_to_duck    2.64±0.05μs
bench_overrides.ArrayFunction.time_mock_broadcast_to_numpy   1.45±0.03μs
bench_overrides.ArrayFunction.time_mock_concatenate_duck     2.65±0.03μs
bench_overrides.ArrayFunction.time_mock_concatenate_many     2.72±0.03μs
bench_overrides.ArrayFunction.time_mock_concatenate_mixed     4.83±0.2μs
bench_overrides.ArrayFunction.time_mock_concatenate_numpy    1.49±0.01μs

These should be compared to my previous results in #12317 (comment)

And here are the results for the hstack(l) example:

l = [np.arange(1000), np.arange(1000)]
%timeit np.hstack(l)
# no overrides: 4.04 µs ± 85.2 ns per loop
# master: 10.6 µs ± 108 ns per loop
# this PR, Python version: 8.05 µs ± 223 ns per loop
# this PR, C version: 6.74 µs ± 181 ns per loop
# without nested dispatch: 4.69 µs ± 211 ns per loop

The bottom line is that overhead for the typical case of checking a single argument that doesn't have an override is now in the range of ~0.7 us, about the level of a single function call. For the typical case of case of no overrides with only a few arguments, the C version is about 30% faster than the Python version. For functions like concatenate() with a large number of array arguments (e.g., 1000), the overhead is something like 100x smaller to the point where it doesn't have a noticeable performance impact (previously it was a 4x slow-down).

@shoyer
Copy link
Member Author

shoyer commented Dec 3, 2018

Rewriting ndarray.__array_function__ in C also helps significantly for cases where we call ndarray.__array_function__ because other overrides exist (i.e., time_mock_concatenate_mixed):

bench_overrides.ArrayFunction.time_mock_broadcast_to_duck     2.59±0.07μs
bench_overrides.ArrayFunction.time_mock_broadcast_to_numpy    1.48±0.06μs
bench_overrides.ArrayFunction.time_mock_concatenate_duck      2.63±0.02μs
bench_overrides.ArrayFunction.time_mock_concatenate_many      2.84±0.04μs
bench_overrides.ArrayFunction.time_mock_concatenate_mixed     3.42±0.01μs
bench_overrides.ArrayFunction.time_mock_concatenate_numpy     1.55±0.01μs

@shoyer
Copy link
Member Author

shoyer commented Dec 4, 2018

This is ready for someone else to weigh in (maybe @mhvk ?).

I think this is in a reasonable state to review, but there's at least one major design decision to make: Should we put an upper limit (e.g., NPY_MAXARGS) on the number of __array_function__ overrides that can be called by a single NumPy function?

  • Pros:
    • We could improve performance by a small but measurable amount when overrides are used, since we could allocate overrides on the stack instead of the heap (currently in a list).
    • We could definitely store __array_function__ methods without any overhead (currently we lookup __array_function__ attributes multiple times, which is a little expensive since it involves dict lookup on the class object).
    • The implementation of array function overrides could more closely mirror that of ufunc overrides -- and maybe even share some utility functions.
  • Cons:
    • This would limit functions like np.concatenate() to handling a maximum of 32 distinct types that implement __array_function__. (Would this ever actually come up?)
    • We would still need to build a Python list of arguments in some cases (e.g., for error messages), so our code gets a bit messier.

@mhvk
Copy link
Contributor

mhvk commented Dec 4, 2018

I had a bit of a look earlier today and the code looked ... "oddly familiar". Which of course is great! I had noted the many times we are getting __array_function__ - this is expensive (at least it was for ufuncs) so carrying this information around would seem a good idea.

On the number of arguments, I think using MAX_ARGS is reasonable, especially if it helps get some of the code of __array_function__ and __array_ufunc__.

Will try to look in more detail soon.

@shoyer
Copy link
Member Author

shoyer commented Dec 4, 2018

OK, I rewrote in terms of static arrays. That does show some improvement for cases where there are duck arrays:

bench_overrides.ArrayFunction.time_mock_broadcast_to_duck      2.14±0.1μs
bench_overrides.ArrayFunction.time_mock_broadcast_to_numpy    1.37±0.05μs
bench_overrides.ArrayFunction.time_mock_concatenate_duck      2.25±0.02μs
bench_overrides.ArrayFunction.time_mock_concatenate_many      2.88±0.07μs
bench_overrides.ArrayFunction.time_mock_concatenate_mixed      2.68±0.1μs
bench_overrides.ArrayFunction.time_mock_concatenate_numpy     1.59±0.05μs

@shoyer
Copy link
Member Author

shoyer commented Dec 4, 2018

Tests are passing, except for Linux_Python_36_32bit_full on Azure, which fails with Bash exited with code '139'. I'm guessing I have a bug somewhere...

@shoyer
Copy link
Member Author

shoyer commented Dec 5, 2018

I think I fixed my bug(s) -- tests are passing now (no more out of bounds memory access).

@shoyer
Copy link
Member Author

shoyer commented Dec 5, 2018

One issue worth discussing: I've implemented a different approach here for handling the order in which to call subclasses than what is currently used in ufunc overrides:

  • For ufuncs, we collect __array_ufunc__ methods for all unique types in order of appearance. Then we use a triply nested loop (while/for/for) for calling methods that iterates through arguments in order, discarding arguments if they have a subclass to the right.
  • Here I collect __array_function__ arguments in order, by checking for super classes when we collect the types and inserting arguments/methods in the correct order. Then the methods can simply be called in order.

I like my approach for two reasons:

  1. I'm more confident in its asymptotic runtime performance. I know that it runs in time O(N^2), where N is the number of unique types. In contrast, I think the while loop means ufunc method calling might take time O(N^3) in the worst case. (But to be honest, I don't understand why the outer while loop is necessary at all -- couldn't you call __array_ufunc__ methods inside the "Choose an overriding argument" loop?)
  2. It's a little easier to test, because I can write a wrapper function (_get_implementing_args()) which returns arguments to call in order.

Of course, asymptotic complexity isn't really relevant here -- the typical scenario is have only 1-2 arguments that implement overrides. I don't know which approach is faster in practice -- this should probably be benchmarked. Either way, we should probably use the same approach in both places.

It would also be ideal to consolidate actual logic between these overrides, but in practice there are enough minor differences (e.g., in the form of the input arguments) that this may be tricky to do while preserving maximum performance.

@mhvk
Copy link
Contributor

mhvk commented Dec 5, 2018

@shoyer - at least in the abstract, I agree that your approach is better and I think it makes sense to do the same for __array_ufunc__, and at least the collection is very similar so can be shared (indeed your "wrapper" function might even be something we would eventually want to expose).

@shoyer shoyer changed the title WIP: port np.core.overrides to C ENH: port np.core.overrides to C for speed Dec 5, 2018
@shoyer
Copy link
Member Author

shoyer commented Dec 5, 2018

at least the collection is very similar so can be shared

The differences I noticed:

  • __array_ufunc__ needs to collect arguments from both inputs and out vs. only from relevant_args (but perhaps __array_ufunc__ can simply call the collection function twice, passing in a non-zero value for num_implementing_args?)
  • __array_ufunc__ currently checks for __array_function__ = None in collection (but this could be moved into the calling loop).
  • __array_function__ needs to check for exceeding the max number of arguments (but the overhead here is pretty negligible).

Of course, they also collect different methods (__array_ufunc__ vs __array_function__), which I guess we could handle with function pointers.

Copy link
Contributor

@mhvk mhvk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks great! My only slightly larger comment is about the need to check first for any overrides, iterating over all arguments, before doing essentially the same in actually making a list of overrides. Indeed, since in the latter case you check a given class only once, I think it may be faster to just use it directly.

@mhvk
Copy link
Contributor

mhvk commented Dec 5, 2018

@shoyer - my attempts to make __array_ufunc__ faster included a merger of inputs and outputs into a single list - which would make the collection stage (even) more similar. As for checking for None, that is indeed rare enough it might as well be done in the execution loop (though also such small overhead that one could consider it having meaning for __array_function__ as well).

@shoyer
Copy link
Member Author

shoyer commented Dec 5, 2018

my attempts to make array_ufunc faster included a merger of inputs and outputs into a single list - which would make the collection stage (even) more similar

Interesting -- did this actually speed things up? I guess it could be equivalent in the typical case of no outputs, but if you have to allocate another list my intuition is that this would be slower

@mhvk
Copy link
Contributor

mhvk commented Dec 5, 2018

Looking back at #11372, I see I only allocated a new tuple if output arguments were actually present. The cost was offset by avoiding the need to search for the out argument in kwds multiple times.

@shoyer
Copy link
Member Author

shoyer commented Dec 5, 2018

As for checking for None, that is indeed rare enough it might as well be done in the execution loop (though also such small overhead that one could consider it having meaning for __array_function__ as well).

I suppose this could indeed be a reasonable shortcut to support.

Some questions to think about:

  1. Are there use-cases for introspecting arguments to see if they support __array_function__?
    • Yes: A library like xarray or dask that consumes NumPy's high level API could raise an error or coerce inputs that don't define __array_function__.
  2. Are the use-cases for distinguishing between __array_function__ = None and simply omitting __array_function__?
    • Maybe: This lets a class opt-out of NumPy's high level functions while retaining the ability to be explicitly coerced to a NumPy array.
  3. Should __array_function__ = None be defined as equivalent to defining an __array_function__ method that always returns NotImplemented, or a method that always raises TypeError? (Note that in practice, these are very rarely different)
    • TypeError would be most similar to __array_ufunc__.
    • NotImplemented would be most similar to omitting the __array_function__ method entirely.

I'm struggling to think of real use cases for (2), especially given the current state of affairs in the broader ecosystem where casting function inputs to NumPy arrays with asarray() is ubiquitous. For ufuncs, we needed it because the default behavior (omitting __array_ufunc__) is to "vectorize" arithmetic over scalars, and sometimes you don't want that.

@mhvk
Copy link
Contributor

mhvk commented Dec 5, 2018

__array_ufunc__ = None could also be changed to imply returning NotImplemented; I actually think that makes more sense.

Here, maybe good to remember that what was merged was my adaptation of the earlier __numpy_ufunc__ work and at the time, to have any hope of not rekindling an ultra-long discussion, it was important to change as little as possible. But we explicitly decided and stated that the implementation was experimental, so we can and should feel free to change details like these.

@mattip
Copy link
Member

mattip commented Dec 19, 2018

Maybe this would work better as two PRs - one with the documentation and pure python changes, including the benchmarks, and another with only replacing the python implementation with an equivalent C-based one.

@mattip
Copy link
Member

mattip commented Dec 19, 2018

In any case, I would prefer the PR not remove the pure python implementation

@mhvk
Copy link
Contributor

mhvk commented Dec 19, 2018

@mattip - I'm not sure there is much point in keeping the python implementation around - it will only get out of sync with the C implementation. If we really want to keep it for inspection in a way that is easier than digging through git, perhaps it should be as an addendum to the NEP? That makes it more obvious that it is not something that is necessarily exactly up to date with the real implementation. But that can be done as a separate PR (though it may suggest not to make any changes to the NEP here).

For the present code, I think we should merge it rather than worry about further small changes. This in part because 1.16 is waiting for this, but also because it would be really nice next to factor out common parts with __array_ufunc__ (which has a less nice and readable implementation), and that will likely inform further small optimizations.

@mattip
Copy link
Member

mattip commented Dec 19, 2018

@mhvk understood. It would be nice if when squashing this down it become three commits: the documentation and tests, the changes to the python implementation, and the new C implementation/python implementation removal. That way it will make it easier for a future refactor if we decide to ever revive the pure python version. I can do that once the code is ready to be merged.

@shoyer
Copy link
Member Author

shoyer commented Dec 19, 2018

OK, assuming the review process is done, let me see I can rewrite history here.

@shoyer
Copy link
Member Author

shoyer commented Dec 19, 2018

I can't easily split up the tests/refactor of the pure Python version, so I'm going to stick with two commits: (1) pure Python fixes and (2) C implementation / Python removal.

@shoyer shoyer force-pushed the array-function-c branch 2 times, most recently from 0696e66 to db6e223 Compare December 19, 2018 18:44
@shoyer
Copy link
Member Author

shoyer commented Dec 19, 2018

Note that the pure Python implementation from the first commit is correct, but the implementation does not exactly match the C version, and there may very well be a performance regression compared to what we currently have on master. But it may still be a good starting point for a pypy.

@shoyer
Copy link
Member Author

shoyer commented Dec 20, 2018

OK, I think this is ready to merge? There are two commits, one for Python/doc/test changes and one for switching to the C implementation. Both pass our test suite.

@mhvk
Copy link
Contributor

mhvk commented Dec 20, 2018

Had another look and agree this should now just go in, so merging! cc @charris, since I think this was one of the main items 1.16 was waiting for.

@mhvk mhvk merged commit f4c497c into numpy:master Dec 20, 2018
@mhvk
Copy link
Contributor

mhvk commented Dec 20, 2018

p.s. Thanks, @shoyer, this is really nice!

@charris
Copy link
Member

charris commented Dec 20, 2018

since I think this was one of the main items 1.16 was waiting for.

Last time I asked, Stephan said that python version was good for 1.16, so I wasn't waiting on the C version. Now that the first 1.16.0 rc is out I would prefer to leave it with the python version. We can backport later if that seems reasonable.

@shoyer shoyer deleted the array-function-c branch December 20, 2018 17:51
@mhvk
Copy link
Contributor

mhvk commented Dec 20, 2018

@charris - I think the python version is indeed fine, but would recommend backporting so that the C version also gets some testing by those interested in trying; it also means we'll get more useful complaints about possible performance regressions. Whether to backport for rc2 and/or 1.16.0 or for 1.16.1, I'll leave up to you.

@mhvk
Copy link
Contributor

mhvk commented Dec 20, 2018

p.s. Actually, people will presumably test on master before complaining too much, so really it doesn't matter that much.

@charris
Copy link
Member

charris commented Dec 20, 2018

I don't think there would be much risk in backporting, the feature is new and currently unused. However, I'd like to keep the number of potential problems from getting any bigger and I'm already expecting trouble (knock, knock). Let's give it a micro release and go from there.

@charris charris added the 09 - Backport-Candidate PRs tagged should be backported label Dec 20, 2018
@charris charris added this to the 1.16.1 release milestone Dec 20, 2018
@charris
Copy link
Member

charris commented Dec 21, 2018

@shoyer How do you propose to handle functions that transition to ufuncs? I ask here because it is convenient, and because it looks like we will need to deal with that. If there is much discussion of the topic we may want to open an issue and update the NEP.

EDIT: @eric-wieser .

@shoyer
Copy link
Member Author

shoyer commented Dec 21, 2018

@charris We did actually address this explicitly in the NEP:

See the warning at the top: http://www.numpy.org/neps/nep-0018-array-function-protocol.html#implementation

Warning The __array_function__ protocol, and its use on particular functions, is experimental. We plan to retain an interface that makes it possible to override NumPy functions, but the way to do so for particular functions can and will change with little warning. If such reduced backwards compatibility guarantees are not accepted to you, do not rely upon overrides of NumPy functions for non-NumPy arrays. See “Non-goals” below for more details.

And below under "non-goals": http://www.numpy.org/neps/nep-0018-array-function-protocol.html#non-goals

We also expect that the mechanism for overriding specific functions that will initially use the __array_function__ protocol can and will change in the future. As a concrete example of how we expect to break behavior in the future, some functions such as np.where are currently not NumPy universal functions, but conceivably could become universal functions in the future. When/if this happens, we will change such overloads from using __array_function__ to the more specialized __array_ufunc__.

More concretely, we can do this with a (possibly abbreviated) deprecation cycle:

  1. Look for an __array_ufunc__ implementation.
  2. Look for an __array_function__ implementation. If one is found, use it but issue a FutureWarning.
  3. Fall back to NumPy's implementation of the ufunc.

This way, a third-party library can retain their use of __array_function__ for compatibility with old versions of NumPy while switching to __array_ufunc__ for new versions of NumPy. After the deprecation cycle is complete, we remove step 2 to eliminate the overhead of a second round of override checks.

Actually, now that I write this down maybe it is also worth putting in the NEP, just so people know will happen.

This is probably worth implementing in the near-ish term as part of the __array_ufunc__ refactor to make it work more similarly to what we have here for __array_function__.

@charris charris added 03 - Maintenance and removed 09 - Backport-Candidate PRs tagged should be backported 25 - WIP labels Jan 16, 2019
@charris charris removed this from the 1.16.1 release milestone Jan 16, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants