Skip to content

MAINT: Ufunc parsing super fast (work in progress) #11372

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 29 commits into from

Conversation

mhvk
Copy link
Contributor

@mhvk mhvk commented Jun 18, 2018

It is probably useful to see where I hope all my PRs lead to. Combining #11338, #11351, #11282 plus follow-up commits, the reduction is becoming quite impressive (speeds are: current master -> #11351 -> this PR.

import numpy as np
a = np.array(1)
b = np.empty_like(a)
%timeit np.positive(a)                                 # 352->348->294 ns
%timeit np.positive(a, subok=True)                     # 606->501->377 ns
%timeit np.positive(a, where=True)                     # 586->503->366 ns
%timeit np.positive(a, where=True, subok=True)         # 695->531->394 ns
%timeit np.positive(a, b)                              # 354->352->263 ns
%timeit np.positive(a, out=b)                          # 557->480->316 ns
%timeit np.positive(a, out=b, subok=True)              # 668->506->420 ns
%timeit np.positive(a, out=b, where=True, subok=True)  # 752->536->451 ns

The main plan I followed was to use interned strings instead of char, and to ensure things get done only once, in particular with output arguments: out is now resolved right at the start and treated as a positional argument thereafter. In the process, ufunc_full_args disappeared again, since the code can now count on all arguments being in one place.

Note: no need to try to comment on all of this at once - this is mostly here to show that the work is worth it, and that I have something to refer to when discussing particular bits.

p.s. To my dismay, just adding subok=True still costs about 85 ns: 40ns in python for creating kwargs, 20ns for looking for out when it isn't there, and about 25 ns in parsing it.

mhvk added 29 commits June 16, 2018 17:44
Overall, it likely doesn't matter much for performance, but it is
more logical and more consistent with what python does: reverse
operators are not called if the forward one of a given class
already returned NotImplemented.
Code was getting too convoluted and both can be optimized
in different ways.
Using the realization from inspecting similar CPython code that as we
intern string names anyway, comparing keys with possible names by
pointer will generally just work.

Also rather drastically rewrote, ot use simple conversion functions
instead of the enormous switch/case loop (except for "out").

Reduces the overhead as follows:
```
import numpy as np
a = np.array(1)
b = np.empty_like(a)
%timeit np.positive(a)                                 # 352->348 ns
%timeit np.positive(a, subok=True)                     # 606->501 ns
%timeit np.positive(a, where=True)                     # 586->503 ns
%timeit np.positive(a, where=True, subok=True)         # 695->531 ns
%timeit np.positive(a, b)                              # 354->352 ns
%timeit np.positive(a, out=b)                          # 557->480 ns
%timeit np.positive(a, out=b, subok=True)              # 668->506 ns
%timeit np.positive(a, out=b, where=True, subok=True)  # 752->536 ns
```
The argument parsing for ufuncs contains special logic to enable returning
NotImplemented for comparison functions.  The ufuncs, however, should
not have to know whether they were called via a python comparison operator,
so this PR moves the logic to the operators, checking after ufunc failed
whether a NotImplemented should be returned.
Could be together with simplifying array_wrap.
@eric-wieser
Copy link
Member

For comparison on your machine, can you show the timing for +a and a_builtin = 1; +a_builtin?

@mhvk
Copy link
Contributor Author

mhvk commented Jun 18, 2018

I think +a does not (yet?) go through np.positive, since it is much faster. But negative does (and np.negative gives the same timings as above). Built-in is of course far faster still (but, hey, I've not done anything yet to actual code, just the parsing of arguments!).

%%timeit
np.negative(a)  # 290 ns
%%timeit
-a  # 267 ns
negative = np.negative
%%timeit
negative(a)  # 260 ns
a_builtin = 1
%%timeit
-a_builtin  # 17.6 ns

@mhvk
Copy link
Contributor Author

mhvk commented Jun 18, 2018

One generic question is the extent to which we're willing to adapt the signature of PyUFunc_GenericFunction. In particular, it might make sense to adopt the METH_FASTCALL signature, i.e., PyObject **stack, Py_ssize_t nargs, PyObject *kwnames, where stack contains the nargs arguments as well as the values of the keywords with the names given in the kwnames tuple.
In principle, this could even be used for the reductions (but sadly not for __call__).

@mattip
Copy link
Member

mattip commented Jun 19, 2018

In particular, it might make sense to adopt the METH_FASTCALL

Since this function is part of the public API I would prefer to keep it simple.

@mhvk
Copy link
Contributor Author

mhvk commented Jun 19, 2018

Since this function is part of the public API I would prefer to keep it simple.

Yes. In fact, that also means that some of what I'm doing here is not quite right; I cannot assume that out is not present in kwargs. I might need to let this call an internal, private function and use that as well for ufunc_general_call. Anyway, clearly a benefit of showing it now! And another reason for proceeding in slow steps.

@shoyer
Copy link
Member

shoyer commented Dec 5, 2018

It looks like all these other PRs are merged -- can we close this?

@mhvk
Copy link
Contributor Author

mhvk commented Dec 5, 2018

This has quite a bit of follow-up commits, but I would definitely have to re-do those now that multiarray and umath are merged. So let me indeed close it here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants