-
-
Notifications
You must be signed in to change notification settings - Fork 11.1k
MAINT: Ufunc parsing super fast (work in progress) #11372
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Overall, it likely doesn't matter much for performance, but it is more logical and more consistent with what python does: reverse operators are not called if the forward one of a given class already returned NotImplemented.
Code was getting too convoluted and both can be optimized in different ways.
Using the realization from inspecting similar CPython code that as we intern string names anyway, comparing keys with possible names by pointer will generally just work. Also rather drastically rewrote, ot use simple conversion functions instead of the enormous switch/case loop (except for "out"). Reduces the overhead as follows: ``` import numpy as np a = np.array(1) b = np.empty_like(a) %timeit np.positive(a) # 352->348 ns %timeit np.positive(a, subok=True) # 606->501 ns %timeit np.positive(a, where=True) # 586->503 ns %timeit np.positive(a, where=True, subok=True) # 695->531 ns %timeit np.positive(a, b) # 354->352 ns %timeit np.positive(a, out=b) # 557->480 ns %timeit np.positive(a, out=b, subok=True) # 668->506 ns %timeit np.positive(a, out=b, where=True, subok=True) # 752->536 ns ```
The argument parsing for ufuncs contains special logic to enable returning NotImplemented for comparison functions. The ufuncs, however, should not have to know whether they were called via a python comparison operator, so this PR moves the logic to the operators, checking after ufunc failed whether a NotImplemented should be returned.
This since "out" can no longer be present.
Could be together with simplifying array_wrap.
For comparison on your machine, can you show the timing for |
I think
|
One generic question is the extent to which we're willing to adapt the signature of |
Since this function is part of the public API I would prefer to keep it simple. |
Yes. In fact, that also means that some of what I'm doing here is not quite right; I cannot assume that |
It looks like all these other PRs are merged -- can we close this? |
This has quite a bit of follow-up commits, but I would definitely have to re-do those now that |
It is probably useful to see where I hope all my PRs lead to. Combining #11338, #11351, #11282 plus follow-up commits, the reduction is becoming quite impressive (speeds are: current master -> #11351 -> this PR.
The main plan I followed was to use interned strings instead of
char
, and to ensure things get done only once, in particular with output arguments:out
is now resolved right at the start and treated as a positional argument thereafter. In the process,ufunc_full_args
disappeared again, since the code can now count on all arguments being in one place.Note: no need to try to comment on all of this at once - this is mostly here to show that the work is worth it, and that I have something to refer to when discussing particular bits.
p.s. To my dismay, just adding
subok=True
still costs about 85 ns: 40ns in python for creating kwargs, 20ns for looking for out when it isn't there, and about 25 ns in parsing it.