-
-
Notifications
You must be signed in to change notification settings - Fork 11.1k
MAINT: Improve speed of ufunc kwargs parsing #11351
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
MAINT: Improve speed of ufunc kwargs parsing #11351
Conversation
numpy/core/src/umath/ufunc_object.c
Outdated
(++kwarg)->output = out_subok; | ||
(++kwarg)->output = out_typetup; | ||
(++kwarg)->output = &sig; | ||
(++kwarg)->output = out_extobj; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This block isn't threadsafe (since ufunc_kwargs
is static)
How much does performance change if you make it non-static?
Really though, I'm not sure this gains anything over a parallel array of outputs
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oops, I had not thought of that... I actually had a parallel array before, and at the time it seemed less nice. But reverting, I actually think it is much nicer, so I adopted it.
numpy/core/src/umath/ufunc_object.c
Outdated
{NULL, NULL, NULL} /* sentinel */ | ||
}; | ||
ufunc_kwarg *kwarg = ufunc_kwargs; | ||
if (kwarg->str == NULL) { /* fill on first call */ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Worth adding that this is because the strings are not interned yet during static initialization.
You could store &npy_um_str_out
etc into the array instead, which would work fine during static initialization
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, better indeed.
numpy/core/src/umath/ufunc_object.c
Outdated
|
||
PyArray_Descr *dtype = NULL; | ||
PyObject *sig = NULL; | ||
typedef int (*converter)(PyObject *, void *); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I often like to do my function typedefs as
typedef int converter(PyObject *, void *)
And then use converter *convert
below, which reads as "a pointer to a converter function"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I had mine from the CPython implementation, but that definitely looks nicer, so I changed it.
I wonder if you should just make a
I know very little about |
@eric-wieser - I followed your suggestions and think it looks quite a bit better still. Do you agree that this is a better solution than #11333? If so, I'll close that one. On the |
f34960a
to
d41caa9
Compare
I see #113333 is still open, but this is the desired alternative? Can we merge this and close the other? |
I definitely think this is the better solution -- just wasn't sure whether there were further comments. |
d41caa9
to
82e02c6
Compare
numpy/core/src/umath/ufunc_object.c
Outdated
str_key_obj = PyUnicode_AsASCIIString(key); | ||
if (str_key_obj != NULL) { | ||
key = str_key_obj; | ||
while (PyDict_Next(kwds, &pos, &key, &value)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This could almost become a fast keyword parsing function that we could use also for the reductions. But as is, this part of code is not able to DECREF
items on failure, so I'd rather leave that for another PR (it needs changing the converters such that if passed NULL
, they decref if needed).
numpy/core/src/umath/ufunc_object.c
Outdated
|
||
typedef int converter(PyObject *, void *); | ||
typedef struct { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm still not completely sold on this new struct - its main advantage of keeping the names and converters together perhaps doesn't outweigh adding it -- but for use in just one function it is probably fine (and another reason for not yet aiming to make a general keyword parsing function).
Rebased, and refactored a little more. There are some obvious further improvements possible, but I'd prefer to leave those for a future PR (see in-line comment). |
82e02c6
to
62a842f
Compare
@eric-wieser - sorry to bug for two PRs, but this one would also be nice to get in... |
numpy/core/src/umath/ufunc_object.c
Outdated
{NULL, NULL} /* sentinel */ | ||
}; | ||
/* | ||
* Some converters are from the Array_API, which are not availabe' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This won't be true once umath and multiarray are merged, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Correct
numpy/core/src/umath/ufunc_object.c
Outdated
outputs[9] = out_typetup; | ||
outputs[10] = &sig; | ||
outputs[11] = out_extobj; | ||
outputs[12] = NULL; /* sentinel for key-not-found */ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Having this order duplicated between here and above seems very fragile. I'll see if I can think of another way
numpy/core/src/umath/ufunc_object.c
Outdated
output = outputs[kwarg - ufunc_kwargs]; | ||
if (output) { | ||
if (kwarg->convert) { | ||
if (!(*(kwarg->convert))(value, output)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FYI: No need to dereference the function pointer here - *fp()
and fp()
are synonymous in C. I'm fine with this line as is though
numpy/core/src/umath/ufunc_object.c
Outdated
bad_arg = 0; | ||
} | ||
else { | ||
/* Just copy reference */ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is special casing NULL
here really better than just having one more function pointer?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
numpy/core/src/umath/ufunc_object.c
Outdated
else { | ||
#if PY_VERSION_HEX >= 0x03000000 | ||
PyErr_Format(PyExc_TypeError, | ||
"'%S' is an invalid keyword to ufunc '%s'", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wait, CPython3x format strings have a special syntax for unicode? I think there's a lot of places that cleanup could go
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
'S'
is equivalent to str
, and, yes, I was very happy when I found it (and then sad that it didn't work on python 2... fortunately, not much longer.)
numpy/core/src/umath/ufunc_object.c
Outdated
* And we do indeed intern them. | ||
*/ | ||
void *output; | ||
ufunc_kwarg *kwarg = ufunc_kwargs; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you extract this line to line 787 as a helper function, ufunc_kwarg *find_ufunc_kwd(PyObject *key)
, which returns null if nothing is found?
That would make it easier to swap it for a pointer-ordered binary search in future, if necessary.
You'd have to put the struct and static array definition at global scope, but I think that's fine.
numpy/core/src/umath/ufunc_object.h
Outdated
@@ -11,12 +11,22 @@ NPY_NO_EXPORT const char* | |||
ufunc_get_name_cstr(PyUFuncObject *ufunc); | |||
|
|||
/* interned strings (on umath import) */ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you add from umathmodule.c
to this comment?
62a842f
to
b292dcb
Compare
return -1; | ||
} | ||
if (kwnames[index]) { | ||
va_start(va, kwnames); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not 100% sure this way of repeatedly going through the va_list
is OK generically, but I guess the tests will tell.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like you're fine:
void va_start (va_list ap, paramN);
If ap has already been passed as first argument to a previous call to va_start or va_copy, it shall be passed to va_end before calling this function.
numpy/core/src/umath/ufunc_object.c
Outdated
if (pos != 0) { | ||
Py_ssize_t repeat = 0; | ||
while (PyDict_Next(kwds, &repeat, &key, &value) && repeat != pos) { | ||
int i; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Obviously, some repetition here with the above. Not quite sure how to solve it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could encapsulate the loop::
for (i = 0; i <= index; i++) {
convert = va_arg(va, converter *);
output = va_arg(va, void *);
}
But arguably not worth it.
Maybe I missed it, but did you try replacing this with |
@mattip - the problem is that |
@mhvk: Do we have a benchmark that proves that? |
6ed3bfc
to
6d42d80
Compare
@eric-wieser, @mattip - one can check with the reduction methods (which do use
|
Can you add (in a new PR) a benchmark that compares:
for some small arrays? Preferably in some way that |
6d42d80
to
fc5f537
Compare
I pushed a version that has the benchmarks as the first commit. I tried installing |
benchmarks/benchmarks/bench_ufunc.py
Outdated
@@ -148,3 +148,62 @@ def time_add_scalar_conv(self): | |||
|
|||
def time_add_scalar_conv_complex(self): | |||
(self.y + self.z) | |||
|
|||
|
|||
class ArgParsing(Benchmark): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd like these in a separate PR, simply because the hosted asv runs nightly, so would not produce a before / after if they get merged together.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point, see #11453
numpy/core/src/umath/ufunc_object.c
Outdated
} | ||
va_end(va); | ||
if (output) { | ||
convert(NULL, output); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You'll get a system error here if convert
tries to do any python stuff since there's an exception in flight. Possibly worth Fetch
ing the exception, and checking the error code here, just in case.
numpy/core/src/umath/ufunc_object.c
Outdated
* Note that if any of the out_* is NULL, parse_ufunc_keywords | ||
* will treat the keyword as unwanted. | ||
*/ | ||
if (parse_ufunc_keywords( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you move the definition of kwnames
to just above here, so that the two things that have to be in the same order are at least adjacent?
static
variables can be declared at any scope, not just function scope, IIRC.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, makes much more sense.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Indeed, I rewrote as
static *kwnames[13] = {NULL}
if (kwnames[0] == NULL) {
kwnames[0] = npy_um_...
...
}
so that I don't have the triple ***kwnames
any more. Should be just that little bit faster.
numpy/core/src/umath/ufunc_object.c
Outdated
if (_convert_casting == NULL) { | ||
_convert_casting = (converter *)PyArray_CastingConverter; | ||
_convert_order = (converter *)PyArray_OrderConverter; | ||
_convert_dtype = (converter *)PyArray_DescrConverter2; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this really worth it vs just passing these functions directly into the parse_ufunc_keywords
call below? You're just making copies of the pointer here, which at best might provide marginally improved cache locality
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Duh, yes, of course that should work! Indeed, much clearer, especially with the keywords there too.
numpy/core/src/umath/ufunc_object.c
Outdated
_convert_dtype = (converter *)PyArray_DescrConverter2; | ||
}; | ||
/* | ||
* Initialize output objects so caller knows when outputs and optional | ||
* arguments are set (also means we can safely XDECREF on failure). | ||
*/ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this necessary now, or does parse_ufunc_keywords
handle that?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, still necessary. The parser only touches parts that are actually in kwds
numpy/core/src/umath/ufunc_object.c
Outdated
return NPY_FAIL; | ||
} | ||
} | ||
else if (obj == NULL) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This condition is never hit.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is if the parser is calling the function again in the failure path to undo an earlier initialization, i.e., in this case when, say, the first keyword was where
and the second keyword something that is not allowed.
This behaviour is the same as for PyArg_ParseArgsAndKeywords
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
p.s. Added a note about the converters to make this clear.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But looking more closely at the array ones, I'm not so sure they are actually written to follow the logic... And indeed, reading the documentation [1] more carefully, a function has to return Py_CLEANUP_SUPPORTED
and this only works in python 3. I'll rewrite accordingly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What I meant here is that you had your ifs the wrong way around.
if (obj != PyTrue) {
}
else if (obj == NULL) {
// but this can never happen, because PyTrue != NULL
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, yes, that was silly!
numpy/core/src/umath/ufunc_object.c
Outdated
NPY_NO_EXPORT int | ||
_convert_keepdims(PyObject *obj, int *keepdims) | ||
{ | ||
if (PyBool_Check(obj)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure this is safe if obj == NULL
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought surely it was but looking at the actual definition, it clearly is not.
Using the realization from inspecting similar CPython code that as we intern string names anyway, comparing keys with possible names by pointer will generally just work. Also rather drastically rewrote, to use a new parser with simple conversion functions to replace part of the enormous switch/case structure, process out separately, and check for duplication after the parsing. Reduces the overhead as follows: ``` import numpy as np a = np.array(1) b = np.empty_like(a) %timeit np.positive(a) # 352->348 ns %timeit np.positive(a, subok=True) # 606->501 ns %timeit np.positive(a, where=True) # 586->503 ns %timeit np.positive(a, where=True, subok=True) # 695->531 ns %timeit np.positive(a, b) # 354->352 ns %timeit np.positive(a, out=b) # 557->480 ns %timeit np.positive(a, out=b, subok=True) # 668->506 ns %timeit np.positive(a, out=b, where=True, subok=True) # 752->536 ns ```
fc5f537
to
f93a496
Compare
@eric-wieser - thanks for the (as usual) very good comments. Now having looked more at the normal parsers, it is clear that the cleanup is better not done inside it; being a bit more judicious with borrowed and new references allowed the code to become substantially simpler. I'm now sufficiently convinced this is the right path that I squashed all commits. In a follow-up commit, I'll use this parser for the reduction methods (which may need a bit of tinkering with the keyword-not-found handler, but should otherwise be very straightforward). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks pretty good now - thanks for showing me a few nifty py3 c api features, even if we can't use them yet...
Let's not merge this until the benchmark suite picks up your new benchmarks |
Alright, let's see how the benchmarks improve |
Great! |
Very strange, I just checked and I do still get a speed-up, but it is less dramatic than I had in top, mostly because the just-before commit is faster (new numbers, from just before the commit to current master):
O well, at least the code is clearer now... |
This is an alternative to #11333, which doesn't use the enum and includes a drastic rewrite in terms of conversion functions instead of the enormous switch/case loop (except for "out"). The reductions in overhead are the same as with #11333 to within a few ns (see below).
@eric-wieser - what do you think? My own sense is that this will be much more easily used also for
reduce
, etc., as it follows the python parsing ideas more closely.EDIT: added reductions in overhead from #11333, since I closed that one.