MAINT: Improve speed of ufunc kwargs parsing #11351

mhvk · 2018-06-16T00:29:51Z

This is an alternative to #11333, which doesn't use the enum and includes a drastic rewrite in terms of conversion functions instead of the enormous switch/case loop (except for "out"). The reductions in overhead are the same as with #11333 to within a few ns (see below).

@eric-wieser - what do you think? My own sense is that this will be much more easily used also for reduce, etc., as it follows the python parsing ideas more closely.

EDIT: added reductions in overhead from #11333, since I closed that one.

import numpy as np
a = np.array(1)
b = np.empty_like(a)
%timeit np.positive(a)                                 # 352->348 ns
%timeit np.positive(a, subok=True)                     # 606->501 ns
%timeit np.positive(a, where=True)                     # 586->503 ns
%timeit np.positive(a, where=True, subok=True)         # 695->531 ns
%timeit np.positive(a, b)                              # 354->352 ns
%timeit np.positive(a, out=b)                          # 557->480 ns
%timeit np.positive(a, out=b, subok=True)              # 668->506 ns
%timeit np.positive(a, out=b, where=True, subok=True)  # 752->536 ns

eric-wieser · 2018-06-16T03:55:00Z

numpy/core/src/umath/ufunc_object.c

+        (++kwarg)->output = out_subok;
+        (++kwarg)->output = out_typetup;
+        (++kwarg)->output = &sig;
+        (++kwarg)->output = out_extobj;


This block isn't threadsafe (since ufunc_kwargs is static)

How much does performance change if you make it non-static?

Really though, I'm not sure this gains anything over a parallel array of outputs

Oops, I had not thought of that... I actually had a parallel array before, and at the time it seemed less nice. But reverting, I actually think it is much nicer, so I adopted it.

eric-wieser · 2018-06-16T03:58:22Z

numpy/core/src/umath/ufunc_object.c

+        {NULL, NULL, NULL}  /* sentinel */
+    };
+    ufunc_kwarg *kwarg = ufunc_kwargs;
+    if (kwarg->str == NULL) {  /* fill on first call */


Worth adding that this is because the strings are not interned yet during static initialization.

You could store &npy_um_str_out etc into the array instead, which would work fine during static initialization

OK, better indeed.

eric-wieser · 2018-06-16T04:00:23Z

numpy/core/src/umath/ufunc_object.c


+    PyArray_Descr *dtype = NULL;
+    PyObject *sig = NULL;
+    typedef int (*converter)(PyObject *, void *);


I often like to do my function typedefs as

typedef int converter(PyObject *, void *)

And then use converter *convert below, which reads as "a pointer to a converter function"

I had mine from the CPython implementation, but that definitely looks nicer, so I changed it.

eric-wieser · 2018-06-16T06:39:42Z

I wonder if you should just make a va_list function, which you could call as

parse_kwargs_fast(kwargs,
    interned_name, parser, target_var,
    interned_name, parser, target_var,
    ...
    NULL)

I know very little about va_list, but my understanding is that it's really just a bunch of helpers to walk through the stack

mhvk · 2018-06-16T17:17:23Z

@eric-wieser - I followed your suggestions and think it looks quite a bit better still. Do you agree that this is a better solution than #11333? If so, I'll close that one.

On the va_list, I did think of making effectively a sped-up version of PyArg_ParseArgsAndKeywords, but that may be better done in a separate PR. Also, I think for this routine, we may want to do out before it even gets called (since it is needed even if the ufunc is overridden).

mattip · 2018-06-19T21:47:13Z

I see #113333 is still open, but this is the desired alternative? Can we merge this and close the other?

mhvk · 2018-06-19T22:48:57Z

I definitely think this is the better solution -- just wasn't sure whether there were further comments.

mhvk · 2018-06-21T14:44:28Z

numpy/core/src/umath/ufunc_object.c

-            str_key_obj = PyUnicode_AsASCIIString(key);
-            if (str_key_obj != NULL) {
-                key = str_key_obj;
+        while (PyDict_Next(kwds, &pos, &key, &value)) {


This could almost become a fast keyword parsing function that we could use also for the reductions. But as is, this part of code is not able to DECREF items on failure, so I'd rather leave that for another PR (it needs changing the converters such that if passed NULL, they decref if needed).

mhvk · 2018-06-21T14:47:07Z

numpy/core/src/umath/ufunc_object.c


+    typedef int converter(PyObject *, void *);
+    typedef struct {


I'm still not completely sold on this new struct - its main advantage of keeping the names and converters together perhaps doesn't outweigh adding it -- but for use in just one function it is probably fine (and another reason for not yet aiming to make a general keyword parsing function).

mhvk · 2018-06-21T14:52:52Z

Rebased, and refactored a little more. There are some obvious further improvements possible, but I'd prefer to leave those for a future PR (see in-line comment).

mhvk · 2018-06-27T18:41:46Z

@eric-wieser - sorry to bug for two PRs, but this one would also be nice to get in...

eric-wieser · 2018-06-28T07:31:58Z

numpy/core/src/umath/ufunc_object.c

+        {NULL, NULL}                  /* sentinel */
+    };
+    /*
+     * Some converters are from the Array_API, which are not availabe'


This won't be true once umath and multiarray are merged, right?

eric-wieser · 2018-06-28T07:34:48Z

numpy/core/src/umath/ufunc_object.c

+        outputs[9] = out_typetup;
+        outputs[10] = &sig;
+        outputs[11] = out_extobj;
+        outputs[12] = NULL;  /* sentinel for key-not-found */


Having this order duplicated between here and above seems very fragile. I'll see if I can think of another way

eric-wieser · 2018-06-28T07:38:36Z

numpy/core/src/umath/ufunc_object.c

+            output = outputs[kwarg - ufunc_kwargs];
+            if (output) {
+                if (kwarg->convert) {
+                    if (!(*(kwarg->convert))(value, output)) {


FYI: No need to dereference the function pointer here - *fp() and fp() are synonymous in C. I'm fine with this line as is though

eric-wieser · 2018-06-28T07:39:55Z

numpy/core/src/umath/ufunc_object.c

-                        bad_arg = 0;
+                }
+                else {
+                    /* Just copy reference */


Is special casing NULL here really better than just having one more function pointer?

eric-wieser · 2018-06-28T07:41:29Z

numpy/core/src/umath/ufunc_object.c

+            else {
+#if PY_VERSION_HEX >= 0x03000000
+                PyErr_Format(PyExc_TypeError,
+                             "'%S' is an invalid keyword to ufunc '%s'",


Wait, CPython3x format strings have a special syntax for unicode? I think there's a lot of places that cleanup could go

'S' is equivalent to str, and, yes, I was very happy when I found it (and then sad that it didn't work on python 2... fortunately, not much longer.)

eric-wieser · 2018-06-28T07:51:54Z

numpy/core/src/umath/ufunc_object.c

+             * And we do indeed intern them.
+             */
+            void *output;
+            ufunc_kwarg *kwarg = ufunc_kwargs;


Can you extract this line to line 787 as a helper function, ufunc_kwarg *find_ufunc_kwd(PyObject *key), which returns null if nothing is found?

That would make it easier to swap it for a pointer-ordered binary search in future, if necessary.

You'd have to put the struct and static array definition at global scope, but I think that's fine.

eric-wieser · 2018-06-28T07:55:40Z

numpy/core/src/umath/ufunc_object.h

@@ -11,12 +11,22 @@ NPY_NO_EXPORT const char*
 ufunc_get_name_cstr(PyUFuncObject *ufunc);

 /* interned strings (on umath import) */


Can you add from umathmodule.c to this comment?

mhvk · 2018-06-28T19:57:25Z

numpy/core/src/umath/ufunc_object.c

+            return -1;
+        }
+        if (kwnames[index]) {
+            va_start(va, kwnames);


I'm not 100% sure this way of repeatedly going through the va_list is OK generically, but I guess the tests will tell.

Looks like you're fine:

void va_start (va_list ap, paramN);
If ap has already been passed as first argument to a previous call to va_start or va_copy, it shall be passed to va_end before calling this function.

mhvk · 2018-06-28T19:58:28Z

numpy/core/src/umath/ufunc_object.c

+    if (pos != 0) {
+        Py_ssize_t repeat = 0;
+        while (PyDict_Next(kwds, &repeat, &key, &value) && repeat != pos) {
+            int i;


Obviously, some repetition here with the above. Not quite sure how to solve it.

Could encapsulate the loop::

for (i = 0; i <= index; i++) { convert = va_arg(va, converter *); output = va_arg(va, void *); }

But arguably not worth it.

mattip · 2018-06-28T20:32:07Z

Maybe I missed it, but did you try replacing this with PyArg_ParseArgsAndKeywords ? It seems you are moving in the direction of duplicating much of that code

mhvk · 2018-06-28T20:40:11Z

@mattip - the problem is that PyArg_ParseArgsAndKeywords is very slow, in part because it turns every character keyword into an object, in part (IIRC) because it searches for keywords in the dict instead of scanning over the dict. But, yes, effectively we're emulating CPython code (the parts that deal with function calls); sadly, that is not exposed.

eric-wieser · 2018-06-28T20:41:18Z

@mhvk: Do we have a benchmark that proves that?

mhvk · 2018-06-28T21:25:46Z

@eric-wieser, @mattip - one can check with the reduction methods (which do use PyArg_ParseArgsAndKeywords):

In [17]: a = np.arange(10.)

In [18]: %timeit np.add.reduce(a, 0)
The slowest run took 24.99 times longer than the fastest. This could mean that an intermediate result is being cached.
1000000 loops, best of 3: 1.11 µs per loop

In [19]: %timeit np.add.reduce(a, axis=0)
The slowest run took 19.99 times longer than the fastest. This could mean that an intermediate result is being cached.
1000000 loops, best of 3: 1.36 µs per loop

eric-wieser · 2018-06-28T21:40:08Z

Can you add (in a new PR) a benchmark that compares:

np.add(a, b, out=c)
np.add(a, b, c)
np.add(a, b, out=(c,))

for some small arrays? Preferably in some way that asv will show them all on the same graph

mhvk · 2018-06-28T23:28:24Z

I pushed a version that has the benchmarks as the first commit. I tried installing asv and it did run with the simple ./runtest.py --bench command, but nothing else in bench/README.rst just works (no comparisons, no publishing). I don't really have time to debug this, though, and would prefer to focus on whether the PR itself makes sense (I feel it is an improvement over the previous parsing even ignoring the speed gain!).

eric-wieser · 2018-06-29T08:28:10Z

benchmarks/benchmarks/bench_ufunc.py

@@ -148,3 +148,62 @@ def time_add_scalar_conv(self):

    def time_add_scalar_conv_complex(self):
        (self.y + self.z)
+
+
+class ArgParsing(Benchmark):


I'd like these in a separate PR, simply because the hosted asv runs nightly, so would not produce a before / after if they get merged together.

Good point, see #11453

eric-wieser · 2018-06-29T08:30:47Z

numpy/core/src/umath/ufunc_object.c

+            }
+            va_end(va);
+            if (output) {
+                convert(NULL, output);


You'll get a system error here if convert tries to do any python stuff since there's an exception in flight. Possibly worth Fetching the exception, and checking the error code here, just in case.

eric-wieser · 2018-06-29T08:32:10Z

numpy/core/src/umath/ufunc_object.c

+         * Note that if any of the out_* is NULL, parse_ufunc_keywords
+         * will treat the keyword as unwanted.
+         */
+        if (parse_ufunc_keywords(


Can you move the definition of kwnames to just above here, so that the two things that have to be in the same order are at least adjacent?

static variables can be declared at any scope, not just function scope, IIRC.

Yes, makes much more sense.

Indeed, I rewrote as

static *kwnames[13] = {NULL} if (kwnames[0] == NULL) { kwnames[0] = npy_um_... ... }

so that I don't have the triple ***kwnames any more. Should be just that little bit faster.

eric-wieser · 2018-06-29T08:34:31Z

numpy/core/src/umath/ufunc_object.c

+    if (_convert_casting == NULL) {
+        _convert_casting = (converter *)PyArray_CastingConverter;
+        _convert_order = (converter *)PyArray_OrderConverter;
+        _convert_dtype = (converter *)PyArray_DescrConverter2;


Is this really worth it vs just passing these functions directly into the parse_ufunc_keywords call below? You're just making copies of the pointer here, which at best might provide marginally improved cache locality

Duh, yes, of course that should work! Indeed, much clearer, especially with the keywords there too.

eric-wieser · 2018-06-29T08:36:33Z

numpy/core/src/umath/ufunc_object.c

+        _convert_dtype = (converter *)PyArray_DescrConverter2;
+    };
+    /*
+     * Initialize output objects so caller knows when outputs and optional
     * arguments are set (also means we can safely XDECREF on failure).
     */


Is this necessary now, or does parse_ufunc_keywords handle that?

Yes, still necessary. The parser only touches parts that are actually in kwds

eric-wieser · 2018-06-29T08:43:46Z

numpy/core/src/umath/ufunc_object.c

+            return NPY_FAIL;
+        }
+    }
+    else if (obj == NULL) {


This condition is never hit.

It is if the parser is calling the function again in the failure path to undo an earlier initialization, i.e., in this case when, say, the first keyword was where and the second keyword something that is not allowed.

This behaviour is the same as for PyArg_ParseArgsAndKeywords

p.s. Added a note about the converters to make this clear.

But looking more closely at the array ones, I'm not so sure they are actually written to follow the logic... And indeed, reading the documentation [1] more carefully, a function has to return Py_CLEANUP_SUPPORTED and this only works in python 3. I'll rewrite accordingly.

[1] https://docs.python.org/3/c-api/arg.html

What I meant here is that you had your ifs the wrong way around.

if (obj != PyTrue) { } else if (obj == NULL) { // but this can never happen, because PyTrue != NULL }

Ah, yes, that was silly!

eric-wieser · 2018-06-29T08:44:02Z

numpy/core/src/umath/ufunc_object.c

+NPY_NO_EXPORT int
+_convert_keepdims(PyObject *obj, int *keepdims)
+{
+    if (PyBool_Check(obj)) {


Not sure this is safe if obj == NULL

I thought surely it was but looking at the actual definition, it clearly is not.

Using the realization from inspecting similar CPython code that as we intern string names anyway, comparing keys with possible names by pointer will generally just work. Also rather drastically rewrote, to use a new parser with simple conversion functions to replace part of the enormous switch/case structure, process out separately, and check for duplication after the parsing. Reduces the overhead as follows: ``` import numpy as np a = np.array(1) b = np.empty_like(a) %timeit np.positive(a) # 352->348 ns %timeit np.positive(a, subok=True) # 606->501 ns %timeit np.positive(a, where=True) # 586->503 ns %timeit np.positive(a, where=True, subok=True) # 695->531 ns %timeit np.positive(a, b) # 354->352 ns %timeit np.positive(a, out=b) # 557->480 ns %timeit np.positive(a, out=b, subok=True) # 668->506 ns %timeit np.positive(a, out=b, where=True, subok=True) # 752->536 ns ```

mhvk · 2018-06-29T13:54:09Z

@eric-wieser - thanks for the (as usual) very good comments. Now having looked more at the normal parsers, it is clear that the cleanup is better not done inside it; being a bit more judicious with borrowed and new references allowed the code to become substantially simpler. I'm now sufficiently convinced this is the right path that I squashed all commits. In a follow-up commit, I'll use this parser for the reduction methods (which may need a bit of tinkering with the keyword-not-found handler, but should otherwise be very straightforward).

eric-wieser

Looks pretty good now - thanks for showing me a few nifty py3 c api features, even if we can't use them yet...

eric-wieser · 2018-06-30T03:49:56Z

Let's not merge this until the benchmark suite picks up your new benchmarks

eric-wieser · 2018-07-02T16:00:59Z

Alright, let's see how the benchmarks improve

mhvk · 2018-07-02T17:11:00Z

Great!

eric-wieser · 2018-07-10T15:01:42Z

Doesn't seem to have made any real difference (orange line is this commit)

mhvk · 2018-07-10T16:27:53Z

Very strange, I just checked and I do still get a speed-up, but it is less dramatic than I had in top, mostly because the just-before commit is faster (new numbers, from just before the commit to current master):

import numpy as np
a = np.array(1)
b = np.empty_like(a)
%timeit np.positive(a)                                 # 352->348 ns  # 313->320 ns
%timeit np.positive(a, subok=True)                     # 606->501 ns  # 529->483 ns
%timeit np.positive(a, where=True)                     # 586->503 ns  # 500->473 ns
%timeit np.positive(a, where=True, subok=True)         # 695->531 ns  # 595->515 ns
%timeit np.positive(a, b)                              # 354->352 ns  # 330->330 ns
%timeit np.positive(a, out=b)                          # 557->480 ns  # 504->455 ns
%timeit np.positive(a, out=b, subok=True)              # 668->506 ns  # 573->495 ns
%timeit np.positive(a, out=b, where=True, subok=True)  # 752->536 ns  # 637->526 ns

O well, at least the code is clearer now...

mhvk added 03 - Maintenance component: numpy.ufunc labels Jun 16, 2018

mhvk requested a review from eric-wieser June 16, 2018 00:29

mhvk changed the title ~~MAINT: Improve speed of ufunc kwargs parsing.~~ MAINT: Improve speed of ufunc kwargs parsing (alternative). Jun 16, 2018

eric-wieser reviewed Jun 16, 2018

View reviewed changes

mhvk force-pushed the ufunc-parsing-speedup-alternative branch from f34960a to d41caa9 Compare June 16, 2018 17:17

mhvk mentioned this pull request Jun 18, 2018

MAINT: Ufunc parsing super fast (work in progress) #11372

Closed

mhvk force-pushed the ufunc-parsing-speedup-alternative branch from d41caa9 to 82e02c6 Compare June 21, 2018 14:39

mhvk mentioned this pull request Jun 21, 2018

MAINT: Improve speed of ufunc kwargs parsing. #11333

Closed

mhvk commented Jun 21, 2018

View reviewed changes

mhvk force-pushed the ufunc-parsing-speedup-alternative branch from 82e02c6 to 62a842f Compare June 21, 2018 17:51

mhvk changed the title ~~MAINT: Improve speed of ufunc kwargs parsing (alternative).~~ MAINT: Improve speed of ufunc kwargs parsing Jun 21, 2018

eric-wieser self-assigned this Jun 27, 2018

eric-wieser reviewed Jun 28, 2018

View reviewed changes

mhvk force-pushed the ufunc-parsing-speedup-alternative branch from 62a842f to b292dcb Compare June 28, 2018 19:41

mhvk commented Jun 28, 2018

View reviewed changes

mhvk force-pushed the ufunc-parsing-speedup-alternative branch from 6ed3bfc to 6d42d80 Compare June 28, 2018 20:44

mhvk force-pushed the ufunc-parsing-speedup-alternative branch from 6d42d80 to fc5f537 Compare June 28, 2018 23:23

eric-wieser reviewed Jun 29, 2018

View reviewed changes

mhvk mentioned this pull request Jun 29, 2018

BENCH: add ufunc argument parsing benchmarks. #11453

Merged

mhvk force-pushed the ufunc-parsing-speedup-alternative branch from fc5f537 to f93a496 Compare June 29, 2018 13:51

eric-wieser approved these changes Jun 30, 2018

View reviewed changes

eric-wieser mentioned this pull request Jul 1, 2018

MAINT: Produce a more readable repr of argument packs in benchmark #11463

Merged

eric-wieser merged commit 58cefee into numpy:master Jul 2, 2018

mhvk deleted the ufunc-parsing-speedup-alternative branch July 2, 2018 17:10

		@@ -11,12 +11,22 @@ NPY_NO_EXPORT const char*
		ufunc_get_name_cstr(PyUFuncObject *ufunc);

		/* interned strings (on umath import) */

Uh oh!

MAINT: Improve speed of ufunc kwargs parsing #11351

MAINT: Improve speed of ufunc kwargs parsing #11351

Uh oh!

Conversation

mhvk commented Jun 16, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

eric-wieser Jun 16, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

eric-wieser commented Jun 16, 2018

Uh oh!

mhvk commented Jun 16, 2018

Uh oh!

mattip commented Jun 19, 2018

Uh oh!

mhvk commented Jun 19, 2018

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

mhvk commented Jun 21, 2018

Uh oh!

mhvk commented Jun 27, 2018

Uh oh!

eric-wieser Jun 28, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

eric-wieser Jun 28, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

mattip commented Jun 28, 2018

Uh oh!

mhvk commented Jun 28, 2018

Uh oh!

eric-wieser commented Jun 28, 2018

Uh oh!

mhvk commented Jun 28, 2018

Uh oh!

eric-wieser commented Jun 28, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

mhvk commented Jun 16, 2018 •

edited

Loading

eric-wieser Jun 16, 2018 •

edited

Loading

eric-wieser Jun 28, 2018 •

edited

Loading

eric-wieser Jun 28, 2018 •

edited

Loading

eric-wieser commented Jun 28, 2018 •

edited

Loading

eric-wieser Jun 29, 2018 •

edited

Loading

eric-wieser Jun 29, 2018 •

edited

Loading

eric-wieser Jun 29, 2018 •

edited

Loading