ENH: Implement axis for generalized ufuncs. #11018

mhvk · 2018-04-30T16:39:53Z

EDIT: this is now for axis only - keepdims has been added.

Follow-up on #8819, specifically for generalized ufuncs that are like reductions on a single axis.

EDIT: most of the below has become irrelevant.

~~As is, the implementation is not super-elegant:~~ EDIT: now quite elegant.

For keepdims, ~~checks are added every time the output core dimensions are accessed; it might make more sense to adjust just once.~~ EDIT: this is now solved.
For axis, ~~a full axes list is constructed rather than directly use the value (an advantage, though, is that an extension to passing in a tuple or None for axis would be that much easier).~~ (EDIT) currently a separate routine is used from that for axes, which means that any reinterpretation of what values it can have (currently, only int) would have to be done in two places. This would involve only a trivial refactor, though.

But I think this is worth having a careful look. It would be nice to have this in 1.15, together with the axes argument.

p.s. I tried to keep the commits adding keepdims and axis separate, as they are somewhat separate enhancements (and it might help review).

EDIT: they certainly helped to find an annoying reference counting bug. All seems OK now.

mhvk · 2018-04-30T16:41:51Z

numpy/core/src/umath/ufunc_object.c

    if (retval < 0) {
-        goto fail;


The change from goto fail to return here and below is because in fail I now need to decref axes (since it might be a new object created from axis). Fortunately, we have not taken any references here (and below), so it is safe to just return.

mhvk · 2018-04-30T17:06:44Z

numpy/core/src/umath/ufunc_object.c

+    /*
+     * If keepdims was passed in (and thus changed from the initial value
+     * on top), check the gufunc is suitable, i.e., that:
+     * - its input arguments share a single core dimension;


This is actually not strictly necessary (and checked); signatures like (i),(j)->() are fine for keepdims. Will change after having had further review.

shoyer · 2018-04-30T22:26:49Z

doc/release/1.15.0-notes.rst

+that act on a single, shared core dimension such as the inner product example
+above, one can pass an ``axis`` argument. This is equivalent to passing in
+``axes`` with identical entries for all arguments with that core dimension
+(e.g., for the example above, ``axes=[(axis,), (axis,)]``).


It would be nice to clarify (and test!) it this also works if the output has the same dimension. For example, a signature of (i)->(i).

Yes, indeed. The problem is that the _umath_test module does not contain any example of a gufunc with such a signature, and I don't feel quite up to writing one... (sadly, linalg also does not have one). Note, though, that the code path does get tested - with keepdims=True, any no-core arguments do get upgraded to having a single core dimension.

p.s. Of course this isn't a reason to clarify the documentation! Indeed, the keepdims case itself might be useful for that.

Now added a bit more text to keepdims to clarify this.

mhvk · 2018-04-30T23:09:48Z

doc/source/reference/ufuncs.rst

+
+    If this is set to `True`, an axis which is reduced will be left in the
+    result as a dimension with size one. With this option, the result will
+    correctly against the inputs. This option can only be used for generalized


Note to myself: missing broadcast and this incorrectly states the single dimension needs to be shared.

eric-wieser · 2018-05-01T04:29:41Z

numpy/core/src/umath/ufunc_object.c

@@ -565,7 +565,9 @@ get_ufunc_arguments(PyUFuncObject *ufunc,
                    PyObject **out_typetup,
                    int *out_subok,
                    PyArrayObject **out_wheremask,
-                    PyObject **out_axes)
+                    PyObject **out_axes,
+                    PyObject **out_axis,


A comment with the python type would be nice for this and the above line

You mean something like

# list of tuples of specific axes for each argument # axis to use for a shared, single core dimension

Am not sure it will help much to have that here.

Yeah, something like that. Maybe even just /* type: T */ and /* type: List[Tuple[T]] */, adopting PEP484 syntax (and considering @shoyer's remarks about non-int types

eric-wieser · 2018-05-01T04:30:43Z

numpy/core/src/umath/ufunc_object.c

+                                        "'keepdims' must be a boolean");
+                            goto fail;
+                        }
+                        *out_keepdims = (value == Py_True);


What if value is np.false_? How do we handle this in subok?

I simply copied the stanza from subok, but agree it is somewhat odd. I'd prefer to leave it for a quick MAINT PR, though.

eric-wieser · 2018-05-01T04:31:31Z

numpy/core/src/umath/ufunc_object.c

-                    PyObject **out_axes)
+                    PyObject **out_axes,
+                    PyObject **out_axis,
+                    int *out_keepdims)


Any reason not to use a npy_bool for this and out_subok?

In the main routine, it is int, though, and I use -1 as a value to indicate "unitialized". Overall, perhaps best done in separate cleanup.

eric-wieser · 2018-05-01T04:32:46Z

numpy/core/src/umath/ufunc_object.c

+     */
+    if (keepdims != -1) {
+        for (i = 0; i < nop; i++) {
+            if (ufunc->core_num_dims[i] != (i < nin? 1 : 0)) {


I would prefer to see the test separate from the error message. Something like bool _ufunc_supports_keepdims(ufunc)

Yes, that makes sense. I'll do the same for axis (even though it is a trivial one).

eric-wieser · 2018-05-01T04:33:49Z

numpy/core/src/umath/ufunc_object.c

@@ -2260,7 +2317,7 @@ PyUFunc_GeneralizedFunction(PyUFuncObject *ufunc,
     */
    iter_ndim = broadcast_ndim;
    for (i = nin; i < nop; ++i) {
-        iter_ndim += ufunc->core_num_dims[i];
+        iter_ndim += keepdims ? 1 : ufunc->core_num_dims[i];


Why is keepdims being added in the loop? That seems wrong to me

It does need to be there, so that the iterator can allocate dimensions in the correct order (see comment just above this piece of code)

eric-wieser · 2018-05-01T04:35:19Z

numpy/core/tests/test_ufunc.py

@@ -708,6 +708,58 @@ def test_axes_argument(self):
        assert_raises(ValueError, mm, z[1], z, axes=[0, 1])
        assert_raises(ValueError, mm, z, z, out=z[0], axes=[0, 1])

+    def test_keepdims_argument(self):
+        # inner1d signature: '(i),(i)->()'
+        in1d = umt.inner1d


A little misleading for someone greping for in1d and looking for np.in1d

eric-wieser · 2018-05-01T04:39:52Z

numpy/core/src/umath/ufunc_object.c

@@ -2456,7 +2497,7 @@ PyUFunc_GeneralizedFunction(PyUFuncObject *ufunc,
     */
    core_dim_ixs_size = 0;
    for (i = 0; i < nop; ++i) {
-        core_dim_ixs_size += ufunc->core_num_dims[i];
+        core_dim_ixs_size += keepdims && i >= nin ? 1: ufunc->core_num_dims[i];


Instead of computing this over and over again, why not something like this at the start:

npy_intp *core_num_dims; if (keepdims) { // or could just use a static buffer core_num_dims = alloca(sizeof(*core_num_dims) * NPY_MAX_DIMS); for (i = 0; i < nin; ++i) { core_num_dims[i] = ufunc->core_num_dims[i]; } for(; i < nop; i++) { core_num_dims[i] = 1; } } else { core_num_dims = ufunc->core_num_dims; }

OK, better indeed, especially since all dimensions are just 1.

It seems appveyor/windows does not like alloca, so I now just use a static buffer.

eric-wieser · 2018-05-01T04:42:20Z

numpy/core/src/umath/ufunc_object.c

-
+    /*
+     * If keepdims was passed in (and thus changed from the initial value
+     * on top), check the gufunc is suitable.


I'm starting to think that we should have a type hierarchy for gufuncs, rather than making methods available or not based on their properties. Not something to deal with in this PR, but something to consider.

Yes, that makes sense. For that purpose, the separate _check_..._support functions are also sensible.

eric-wieser · 2018-05-01T04:42:57Z

numpy/core/src/umath/ufunc_object.c

            PyErr_Format(PyExc_ValueError,
                         "%s: %s operand %d does not have enough "
                         "dimensions (has %d, gufunc core with "
                         "signature %s requires %d)",
-                         ufunc_get_name_cstr(ufunc),
-                         i < nin ? "Input" : "Output",
+                         ufunc_name, i < nin ? "Input" : "Output",


This was better with the two ternaries line wrapped to match up

OK, now have all arguments on their own line.

eric-wieser · 2018-05-01T04:44:26Z

numpy/core/src/umath/ufunc_object.c

@@ -2230,7 +2246,9 @@ PyUFunc_GeneralizedFunction(PyUFuncObject *ufunc,
    /* Get all the arguments */
    retval = get_ufunc_arguments(ufunc, args, kwds,
                op, &order, &casting, &extobj,
-                &type_tup, &subok, NULL, &axes, &keepdims);
+                &type_tup, &subok, NULL, &axes, &axis, &keepdims);
+    /* if given, need our own reference to axes, as we XDECREF it at the end */


Why not just leave it alone then? This doesn't seem to justify doing either at the moment.

Edit: Looks like the real reason we need it is we allocate a new axes below

I'll edit the comment to state this explicitly.

mhvk · 2018-05-01T16:52:58Z

OK, pushed up changes...

mhvk · 2018-05-02T17:05:43Z

I couldn't resist - I now allow keepdims for any case in which all inputs have the same number of core dimensions, and all outputs have none. With that, one can use keepdims on things like det (which has signature (m,m)->()). (Note that this is not exposed through the normal linalg interface; I did add tests to check that for the underlying gufunc it works as advertised.)

mhvk · 2018-05-11T23:26:52Z

Rebased to get rid of the merge conflict. Am fairly happy with this and hope it can go in 1.15.

charris · 2018-05-12T17:26:17Z

@eric-wieser @shoyer Be good for the previous reviewers to sign off on this, or not.

eric-wieser · 2018-05-12T20:14:47Z

numpy/core/src/umath/override.c

    /* finally, ufuncs accept 'sig' or 'signature' normalize to 'signature' */
-    return normalize_signature_keyword(*normal_kwds);
+    return nkwds == 0 ? 0 : normalize_signature_keyword(*normal_kwds);


I think this optimization might make more sense inside normalize_signature_keyword

The reason I put it here is that normalize_signature_keyword gets passed normal_kwds, which by this time may well have gotten an out entry even if there were no original keywords.

eric-wieser · 2018-05-12T20:16:47Z

numpy/core/src/umath/override.c

+        PyErr_SetString(PyExc_TypeError,
+                        "cannot specify both 'axis' and 'axes'");
+        return -1;
+    }


Should axis be converted to axes here, so that __array_ufunc__ only needs to support axes?

Perhaps a normalize_axis_keyword(*normal_kwds)

In general, I think __array_ufunc__ should just get whatever is passed in, since in principle at least it may act differently than the normal ufunc. In fact, I am somewhat torn about even including the check here; we do just pass on everything else that the user passes in, even if it would give an error message if no __array_ufunc__ is present. But then, we do check sig and signature. But then again, that is a deprecation...

Isn't the whole point in these normalization function not to pass on whatever was passed? I'd argue that by not normalizing axis, we make it harder for overriders to implement it and axes correctly.

The question really is - is your description of how axis maps to axes an invariant of ufuncs, or just one for the specific case of ndarrays? I think that if a short-hand can mean different things for different subclasses, then it's a dangerous shorthand.

As I saw it, the normalization was mostly to ensure all but the inputs were in keyword arguments, so that __array_ufunc__ implementations didn't have to worry about that; sig and signature seemed to be there just because sig is deprecated.

But I can see your argument as well that one should just translate axis because we should not encourage __array_ufunc__ implementations that can handle axis but not axes. I'll try to see how it looks if I split out the conversion of axis to axes as a helper function, as you suggest below.

I'm finding myself slightly stuck: I now have a nice _build_axes_tuple in ufunc_object.c, but to use this in override.c, I'd need to include it in a .h file -- but in umath, those seem to contain only stuff to be exported, which this obviously is not. Most logical might be to start a new ufunc_parser.c or ufunc_helper.c which contains all these helper functions for parsing, but that feels a bit beyond this PR...

An alternative might be just to remove the override commit for now, with the logic that __array_ufunc__ implementations should do whatever they want; they'll get an error anyway if they pass this back in.

but let's make sure it works for arbitrary Python objects, not just integers.

+1 on this. axis='rows' for a ufunc like (i),(i) -> () should end up as axes=[['rows'], ['rows'], []]

The current normalization does work for arbitrary objects. It just puts axis in a single-element tuple, and creates a list with that tuple as entries (or an empty one if an argument has no core dimensions).

@mhvk: Can you just put it in numpy\core\src\umath\ufunc_object.h? Those functions are not exported either.

Yes, putting it in ufunc_object.h works (I had mistakenly looked in ufuncobject.h, which does have things for export...), but it has a problem that then we need even more parsing... (I'll move to the main thread, since I think this discussion should not disappear after another change.)

Yeah, the fact that both of those filenames exist is super confusing...

eric-wieser · 2018-06-04T01:34:34Z

numpy/core/src/umath/ufunc_object.c

-                    PyArrayObject **out_wheremask,
-                    PyObject **out_axes,
-                    int *out_keepdims)
+                    PyObject **out_typetup,  /* tuple of dtype */


nit: type: Tuple[np.dtype] would match the mypy-style annotation used below

eric-wieser · 2018-06-04T01:37:28Z

numpy/core/src/umath/ufunc_object.c

-    /* When provided, extobj, typetup, and axes contain borrowed references */
-    PyObject *extobj = NULL, *type_tup = NULL, *axes = NULL;
+    /* other possible keyword arguments */
+    PyObject *extobj = NULL, *type_tup = NULL, *axes=NULL, *axis = NULL;


nit: inconsistent spacing. Could break this over multiple lines too.

added spaces.

eric-wieser

Do we want to add this to the ufunc doc signature template? Arguments against: That template is only really applied to non-gufuncs right now, which don;t support this argument. Making me think that #8868 might be worth picking back up.

eric-wieser · 2018-06-04T01:40:06Z

doc/source/reference/ufuncs.rst

+
+    .. versionadded:: 1.15
+
+    A single axis over which a generalized ufunc should operate. This is a


"Not supported for non-generalized ufuncs" might be nice. Does np.add([1], [2], axis=0) fail?

It does say specifically "over which a generalized ufunc should operate. And, yes, an error is raised:

In [2]: np.add(1., 1., axis=0) --------------------------------------------------------------------------- TypeError Traceback (most recent call last) <ipython-input-2-578c404b564c> in <module>() ----> 1 np.add(1., 1., axis=0) TypeError: 'axis' is an invalid keyword to ufunc 'add'

But this was not tested - now added.

mhvk · 2018-06-04T13:49:36Z

I agree that we should not add axes, axis, keepdims to the docstring template, since we currently do not expose any generalized ufunc directly.

mhvk · 2018-06-06T14:25:45Z

Rebased on top of #11257 - which means axis now has its own reference too. Should be all set now.

mattip · 2018-06-06T18:23:20Z

@charris are you comfortable with this going into 1.15?

mhvk · 2018-06-06T19:37:25Z

It does complement axes and keepdims...

charris · 2018-06-07T16:58:44Z

@mattip I'm letting the __array_ufunc__ folks drive this train as I am not a user of that functionality. I'd like @shoyer to sign off on it though. I think @eric-wieser is OK with it now?

charris · 2018-06-07T17:01:04Z

I'll probably kick all this off to 1.16 come Friday otherwise, as I'd like to get 1.15.0rc1 out this weekend.

shoyer

LGTM, though I would still be happier if we had a test case ufunc that included an output core dimension (e.g., with signature='(i)->(i)').

shoyer · 2018-06-07T17:08:18Z

numpy/core/src/umath/ufunc_object.c

@@ -827,9 +831,23 @@ get_ufunc_arguments(PyUFuncObject *ufunc,
                case 'a':
                    /* possible axes argument for generalized ufunc */
                    if (out_axes != NULL && strcmp(str, "axes") == 0) {
+                        if (out_axis != NULL && *out_axis != NULL) {
+                            PyErr_SetString(PyExc_RuntimeError,


Why is this RuntimeError here but TypeError above (in override.c)? I would expect both to give the same error (probably TypeError).

That was for symmetry with sig and signature, but I must admit a TypeError makes much more sense, so I've changed it.

It is only allowed for gufuncs with a single, shared core dimension.

Both in the general documentation and in the release notes.

Much faster, and much less code duplication than I feared.

mhvk · 2018-06-07T18:39:22Z

@shoyer - now that I've been writing other gufuncs in _umath_tests.c.src, it was much easier to just add a new function with the signature you requested, (i)->(i) - so I did and added some tests.

shoyer

This looks good to me. Thanks for adding the cumsum tests!

mhvk commented Apr 30, 2018

View reviewed changes

mhvk added this to the 1.15.0 release milestone Apr 30, 2018

mhvk commented Apr 30, 2018

View reviewed changes

mhvk force-pushed the gufunc-axis-and-keepdims branch from 1ba5e99 to 4c98f44 Compare April 30, 2018 18:08

mhvk mentioned this pull request Apr 30, 2018

DO NOT MERGE: Add keepdims argument for generalized ufuncs. #11019

Closed

mhvk force-pushed the gufunc-axis-and-keepdims branch from 4c98f44 to 7e8763e Compare April 30, 2018 20:18

shoyer reviewed Apr 30, 2018

View reviewed changes

mhvk commented Apr 30, 2018

View reviewed changes

eric-wieser reviewed May 1, 2018

View reviewed changes

mhvk force-pushed the gufunc-axis-and-keepdims branch from 7e8763e to 1842a95 Compare May 1, 2018 16:45

mhvk force-pushed the gufunc-axis-and-keepdims branch 2 times, most recently from 7f9032e to 83bbb65 Compare May 2, 2018 15:32

charris added 01 - Enhancement component: numpy._core labels May 5, 2018

mhvk force-pushed the gufunc-axis-and-keepdims branch from 83bbb65 to 93297e0 Compare May 11, 2018 23:25

eric-wieser reviewed May 12, 2018

View reviewed changes

eric-wieser reviewed Jun 4, 2018

View reviewed changes

mhvk mentioned this pull request Jun 4, 2018

MAINT: Split ufunc into two subtypes, and remove methods where innapropriate #8868

Closed

mhvk force-pushed the gufunc-axis-and-keepdims branch 2 times, most recently from 0283828 to 93d4b6f Compare June 4, 2018 16:59

mhvk mentioned this pull request Jun 4, 2018

BUG: do not decref borrowed refs for extobj and axes #11244

Closed

mhvk force-pushed the gufunc-axis-and-keepdims branch 2 times, most recently from cf5eaa6 to 580399d Compare June 6, 2018 14:24

mhvk force-pushed the gufunc-axis-and-keepdims branch from 580399d to 5a1fd86 Compare June 6, 2018 16:37

shoyer approved these changes Jun 7, 2018

View reviewed changes

mhvk added 6 commits June 7, 2018 14:37

ENH: add "axis" argument to generalized ufuncs.

e782814

It is only allowed for gufuncs with a single, shared core dimension.

DOC: Describe new axis argument.

2abef6c

Both in the general documentation and in the release notes.

MAINT: let ufunc override reject passing in both axis and axes.

2d67af5

MAINT: Interpret gufunc axis directly rather than construct axes.

f004493

Much faster, and much less code duplication than I feared.

DOC: Add release note about __array_ufunc__.

c5ef411

TST: Add a new gufunc with (i)->(i) signature for testing with axis.

b02fa32

mhvk force-pushed the gufunc-axis-and-keepdims branch from 5a1fd86 to b02fa32 Compare June 7, 2018 18:38

shoyer approved these changes Jun 7, 2018

View reviewed changes

charris merged commit 57d17c3 into numpy:master Jun 7, 2018

mhvk deleted the gufunc-axis-and-keepdims branch June 8, 2018 00:42

mhvk mentioned this pull request Jul 2, 2018

Feature request: axis and keepdims for gufuncs (and use it in np.median) #8810

Closed

7 tasks


		.. versionadded:: 1.15

		A single axis over which a generalized ufunc should operate. This is a

Uh oh!

ENH: Implement axis for generalized ufuncs. #11018

ENH: Implement axis for generalized ufuncs. #11018

Uh oh!

Conversation

mhvk commented Apr 30, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

eric-wieser May 1, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

eric-wieser May 1, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

mhvk commented May 1, 2018

Uh oh!

mhvk commented May 2, 2018

Uh oh!

mhvk commented May 11, 2018

Uh oh!

charris commented May 12, 2018

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

mhvk commented Apr 30, 2018 •

edited

Loading

eric-wieser May 1, 2018 •

edited

Loading

eric-wieser May 1, 2018 •

edited

Loading

eric-wieser May 12, 2018 •

edited

Loading

eric-wieser May 14, 2018 •

edited

Loading