Skip to content

ENH: Implement axis for generalized ufuncs. #11018

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 6 commits into from
Jun 7, 2018

Conversation

mhvk
Copy link
Contributor

@mhvk mhvk commented Apr 30, 2018

EDIT: this is now for axis only - keepdims has been added.

Follow-up on #8819, specifically for generalized ufuncs that are like reductions on a single axis.

EDIT: most of the below has become irrelevant.

As is, the implementation is not super-elegant: EDIT: now quite elegant.

  • For keepdims, checks are added every time the output core dimensions are accessed; it might make more sense to adjust just once. EDIT: this is now solved.
  • For axis, a full axes list is constructed rather than directly use the value (an advantage, though, is that an extension to passing in a tuple or None for axis would be that much easier). (EDIT) currently a separate routine is used from that for axes, which means that any reinterpretation of what values it can have (currently, only int) would have to be done in two places. This would involve only a trivial refactor, though.

But I think this is worth having a careful look. It would be nice to have this in 1.15, together with the axes argument.

p.s. I tried to keep the commits adding keepdims and axis separate, as they are somewhat separate enhancements (and it might help review).

EDIT: they certainly helped to find an annoying reference counting bug. All seems OK now.

if (retval < 0) {
goto fail;
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The change from goto fail to return here and below is because in fail I now need to decref axes (since it might be a new object created from axis). Fortunately, we have not taken any references here (and below), so it is safe to just return.

@mhvk mhvk added this to the 1.15.0 release milestone Apr 30, 2018
/*
* If keepdims was passed in (and thus changed from the initial value
* on top), check the gufunc is suitable, i.e., that:
* - its input arguments share a single core dimension;
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is actually not strictly necessary (and checked); signatures like (i),(j)->() are fine for keepdims. Will change after having had further review.

that act on a single, shared core dimension such as the inner product example
above, one can pass an ``axis`` argument. This is equivalent to passing in
``axes`` with identical entries for all arguments with that core dimension
(e.g., for the example above, ``axes=[(axis,), (axis,)]``).
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be nice to clarify (and test!) it this also works if the output has the same dimension. For example, a signature of (i)->(i).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, indeed. The problem is that the _umath_test module does not contain any example of a gufunc with such a signature, and I don't feel quite up to writing one... (sadly, linalg also does not have one). Note, though, that the code path does get tested - with keepdims=True, any no-core arguments do get upgraded to having a single core dimension.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

p.s. Of course this isn't a reason to clarify the documentation! Indeed, the keepdims case itself might be useful for that.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now added a bit more text to keepdims to clarify this.


If this is set to `True`, an axis which is reduced will be left in the
result as a dimension with size one. With this option, the result will
correctly against the inputs. This option can only be used for generalized
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note to myself: missing broadcast and this incorrectly states the single dimension needs to be shared.

@@ -565,7 +565,9 @@ get_ufunc_arguments(PyUFuncObject *ufunc,
PyObject **out_typetup,
int *out_subok,
PyArrayObject **out_wheremask,
PyObject **out_axes)
PyObject **out_axes,
PyObject **out_axis,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A comment with the python type would be nice for this and the above line

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You mean something like

# list of tuples of specific axes for each argument
# axis to use for a shared, single core dimension

Am not sure it will help much to have that here.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, something like that. Maybe even just /* type: T */ and /* type: List[Tuple[T]] */, adopting PEP484 syntax (and considering @shoyer's remarks about non-int types

"'keepdims' must be a boolean");
goto fail;
}
*out_keepdims = (value == Py_True);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What if value is np.false_? How do we handle this in subok?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I simply copied the stanza from subok, but agree it is somewhat odd. I'd prefer to leave it for a quick MAINT PR, though.

PyObject **out_axes)
PyObject **out_axes,
PyObject **out_axis,
int *out_keepdims)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any reason not to use a npy_bool for this and out_subok?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the main routine, it is int, though, and I use -1 as a value to indicate "unitialized". Overall, perhaps best done in separate cleanup.

*/
if (keepdims != -1) {
for (i = 0; i < nop; i++) {
if (ufunc->core_num_dims[i] != (i < nin? 1 : 0)) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would prefer to see the test separate from the error message. Something like bool _ufunc_supports_keepdims(ufunc)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, that makes sense. I'll do the same for axis (even though it is a trivial one).

@@ -2260,7 +2317,7 @@ PyUFunc_GeneralizedFunction(PyUFuncObject *ufunc,
*/
iter_ndim = broadcast_ndim;
for (i = nin; i < nop; ++i) {
iter_ndim += ufunc->core_num_dims[i];
iter_ndim += keepdims ? 1 : ufunc->core_num_dims[i];
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is keepdims being added in the loop? That seems wrong to me

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It does need to be there, so that the iterator can allocate dimensions in the correct order (see comment just above this piece of code)

@@ -708,6 +708,58 @@ def test_axes_argument(self):
assert_raises(ValueError, mm, z[1], z, axes=[0, 1])
assert_raises(ValueError, mm, z, z, out=z[0], axes=[0, 1])

def test_keepdims_argument(self):
# inner1d signature: '(i),(i)->()'
in1d = umt.inner1d
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A little misleading for someone greping for in1d and looking for np.in1d

@@ -2456,7 +2497,7 @@ PyUFunc_GeneralizedFunction(PyUFuncObject *ufunc,
*/
core_dim_ixs_size = 0;
for (i = 0; i < nop; ++i) {
core_dim_ixs_size += ufunc->core_num_dims[i];
core_dim_ixs_size += keepdims && i >= nin ? 1: ufunc->core_num_dims[i];
Copy link
Member

@eric-wieser eric-wieser May 1, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of computing this over and over again, why not something like this at the start:

npy_intp *core_num_dims;
if (keepdims) {
    // or could just use a static buffer
    core_num_dims = alloca(sizeof(*core_num_dims) * NPY_MAX_DIMS);
    for (i = 0; i < nin; ++i) {
         core_num_dims[i] = ufunc->core_num_dims[i];
    }
    for(; i < nop; i++) {
        core_num_dims[i] = 1;
    } 
}
else {
    core_num_dims = ufunc->core_num_dims;
}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, better indeed, especially since all dimensions are just 1.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems appveyor/windows does not like alloca, so I now just use a static buffer.


/*
* If keepdims was passed in (and thus changed from the initial value
* on top), check the gufunc is suitable.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm starting to think that we should have a type hierarchy for gufuncs, rather than making methods available or not based on their properties. Not something to deal with in this PR, but something to consider.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, that makes sense. For that purpose, the separate _check_..._support functions are also sensible.

PyErr_Format(PyExc_ValueError,
"%s: %s operand %d does not have enough "
"dimensions (has %d, gufunc core with "
"signature %s requires %d)",
ufunc_get_name_cstr(ufunc),
i < nin ? "Input" : "Output",
ufunc_name, i < nin ? "Input" : "Output",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was better with the two ternaries line wrapped to match up

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, now have all arguments on their own line.

@@ -2230,7 +2246,9 @@ PyUFunc_GeneralizedFunction(PyUFuncObject *ufunc,
/* Get all the arguments */
retval = get_ufunc_arguments(ufunc, args, kwds,
op, &order, &casting, &extobj,
&type_tup, &subok, NULL, &axes, &keepdims);
&type_tup, &subok, NULL, &axes, &axis, &keepdims);
/* if given, need our own reference to axes, as we XDECREF it at the end */
Copy link
Member

@eric-wieser eric-wieser May 1, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not just leave it alone then? This doesn't seem to justify doing either at the moment.

Edit: Looks like the real reason we need it is we allocate a new axes below

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll edit the comment to state this explicitly.

@mhvk mhvk force-pushed the gufunc-axis-and-keepdims branch from 7e8763e to 1842a95 Compare May 1, 2018 16:45
@mhvk
Copy link
Contributor Author

mhvk commented May 1, 2018

OK, pushed up changes...

@mhvk mhvk force-pushed the gufunc-axis-and-keepdims branch 2 times, most recently from 7f9032e to 83bbb65 Compare May 2, 2018 15:32
@mhvk
Copy link
Contributor Author

mhvk commented May 2, 2018

I couldn't resist - I now allow keepdims for any case in which all inputs have the same number of core dimensions, and all outputs have none. With that, one can use keepdims on things like det (which has signature (m,m)->()). (Note that this is not exposed through the normal linalg interface; I did add tests to check that for the underlying gufunc it works as advertised.)

@mhvk
Copy link
Contributor Author

mhvk commented May 11, 2018

Rebased to get rid of the merge conflict. Am fairly happy with this and hope it can go in 1.15.

@charris
Copy link
Member

charris commented May 12, 2018

@eric-wieser @shoyer Be good for the previous reviewers to sign off on this, or not.

/* finally, ufuncs accept 'sig' or 'signature' normalize to 'signature' */
return normalize_signature_keyword(*normal_kwds);
return nkwds == 0 ? 0 : normalize_signature_keyword(*normal_kwds);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this optimization might make more sense inside normalize_signature_keyword

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The reason I put it here is that normalize_signature_keyword gets passed normal_kwds, which by this time may well have gotten an out entry even if there were no original keywords.

PyErr_SetString(PyExc_TypeError,
"cannot specify both 'axis' and 'axes'");
return -1;
}
Copy link
Member

@eric-wieser eric-wieser May 12, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should axis be converted to axes here, so that __array_ufunc__ only needs to support axes?

Perhaps a normalize_axis_keyword(*normal_kwds)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In general, I think __array_ufunc__ should just get whatever is passed in, since in principle at least it may act differently than the normal ufunc. In fact, I am somewhat torn about even including the check here; we do just pass on everything else that the user passes in, even if it would give an error message if no __array_ufunc__ is present. But then, we do check sig and signature. But then again, that is a deprecation...

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't the whole point in these normalization function not to pass on whatever was passed? I'd argue that by not normalizing axis, we make it harder for overriders to implement it and axes correctly.

The question really is - is your description of how axis maps to axes an invariant of ufuncs, or just one for the specific case of ndarrays? I think that if a short-hand can mean different things for different subclasses, then it's a dangerous shorthand.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As I saw it, the normalization was mostly to ensure all but the inputs were in keyword arguments, so that __array_ufunc__ implementations didn't have to worry about that; sig and signature seemed to be there just because sig is deprecated.

But I can see your argument as well that one should just translate axis because we should not encourage __array_ufunc__ implementations that can handle axis but not axes. I'll try to see how it looks if I split out the conversion of axis to axes as a helper function, as you suggest below.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm finding myself slightly stuck: I now have a nice _build_axes_tuple in ufunc_object.c, but to use this in override.c, I'd need to include it in a .h file -- but in umath, those seem to contain only stuff to be exported, which this obviously is not. Most logical might be to start a new ufunc_parser.c or ufunc_helper.c which contains all these helper functions for parsing, but that feels a bit beyond this PR...

An alternative might be just to remove the override commit for now, with the logic that __array_ufunc__ implementations should do whatever they want; they'll get an error anyway if they pass this back in.

Copy link
Member

@eric-wieser eric-wieser May 14, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

but let's make sure it works for arbitrary Python objects, not just integers.

+1 on this. axis='rows' for a ufunc like (i),(i) -> () should end up as axes=[['rows'], ['rows'], []]

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The current normalization does work for arbitrary objects. It just puts axis in a single-element tuple, and creates a list with that tuple as entries (or an empty one if an argument has no core dimensions).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mhvk: Can you just put it in numpy\core\src\umath\ufunc_object.h? Those functions are not exported either.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, putting it in ufunc_object.h works (I had mistakenly looked in ufuncobject.h, which does have things for export...), but it has a problem that then we need even more parsing... (I'll move to the main thread, since I think this discussion should not disappear after another change.)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, the fact that both of those filenames exist is super confusing...

PyArrayObject **out_wheremask,
PyObject **out_axes,
int *out_keepdims)
PyObject **out_typetup, /* tuple of dtype */
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: type: Tuple[np.dtype] would match the mypy-style annotation used below

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done.

/* When provided, extobj, typetup, and axes contain borrowed references */
PyObject *extobj = NULL, *type_tup = NULL, *axes = NULL;
/* other possible keyword arguments */
PyObject *extobj = NULL, *type_tup = NULL, *axes=NULL, *axis = NULL;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: inconsistent spacing. Could break this over multiple lines too.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added spaces.

Copy link
Member

@eric-wieser eric-wieser left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we want to add this to the ufunc doc signature template? Arguments against: That template is only really applied to non-gufuncs right now, which don;t support this argument. Making me think that #8868 might be worth picking back up.


.. versionadded:: 1.15

A single axis over which a generalized ufunc should operate. This is a
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"Not supported for non-generalized ufuncs" might be nice. Does np.add([1], [2], axis=0) fail?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It does say specifically "over which a generalized ufunc should operate. And, yes, an error is raised:

In [2]: np.add(1., 1., axis=0)
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-2-578c404b564c> in <module>()
----> 1 np.add(1., 1., axis=0)

TypeError: 'axis' is an invalid keyword to ufunc 'add'

But this was not tested - now added.

@mhvk
Copy link
Contributor Author

mhvk commented Jun 4, 2018

I agree that we should not add axes, axis, keepdims to the docstring template, since we currently do not expose any generalized ufunc directly.

@mhvk mhvk force-pushed the gufunc-axis-and-keepdims branch 2 times, most recently from 0283828 to 93d4b6f Compare June 4, 2018 16:59
@mhvk mhvk force-pushed the gufunc-axis-and-keepdims branch 2 times, most recently from cf5eaa6 to 580399d Compare June 6, 2018 14:24
@mhvk
Copy link
Contributor Author

mhvk commented Jun 6, 2018

Rebased on top of #11257 - which means axis now has its own reference too. Should be all set now.

@mhvk mhvk force-pushed the gufunc-axis-and-keepdims branch from 580399d to 5a1fd86 Compare June 6, 2018 16:37
@mattip
Copy link
Member

mattip commented Jun 6, 2018

@charris are you comfortable with this going into 1.15?

@mhvk
Copy link
Contributor Author

mhvk commented Jun 6, 2018

It does complement axes and keepdims...

@charris
Copy link
Member

charris commented Jun 7, 2018

@mattip I'm letting the __array_ufunc__ folks drive this train as I am not a user of that functionality. I'd like @shoyer to sign off on it though. I think @eric-wieser is OK with it now?

@charris
Copy link
Member

charris commented Jun 7, 2018

I'll probably kick all this off to 1.16 come Friday otherwise, as I'd like to get 1.15.0rc1 out this weekend.

Copy link
Member

@shoyer shoyer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, though I would still be happier if we had a test case ufunc that included an output core dimension (e.g., with signature='(i)->(i)').

@@ -827,9 +831,23 @@ get_ufunc_arguments(PyUFuncObject *ufunc,
case 'a':
/* possible axes argument for generalized ufunc */
if (out_axes != NULL && strcmp(str, "axes") == 0) {
if (out_axis != NULL && *out_axis != NULL) {
PyErr_SetString(PyExc_RuntimeError,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this RuntimeError here but TypeError above (in override.c)? I would expect both to give the same error (probably TypeError).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That was for symmetry with sig and signature, but I must admit a TypeError makes much more sense, so I've changed it.

@mhvk mhvk force-pushed the gufunc-axis-and-keepdims branch from 5a1fd86 to b02fa32 Compare June 7, 2018 18:38
@mhvk
Copy link
Contributor Author

mhvk commented Jun 7, 2018

@shoyer - now that I've been writing other gufuncs in _umath_tests.c.src, it was much easier to just add a new function with the signature you requested, (i)->(i) - so I did and added some tests.

Copy link
Member

@shoyer shoyer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good to me. Thanks for adding the cumsum tests!

@charris charris merged commit 57d17c3 into numpy:master Jun 7, 2018
@mhvk mhvk deleted the gufunc-axis-and-keepdims branch June 8, 2018 00:42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants