-
-
Notifications
You must be signed in to change notification settings - Fork 10.8k
ENH: Implement axis for generalized ufuncs. #11018
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
numpy/core/src/umath/ufunc_object.c
Outdated
if (retval < 0) { | ||
goto fail; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The change from goto fail
to return
here and below is because in fail
I now need to decref axes (since it might be a new object created from axis
). Fortunately, we have not taken any references here (and below), so it is safe to just return.
numpy/core/src/umath/ufunc_object.c
Outdated
/* | ||
* If keepdims was passed in (and thus changed from the initial value | ||
* on top), check the gufunc is suitable, i.e., that: | ||
* - its input arguments share a single core dimension; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is actually not strictly necessary (and checked); signatures like (i),(j)->()
are fine for keepdims
. Will change after having had further review.
1ba5e99
to
4c98f44
Compare
4c98f44
to
7e8763e
Compare
that act on a single, shared core dimension such as the inner product example | ||
above, one can pass an ``axis`` argument. This is equivalent to passing in | ||
``axes`` with identical entries for all arguments with that core dimension | ||
(e.g., for the example above, ``axes=[(axis,), (axis,)]``). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be nice to clarify (and test!) it this also works if the output has the same dimension. For example, a signature of (i)->(i)
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, indeed. The problem is that the _umath_test
module does not contain any example of a gufunc with such a signature, and I don't feel quite up to writing one... (sadly, linalg
also does not have one). Note, though, that the code path does get tested - with keepdims=True
, any no-core arguments do get upgraded to having a single core dimension.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
p.s. Of course this isn't a reason to clarify the documentation! Indeed, the keepdims
case itself might be useful for that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Now added a bit more text to keepdims
to clarify this.
doc/source/reference/ufuncs.rst
Outdated
|
||
If this is set to `True`, an axis which is reduced will be left in the | ||
result as a dimension with size one. With this option, the result will | ||
correctly against the inputs. This option can only be used for generalized |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note to myself: missing broadcast
and this incorrectly states the single dimension needs to be shared.
numpy/core/src/umath/ufunc_object.c
Outdated
@@ -565,7 +565,9 @@ get_ufunc_arguments(PyUFuncObject *ufunc, | |||
PyObject **out_typetup, | |||
int *out_subok, | |||
PyArrayObject **out_wheremask, | |||
PyObject **out_axes) | |||
PyObject **out_axes, | |||
PyObject **out_axis, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A comment with the python type would be nice for this and the above line
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You mean something like
# list of tuples of specific axes for each argument
# axis to use for a shared, single core dimension
Am not sure it will help much to have that here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, something like that. Maybe even just /* type: T */
and /* type: List[Tuple[T]] */
, adopting PEP484 syntax (and considering @shoyer's remarks about non-int types
numpy/core/src/umath/ufunc_object.c
Outdated
"'keepdims' must be a boolean"); | ||
goto fail; | ||
} | ||
*out_keepdims = (value == Py_True); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What if value
is np.false_
? How do we handle this in subok
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I simply copied the stanza from subok
, but agree it is somewhat odd. I'd prefer to leave it for a quick MAINT
PR, though.
numpy/core/src/umath/ufunc_object.c
Outdated
PyObject **out_axes) | ||
PyObject **out_axes, | ||
PyObject **out_axis, | ||
int *out_keepdims) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Any reason not to use a npy_bool
for this and out_subok
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the main routine, it is int
, though, and I use -1
as a value to indicate "unitialized". Overall, perhaps best done in separate cleanup.
numpy/core/src/umath/ufunc_object.c
Outdated
*/ | ||
if (keepdims != -1) { | ||
for (i = 0; i < nop; i++) { | ||
if (ufunc->core_num_dims[i] != (i < nin? 1 : 0)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would prefer to see the test separate from the error message. Something like bool _ufunc_supports_keepdims(ufunc)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, that makes sense. I'll do the same for axis
(even though it is a trivial one).
numpy/core/src/umath/ufunc_object.c
Outdated
@@ -2260,7 +2317,7 @@ PyUFunc_GeneralizedFunction(PyUFuncObject *ufunc, | |||
*/ | |||
iter_ndim = broadcast_ndim; | |||
for (i = nin; i < nop; ++i) { | |||
iter_ndim += ufunc->core_num_dims[i]; | |||
iter_ndim += keepdims ? 1 : ufunc->core_num_dims[i]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is keepdims
being added in the loop? That seems wrong to me
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It does need to be there, so that the iterator can allocate dimensions in the correct order (see comment just above this piece of code)
numpy/core/tests/test_ufunc.py
Outdated
@@ -708,6 +708,58 @@ def test_axes_argument(self): | |||
assert_raises(ValueError, mm, z[1], z, axes=[0, 1]) | |||
assert_raises(ValueError, mm, z, z, out=z[0], axes=[0, 1]) | |||
|
|||
def test_keepdims_argument(self): | |||
# inner1d signature: '(i),(i)->()' | |||
in1d = umt.inner1d |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A little misleading for someone grep
ing for in1d
and looking for np.in1d
numpy/core/src/umath/ufunc_object.c
Outdated
@@ -2456,7 +2497,7 @@ PyUFunc_GeneralizedFunction(PyUFuncObject *ufunc, | |||
*/ | |||
core_dim_ixs_size = 0; | |||
for (i = 0; i < nop; ++i) { | |||
core_dim_ixs_size += ufunc->core_num_dims[i]; | |||
core_dim_ixs_size += keepdims && i >= nin ? 1: ufunc->core_num_dims[i]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead of computing this over and over again, why not something like this at the start:
npy_intp *core_num_dims;
if (keepdims) {
// or could just use a static buffer
core_num_dims = alloca(sizeof(*core_num_dims) * NPY_MAX_DIMS);
for (i = 0; i < nin; ++i) {
core_num_dims[i] = ufunc->core_num_dims[i];
}
for(; i < nop; i++) {
core_num_dims[i] = 1;
}
}
else {
core_num_dims = ufunc->core_num_dims;
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, better indeed, especially since all dimensions are just 1.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems appveyor/windows does not like alloca
, so I now just use a static buffer.
numpy/core/src/umath/ufunc_object.c
Outdated
|
||
/* | ||
* If keepdims was passed in (and thus changed from the initial value | ||
* on top), check the gufunc is suitable. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm starting to think that we should have a type hierarchy for gufuncs, rather than making methods available or not based on their properties. Not something to deal with in this PR, but something to consider.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, that makes sense. For that purpose, the separate _check_..._support
functions are also sensible.
numpy/core/src/umath/ufunc_object.c
Outdated
PyErr_Format(PyExc_ValueError, | ||
"%s: %s operand %d does not have enough " | ||
"dimensions (has %d, gufunc core with " | ||
"signature %s requires %d)", | ||
ufunc_get_name_cstr(ufunc), | ||
i < nin ? "Input" : "Output", | ||
ufunc_name, i < nin ? "Input" : "Output", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was better with the two ternaries line wrapped to match up
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, now have all arguments on their own line.
numpy/core/src/umath/ufunc_object.c
Outdated
@@ -2230,7 +2246,9 @@ PyUFunc_GeneralizedFunction(PyUFuncObject *ufunc, | |||
/* Get all the arguments */ | |||
retval = get_ufunc_arguments(ufunc, args, kwds, | |||
op, &order, &casting, &extobj, | |||
&type_tup, &subok, NULL, &axes, &keepdims); | |||
&type_tup, &subok, NULL, &axes, &axis, &keepdims); | |||
/* if given, need our own reference to axes, as we XDECREF it at the end */ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not just leave it alone then? This doesn't seem to justify doing either at the moment.
Edit: Looks like the real reason we need it is we allocate a new axes below
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll edit the comment to state this explicitly.
7e8763e
to
1842a95
Compare
OK, pushed up changes... |
7f9032e
to
83bbb65
Compare
I couldn't resist - I now allow |
83bbb65
to
93297e0
Compare
Rebased to get rid of the merge conflict. Am fairly happy with this and hope it can go in 1.15. |
@eric-wieser @shoyer Be good for the previous reviewers to sign off on this, or not. |
/* finally, ufuncs accept 'sig' or 'signature' normalize to 'signature' */ | ||
return normalize_signature_keyword(*normal_kwds); | ||
return nkwds == 0 ? 0 : normalize_signature_keyword(*normal_kwds); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this optimization might make more sense inside normalize_signature_keyword
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The reason I put it here is that normalize_signature_keyword
gets passed normal_kwds
, which by this time may well have gotten an out
entry even if there were no original keywords.
PyErr_SetString(PyExc_TypeError, | ||
"cannot specify both 'axis' and 'axes'"); | ||
return -1; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should axis
be converted to axes
here, so that __array_ufunc__
only needs to support axes
?
Perhaps a normalize_axis_keyword(*normal_kwds)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In general, I think __array_ufunc__
should just get whatever is passed in, since in principle at least it may act differently than the normal ufunc. In fact, I am somewhat torn about even including the check here; we do just pass on everything else that the user passes in, even if it would give an error message if no __array_ufunc__
is present. But then, we do check sig
and signature
. But then again, that is a deprecation...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Isn't the whole point in these normalization function not to pass on whatever was passed? I'd argue that by not normalizing axis
, we make it harder for overriders to implement it and axes
correctly.
The question really is - is your description of how axis
maps to axes
an invariant of ufuncs, or just one for the specific case of ndarrays? I think that if a short-hand can mean different things for different subclasses, then it's a dangerous shorthand.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As I saw it, the normalization was mostly to ensure all but the inputs were in keyword arguments, so that __array_ufunc__
implementations didn't have to worry about that; sig
and signature
seemed to be there just because sig
is deprecated.
But I can see your argument as well that one should just translate axis
because we should not encourage __array_ufunc__
implementations that can handle axis
but not axes
. I'll try to see how it looks if I split out the conversion of axis
to axes
as a helper function, as you suggest below.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm finding myself slightly stuck: I now have a nice _build_axes_tuple
in ufunc_object.c
, but to use this in override.c
, I'd need to include it in a .h
file -- but in umath
, those seem to contain only stuff to be exported, which this obviously is not. Most logical might be to start a new ufunc_parser.c
or ufunc_helper.c
which contains all these helper functions for parsing, but that feels a bit beyond this PR...
An alternative might be just to remove the override
commit for now, with the logic that __array_ufunc__
implementations should do whatever they want; they'll get an error anyway if they pass this back in.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
but let's make sure it works for arbitrary Python objects, not just integers.
+1 on this. axis='rows'
for a ufunc like (i),(i) -> ()
should end up as axes=[['rows'], ['rows'], []]
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The current normalization does work for arbitrary objects. It just puts axis
in a single-element tuple, and creates a list with that tuple as entries (or an empty one if an argument has no core dimensions).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@mhvk: Can you just put it in numpy\core\src\umath\ufunc_object.h
? Those functions are not exported either.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, putting it in ufunc_object.h
works (I had mistakenly looked in ufuncobject.h
, which does have things for export...), but it has a problem that then we need even more parsing... (I'll move to the main thread, since I think this discussion should not disappear after another change.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, the fact that both of those filenames exist is super confusing...
numpy/core/src/umath/ufunc_object.c
Outdated
PyArrayObject **out_wheremask, | ||
PyObject **out_axes, | ||
int *out_keepdims) | ||
PyObject **out_typetup, /* tuple of dtype */ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: type: Tuple[np.dtype]
would match the mypy-style annotation used below
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done.
numpy/core/src/umath/ufunc_object.c
Outdated
/* When provided, extobj, typetup, and axes contain borrowed references */ | ||
PyObject *extobj = NULL, *type_tup = NULL, *axes = NULL; | ||
/* other possible keyword arguments */ | ||
PyObject *extobj = NULL, *type_tup = NULL, *axes=NULL, *axis = NULL; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: inconsistent spacing. Could break this over multiple lines too.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
added spaces.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we want to add this to the ufunc doc signature template? Arguments against: That template is only really applied to non-gufuncs right now, which don;t support this argument. Making me think that #8868 might be worth picking back up.
|
||
.. versionadded:: 1.15 | ||
|
||
A single axis over which a generalized ufunc should operate. This is a |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"Not supported for non-generalized ufuncs" might be nice. Does np.add([1], [2], axis=0)
fail?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It does say specifically "over which a generalized ufunc should operate. And, yes, an error is raised:
In [2]: np.add(1., 1., axis=0)
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-2-578c404b564c> in <module>()
----> 1 np.add(1., 1., axis=0)
TypeError: 'axis' is an invalid keyword to ufunc 'add'
But this was not tested - now added.
I agree that we should not add |
0283828
to
93d4b6f
Compare
cf5eaa6
to
580399d
Compare
Rebased on top of #11257 - which means |
580399d
to
5a1fd86
Compare
@charris are you comfortable with this going into 1.15? |
It does complement |
@mattip I'm letting the |
I'll probably kick all this off to 1.16 come Friday otherwise, as I'd like to get 1.15.0rc1 out this weekend. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, though I would still be happier if we had a test case ufunc that included an output core dimension (e.g., with signature='(i)->(i)'
).
numpy/core/src/umath/ufunc_object.c
Outdated
@@ -827,9 +831,23 @@ get_ufunc_arguments(PyUFuncObject *ufunc, | |||
case 'a': | |||
/* possible axes argument for generalized ufunc */ | |||
if (out_axes != NULL && strcmp(str, "axes") == 0) { | |||
if (out_axis != NULL && *out_axis != NULL) { | |||
PyErr_SetString(PyExc_RuntimeError, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is this RuntimeError here but TypeError above (in override.c)? I would expect both to give the same error (probably TypeError).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That was for symmetry with sig
and signature
, but I must admit a TypeError
makes much more sense, so I've changed it.
It is only allowed for gufuncs with a single, shared core dimension.
Both in the general documentation and in the release notes.
Much faster, and much less code duplication than I feared.
5a1fd86
to
b02fa32
Compare
@shoyer - now that I've been writing other gufuncs in |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks good to me. Thanks for adding the cumsum
tests!
EDIT: this is now for
axis
only - keepdims has been added.Follow-up on #8819, specifically for generalized ufuncs that are like reductions on a single axis.
EDIT: most of the below has become irrelevant.
As is, the implementation is not super-elegant:EDIT: now quite elegant.keepdims
,checks are added every time the output core dimensions are accessed; it might make more sense to adjust just once.EDIT: this is now solved.axis
,a full(EDIT) currently a separate routine is used from that foraxes
list is constructed rather than directly use the value (an advantage, though, is that an extension to passing in a tuple orNone
foraxis
would be that much easier).axes
, which means that any reinterpretation of what values it can have (currently, only int) would have to be done in two places. This would involve only a trivial refactor, though.But I think this is worth having a careful look. It would be nice to have this in 1.15, together with the
axes
argument.p.s. I tried to keep the commits adding
keepdims
andaxis
separate, as they are somewhat separate enhancements (and it might help review).EDIT: they certainly helped to find an annoying reference counting bug. All seems OK now.