Skip to content

ENH: Implement axes keyword argument for gufuncs. #8819

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 3 commits into from
Feb 28, 2018

Conversation

mhvk
Copy link
Contributor

@mhvk mhvk commented Mar 23, 2017

EDIT (2017-11-07): went with the least controversial solution, which is easy to extend if we wish. To avoid confusion with the normal use of axis, I went with an axes keyword, which nominally should be a list with length equal to the total number of operands (inputs & outputs), with each entry a tuple with length equal to the number of core dimensions of that operand, containing axis indices. I still allow two short-cuts that did not seem controversial:

  • If none of the outputs have core dimensions, the list can contain axes for just the inputs (since there is nothing to set).
  • If an operand has only one core dimension, one can pass in an integer instead of a tuple.

Previous attempt to resummarize, with above conclusions added in:

In general, axis should be a list with length equal to either the number of inputs or the total number of operands (inputs & outputs). Each entry should be a listtuple with length equal to the number of core dimensions of that operand, containing axis indices. Some shortcuts may be handy.

It is probably easiest to see for a given signature. Assuming this is (ij),(j)->(i):

  1. The defaults would correspond to axis=[(-2, -1), (-1,), (-1,)]; still true
  2. One can omit the output(s), so axis=[(-2, -1), (-1,)] is the same; not generally any more; only if output has no core dimensions
  3. a single integer in the list is OK if the operand has only a single core dimension, so one could also write [(-2, -1), -1, -1]; still true
  4. If the signature has only one core dimension, then a simple integer can replace the outer list, so for a signature (i),(i)->() one could just pass axis=-1 (this is not meant to cover (i),(j)->()). not implemented; maybe eventually have axis argument

Currently under discussion:

  • Name this axes so we can use axis as shortcut in wrappers that prepare axes. DONE
  • Somewhat related, remove the single-integer shortcut for the list (4) for now. DONE
  • Possibly also remove the single-integer shortcut for a single operand (3)? no, kept
  • Multi-dispatch ufuncs, determined by args, out, and/or axis should be safe now
    important mostly as the implementation adopted here should not create ambiguity resolving dispatches in future
    • useful for matmul and other ufuncs that right now have a special case for 1d vectors. For example, multiplication:
      • axis=[(-2,-1), (-2,-1)] → matrix-matrix
      • axis=[(-1,), (-2, -1)] → vector-matrix
      • axis=[(-2, -1), (-1,)] → matrix-vector
    • could potentially implement optional arguments

Ideas discarded:

  • None as a replacement for the default for a given operand; reasons:
    • keep None for possible other use
    • None elsewhere means "all axes" - a different meaning would be confusing
    • this would preclude possibly having multiple signatures, among which axes could choose.
  • list of lists instead of list of tuples (to keep tuples for using multiple axes to represent one); reason: too ugly.

NOTE: needs documentation before merging.

@@ -1205,6 +1216,10 @@ get_ufunc_arguments(PyUFuncObject *ufunc,
*out_extobj = NULL;
Py_XDECREF(*out_typetup);
*out_typetup = NULL;
if (out_axis != NULL) {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is by analogy of the others, but seems actually wrong, since the reference is borrowed.

@@ -2013,7 +2030,8 @@ PyUFunc_GeneralizedFunction(PyUFuncObject *ufunc,
/* The sizes of the core dimensions (# entries is ufunc->core_num_dim_ix) */
npy_intp *core_dim_sizes = inner_dimensions + 1;
int core_dim_ixs_size;

/* swapping around of axis (TODO: allocate this when needed) */
int remap_axis[NPY_MAXARGS][NPY_MAXDIMS];
Copy link
Contributor Author

@mhvk mhvk Mar 23, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not sure whether it is better to malloc these 32*32 elements; same holds for op_axes_arrays above.

case 'a':
/* possible axis argument for generalized ufunc */
if (strcmp(str, "axis") == 0 && ufunc->core_enabled) {
*out_axis = value;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This doesn't work if out_axis is NULL

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That was captured by the ufunc->core_enabled, but I don't know why I didn't just go for the clarity of checking that out_axis != NULL (which is also done for where).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, putting the checks in the other order should be faster

* possibly remap axes.
*/
int axis_rqst;
int axis_list = PyList_Check(axis);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe axis on np.sum actually only allows tuples, not lists -- it would be good to be consistent.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wouldn't the code be a little more readable if this variable was called axis_is_list?

if (axis_tuple) {
rqst = (int)PyInt_AsSsize_t(PyTuple_GET_ITEM(axis_item,
iax));
if (rqst == -1 && PyErr_Occurred()) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think there's an error_converting(rqst) function in common.h that we'd normally use here

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, nice. The same pattern is all over the file...

/*
* possibly remap axes.
*/
int axis_rqst;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not Py_Ssize_t?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mostly since your check_and_adjust_axis seems to want an int. I'm eternally confused about these C types...

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ugh, you're right. I did that because that's what we used elsewhere in the public API, not because it was the right decision.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I may be a little influenced by having to follow this in my daily work, but wouldn't things be more readable if this was requested_axis, or axis_for_all, or something more descriptive?

@@ -1025,6 +1029,13 @@ get_ufunc_arguments(PyUFuncObject *ufunc,
}

switch (str[0]) {
case 'a':
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a reason we're not using PyArg_ParseTupleAndKeywords here? Out of scope for this PR, but still

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably speed – I haven't checked if this particular case makes a meaningful difference, but the ufunc call path is a place where microseconds count.

Copy link
Member

@charris charris Mar 23, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Switch statements actually compile as hardwired binary search, so for small lists of possibilities the savings probably don't amount to much. I think the main virtues are compactness, clarity, and fall through, which can be used to select code entry points. The last is probably frowned upon by structured code fanatics...

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If speed really matters here, then presumably we could save a little more with strcmp(str+1, "xis") on the following lines...

Either way, I think this probably needs a comment explaining that we're avoiding PyArg_ParseTupleAndKeywords for speed reasons.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We didn't add the comment, but I'm not going to demand it before merging.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think a comment on the structure is a bit out-of-scope of this PR (and I don't really know what the original reason was anyway...)

@mhvk
Copy link
Contributor Author

mhvk commented Mar 24, 2017

@shoyer (commenting in main thread, since a good point): For reduce operations the usage of axis is a bit different: one can tell to go over multiple rather than a single axis. In contrast, here we want to define where the core dimensions of the gufunc are to be found.

Now a counterargument is that np.sum (and any reduce operation) could be seen as a gufunc with signature (i)->(). In that respect, what the tuple does is, effectively, combining those axes (with None combining all).

But perhaps that argues that the items inside the list should also be lists, so that we keep open the option of using tuples to define multiple operand axes as a single core axis. I'll do that...

@eric-wieser
Copy link
Member

eric-wieser commented Mar 24, 2017

But perhaps that argues that the items inside the list should also be lists, so that we keep open the option of using tuples to define multiple operand axes as a single core axis. I'll do that...

How about using sets for that meaning? None of the reducer functions care about the ordering of their operand axes, right?

Alternatively: I'd be tempted to allow the reducers to be defined as (i...), which explicitly allows this dimensionality reduction (and allows keepdims to restore them). Then the caller can just pass a tuple of indices to this as they would (i,j,k)

Right now, do gufuncs incur a copy if their non-core dimensions are not contiguous?

@mhvk
Copy link
Contributor Author

mhvk commented Mar 24, 2017

@eric-wieser - A set would be logical, but I think @shoyer's point was in part that for np.sum, etc., a tuple is already used.

"axis list element %d should be a tuple "
"with %d elements or None", i, num_dims);
"axis item %d should be a list with %d "
"elements or None", i, num_dims);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This can be mistakenly read as "a list with no elements". A comma would help, or so would swapping the order

@mhvk
Copy link
Contributor Author

mhvk commented Mar 24, 2017

@eric-wieser - I'm not sure I understand what you mean by the "reducer". Would this be in the signature?

For the non-core dimensions, an iterator is used, so no copy is made (strides are passed to the inner loop, so no copy is made there either).

@eric-wieser
Copy link
Member

@eric-wieser - I'm not sure I understand what you mean by the "reducer". Would this be in the signature?

I mean the functions with a signature ending in ->(), which are also the only ones that should allow keepdims IMO. I guess reduce is a bad word, because argmin is not a reduce operation.

Yes, (i...) would be the literal signature. This would need special handling in the inner loop, since that would now need to iterate over a variable number of strides.

@mhvk
Copy link
Contributor Author

mhvk commented Mar 24, 2017

I think I'd prefer to keep discussion of the signature out of this PR -- it is complicated enough as it is!

I also think you're right about keepdims making sense only for an output with no core dimensions (and probably only when all inputs have the same core dimensions).

I'm also wondering if, rather than special-case so much, it is better to go with @njsmith's suggestion of having axis for the simple case of just one core dimension, and axes always requiring full details.

@eric-wieser
Copy link
Member

I think I'd prefer to keep discussion of the signature out of this PR -

My point here is that defining the signature in this way would let us use tuples for the inner axis argume without ambiguity

  • [(1, 2), 3] for (m,n)->(n), as previously proposed
  • (1, 2) for (m,n)->(), as previously proposed
  • (1, 2) for (m...)->(), which would implement sum(axis=(1,2))

@mhvk
Copy link
Contributor Author

mhvk commented Mar 24, 2017

Hmm, I had thought that, really, the np.sum one corresponded to treating several axes as a single one. And that, thus, it would be allowed for any input core dimension to specify multiple axes. But at some level that can already be done by an appropriate reshape inside the actual call (though that argument also holds for np.sum...). I'll think a bit more.

@eric-wieser
Copy link
Member

it would be allowed for any input core dimension to specify multiple axes

I'm struggling to think of a case when this is useful and not just cryptic.

can already be done by an appropriate reshape inside the actual call (though that argument also holds for np.sum...)

Except we don't want to do that, (and I think currently don't?), because reshape would cause a copy. We need an nd-iter at that point, and the ufunc machinery starts to fall apart

@mhvk
Copy link
Contributor Author

mhvk commented Mar 24, 2017

But .reduce can go over multiple axes. Looking at the reduction.c, I found this gem:

 * TODO FIXME: if you squint, this is essentially an second independent
 * implementation of generalized ufuncs with signature (i)->(), plus a few
 * extra bells and whistles. (Indeed, as far as I can tell, it was originally
 * split out to support a fancy version of count_nonzero... which is not
 * actually a reduction function at all, it's just a (i)->() function!) So
 * probably these two implementation should be merged into one. (In fact it
 * would be quite nice to support axis= and keepdims etc. for arbitrary
 * generalized ufuncs!)

(no surprise now that @njsmith has been bemused by our discussion...)

Copy link
Member

@jaimefrio jaimefrio left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a lot for doing this, Marten! This is some of the most forsaken NumPy code, so it's great to see more people figuring it out.

I commented inline on a couple of things I don't fully like of the way axis= has to be specified. On top of that comment on operands with no core dimensions, and of making axes for output operands optional, I am a little wary of reusing a single integer axis for several operands, and lean more towards only accepting an integer if there is a single operand with core dimensions, and it happens to have a single core dimension, i.e. axis=1 would be ok for (i),()->(), but not for (i),(i)->() of for (i)->(i). Do you have a use case in mind where reusing that single value would be a good thing with signatures like these last two examples?

I think this whole matter is confusing enough that we shouldn't approve this without explicitly writing out some form of specification and putting it through some formal approval, probably in the mailing list. It would be tragic if we missed some relevant corner case, only find out in a couple of releases, and end up stuck with a suboptimal behavior because of backwards compatibility.

I think it was @shoyer that hashed out a couple of years ago a pretty comprehensive document of how this should work, and even implemented a Python parser? Not sure where to find it, but it would be good to compare it to this and see if there are any major discrepancies.

* possibly remap axes.
*/
int axis_rqst;
int axis_list = PyList_Check(axis);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wouldn't the code be a little more readable if this variable was called axis_is_list?

*/
int axis_rqst;
int axis_list = PyList_Check(axis);
int item_list = 0;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similarly here item_is_list reads easier below for me.

} else {
rqst = axis_rqst; /* use same axis for all operands */
}
for (iax = num_dims - 1; iax >= 0; --iax) {
Copy link
Member

@jaimefrio jaimefrio Mar 26, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this logic is unnecessarily complicated. And unless I am missing something, it doesn't check that there are no repeated items in item, does it? Rather than shuffling entries in remap_axis around, you could fill it directly doing something like:

int have_seen_axis[NPY_NUM_DIMS] = {0};

for (j = 0; j < num_dims; ++j) {
    if (item_list) {
        rqst = (int)PyInt_AsSsize_t(PyList_GET_ITEM(item, j));
        if (error_converting(rqst)) {
            retval = -1;
            goto fail;
        }
    }
    if (check_and_adjust_axis(&rqst, op_ndim) < 0) {
        retval = -1;
        goto fail;
    }
    if (have_seen_axis[rqst]) {
        PyErr_Format(PyExc_ValueError,
                     "axis item %d has value %d repeated", i, rqst);
    }
    have_seen_axis[rqst] = 1;
    remap_axis[i][op_ndim - num_dims + j] = rqst;
}
iax = 0;
for (j = 0; j < op_ndim - num_dims; ++j) {
    while (have_seen_axis[iax]) {
        ++iax;
    }
    remap_axis[i][j] = iax;
}

}
for (i = 0; i < nop; ++i) {
int op_ndim, iax, rqst, dflt;
int num_dims = ufunc->core_num_dims[i];
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Again, num_core_dims or core_ndim would be easier on my brain below, where I have to keep remembering that these are only the core dimensions...

op_ndim = broadcast_ndim + ufunc->core_num_dims[i];
}
/* initialize remap to straight one to one */
for (j = 0; j < op_ndim; j++) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you follow my suggestion below, this loop is no longer needed.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just realized this is still needed for when item is None...

/*
* possibly remap axes.
*/
int axis_rqst;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I may be a little influenced by having to follow this in my daily work, but wouldn't things be more readable if this was requested_axis, or axis_for_all, or something more descriptive?

@mhvk
Copy link
Contributor Author

mhvk commented Mar 28, 2017

@jaimefrio - thanks for the comments! Your way to fill remap_axis is much better; I now keep a flag for whether remapping is necessary at all, so I could just use it verbatim. (Well, with better names, also following your suggestions.)

I also agree with you that this better go past the mailing list! To help that, I've started writing some test cases (now included) and will add documentation. I'm still pondering whether to try to get keepdims working before bothering people.

On the syntax: I thought a single axis still work for a (i),(i)->() or (i),(i)->(i) signature. For instance, I could imagine using the new all_equal on vectors, where the two inputs would not necessarily have xyz in the last dimension.

To all those still interested: I went back to a list of tuples, which I find much clearer (perhaps it should even be a tuple of tuples), and I ended up not seeing how we could ever sensibly support turning multiple axes into one anyway.

@shoyer
Copy link
Member

shoyer commented Mar 28, 2017

Responses to @mhvk's first and latest posts are below:

In general, axis should be a list with length equal to either the number of inputs or the total number of operands (inputs & outputs).

Instead, I might say: axis should be a list with length equal to the number of inputs with a core dimension, or the total number of inputs and outputs with core dimensions.

The following shortcuts are supported (to be confirmed with mailing list): an entry in the (outer) list of None is taken to imply the default

I don't see much point in this, unless you have a large number of core dimensions. Writing -1 is more explicit and requires less text, too.

Alternatively, we could use None as a sentinel for "all dimensions", which is harder to write (i.e., tuple(range(-array.ndim, 0, 1)), which took me some experimentation to figure out).

an integer is OK if the operand has only a single core dimension.

Yes, this makes sense.

Furthermore, if the signature has only one core dimension, then a simple integer can replace the outer list.

By one core dimension, do you mean one core dimension on exactly one argument?

On the syntax: I thought a single axis still work for a (i),(i)->() or (i),(i)->(i) signature.

Do you still think this works? I'm not sure how. "Broadcasting" an integer axis over multiple arguments seems like a slippery road to go down.


I think we have discussed this before, but I wanted to bring up again that one use of an axis argument is to control dispatching to gufunc kernels, even for higher dimensional inputs.

e.g., for matmul:

  • np.matmul(x, y, axis=[(0,), (0,)] could dispatch to the kernel (n),(m)->()
  • np.matmul(x, y, axis=[(0, 1), (0,)] could dispatch to the kernel (n,m),(m)->(n)
  • np.matmul(x, y, axis=[(0,), (0, 1)] could dispatch to the kernel (n),(n,m)->(m)

We don't need to support this yet, but we should be sure we don't do anything that precludes it.


For reference, the Python parser I wrote for handling gufunc axis argument can be found here: http://nbviewer.jupyter.org/gist/shoyer/7740d32850084261d870

I'm not sure how helpful this is, but probably it's still a good idea to prototype the logic for axis canonicalization in Python.

@eric-wieser
Copy link
Member

Instead, I might say: axis should be a list with length equal to the number of inputs with a core dimension, or the total number of inputs and outputs with core dimensions.

I'm not sure - I think it's easier to understand when there's a 1-to-1 mapping between arguments and axes - even if it comes at the cost of having to specify () for arguments with no core dimensions

I don't see much point in this, unless you have a large number of core dimensions. Writing -1 is more explicit and requires less text, too. [...] Alternatively, we could use None as a sentinel for "all dimension".

+1 from me on this. For now, let's just reserve None for that use, rather than actually implementing it.

On the syntax: I thought a single axis still work for a (i),(i)->() or (i),(i)->(i) signature.

The former would be something like all_equal, and the latter np.cross? I guess that seems reasonable.

I'm not sure how helpful this is, but probably it's still a good idea to prototype the logic for axis canonicalization in Python.

Can we just do the canonicalization in _internal.py? Presumably we'll want to pass a canonicalized axis to __array_ufunc__ anyway, so need to go via the full list of tuples

@mhvk
Copy link
Contributor Author

mhvk commented Mar 28, 2017

@shoyer, @eric-wieser - OK, I agree, let's reserve None for possible different use. I now removed that from the code. I agree with Eric that an empty tuple for an operand with no core dimensions is cleaner (note that in many cases, this will be the output, which, following @jaimefrio's suggestion, can now be omitted).

@shoyer - did the examples from @eric-wieser and myself convince you it makes sense to keep the single-integer case as applying to any single core dimension?

I very much like the idea of being able to use the axis as a way to select the signature in, e.g., matmul, even though that would quite obviously need a much more major intervention (since inside PyUFunc_GeneralizedFunction, I'd seem stuck with a single signature ufunc). But I don't think the current API prevents this from happening at a later stage.

I do agree that, in principle, one could move the translation from integer to single-element tuple (or set of single-element tuples) outside, although this does mean that ufunc_override.c would have to become rather uncomfortably aware of ufunc internals. Anyway, I would propose to leave this for later... (but do have a look at #8247 -- which (re)implements __array_ufunc__ -- if you want to think about the interaction).

@shoyer
Copy link
Member

shoyer commented Mar 28, 2017

@shoyer - did the examples from @eric-wieser and myself convince you it makes sense to keep the single-integer case as applying to any single core dimension?

I think I missed those examples -- can you point to them again?

One option that I believe @njsmith and I discussed previously is is to have two separate keyword arguments:

  • axes for providing the full axes specification in long form
  • axis for specifying an integer axis.

I would actually lean toward only making axes part of the gufunc specification, and making handling of an axis argument function specific, e.g., if we wanted the core of numpy.cumsum to use a gufunc, we could write:

def cumsum(array, axis):
    axes = [(axis,), (axis,)]
    # cumsum_impl is a gufunc with signature (n)->(n)
    return cumsum_impl(array, axes=axes)

This allows for the simplifying logic of an axis argument when it makes sense, but keeps the implementation specific to each function that uses it.

The only slightly confusing part about this is that implementors of __array_ufunc__ would need to check for cumsum_impl instead of cumsum if they override it's functionality.

@mhvk
Copy link
Contributor Author

mhvk commented Mar 28, 2017

@shoyer - the case for allowing axis=i for any set of single core-dimension signatures, is that for the examples we could come up with, it seemed obvious what was the intent:

  • (i),()->(): we all agree
  • (i),(i)->(): for something like all_equal or inner1d it would make sense certainly if one knows one is passing in data with the same structure (e.g., vector arrays where the xyz dimension is 0 rather than -1).
  • (i),(i)->(i): same, e.g., a cross-product of the above vectors.

I can see the appeal of axis vs axes, though, with each fairly strictly defined. I mostly ended up not (yet) doing it as it seemed not quite necessary. But if, overall, people feel this whole PR is going in the right direction, and it would be better with axis and axes, I'm happy to implement it. (of course, PRs to my PR are even more welcome...)

I do wonder a bit about keepdims. Earlier it was suggested to only allow it for the case of a single core dimension, but that would remove, e.g., the possibility to let a matrix determinant ufunc with signature (i,i)->() return an array with shape ending in (1,1). One might just more generally add 1 as necessary to match the longest input core dimension.

}
else {
Py_INCREF(op_axes_tuple);
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See this outdated comment on not doing tuple promotion unless op_ncore == 1

int nout = ufunc->nout;
int nop = nin + nout;
int iop, list_size;
PyObject *op_axes_tuple;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This can also move to the loop scope

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had it there, but since it is decref'd in fail, I got compilation errors.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fail can move within the loop too.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since I ended up having only two instances of goto fail, I just removed that altogether, and moved the op_axes_tuple object in the loop.

@mhvk mhvk force-pushed the gufunc-axis-argument branch 3 times, most recently from 44f0ddc to 02f89d9 Compare January 7, 2018 02:11
@ahaldane
Copy link
Member

LGTM, though I didn't check super carefully for little bugs.

If I understand correctly, a future plan may be to allow collapsing multiple axes to one axis like reduce does. The syntax might allow nested tuples like [((-4,-3), -1), (-1,), (-2,)] for signature (i,j),(j)->(i) where for the 0th op axes -4 and -3 were collapsed to a single axis.

That's starting to get a bit cluttered, so I was thinking about other syntaxes. What about supporting a dictionary like axes={0: ((-4,-3), -1), 2: (-2,)} where the key is the op index and the value is the axes? That gives a way to only specify some op axes and leaves the others as default, like you attempted with the None placeholder, now gone. And indstead of having to mentally count out which op a tuple corresponds to, you can just read off the key.

@mhvk
Copy link
Contributor Author

mhvk commented Feb 15, 2018

@ahaldane - I quite like the dict idea, in particular that it is so much more obvious that one can override the axes for just some operands. On the other hand, it feels like it deviates a bit from "standard" practice, in that maybe more typically extra attributes to what in essence is a tuple of arguments are also given as iterables. But I'm not sure this "standard" practice really exists anywhere but in my mind.

I also feel having keys that are just numbers is slightly odd, but perhaps a related alternative would resolve this: one could look in kwargs for any keyword arguments starting with axes (i.e., axes0, axes1; still plural, since it can be more than one) and pass those on. In terms of implementation, this would likely end up quite similar to your suggestion, since it would be easiest to just create a dict with just what you suggest in the parser.

@eric-wieser, what do you think?

@ahaldane
Copy link
Member

@mhvk I quite like the keyword arg idea. I also found some other alternatives in this thread.

Here's an example of what the different options we've mentioned look like:

np.matmul(a, b, out=c, axes=[((-4,-3), -1), (-1,), (-2,)])

np.matmul(a, b, out=c, axes={0: ((-4,-3), -1), 2: (-2,)})

np.matmul(a, b, out=c, axes0=((-4,-3), -1), axes2=(-2,))

np.matmul(a, b, out=c, axes0=((-4,-3), -1), axes_out0=(-2,))

np.matmul(a, b, out=c, axes=[{"m": (-4, -3), "n": -1}, {"n": -1}, {"m": -2}].

I think my favorites right now are the axes0= styles.

@shoyer
Copy link
Member

shoyer commented Feb 16, 2018

The nested tuple syntax feels most natural to me, though I agree that it can get cumbersome for complex expressions.

I would be OK with optionally replacing tuples at either the argument or axis level with dictionaries, keyed by either argument number or axis name, e.g., any of

  • axes={0: ((-4,-3), -1), 2: (-2,)})
  • axes=[{"m": (-4, -3), "n": -1}, {"n": -1}, {"m": -2}]
  • axes={0: {"m": (-4, -3), "n": -1}, 2: {"m": -2}}

That said, I'm skeptical that we actually need this. If the main argument is readability, then dictionary for axis names is probably the most valuable (e.g., {"m": (-4, -3), "n": -1}).

I don't like the axis0= styles because they rely on programmatically generated argument names.

@ahaldane
Copy link
Member

That sounds fair, let's support the list-of-tuples format in any case. As far as I can see, everyone is happy with this PR and it is ready to merge.

So how about we merge this PR after 1.14.1 gets released in the next few days? We can leave any additional syntaxes or options like 'keepdims' for discussion in future PRs, aiming for 1.15.

@mhvk
Copy link
Contributor Author

mhvk commented Feb 17, 2018

Sounds all OK to me.

@hameerabbasi
Copy link
Contributor

I think my favorites right now are the axes0= styles.

Would they scale well to an arbitrary amount if input arguments? It seems einsum can easily call a gufunc version of a fused-multiply-add if arbitrary amounts of arguments were kept.

My favourite is the int-indexed dict style. A perfect balance of verbosity and explicitness, IMO.

axes={0: ((-4,-3), -1), 2: (-2,)}

@ahaldane
Copy link
Member

Now that 1.14.1 is in, let's go ahead and merge this one. Then we can debate alternate syntaxes.

I'll give this one another read-through myself, and then I plan to merge in half a day. Sounds good?

If anyone wants more time to check anything, or if anyone who did the more thorough reviews above wants to give the final say, please speak up before then.

*/
for (axis = op_nbroadcast; axis < op_ndim; axis++) {
axis_item = PyTuple_GetItem(op_axes_tuple, axis - op_nbroadcast);
op_axis = (int)PyNumber_AsSsize_t(axis_item, NULL);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nitpick: I think this is partly copied from PyUFunc_GenericReduction. There, we used PyTuple_GETITEM and PyArray_PyIntAsInt instead of these two functions. I don't think it's too important so I'm happy to merge as is, but probably those functions are slightly better: faster and avoids the cast.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm quite happy to change, but think that the GetItem ended up this way after discussion with @eric-wieser - suggestions?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, I see there was debate over this and use of PyTuple_CheckExact. I'm fine leaving it as you have it.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think my point about PyTuple_GetItem was wrong - there's no reason not to use PyTuple_GETITEM here, as the former doesn't respect subclasses __getitem__ anyway

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, replaced both.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we do this for the tuple, I assume I should also use GETITEM for the list?

if (axes) {
remap_axis = PyArray_malloc(sizeof(remap_axis[0]) * nop);
remap_axis_memory = PyArray_malloc(sizeof(remap_axis_memory[0]) *
nop * NPY_MAXDIMS);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be nice to avoid mallocs. In principle we could replace nop by NPY_MAXARGS and allocate on the stack, but that would be 32*32 ints, or 8kb (4kb on 32bit), which seems a little large to put on the stack. But I see other places in numpy we have already done this, eg a grep shows:

numpy/core/src/umath/ufunc_object.c:2061:    int op_axes_arrays[NPY_MAXARGS][NPY_MAXDIMS];
numpy/core/src/multiarray/nditer_pywrap.c:742:    int op_axes_arrays[NPY_MAXARGS][NPY_MAXDIMS];
numpy/core/src/multiarray/einsum.c.src:2610:    char op_labels[NPY_MAXARGS][NPY_MAXDIMS];
numpy/core/src/multiarray/einsum.c.src:2617:    int op_axes_arrays[NPY_MAXARGS][NPY_MAXDIMS];

I like the strategy in the 3rd line there to use a char, so it is only 1kb. I also like how those other examples use C's 2d static array syntax, instead of allocating a separate index array.

In summary, I actually think it would be a significant improvement to allocate remap_axis_memory as a static 2d char array.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had that originally, but thought that it was a waste of memory for what should normally be a rare use case. A yet more complicated alternative would be to allocate a, say, 3x3 2-D array and not use it is there are more arguments. (note that the char is for op_labels, which are in fact characters.)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are a few ways to tidy this along those lines.

Perhaps do this: Get rid of remap_axis and only have char remap_axis_memory[NPY_MAXARGS][NPY_MAXDIMS]; Then redefine #define REMAP_AXIS(iop, axis) ((axes ? remap_axis_memory[iop][axis] : axis)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh sorry, I didn't see your comment. Yeah it is debatable.

I guess if you already investigated both options, I am happy to go with whichever you thin kis better.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any reason not to combine these into one allocation?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remap_axis is of pointers, remap_axis_memory of int, so in principle they can have different stride size, correct?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Stride size isn't relevant if you're allocating a void*. I suppose alignment could be an issue?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the end, I think this is an implementation detail - the present code is at least somewhat clear, so maybe best to leave it for later.

@ahaldane
Copy link
Member

All right finished reviewing, and it looks good.

I think after your replies, I am still happy to merge as is. We can leave the mallocs. I'll wait until this evening to merge in case there are any more comments.

@mhvk mhvk force-pushed the gufunc-axis-argument branch from 02f89d9 to 5dfe2f0 Compare February 27, 2018 20:10
@mhvk
Copy link
Contributor Author

mhvk commented Feb 27, 2018

Pushed up a rebased version with GETITEM (twice) PyArray_PyIntAsInt (once), but leaving the malloc in place.

My main question is whether we stick with the list of tuples or a dict of tuples (which I quite like, but only felt it was not "typical", for reasons I am not altogether sure about). We can of course merge and decide later, but that tends to commit us to what is there (and I don't like the idea of having alternatives for a rarely used keyword!).

EDIT: the main advantage of a dict remains that one takes care of defaults by not including those operands.

@hameerabbasi
Copy link
Contributor

+1 for putting things into a final form before pushing to master. If we release before a change, it could bind us to what's already there.

@eric-wieser
Copy link
Member

eric-wieser commented Feb 27, 2018

I'm -1 on the dict simply because it's inconsistent with the out argument:

out_fr, out_exp = np.frexp(x, out=(out_fr, None))

is supported, but

out_fr, out_exp = np.frexp(x, out={0: out_fr})

is not.

If we want to support dictionaries, I think it should come in a much later PR that also adds support for named ufunc arguments.

mhvk added 3 commits February 27, 2018 20:03
The axes argument allows one to specify the axes on which
the gufunc will operate (by default, the trailing ones).
It has to be a list with length equal to the number of
operands, and each element a tuple of length equal to the
number of core dimensions, with each element an axis index.

If there is only one core dimension, the tuple can be
replaced by a single index, and if none of the outputs have
core dimensions, the corresponding empty tuples can be
omitted.
This ensures we do not have to guard against any operand having
fewer dimensions than required by the ufunc in, e.g.,
_parse_axes_argument.
@mhvk mhvk force-pushed the gufunc-axis-argument branch from 5dfe2f0 to 05a420a Compare February 28, 2018 01:03
@mhvk
Copy link
Contributor Author

mhvk commented Feb 28, 2018

@eric-wieser - agree with you somewhat that the dict is a break of how we pass in other arguments (good to have an actual example!). Though by the comparison with a tuple for out it should perhaps be a tuple of lists rather than a list of tuples... While visually for simple cases, I think I prefer the current approach of a list of tuples, and also the signature uses "tuples" to indicate sets of axes (like "(m,n),(n,o)->(m,o)"), the tuple of lists might make it easier to distinguish a single axis made up of several components from multiple single axes, so let me at least show examples (noting of course, that combining multiple axes for a single one is currently not supported!):

# matrix-vector example now
np.matmul(a, b, out=c, axes=[((-4, -3), -1), (-1,), (-2,)])
# removing tuples for single axis (possible now)
np.matmul(a, b, out=c, axes=[((-4, -3), -1), -1, -2])
# tuple of lists
np.matmul(a, b, out=c, axes=([(-4, -3), -1], [-1], [-2]))
# simplified by removing lists for single axis (would be possible)
np.matmul(a, b, out=c, axes=([(-4, -3), -1], -1, -2))

# matrix-matrix example now, for comparison
np.matmul(a, b, out=c, axes=[((-4, -3), -1), (-1, -2), (-2, -1)])
# tuple of lists
np.matmul(a, b, out=c, axes=([(-4, -3), -1], [-1, -2], [-2, -1]))

# matrix-vector with complicated vector now:
np.matmul(a, b, out=c, axes=[((-4, -3), -1), ((-4, -3),), (-2,)])
# Cannot simplify single axis for b now, since it would suggest 2 axes
np.matmul(a, b, out=c, axes=[((-4, -3), -1), ((-4, -3),), -2])
# tuple of lists
np.matmul(a, b, out=c, axes=([(-4, -3), -1], [(-4, -3)], [-2]))
# simplified tuple of lists: now can simplify single combined axis.
np.matmul(a, b, out=c, axes=([(-4, -3), -1], (-4, -3), -2))

@ahaldane
Copy link
Member

All right, I think we are close enough to a consensus that I will merge now. We can always change things in future PRs. I think we are all at least OK with the list-of-tuples syntax, I for one happy with it.

Thanks @mhvk and all reviewers!

@ahaldane ahaldane merged commit c7459dd into numpy:master Feb 28, 2018
@mhvk mhvk deleted the gufunc-axis-argument branch February 28, 2018 21:25
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

9 participants