Skip to content

ENH: Adding logical gufuncs, ie all_equal #8528

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 26 commits into from

Conversation

mattharrigan
Copy link
Contributor

@mattharrigan mattharrigan commented Jan 24, 2017

Currently a work in progress. Adds functionality from issue #8513

Questions:

  • How should this be tied into numpy setup? Currently the build fails when clean because logical_gufuncs.c.src isnt being expanded. If I uncomment line 905 numpy.core.setup that file gets expanded into the *.c, but now the compile fails because its included twice. If I leave the expanded *.c files and recomment line 905, now the build works. "generate_umath_templated_sources" sounds like a helpful function but I don't think it is ever called. Sorry but this is my first time in the bowels of a complicated setup.py. Any advice?

  • I still need to add complex data types. Should I add datetimes also? Also need to add unit tests, docstrings, release notes, and probably others

  • Any other suggestions welcomed

Thanks

@juliantaylor
Copy link
Contributor

only have the .c.src in the setup.py, not both.

couldn't this just be a regular ufunc?

@charris
Copy link
Member

charris commented Jan 25, 2017

I have the same question as @juliantaylor, what is the use case for making this a gufunc? If multiple axis are supported I think the same could be accomplished with a ufunc. There would be some work involved as this isn't the standard reduce operation. We should probably give some thought to what general class of operations all_* functions fall under.

@mhvk
Copy link
Contributor

mhvk commented Jan 25, 2017

@juliantaylor, @charris - the difference is the reduction in dimensionality here. Are there ways other than reduce that do this for a ufunc? (reduce would not apply here, of course, as it only takes a single input array, and the whole point is to avoid calculating the full boolean array).

@mattharrigan
Copy link
Contributor Author

latest commit is probably a hack to get it to build on my machine, but fails for some builds. I tried @juliantaylor advice but couldn't get it building on my machine, probably do to my unfamiliarity with numpy builds. At least it allows me to start working on tests and other aspects.

It seems that a creating similar functionality with a regular ufunc would require a large rewrite of the ufunc machinery. Numpy would need to add something like reduce for non-binary ufuncs and also have a short circuit mechanism, neither seems easy IMHO. It seems like generalized ufuncs were created for cases like this.

@mattharrigan
Copy link
Contributor Author

I am unsure whether this functionality is welcomed or not. If not, I would prefer to know so that I don't spend any more time on it. Should it be explicitly asked in discussion?

@mhvk
Copy link
Contributor

mhvk commented Jan 29, 2017

It is a bit from the peanut gallery, but I would definitely use it!

@charris
Copy link
Member

charris commented Jan 30, 2017

Looks like you may need to add something to MANIFEST.in for the sdist. Not sure why...

@shoyer
Copy link
Member

shoyer commented Jan 30, 2017

It seems like generalized ufuncs were created for cases like this.

I agree -- this feels like an excellent fit for generalized ufuncs.

I am unsure whether this functionality is welcomed or not. If not, I would prefer to know so that I don't spend any more time on it. Should it be explicitly asked in discussion?

I am also a fan and would support including this in NumPy. These types of comparisons are done all the time, and for some use cases this could be a significant improvement.

@charris
Copy link
Member

charris commented Jan 31, 2017

@juliantaylor Unused function compiler warning.

@juliantaylor
Copy link
Contributor

yes it should just use a header file instead of including a C file.

@mattharrigan mattharrigan force-pushed the logical_gufuncs branch 3 times, most recently from e0b484a to 52bedaf Compare February 2, 2017 01:02
@mattharrigan
Copy link
Contributor Author

mattharrigan commented Feb 2, 2017

Can you please explain what should go in the header and why? I tried a few things but only got segfaults. I apologize for the newbie questions. Thanks.

Copy link
Contributor

@juliantaylor juliantaylor left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

gufuncs cannot broadcast over their core dimensions right?
There is some concern for confusing users who try using them like normal ufuncs.
If I am not missing something you have to use stride_tricks to do e.g. np.all_equal(d, 1) right?

I am also not too sure about how much use they would really get, but then we do have very specialized functions already so its probably not that big a deal.
One thing that would be needed is the vectorization so that the non short circuiting code is not slower than the two vectorized two step code for cache sized arrays.


/* -------------------------------------------------------------------------- */
/* Create type arrays for each gufunc, which are all identical*/
static char types[48] = {NPY_BYTE, NPY_BYTE, NPY_BOOL,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

shouldn't need the explicit size, the compiler can figure that out

"Return True if x1 == x2 for all elements along the last axis, False\n"
"otherwise. Similar to (x1==x2).all(axis=-1), except the last dimension\n"
"of x1 and x2 must be equal and greater than 1. This function short\n"
"circuits after the first nonequal element."
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should be clearer, e.g. short circuits on the first nonequal within the reduced axis. I think it might also not be the case when the axis is larger than the iterator buffer size due to limitations of the current implementation
Might be best to not document that behaviour yet.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the docstring for all variants can probably be generated from our templating script.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Docstrings should be handled as ufunc docstrings are, not in C code. See numpy/core/code_generators/ufunc_docstrings.py.

@@ -0,0 +1,13 @@
typedef PyObject* (*PyUFunc_FromFuncAndDataAndSignature_t)(PyUFuncGenericFunction*,
void**,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

spaces instead of tabs

a_i += a_I; \
b_i += b_I; \
} \
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

these could all be inlined templated functions.

@seberg
Copy link
Member

seberg commented Feb 6, 2017

Maybe a bit silly, but I am still wondering a bit about something like np.equal.all(). I.e. some ufunc mechanism which uses the typical ufunc loops instead of specialized loops (and probably would support loops returning bools, though that does not matter much). On the up-side it would allow much more generality. On the downside, you would have to live in a buffered world (even for the trivial cases, you will have to do do chunking logic) with chunking to get the "early stop" behaviour (i.e. you do not have a perfect stop as soon as one false/true occurs, but only "stop soon after". Which probably does not matter much speed wise, though may give some wrong warnings (or even errors for objects).

@mattharrigan
Copy link
Contributor Author

@juliantaylor Thanks for the feedback. I fixed the easy ones. I don't know what you mean by "templated functions" or the docstring "templating script". Can you point me to examples?

@juliantaylor
Copy link
Contributor

the /*begin repeat comments are our form of templates, it can be used on everything in a file, so it should also be usable to create all the very similar docstrings from one template.

/* create the family of functions using a template */

/**begin repeat
* #TYPE = byte,ubyte,short,ushort,int,uint,long,ulong,longlong,ulonglong,float,double,longdouble,cfloat,cdouble,clongdouble#
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The long line should be broken. Indeed, all the long lines should be < 80 characters.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree about long lines, but it wasn't clear to me how to break up lines in the template format. I checked here but the doc's didn't seem to mention multiline type statements. How does that work?

@charris
Copy link
Member

charris commented Feb 8, 2017

Grep is your friend.

/**begin repeat
 * #TYPE = BYTE, UBYTE, SHORT, USHORT, INT, UINT,
 *         LONG, ULONG, LONGLONG, ULONGLONG#
 * #type = npy_byte, npy_ubyte, npy_short, npy_ushort, npy_int, npy_uint,
 *         npy_long, npy_ulong, npy_longlong, npy_ulonglong#
 * #ftype = npy_float, npy_float, npy_float, npy_float, npy_double, npy_double,
 *          npy_double, npy_double, npy_double, npy_double#
 * #SIGNED = 1, 0, 1, 0, 1, 0, 1, 0, 1, 0#
 */

@mattharrigan
Copy link
Contributor Author

mattharrigan commented Feb 11, 2017

@juliantaylor I agree vectorization would be nice. This function is indeed slower for cache sized arrays where the entire array must be checked. On my personal machine is a little old (i3 M370, no AVX):

In [3]: x = np.arange(2**16, dtype=float)
In [5]: y = x.copy()
In [8]: %timeit (x==y).all()
10000 loops, best of 3: 81.8 µs per loop

In [9]: %timeit np.all_equal(x,y)
10000 loops, best of 3: 116 µs per loop

The new function is still faster for huge arrays, tiny arrays, or cases where an early element is not equal. However it would be nice if it were faster for all problems. I would like to learn more about vectorization, this seems like my chance!

@juliantaylor
Copy link
Contributor

vectorization is not necessary for now, it can be added later too.

@mattharrigan
Copy link
Contributor Author

mattharrigan commented Feb 12, 2017

Here is a toy function which vectorizes nicely:

#define ARR_LENGTH 4096
typedef float afloat __attribute__ ((__aligned__(32)));
int curly(afloat *a, afloat *b){
    unsigned int i;
    unsigned int result = 0;
    for (i=0; i<ARR_LENGTH; i++){
        result += (*a) == (*b);
        a++;
        b++;
    }
    return result == i;
}

Compiled with gcc 6.3 -O3 -march=haswell yield nice looking avx loop

.L2:
        vmovaps (%rsi,%rax), %ymm0
        vcmpeqps        (%rdi,%rax), %ymm0, %ymm0
        addq    $32, %rax
        vpsubd  %ymm0, %ymm1, %ymm1
        cmpq    $16384, %rax
        jne     .L2

@homu
Copy link
Contributor

homu commented Feb 16, 2017

☔ The latest upstream changes (presumably #8043) made this pull request unmergeable. Please resolve the merge conflicts.

@mattharrigan
Copy link
Contributor Author

mattharrigan commented Feb 17, 2017

I still need to do housekeeping for docstrings and release notes. However I think this is getting close.

This is my first serious attempt at vectorization. I'm relying on the compiler to automagically vectorize the code. It seems to be pretty quick. There is probably room for improvement though. Here is a crude benchmark on my old personal machine. I would like to know how it does with AVX. Also how much should be vectorized? Right now its not much, just all_equal for int, float, and double.

In [5]: x = np.ones(2**18, dtype=np.float32);y=x.copy()
In [6]: %timeit (x==y).all()
1000 loops, best of 3: 203 µs per loop
In [7]: %timeit np.all_equal(x, y)
10000 loops, best of 3: 145 µs per loop

@mattharrigan
Copy link
Contributor Author

@charris previously commented:

Docstrings should be handled as ufunc docstrings are, not in C code. See numpy/core/code_generators/ufunc_docstrings.py.

AFAIK in order to have docstrings like that you also need to use numpy/core/code_generators/generate_ufunc_api.py. I have only looked at that very briefly, but its not clear to me that mixing a python generator script with C code helps all that much here. The .src templating scripts work really well here IMHO. The only real benefit I see in switching is consistency with what already exists.

  1. Am I correct that you need to use genereate_ufunc_api to use ufunc_docstrings.py?
  2. Should I make that switch?

@eric-wieser
Copy link
Member

eric-wieser commented Mar 3, 2017

Maybe a bit silly, but I am still wondering a bit about something like np.equal.all()

+1 on this. This would then also provide np.logical_and.(any|all) and np.logical_or.(any|all), although these don't strike me as immediately useful.

With a bit of type coercion, you could allow np.add.any(a, b) == np.any(a + b) == np.any(a + b != 0), although that's a little magic

@mhvk
Copy link
Contributor

mhvk commented Mar 3, 2017

Maybe a bit silly, but I am still wondering a bit about something like np.equal.all()

Would be visually nice, but in the context of ufuncs it remains odd to me. The functions here rather logically fit the gufunc mold, as it really is two ufunc's chained together (logical_and.reduce(equal(a, b))), but then nicely done in an inner loop. Furthermore, the short-circuiting is very particular to the logical functions, not general to all.

Now the first point does hint at a broader question, whether it would be possible to chain more generally (e.g., add.reduce(power(a, 2)), or the ability to multiply an array by a constant before applying a ufunc, as in, sin(multiply(a, b)) or add(a, multiply(b, c)) (i.e., stuff done by numexpr...)

@ahaldane
Copy link
Member

ahaldane commented Feb 13, 2018

Maybe this PR is a first step towards what I just said.

In that case, it's interesting to look at some timings:

[1]: z = np.eye(10000, dtype='i')
[2]: x = np.zeros((10000,10000), dtype='i')

# test for for every possible index of first inequality
[3]: %timeit [np.any_not_equal(zi, xi) for zi,xi in zip(z,x)]
30.8 ms ± 2.47 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
[4]: %timeit [np.logical_or.reduce(zi != xi) for zi,xi in zip(z,x)]
99.2 ms ± 5.99 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)

# test when the first inequality is at index 10
[5]: zi, xi = z[10], x[10]
[6]: %timeit np.any_not_equal(zi, xi)
1.36 µs ± 27.8 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
[7]: %timeit np.logical_or.reduce(zi != xi)
8.33 µs ± 295 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)

So this PR does appear to give a speedup of about 3x on average, or much more if we short-circuit early.

Maybe the way forward is this: Merge this PR, and then in future PRs specialize it to 1) Add loops comparing to a scalar (i),()->() and loops comparing to 0 (i)->(), and 2) Adapt this same code to compute first/count.

If @mattharrigan is still around, I would be interested in following this through.

@eric-wieser
Copy link
Member

eric-wieser commented Feb 13, 2018

I'd prefer to go straight to ufunc.first and ufunc.count rather than adding more api surface to np. first by merging this.

@ahaldane
Copy link
Member

ahaldane commented Feb 13, 2018

This PR currently adds these ufuncs at toplevel, np.all_not_equal and so on. That's what you're talking about, right?

I agree I don't think they should be top level, I think they should be exposed as np.not_equal.all.

@mattharrigan
Copy link
Contributor Author

mattharrigan commented Feb 13, 2018 via email

dt_nat = np.array([np.datetime64('NaT', 'D')])
dt_other = np.array([np.datetime64('2000-01-01')])
td_nat = np.array([np.timedelta64('NaT', 'h')])
td_other = np.array([np.timedelta64(1, 'h')])
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd prefer not to lose the 0d tests we already have here

yield check, np.all_less, x1, x2, (x1<x2).all()
yield check, np.any_less, x1, x2, (x1<x2).any()
yield check, np.all_less_equal, x1, x2, (x1<=x2).all()
yield check, np.any_less_equal, x1, x2, (x1<=x2).any()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yield tests are no more, I'm afraid - much though I liked them

@ahaldane
Copy link
Member

@mattharrigan it might ultimately be nice to have all of them: np.equal.all, np.equal.any, np.equal.first, and np.equal.count.

So maybe instead of trying to do everything at once we should get any and all in since they are already here.

@ahaldane
Copy link
Member

ahaldane commented Feb 15, 2018

Technical question: How can we add these two new gufuncs as attributes on the ufunc objects (like np.equal.all)? It's not trivial.

One possibility is to add an any and all method to all ufuncs (in ufunc_methods), and make them raise an error for the ufuncs where this operation is inapplicable, which is most of them. That's how we already do it with reduce/outer/at. To complete this PR that way we would need to write some setup code in ufunc_object.c to detect and reject all the invalid ufuncs, and we would possibly need another entry in the PyUFuncObject struct to store the location of the corresponding gufunc loops. It seems a bit messy and clutters the attributes of all ufuncs.

Another possibility is to use monkeypatching to only give the valid ufuncs the new methods in python. One obstacle here is that ufuncs do not currently support monkeypatching (or inheritance). But I figured out the minimal changes to this PR to get monkeypatched any and all to work, see the code here (link). I had to take over an unused pointer in the PyUFuncObject struct to do this without affecting the struct layout. It's not perfect and a bit ugly, but it works.

Does anyone find one of these options (or a third one) preferable? Is it acceptable to take over the unused pointer?

Or, should we put off this question to another PR, and merge this one? In that case I suggest we add a leading underscore to these gufuncs (so all_equal becomes _all_equal), thus hiding them for now until we decide what to do. We then only need conflict resolution and an update to the test.

@mhvk
Copy link
Contributor

mhvk commented Feb 15, 2018

@ahaldane - I quite like the monkey-patching except that it does seem to lead to illogical behaviour for __array_ufunc__: where for reduce, etc., one gets the regular ufunc and with method="reduce", here for a similar attribute one would get passed a gufunc.

In that respect, it might be better to define them generally, although as you say it is quite annoying that they then would appear among the methods for all ufuncs. Ideally, we'd have inheritance and define logical ufuncs that do have all and any (and similarly have the 2-argument ufuncs that can deal with reduce inherit from a base class that does not have that attribute...).

p.s. Not sure that I like merging anything unless we're exposing it: a private ufunc either means that it will see no use, or that people use it and it effectively becomes public anyway.

@ahaldane
Copy link
Member

@mhvk I hadn't realized the __array_ufunc__ problem. That may also be a problem the day we convert the reduce methods to their own gufuncs, which I think is the plan. Perhaps we can get around it by giving all these gufuncs an attribute pointing to their "base/scalar" ufunc, and arrange it so they pass the scalar ufunc to __array_ufunc__. Or maybe we can one day deprecate the "method" keyword once we only have only ufuncs/gufuncs.

As for inheritance, I also thought about it but I think it will be messier than monkey-patching. The problem is that some ufuncs may belong to multiple types, eg both to "reduce-like" and "logical", meaning we may need to create complex mixtures of mixins for each ufunc. Each ufunc would be a singleton for its subclassed type, and the scalar ufuncs will no longer simply have numpy.ufunc type. That seemed uglier than monkey-patching.

As for merging, you're probably right. Probably the best course of action is to make a separate PR to decide how to add new methods to ufuncs, and once that is merged fix up this one.

@mattharrigan
Copy link
Contributor Author

I very much like np.equal.all and would like to help make it happen. I also don't like that some ufuncs have methods defined which don't actually work (np.sin.reduce for instance). As another option, could each ufunc be a class and simply define 0 or more methods in C, as opposed to each ufunc being an object of the same class? Also I'm curious about the history and why all ufuncs are objects of a single class, and what obstacles might be to changing it now.

@charris
Copy link
Member

charris commented Jun 15, 2022

There seems to be consensus that something like this would be useful.

@InessaPawson InessaPawson added triaged Issue/PR that was discussed in a triage meeting and removed triage review Issue/PR to be discussed at the next triage meeting labels Jun 16, 2022
@InessaPawson InessaPawson changed the title Adding logical gufuncs, ie all_equal ENH: Adding logical gufuncs, ie all_equal Jun 24, 2022
@charris
Copy link
Member

charris commented Feb 20, 2023

@mattharrigan Are you interested in continuing this? It needs some rework at this point, but it seems folks would like having it.

@charris
Copy link
Member

charris commented Mar 7, 2023

I'm going to close this as inactive. Adding this functionality seem popular, so marking as good idea.

@charris charris closed this Mar 7, 2023
@charris charris added 52 - Inactive Pending author response 64 - Good Idea Inactive PR with a good start or idea. Consider studying it if you are working on a related issue. labels Mar 7, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
01 - Enhancement 52 - Inactive Pending author response 64 - Good Idea Inactive PR with a good start or idea. Consider studying it if you are working on a related issue. component: numpy._core triaged Issue/PR that was discussed in a triage meeting
Projects
None yet
Development

Successfully merging this pull request may close these issues.

10 participants