-
-
Notifications
You must be signed in to change notification settings - Fork 10.8k
ENH: Adding logical gufuncs, ie all_equal #8528
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
only have the .c.src in the setup.py, not both. couldn't this just be a regular ufunc? |
I have the same question as @juliantaylor, what is the use case for making this a gufunc? If multiple axis are supported I think the same could be accomplished with a ufunc. There would be some work involved as this isn't the standard reduce operation. We should probably give some thought to what general class of operations |
@juliantaylor, @charris - the difference is the reduction in dimensionality here. Are there ways other than |
latest commit is probably a hack to get it to build on my machine, but fails for some builds. I tried @juliantaylor advice but couldn't get it building on my machine, probably do to my unfamiliarity with numpy builds. At least it allows me to start working on tests and other aspects. It seems that a creating similar functionality with a regular ufunc would require a large rewrite of the ufunc machinery. Numpy would need to add something like reduce for non-binary ufuncs and also have a short circuit mechanism, neither seems easy IMHO. It seems like generalized ufuncs were created for cases like this. |
I am unsure whether this functionality is welcomed or not. If not, I would prefer to know so that I don't spend any more time on it. Should it be explicitly asked in discussion? |
It is a bit from the peanut gallery, but I would definitely use it! |
Looks like you may need to add something to |
I agree -- this feels like an excellent fit for generalized ufuncs.
I am also a fan and would support including this in NumPy. These types of comparisons are done all the time, and for some use cases this could be a significant improvement. |
@juliantaylor Unused function compiler warning. |
yes it should just use a header file instead of including a C file. |
e0b484a
to
52bedaf
Compare
Can you please explain what should go in the header and why? I tried a few things but only got segfaults. I apologize for the newbie questions. Thanks. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
gufuncs cannot broadcast over their core dimensions right?
There is some concern for confusing users who try using them like normal ufuncs.
If I am not missing something you have to use stride_tricks to do e.g. np.all_equal(d, 1) right?
I am also not too sure about how much use they would really get, but then we do have very specialized functions already so its probably not that big a deal.
One thing that would be needed is the vectorization so that the non short circuiting code is not slower than the two vectorized two step code for cache sized arrays.
|
||
/* -------------------------------------------------------------------------- */ | ||
/* Create type arrays for each gufunc, which are all identical*/ | ||
static char types[48] = {NPY_BYTE, NPY_BYTE, NPY_BOOL, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
shouldn't need the explicit size, the compiler can figure that out
"Return True if x1 == x2 for all elements along the last axis, False\n" | ||
"otherwise. Similar to (x1==x2).all(axis=-1), except the last dimension\n" | ||
"of x1 and x2 must be equal and greater than 1. This function short\n" | ||
"circuits after the first nonequal element." |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should be clearer, e.g. short circuits on the first nonequal within the reduced axis. I think it might also not be the case when the axis is larger than the iterator buffer size due to limitations of the current implementation
Might be best to not document that behaviour yet.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the docstring for all variants can probably be generated from our templating script.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Docstrings should be handled as ufunc docstrings are, not in C code. See numpy/core/code_generators/ufunc_docstrings.py
.
@@ -0,0 +1,13 @@ | |||
typedef PyObject* (*PyUFunc_FromFuncAndDataAndSignature_t)(PyUFuncGenericFunction*, | |||
void**, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
spaces instead of tabs
a_i += a_I; \ | ||
b_i += b_I; \ | ||
} \ | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
these could all be inlined templated functions.
Maybe a bit silly, but I am still wondering a bit about something like |
@juliantaylor Thanks for the feedback. I fixed the easy ones. I don't know what you mean by "templated functions" or the docstring "templating script". Can you point me to examples? |
the |
/* create the family of functions using a template */ | ||
|
||
/**begin repeat | ||
* #TYPE = byte,ubyte,short,ushort,int,uint,long,ulong,longlong,ulonglong,float,double,longdouble,cfloat,cdouble,clongdouble# |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The long line should be broken. Indeed, all the long lines should be < 80 characters.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agree about long lines, but it wasn't clear to me how to break up lines in the template format. I checked here but the doc's didn't seem to mention multiline type statements. How does that work?
Grep is your friend.
|
@juliantaylor I agree vectorization would be nice. This function is indeed slower for cache sized arrays where the entire array must be checked. On my personal machine is a little old (i3 M370, no AVX): In [3]: x = np.arange(2**16, dtype=float)
In [5]: y = x.copy()
In [8]: %timeit (x==y).all()
10000 loops, best of 3: 81.8 µs per loop
In [9]: %timeit np.all_equal(x,y)
10000 loops, best of 3: 116 µs per loop The new function is still faster for huge arrays, tiny arrays, or cases where an early element is not equal. However it would be nice if it were faster for all problems. I would like to learn more about vectorization, this seems like my chance! |
vectorization is not necessary for now, it can be added later too. |
Here is a toy function which vectorizes nicely: #define ARR_LENGTH 4096
typedef float afloat __attribute__ ((__aligned__(32)));
int curly(afloat *a, afloat *b){
unsigned int i;
unsigned int result = 0;
for (i=0; i<ARR_LENGTH; i++){
result += (*a) == (*b);
a++;
b++;
}
return result == i;
} Compiled with gcc 6.3 -O3 -march=haswell yield nice looking avx loop .L2:
vmovaps (%rsi,%rax), %ymm0
vcmpeqps (%rdi,%rax), %ymm0, %ymm0
addq $32, %rax
vpsubd %ymm0, %ymm1, %ymm1
cmpq $16384, %rax
jne .L2 |
☔ The latest upstream changes (presumably #8043) made this pull request unmergeable. Please resolve the merge conflicts. |
I still need to do housekeeping for docstrings and release notes. However I think this is getting close. This is my first serious attempt at vectorization. I'm relying on the compiler to automagically vectorize the code. It seems to be pretty quick. There is probably room for improvement though. Here is a crude benchmark on my old personal machine. I would like to know how it does with AVX. Also how much should be vectorized? Right now its not much, just all_equal for int, float, and double. In [5]: x = np.ones(2**18, dtype=np.float32);y=x.copy()
In [6]: %timeit (x==y).all()
1000 loops, best of 3: 203 µs per loop
In [7]: %timeit np.all_equal(x, y)
10000 loops, best of 3: 145 µs per loop |
@charris previously commented:
AFAIK in order to have docstrings like that you also need to use numpy/core/code_generators/generate_ufunc_api.py. I have only looked at that very briefly, but its not clear to me that mixing a python generator script with C code helps all that much here. The .src templating scripts work really well here IMHO. The only real benefit I see in switching is consistency with what already exists.
|
+1 on this. This would then also provide With a bit of type coercion, you could allow |
Would be visually nice, but in the context of Now the first point does hint at a broader question, whether it would be possible to chain more generally (e.g., |
Maybe this PR is a first step towards what I just said. In that case, it's interesting to look at some timings: [1]: z = np.eye(10000, dtype='i')
[2]: x = np.zeros((10000,10000), dtype='i')
# test for for every possible index of first inequality
[3]: %timeit [np.any_not_equal(zi, xi) for zi,xi in zip(z,x)]
30.8 ms ± 2.47 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
[4]: %timeit [np.logical_or.reduce(zi != xi) for zi,xi in zip(z,x)]
99.2 ms ± 5.99 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
# test when the first inequality is at index 10
[5]: zi, xi = z[10], x[10]
[6]: %timeit np.any_not_equal(zi, xi)
1.36 µs ± 27.8 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
[7]: %timeit np.logical_or.reduce(zi != xi)
8.33 µs ± 295 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each) So this PR does appear to give a speedup of about 3x on average, or much more if we short-circuit early. Maybe the way forward is this: Merge this PR, and then in future PRs specialize it to 1) Add loops comparing to a scalar If @mattharrigan is still around, I would be interested in following this through. |
I'd prefer to go straight to |
This PR currently adds these ufuncs at toplevel, I agree I don't think they should be top level, I think they should be exposed as |
I like "first" a lot. I think that is better than "all" or "any". Another
reason why I like it is that its meaning is much more clear than
alternatives and it is fairly common. For instance when I when first
learning python/numpy, I was confused about np.argmax(x == 3) to find the
index of the first 3. That is essentially a cryptic way of writing
"first". Maybe others would also appreciate it.
I'm not opposed to count, but it seems less useful to me.
I don't feel confident enough to speak to the syntax.
I think the code here is nicely optimized/vectorized/special cased.
…On Tue, Feb 13, 2018 at 1:25 PM, Allan Haldane ***@***.***> wrote:
TL;DR: What about implementing this as a gufunc np.equal.first(a, b)
instead?
first instead of any and all
I was recently thinking about a different pair of generalized ufuncs,
which I will call "first" and "count". I happily came across this PR since
it already basically implements first, but slightly differently. I want
to suggest implementing first instead of any and all since the latter can
be implemented in terms of the former, but not vice versa. Here's my
proposal:
Both methods would be attributes of logical ufuncs (like reduce, reduceat etc), and
return intp type.
np.ufunc.first(a, b): Returns the first index where condition is true, -1 otherwise
np.ufunc.count(a, b): Returns the number of elements where the condition is True
np.any_equal(a, b) <-- same as --> np.equal.first(a, b) >= 0
np.all_equal(a, b) <-- same as --> np.not_equal.first(a, b) < 0
But first is more powerful because it also returns the index of match,
which is useful in many algorithms. For instance, it could be used to
implement the efficient "row sort" functionality we have wanted for so
long, since to efficiently compare two rows of a 2d array you could do:
def cmp_rows(a, b):
ind = np.not_equal.first(a, b)
return 0 if ind < 0 else cmp(a[ind], b[ind])
In my half decade of using numpy, I have reasonably often wanted a fast
short-circuiting "first" function for other algorithms too.
Performance
The point of implementing these gufuncs is performance, since we can
already compute these values using np.logical_or.reduce(a == b) (current
implementation of np.any) or np.count_nonzero. So we should do timing.
These logical gufuncs have the advantages that 1) they short-circuit
instead of testing every element (this can give a huge speed boost of we
expect a match near the start, which is often the case in code I've
written, or for row sorts), and 2) they avoid allocation of a temporary
boolean array. They have the disadvantage that they may not vectorize as
nicely because of the branching. I suspect they will be faster than
np.logical_or.reduce, though.
np.greater_than.count(a, b) might be faster then count_nonzero(a >= b)
since it avoids the temporary boolean array.
Also, what about this (possibly premature) optimization: Could we separate
out a special loop for the case where b is a constant instead of an
array? Similarly, make the second arg optional, so you can call
np.not_equal.first(a), in which case it compares to the 0-element for
that type. Then the C loop would be over while(a[i]) instead of while(a[i]
== b[j]) for boolean array, which should be faster for a few reasons. My
goal here is to really optimize the hell out of this gufunc so it beats
np.logical_or.reduce(a) in most cases.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#8528 (comment)>, or mute
the thread
<https://github.com/notifications/unsubscribe-auth/AJ2jitp-mudHezaHxNv3js7n0nGdism1ks5tUdOhgaJpZM4LsxR8>
.
|
dt_nat = np.array([np.datetime64('NaT', 'D')]) | ||
dt_other = np.array([np.datetime64('2000-01-01')]) | ||
td_nat = np.array([np.timedelta64('NaT', 'h')]) | ||
td_other = np.array([np.timedelta64(1, 'h')]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd prefer not to lose the 0d tests we already have here
yield check, np.all_less, x1, x2, (x1<x2).all() | ||
yield check, np.any_less, x1, x2, (x1<x2).any() | ||
yield check, np.all_less_equal, x1, x2, (x1<=x2).all() | ||
yield check, np.any_less_equal, x1, x2, (x1<=x2).any() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yield
tests are no more, I'm afraid - much though I liked them
@mattharrigan it might ultimately be nice to have all of them: So maybe instead of trying to do everything at once we should get |
Technical question: How can we add these two new gufuncs as attributes on the ufunc objects (like One possibility is to add an Another possibility is to use monkeypatching to only give the valid ufuncs the new methods in python. One obstacle here is that ufuncs do not currently support monkeypatching (or inheritance). But I figured out the minimal changes to this PR to get monkeypatched Does anyone find one of these options (or a third one) preferable? Is it acceptable to take over the unused pointer? Or, should we put off this question to another PR, and merge this one? In that case I suggest we add a leading underscore to these gufuncs (so |
@ahaldane - I quite like the monkey-patching except that it does seem to lead to illogical behaviour for In that respect, it might be better to define them generally, although as you say it is quite annoying that they then would appear among the methods for all ufuncs. Ideally, we'd have inheritance and define logical ufuncs that do have p.s. Not sure that I like merging anything unless we're exposing it: a private ufunc either means that it will see no use, or that people use it and it effectively becomes public anyway. |
@mhvk I hadn't realized the As for inheritance, I also thought about it but I think it will be messier than monkey-patching. The problem is that some ufuncs may belong to multiple types, eg both to "reduce-like" and "logical", meaning we may need to create complex mixtures of mixins for each ufunc. Each ufunc would be a singleton for its subclassed type, and the scalar ufuncs will no longer simply have As for merging, you're probably right. Probably the best course of action is to make a separate PR to decide how to add new methods to ufuncs, and once that is merged fix up this one. |
I very much like np.equal.all and would like to help make it happen. I also don't like that some ufuncs have methods defined which don't actually work (np.sin.reduce for instance). As another option, could each ufunc be a class and simply define 0 or more methods in C, as opposed to each ufunc being an object of the same class? Also I'm curious about the history and why all ufuncs are objects of a single class, and what obstacles might be to changing it now. |
There seems to be consensus that something like this would be useful. |
@mattharrigan Are you interested in continuing this? It needs some rework at this point, but it seems folks would like having it. |
I'm going to close this as inactive. Adding this functionality seem popular, so marking as good idea. |
Currently a work in progress. Adds functionality from issue #8513
Questions:
How should this be tied into numpy setup? Currently the build fails when clean because logical_gufuncs.c.src isnt being expanded. If I uncomment line 905 numpy.core.setup that file gets expanded into the *.c, but now the compile fails because its included twice. If I leave the expanded *.c files and recomment line 905, now the build works. "generate_umath_templated_sources" sounds like a helpful function but I don't think it is ever called. Sorry but this is my first time in the bowels of a complicated setup.py. Any advice?
I still need to add complex data types. Should I add datetimes also? Also need to add unit tests, docstrings, release notes, and probably others
Any other suggestions welcomed
Thanks