-
-
Notifications
You must be signed in to change notification settings - Fork 10.8k
Use count_nonzero instead of sum for boolean arrays #8913
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
If so then an even better solution would be to port the |
That's basically what I thought. |
Note that boolean sum currently involves a type promotion from You can time this via:
since the latter just calls into sum, while the former uses the optimized calll |
It might work to create a "lb->l" loop, or similar, and insert it in front of the "ll->l" loop. It will require modifying the strange logic currently used to force the "ll->l" loop... I am right now looking into changing the type resolution a bit, hoping to make it a bit saner before I attempt replacing it with a completely new version. But... it will mean ABI/API breaking of the type resolution and maybe loop finding slots, so (scipy doesn't use them, numba doesn't use them, and it is pretty crazy to use them in the first place, so hmmm). |
I started trying to create a lb->l loop locally, but the speed up was rather disappointing. Some parts of the patch are salvageable |
@eric-wieser I guess the overhead the issue with that patch, or that it just was not very fast? Might be interesting. There is some really strange logic in the whole TypeResolution about how scalars and how the specified dtypes are handled. I am trying to pry that out a bit (and delete a bunch of code, because I do not believe it makes a difference). |
There's a use case we had not considered in our type resolution discussion. Normally type resolution takes input types and an optional output type, and deduces a full loop - so For the case of a reduction, we don't actually know the input type yet, because we want it to match the output type. We're looking for a loop of the form |
Pushed the patch to my |
Yeah, the loop lookup will have to know that it is a reduction or accumulation (also because of other things such as np.multiply.reduce actually needing to count the number of items being reduced in principle to decide the final output unit – not the dtype though). Possibly, that would just not implement reduce though for starters. We also have some types already specified/fixed, which makes the problem a bit more high dimensional to cche, but that should be OK. About the scalar logic: I would like to move it to the beginning of the ufunc parsing even before starting on actual UfuncImpl stuff. I am slow with the big step, so hope that such things may help prepare it... |
Don't assume nobody uses their own type resolver... https://github.com/astropy/astropy/blob/master/astropy/_erfa/ufunc.c.templ#L518. That said, I'd quite happily adapt to anything that is saner. Also, I recall being rather annoyed at when the functions get called - though I don't remind the details off the top of my head. Anyway, would be very interested to look at ideas for how to change things. |
Seriously, ah dang... So maybe call it if it exists and otherwise call a new one... (and put a Deprecation on it). The other problem is, that to get really sane in the long run, we need to put the TypeResolution behind a first dispatching step. But it certainly should be possible to support this indefinitely (for ufuncs you create yourself in any case as at least you do not replace a TypeResolution on a ufunc someone else created). |
@seberg - in my case, I was definitely trying to circumvent problems with the current type casting, so my override would hopefully not be necessary any more. Please ignore it while you think about how you'd want it to be!! |
@mhvk yes, of course I will not think about it much. But, I thought I could try some cleaning up first. The |
Hello,
I was computing some correlations while I noticed that I can easily make an implementation significantly faster than
coffcoef
by replacingsum()
withcount_nonzero()
. Therefore, I think wheneversum()
is called on a boolean array, it should usecount_nonzero
or such instead of addition.The text was updated successfully, but these errors were encountered: