-
-
Notifications
You must be signed in to change notification settings - Fork 10.8k
ENH: added axis param for np.count_nonzero #7177
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
☔ The latest upstream changes (presumably #7246) made this pull request unmergeable. Please resolve the merge conflicts. |
@jaimefrio : Unfortunately, I have not been able to make much progress on refactoring the code into C. However, I think it should be good to merge as is since it performs relatively well, and the tests pass. If someone else wants to take a look at this, that would be great as well! |
Can somebody take a look at this? I've been pinging this PR for a couple of weeks now with no response from anyone. The code should be ready to land. |
☔ The latest upstream changes (presumably #7346) made this pull request unmergeable. Please resolve the merge conflicts. |
Alright, this is the second time I've encountered a merge conflict on this PR. Could someone take a look at this and try to land it? |
☔ The latest upstream changes (presumably #7178) made this pull request unmergeable. Please resolve the merge conflicts. |
This code currently has a lot of branches. Given the apparent lack of existing test coverage, I'm not even sure all of them are being exercised. It needs to test every valid dtype, and to test invalid (e.g., I have also have some broader concerns about
|
return (multiarray.count_nonzero(a) if axis is None | ||
else (a != '').sum(axis=axis, dtype=intp)) | ||
|
||
return (multiarray.count_nonzero(a) if axis is None |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What falls through to this case? Does it really make sense to use a potentially very slow loop with apply_along_axis
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I covered most of the common cases above, and when the objects start getting really weird (i.e. dtype=object
cases), behaviour is unfortunately quite unpredictable with regards to how it will behave when counting along multiple axes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be nice if we had a simple is_nonzero
function that would return a boolean array. Then we could simply use sum
here instead of apply_along_axis (which feels very hacky).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Bleh, I was going to say that casting-to-bool was is_nonzero
, but it's not true -- string->bool type coercion doesn't check truthiness, it tries to parse the strings or something :-/. I guess internally we don't care about truthiness except in the case of if arr: ...
, which can be handled by checking shape + count_nonzero
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@njsmith: Yeah...I hit my head against that option as well. Needless to say, my definition of 'nonzero-ness' has been forever altered...:smile:
I haven't looked at the rest, but I can explain the name :-). In py2, "nonzero" is the official name for the property that's more commonly known as "truthiness" (see |
OK, that makes sense for the definition of "nonzero". It would be good to document that :). |
@njsmith : If we could rename it, what should it be called? That feels like a |
@shoyer: +1 for the documentation. That has been added along with |
This still needs comprehensive test coverage, as I mentioned in my comment above |
@shoyer: Ah, good point. Thanks for the reminder! |
Could the folks involved here take another look. |
size = (5, 5, 5, 5) | ||
msg = "Mismatch for axis: %s" | ||
|
||
m = randint(-100, 100, size=size) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Use a random seed to ensure that this test can't fail in a stochastic fashion. I recommend using np.random.RandomState
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
@rgommers : I can try rewriting the current |
@rgommers : simplicity seems best - ended up using the current |
☔ The latest upstream changes (presumably #7689) made this pull request unmergeable. Please resolve the merge conflicts. |
@rgommers : any update on this? |
Travis is happy, except for the spurious failure on |
@rgommers : any update on this? |
return a.sum(axis=axis, dtype=np.intp) | ||
|
||
if issubdtype(a.dtype, np.number): | ||
return (a != 0).sum(axis=axis, dtype=np.intp) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This allocates a new boolean array of the same shape as the original. I thought the whole point was to avoid doing that...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@madphysicist: When was that the whole point?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought wrong apparently. It just seems a bit hacky to do that with a function that is implemented in C exactly to avoid such an operation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hacky, a bit, but it does get the job done without too much sadness.
+1 for the overall idea. I started something in C along similar lines. My main objection is the creation of boolean temp arrays in corner cases. I do like the idea of adding the results along multiple axes once you have applied it to one. |
@madphysicist : thanks for taking a look! Would be great to get feedback from the maintainers (@rgommers or someone else) too so that this can actually be merged. |
Can somebody take a look at this? |
+1 from the peanut gallery. I think getting the |
I second that. I would really like to see this in numpy regardless of whether it could be done a different way. |
Can somebody take a look at this? |
OK, this looks pretty sane to me at this point, so I will merge after CI tests pass. |
Fixes numpy#9728 This bug was introduced with the `axis` keyword in numpy#7177, as a misguided optimization.
array([2, 3]) | ||
|
||
""" | ||
if axis is None or axis == (): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd consider this a bug, described in #9728.
Fixes numpy#9728 This bug was introduced with the `axis` keyword in numpy#7177, as a misguided optimization.
Addresses feature request in #391 to add an axis parameter to
np.count_nonzero
.Duplicate of #7138 after a forced push on another branch accidentally reset my origin branch for that PR to
master
, closing it automatically (I had reset hard on the #7138 branch tomaster
so I could do the C refactoring @jaimefrio suggested in the spirit of #4330, whoops).