-
-
Notifications
You must be signed in to change notification settings - Fork 10.8k
ENH: Added axis param for np.count_nonzero #7138
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Below is some timing information On
On
As can be observed, these changes cause a noticeable speed-up in the cases where |
array([2, 3]) | ||
|
||
""" | ||
if not isinstance(a, ndarray): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the same as a = np.asanyarray(a)
, or a = np.asarray(a)
if you do not want to let subclasses through.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Which part? I'm not sure I follow with what you boxed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
np.asanyarray
checks if its argument is an array (or a subclass), and returns it unchanged if it is. If it is not, it tries to convert its input to an array. So instead of doing the check yourself and then calling np.array
, you can unconditionally call a = np.asanyarray(a)
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, that is true, and it is more compact. Done.
Unfortunately not really what we want. An alternative would be to figure out a way to use the count_nonzero code in the add.reduce ufunc (+ overhead avoiding) As those two options currently look relatively far away maybe as a middle ground the code could call into the flattened version when possible and use sum like here when not. |
also please try to keep pep8 changes and new code in separate commits ( |
@juliantaylor : Yeah...I had a lot of issues incorporating the |
probably adding something along the lines of this could work: if axis is None and a.dtype == bool:
return multiarray.core.count_nonzero(a.ravel()) |
How about this?
|
|
||
return a.sum(axis=axis) | ||
|
||
nonzeros = (a != 0) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This would not work for object arrays. You need to use bool(...)
logic there. Though I admit that the current inner loop func is bugged, since it should be checking for errors.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure I quite follow your comment here. Could you elaborate?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"asdf" != 0, but bool("asdf") evaluates True of course, and so it is counted.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"asdf" != 0 also evaluates True
, right? I'm not sure I follow the point your explanation is trying to make.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, starting to be unconcentrated, but then "" != 0 as well, and bool("") evaluates False, same thing. Also you are not limited to strings, does not make it better.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, okay, I see what you mean. Empty strings, arrays, etc. should be counted as zero's. So it really isn't counting zero's then? It's more counting how many False's we get when iterating over "elements" of the array?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is counting "nonzero", which for objects maps to obj.__nonzero__()
, which is the mechanism behind bool(obj)
. bool(np.array([1, 2]))
raises an error (unlike lists), and this error seems unhandled, and should probably raise even here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
At line 432, you mean? If that's the case, I think backwards compatibility gets broken then. Perhaps maybe I should divide logic between whether axis is provided or not like this:
if axis is None:
...optimize if possible but call multiarray.count_nonzero if possible
else:
...messy logic here
Rewrote the function so that we don't have to use element-wise comparisons. |
This is just a "restart" if you may. I will also attempt to optimize based on |
elif issubdtype(a.dtype, str): | ||
a = (a != '') | ||
|
||
if a.dtype.name == 'bool': |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
use
a.dtype == bool
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure thing. Not sure why I keep NOT changing that.
don't know if it is very useful, but as this puts a ufunc like interface on top of count_nonzero it should also support the an issue is that it then has a ufunc like interface but without the |
@juliantaylor : The issue is that the interface is only applied in certain cases, not to mention it adds additional complexity to a function that has enough issues as it is. I'm not sure I want to tackle that in this PR given all those considerations. |
I guess you know see why it didn't exist up to now, adding new functions needs a lot of consideration, there is only one chance at getting it right, after release it cannot be changed anymore. but I accept it might be a bit much for this PR and can be added later, but a ticket with a milestone needs to be created. |
I do concede that changing an API after release is a lot more difficult than beforehand, though it is something I'm not all too unfamiliar with given my other PR's. 😄 |
Rewrote some of the important optimizations used in |
# multiarray is faster with | ||
# booleans when axis is None | ||
if a.dtype == bool: | ||
return multiarray.count_nonzero(a) if axis is None \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Backslashes are not very PEP8 compliant, use parentheses:
return (multiarray.count_nonzero(a) is axis is None
else a.sum(axis=axis, dtype=intp))
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed.
Is the increased memory use of always creating a boolean array before counting a concern? Because if it is not it should be relatively simple to rewrite all of this, including multiple axes, in C without needing to use the iterator, just relying on operating on a newly created contiguous array. |
I am not aware of any such issues ATM. I do agree it certainly is preferable to write the code in C (more consistent with other |
There's a stalled PR, #4330, that has a working implementation of an axis argument for |
Alrighty! Sounds good. I'll have a look, and if I can't make much sense of it, I'll let you know. IMHO, this PR is probably ready to merge in its current form because it adds the |
Addresses feature request in #391 to add an axis parameter to
np.count_nonzero
.@jaimefrio (from the discussion, it seems you had done some work on it already?)