-
-
Notifications
You must be signed in to change notification settings - Fork 11.1k
ENH: Adding support to the range keyword for estimation of the optimal number of bins and associated tests #7243
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
…l number of bins and associated tests
I'm happy with it, thanks @madphysicist. |
I also like the idea of raising a |
Looks good to me, thanks @nayyarv and @madphysicist! I feel strongly that we should raise the error rather than print a warning and return possibly nonsensical results. If nothing else, once we start returning nonsensical results we may get stuck doing that forever due to backcompat issues, while a error is something we can easily fix once we decide what the right answer is. |
@charris I think this should get a backport to 1.11, in such case it doesn't need a release note because it will be part of the initial implementation of automatic bin selection (which should already have a release note). I'll leave you to make the final decision and hit merge here (or not), though? |
@njsmith A backport would be fine with me. So if you want to merge this, remove the release note tag and go ahead. |
mn, mx = data_range | ||
keep = (a >= mn) | ||
keep &= (a <= mx) | ||
if not np.logical_and.reduce(keep): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
use np.all(keep)
instead
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree that maintainability and legibility outweighs a fractional improvement in speed. However, @nayyarv made a pretty good argument for keeping it this way. Could I have one more person weigh in to break the tie?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Previous discussion with @nayyarv's comment was here: #6288 (comment)
I.e., this file uses and.reduce
elsewhere for this same computation. I'm not too bothered either way, using add.reduce
consistently isn't really worse than using all
in some places and add.reduce
in others, and I'd like to get this moving for 1.11, so @shoyer -- I'm going to merge this and we can figure out whether we care about add.reduce
versus all
as a separate matter? Feel free to squawk if you think this is a grave mistake :-)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oops... I forgot I already made this comment!
This is fine :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess this was an inadverdant double-blind test of your review consistency. You did better than NIPS ;-)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are some advantages to np.all
arr = asanyarray(a)
kwargs = {}
if keepdims is not np._NoValue:
kwargs['keepdims'] = keepdims
return arr.all(axis=axis, out=out, **kwargs)
I don't know if they apply here, but I do know that when I did the initial nanfunctions using the higher level np.sum
and such was urged upon me and did indeed improve behavior with subclasses and such.
I have been doing some thinking about how to do weighted partitioning for functions like median and percentile. Looking at the existing C code, it should be a reasonable amount of work to add a |
@charris: Okay, merging |
ENH: Adding support to the range keyword for estimation of the optimal number of bins and associated tests
@madphysicist: that sounds like useful functionality in general, sure, but this probably isn't the place to discuss it :-). The mailing list or a new issue might be a good place if you want to get some feedback on the general idea... |
This PR is a quick fixup of #6288 made on behalf of @nayyarv. I was unable to make a PR directly back into Varun's repo for him to update his branch.