ENH: Adding support to the range keyword for estimation of the optimal number of bins and associated tests #7243

madphysicist · 2016-02-14T04:24:19Z

This PR is a quick fixup of #6288 made on behalf of @nayyarv. I was unable to make a PR directly back into Varun's repo for him to update his branch.

…l number of bins and associated tests

nayyarv · 2016-02-14T04:46:20Z

I'm happy with it, thanks @madphysicist.

@seberg, @njsmith

nayyarv · 2016-02-14T07:08:34Z

I also like the idea of raising a RunTimeWarning instead of a TypeError when weighted data is passed in.

njsmith · 2016-02-14T22:34:39Z

Looks good to me, thanks @nayyarv and @madphysicist!

I feel strongly that we should raise the error rather than print a warning and return possibly nonsensical results. If nothing else, once we start returning nonsensical results we may get stuck doing that forever due to backcompat issues, while a error is something we can easily fix once we decide what the right answer is.

njsmith · 2016-02-14T22:36:59Z

@charris I think this should get a backport to 1.11, in such case it doesn't need a release note because it will be part of the initial implementation of automatic bin selection (which should already have a release note). I'll leave you to make the final decision and hit merge here (or not), though?

charris · 2016-02-14T23:00:34Z

@njsmith A backport would be fine with me. So if you want to merge this, remove the release note tag and go ahead.

shoyer · 2016-02-14T23:39:47Z

numpy/lib/function_base.py

+        mn, mx = data_range
+        keep = (a >= mn)
+        keep &= (a <= mx)
+        if not np.logical_and.reduce(keep):


use np.all(keep) instead

I agree that maintainability and legibility outweighs a fractional improvement in speed. However, @nayyarv made a pretty good argument for keeping it this way. Could I have one more person weigh in to break the tie?

Previous discussion with @nayyarv's comment was here: #6288 (comment)

I.e., this file uses and.reduce elsewhere for this same computation. I'm not too bothered either way, using add.reduce consistently isn't really worse than using all in some places and add.reduce in others, and I'd like to get this moving for 1.11, so @shoyer -- I'm going to merge this and we can figure out whether we care about add.reduce versus all as a separate matter? Feel free to squawk if you think this is a grave mistake :-)

Oops... I forgot I already made this comment!

This is fine :)

I guess this was an inadverdant double-blind test of your review consistency. You did better than NIPS ;-)

There are some advantages to np.all

arr = asanyarray(a) kwargs = {} if keepdims is not np._NoValue: kwargs['keepdims'] = keepdims return arr.all(axis=axis, out=out, **kwargs)

I don't know if they apply here, but I do know that when I did the initial nanfunctions using the higher level np.sum and such was urged upon me and did indeed improve behavior with subclasses and such.

madphysicist · 2016-02-15T00:06:01Z

I have been doing some thinking about how to do weighted partitioning for functions like median and percentile. Looking at the existing C code, it should be a reasonable amount of work to add a weights parameter to all those functions as long as the weights are restricted to reals. I don't think that a separate module is a good idea since weights are meaningless for many of the functions in function_base, just an extra parameter for the relevant ones will be fine. Is there interest in me proceeding with this?

njsmith · 2016-02-15T00:11:10Z

@charris: Okay, merging

ENH: Adding support to the range keyword for estimation of the optimal number of bins and associated tests

njsmith · 2016-02-15T00:11:59Z

@madphysicist: that sounds like useful functionality in general, sure, but this probably isn't the place to discuss it :-). The mailing list or a new issue might be a good place if you want to get some feedback on the general idea...

ENH: Adding support to the range keyword for estimation of the optima…

62bb0cb

…l number of bins and associated tests

madphysicist mentioned this pull request Feb 14, 2016

MAINT/ENH: Support for weights and range when estimating optimal number of bins #6288

Closed

charris added 01 - Enhancement component: numpy.lib 56 - Needs Release Note. Needs an entry in doc/release/upcoming_changes labels Feb 14, 2016

njsmith added this to the 1.11.0 release milestone Feb 14, 2016

njsmith added the 08 - Backport Used to tag backport PRs label Feb 14, 2016

shoyer reviewed Feb 14, 2016
View reviewed changes

njsmith removed the 56 - Needs Release Note. Needs an entry in doc/release/upcoming_changes label Feb 15, 2016

njsmith added a commit that referenced this pull request Feb 15, 2016

Merge pull request #7243 from madphysicist/autobins

401ebba

ENH: Adding support to the range keyword for estimation of the optimal number of bins and associated tests

njsmith merged commit 401ebba into numpy:master Feb 15, 2016

homu mentioned this pull request Feb 15, 2016

MAINT: Cleanup for histogram bin estimator selection #7199

Merged

madphysicist deleted the autobins branch February 15, 2016 11:58

madphysicist mentioned this pull request Mar 13, 2016

How will new histogram bin selectors work with range parameter? #7411

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

ENH: Adding support to the range keyword for estimation of the optimal number of bins and associated tests #7243

ENH: Adding support to the range keyword for estimation of the optimal number of bins and associated tests #7243

Uh oh!

madphysicist commented Feb 14, 2016

Uh oh!

nayyarv commented Feb 14, 2016

Uh oh!

nayyarv commented Feb 14, 2016

Uh oh!

njsmith commented Feb 14, 2016

Uh oh!

njsmith commented Feb 14, 2016

Uh oh!

charris commented Feb 14, 2016

Uh oh!

shoyer Feb 14, 2016

Uh oh!

madphysicist Feb 15, 2016

Uh oh!

njsmith Feb 15, 2016

Uh oh!

shoyer Feb 15, 2016

Uh oh!

njsmith Feb 15, 2016

Uh oh!

charris Feb 15, 2016

Uh oh!

madphysicist commented Feb 15, 2016

Uh oh!

njsmith commented Feb 15, 2016

Uh oh!

njsmith commented Feb 15, 2016

Uh oh!

Uh oh!

Uh oh!

ENH: Adding support to the range keyword for estimation of the optimal number of bins and associated tests #7243

ENH: Adding support to the range keyword for estimation of the optimal number of bins and associated tests #7243

Uh oh!

Conversation

madphysicist commented Feb 14, 2016

Uh oh!

nayyarv commented Feb 14, 2016

Uh oh!

nayyarv commented Feb 14, 2016

Uh oh!

njsmith commented Feb 14, 2016

Uh oh!

njsmith commented Feb 14, 2016

Uh oh!

charris commented Feb 14, 2016

Uh oh!

shoyer Feb 14, 2016

Choose a reason for hiding this comment

Uh oh!

madphysicist Feb 15, 2016

Choose a reason for hiding this comment

Uh oh!

njsmith Feb 15, 2016

Choose a reason for hiding this comment

Uh oh!

shoyer Feb 15, 2016

Choose a reason for hiding this comment

Uh oh!

njsmith Feb 15, 2016

Choose a reason for hiding this comment

Uh oh!

charris Feb 15, 2016

Choose a reason for hiding this comment

Uh oh!

madphysicist commented Feb 15, 2016

Uh oh!

njsmith commented Feb 15, 2016

Uh oh!

njsmith commented Feb 15, 2016

Uh oh!

Uh oh!