Skip to content

ENH: Adding support to the range keyword for estimation of the optimal number of bins and associated tests #7243

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Feb 15, 2016

Conversation

madphysicist
Copy link
Contributor

This PR is a quick fixup of #6288 made on behalf of @nayyarv. I was unable to make a PR directly back into Varun's repo for him to update his branch.

@nayyarv
Copy link
Contributor

nayyarv commented Feb 14, 2016

I'm happy with it, thanks @madphysicist.

@seberg, @njsmith

@nayyarv
Copy link
Contributor

nayyarv commented Feb 14, 2016

I also like the idea of raising a RunTimeWarning instead of a TypeError when weighted data is passed in.

@charris charris added 01 - Enhancement component: numpy.lib 56 - Needs Release Note. Needs an entry in doc/release/upcoming_changes labels Feb 14, 2016
@njsmith njsmith added this to the 1.11.0 release milestone Feb 14, 2016
@njsmith njsmith added the 08 - Backport Used to tag backport PRs label Feb 14, 2016
@njsmith
Copy link
Member

njsmith commented Feb 14, 2016

Looks good to me, thanks @nayyarv and @madphysicist!

I feel strongly that we should raise the error rather than print a warning and return possibly nonsensical results. If nothing else, once we start returning nonsensical results we may get stuck doing that forever due to backcompat issues, while a error is something we can easily fix once we decide what the right answer is.

@njsmith
Copy link
Member

njsmith commented Feb 14, 2016

@charris I think this should get a backport to 1.11, in such case it doesn't need a release note because it will be part of the initial implementation of automatic bin selection (which should already have a release note). I'll leave you to make the final decision and hit merge here (or not), though?

@charris
Copy link
Member

charris commented Feb 14, 2016

@njsmith A backport would be fine with me. So if you want to merge this, remove the release note tag and go ahead.

mn, mx = data_range
keep = (a >= mn)
keep &= (a <= mx)
if not np.logical_and.reduce(keep):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

use np.all(keep) instead

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree that maintainability and legibility outweighs a fractional improvement in speed. However, @nayyarv made a pretty good argument for keeping it this way. Could I have one more person weigh in to break the tie?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Previous discussion with @nayyarv's comment was here: #6288 (comment)

I.e., this file uses and.reduce elsewhere for this same computation. I'm not too bothered either way, using add.reduce consistently isn't really worse than using all in some places and add.reduce in others, and I'd like to get this moving for 1.11, so @shoyer -- I'm going to merge this and we can figure out whether we care about add.reduce versus all as a separate matter? Feel free to squawk if you think this is a grave mistake :-)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oops... I forgot I already made this comment!

This is fine :)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess this was an inadverdant double-blind test of your review consistency. You did better than NIPS ;-)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are some advantages to np.all

    arr = asanyarray(a)
    kwargs = {}
    if keepdims is not np._NoValue:
        kwargs['keepdims'] = keepdims
    return arr.all(axis=axis, out=out, **kwargs)

I don't know if they apply here, but I do know that when I did the initial nanfunctions using the higher level np.sum and such was urged upon me and did indeed improve behavior with subclasses and such.

@madphysicist
Copy link
Contributor Author

I have been doing some thinking about how to do weighted partitioning for functions like median and percentile. Looking at the existing C code, it should be a reasonable amount of work to add a weights parameter to all those functions as long as the weights are restricted to reals. I don't think that a separate module is a good idea since weights are meaningless for many of the functions in function_base, just an extra parameter for the relevant ones will be fine. Is there interest in me proceeding with this?

@njsmith njsmith removed the 56 - Needs Release Note. Needs an entry in doc/release/upcoming_changes label Feb 15, 2016
@njsmith
Copy link
Member

njsmith commented Feb 15, 2016

@charris: Okay, merging

njsmith added a commit that referenced this pull request Feb 15, 2016
ENH: Adding support to the range keyword for estimation of the optimal number of bins and associated tests
@njsmith njsmith merged commit 401ebba into numpy:master Feb 15, 2016
@njsmith
Copy link
Member

njsmith commented Feb 15, 2016

@madphysicist: that sounds like useful functionality in general, sure, but this probably isn't the place to discuss it :-). The mailing list or a new issue might be a good place if you want to get some feedback on the general idea...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants