Skip to content

[MRG] Add interpolation to _weighted_percentile (Addresses #6189) #7662

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed

Conversation

mrecachinas
Copy link

@mrecachinas mrecachinas commented Oct 13, 2016

Reference Issue

Addresses #6189.

What does this implement/fix? Explain your changes.

It follows the Weighted Percentile method to implement interpolation in _weighted_percentile.

Any other comments?

Changing and adding test cases currently.

An example that this changes is

y_true = [0, 1]
weights = [1, 1]
_weighted_percentile(y_true, weights, 50)
# before: output ==> 0
# after: output ==> 0.5

@mrecachinas mrecachinas changed the title [WIP] Add interpolation to _weighted_percentile [WIP] Add interpolation to _weighted_percentile (Addresses #6189) Oct 13, 2016
@amueller
Copy link
Member

tests are failing

@mrecachinas
Copy link
Author

mrecachinas commented Oct 15, 2016

Looked into it. One of the cases where it's failing is in test_gradient_boosting_loss_functions.test_weighted_percentile_zero_weight

The issue is the 100. / weight_cdf[-1] in

p_n = 100. / weight_cdf[-1] * ( weight_cdf - sample_weight / 2. )

when the weights are 0. What behavior is expected in this case? Uniform weighting?

@mrecachinas
Copy link
Author

The other case where tests are failing involves test_gradient_boosting.test_boston.

The issue seems to be the interpolation is causing the result to be slightly different than the expected. This seems consistent with introducing interpolation -- e.g., consider the original example:

weight = np.array([1, 1])
data = np.array([0, 1])
_weighted_percentile(data, weight, 50)

Originally, this returned 0. Now it returns 0.5.

@mrecachinas
Copy link
Author

mrecachinas commented Oct 15, 2016

It seems like currently in the zero-weight case, the smallest value is chosen, i.e., if percentile_idx = 0 (which happens in the 0-weight case), then sorted_idx[percentile_idx] is simply choosing the min value in the array. (Relevant lines 56-59)

@mrecachinas mrecachinas force-pushed the robust-weighted-percentile branch 2 times, most recently from 7b45cd9 to b309f9e Compare November 10, 2018 15:12
@mrecachinas mrecachinas changed the title [WIP] Add interpolation to _weighted_percentile (Addresses #6189) [MRG] Add interpolation to _weighted_percentile (Addresses #6189) Nov 10, 2018
@mrecachinas
Copy link
Author

@amueller Sorry for the 2 year delay! Should be ready to merge.

@mrecachinas mrecachinas force-pushed the robust-weighted-percentile branch from b309f9e to 6811f15 Compare May 3, 2020 22:01
@glemaitre
Copy link
Member

Arfff. It is only now that I am finding this PR. I am sorry I opened #17768 instead.
I will add you in the contributor.

@mrecachinas
Copy link
Author

@glemaitre No worries at all, and thanks!

I'll close this PR in favor of yours.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants