Skip to content

Make _weighted_percentile more robust #6189

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
MechCoder opened this issue Jan 19, 2016 · 5 comments
Closed

Make _weighted_percentile more robust #6189

MechCoder opened this issue Jan 19, 2016 · 5 comments

Comments

@MechCoder
Copy link
Member

As reported by @maniteja123

y_true = [0, 1]
weights = [1, 1]
_weighted_percentile(y_true, weights, 50)
0

Do we want to do some sort of linear interpolation as described in this method?
https://en.wikipedia.org/wiki/Percentile#The_Weighted_Percentile_method

@MechCoder MechCoder changed the title Make _weighted_percentile more stronger Make _weighted_percentile more robust Jan 19, 2016
@mrecachinas
Copy link

@amueller If this needs a contributor, I'd like to tackle it. I'm confused though why this is different than what is being proposed in #6217.

@amueller
Copy link
Member

amueller commented Oct 9, 2016

@mrecachinas I need to look more deeply into that but it doesn't seem to touch _weighted_percentile

mrecachinas pushed a commit to mrecachinas/scikit-learn that referenced this issue Oct 13, 2016
This addresses scikit-learn#6189. It follows [the Weighted Percentile method](https://en.wikipedia.org/wiki/Percentile#The_Weighted_Percentile_method)
from Wikipedia. An example that this addresses is
```
y_true = [0, 1]
weights = [1, 1]
_weighted_percentile(y_true, weights, 50)
\# before: output ==> 0
\# after: output ==> 0.5
```
@lorentzenchr
Copy link
Member

Another way for this feature is if numpy/numpy#9211 gets merged.
@MechCoder @mrecachinas Are there use cases within scikit-learn?

@glemaitre
Copy link
Member

@lorentzenchr I just found this old issue.
Basically, I think that we are going to solve the issue in #17377
It has some effect in the GBDT and I think that we will be limited regarding the implementation with assumption that we do about the sample_weight.

@lorentzenchr
Copy link
Member

Actually, this is a non-issue:

import numpy as np
np.quantile(y, 0.5, method="inverted_cdf")  # 0

results in 0. And that is totally correct. In fact, each value in [0, 1] minimizes the mean absolute error:

from sklearn.metrics import mean_absolute_error

mean_absolute_error(y, [0, 0])  #  0.5
mean_absolute_error(y, [1, 1])  #  0.5

Also, each value $q_{0.5} \in [0, 1]$ fulfills the classical nonexceedance criterion $P(q_\alpha < Y) \leq \alpha$ and $P(Y \leq Y)\geq \alpha$ for $\alpha=0.5$.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment