Skip to content

Add quantile loss as metric #18911

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
lorentzenchr opened this issue Nov 25, 2020 · 16 comments · Fixed by #19415
Closed

Add quantile loss as metric #18911

lorentzenchr opened this issue Nov 25, 2020 · 16 comments · Fixed by #19415
Labels
Moderate Anything that requires some knowledge of conventions and best practices module:metrics New Feature

Comments

@lorentzenchr
Copy link
Member

Describe the workflow you want to enable

I'd like to evaluate and compare the predictive performance of (conditional) quantiles as predicted by GradientBoostingRegressor(loss='quantile', alpha=0.9) for example.

Describe your proposed solution

Implement a new metric quantile_loss(y_true, y_pred, alpha=0.5), Eq. (24) of https://arxiv.org/pdf/0912.0902.pdf.

This is the same loss as in Koenker's book "Quantile Regression" and in:

if sample_weight is None:
loss = (alpha * diff[mask].sum() -
(1 - alpha) * diff[~mask].sum()) / y.shape[0]
else:
loss = ((alpha * np.sum(sample_weight[mask] * diff[mask]) -
(1 - alpha) * np.sum(sample_weight[~mask] *
diff[~mask])) / sample_weight.sum())

@lorentzenchr
Copy link
Member Author

Based on discussion in #18849, @ogrisel and @Bougeant might be interested.

@ogrisel
Copy link
Member

ogrisel commented Nov 27, 2020

also ping @GaelVaroquaux (just to let you know) as we coincidentally exchanged about that yesterday.

@ogrisel
Copy link
Member

ogrisel commented Nov 27, 2020

We would also need to document how to register this use this as a scorer to be able to do model selection on quantile regressors:

from sklearn.metrics import quantile_loss, make_scorer

neg_90_percentile_loss = make_scorer(
    quantile_loss,
    higher_is_better=False,
    alpha=0.9,
)
...
search_cv = GridSearchCV(
    quantile_regressor,
    param_grid,
    scoring=neg_05_percentile_loss,
    cv=5
)
search_cv.fit(X_train, y_train)

@GaelVaroquaux
Copy link
Member

After a cursory look at the reference above, I have the impression that it states that the alpha-quantile loss is a useful surrogate loss, in the sense that minimizing it gives a consistent estimate of alpha-quantiles, but not a calibrated metric, in the sens that I can compare two models of a different family of models using it. This is similar to the logistic loss, which is a surrogate for the zero-one classification loss.

Am I wrong? Are there theoretical arguments for using values of alpha-quantile loss as a metric across models? Thanks!

@GaelVaroquaux
Copy link
Member

Note that my comment probably does not rule out the use of the alpha-quantile loss as a metric in GridSearchCV, because surrogate losses can be useful to compare models close enough. I would worry when using it to evaluate a model before putting it in production.

@lorentzenchr
Copy link
Member Author

Are there theoretical arguments for using values of alpha-quantile loss as a metric across models?

Yes, the (alpha-) quantile loss gives strictly consistent (Fisher consistency) estimates of the true (alpha-) quantile. It is, in my understanding, very suitable to compare different models' predictions/forecasts (of the true conditional alpha-quantile), as is log-loss for probabilistic forecasts/predictions of binary events (actually, I much prefer log-loss over zero-one loss/accuracy if applicable=predict_proba exists).

What exactly do you mean by calibration and surrogate loss?
Again, in my understanding, MSE, log-loss and the quantile loss are all suited for out-of-sample/out-of-time comparison of different models (predicting the expectation for MSE and log-loss* and the alpha-quantile for the quantile loss) of any kind, e.g. neural nets vs linear models vs gut feeling.

*For binary events, the having the expectation is equivalent of having the whole distribution.

@GaelVaroquaux
Copy link
Member

Surrogate loss = loss that you can minimize (eg because it is differentiable) and serves as a proxy of the actual error measure that you are interested in.

The hinge loss and the logistic loss are two surrogates of the zero-one classification error. But comparing two models based on there logistic loss does guarrantie that the best performer will also give the smallest zero-one classification error.

You state that the quantile loss is well suited for out-of-sample comparison of different models. Do you have a paper / a theoretical argument that establishes this? For the log loss, I actually am pretty certain that it is not advised to conclude on zero-one classification error after comparing log losses. For the MSE, it is different: squared error is, at the sample level, an unbiased estimate of the distance to the expectation. So summing squared errors (as with MSE) gives an unbiased estimate of the error to the conditional expectation. I do not know a similar result for conditional quantile. But again, I may have missed something.

@lorentzenchr
Copy link
Member Author

🤔 Let's focus on regression to make things easier. The above given reference Making and Evaluating Point Forecasts is about point forecast comparison. All you have to do is to translate it to a regression setting. (Some dictionary: Replace forecaster by model, forecast by model prediction. Note that forecasts are evaluated out-of-sample, so to speak.)

In this type of situation, competing point forecasters or forecasting procedures are compared and assessed by means of an error measure, such as the absolute error or the squared error, which is averaged over forecast cases.

So, any strictly consistent scoring function (for a functional of interest like a quantile) is suited for:

  1. Fitting => M-estimation
  2. Model comparison: Compare the true error measure for each model (=E[scoring function(Y, model prediction(X)]). Estimate the true error measure preferably on out-of-sample (test) data in order to have models independent of (test) data => (more) unbiased estimate of the true expected error.

For goodness of fit (focus on in-sample), have a look at "Goodness of Fit and Related Inference Processesfor Quantile Regression", R. Koenker, J. A. Machado, http://dx.doi.org/10.1080/01621459.1999.10473882, free PDF.

@lorentzenchr lorentzenchr added help wanted Moderate Anything that requires some knowledge of conventions and best practices labels Dec 13, 2020
@ogrisel
Copy link
Member

ogrisel commented Dec 14, 2020

I am not sure if we should name it quantile_loss in scikit-learn as it might not be the only way to score conditional quantile prediction models. I believe this loss is often referred to as the pinball loss. If we decide not to name it the pinball loss, I think the docstring (and possibly the user guide) should at least mention the name pinball loss and possibly the following reference:

Estimating conditional quantiles with the help of the pinball loss
Ingo Steinwart, Andreas Christmann
https://arxiv.org/abs/1102.2101

By googling, I also found about this recent paper which seems relevant to this discussion:

Beyond Pinball Loss: Quantile Methods for Calibrated Uncertainty Quantification
Youngseog Chung, Willie Neiswanger, Ian Char, Jeff Schneider
https://arxiv.org/abs/2011.09588

The new method introduced in this paper is too recent to be considered for inclusion in scikit-learn but I think that might be an interesting reference on how (some) researchers consider the problem of evaluating models for conditional quantile prediction.

@lorentzenchr
Copy link
Member Author

@ogrisel I'm fine with calling it pinball_loss, though I don't know who has begun to call it so, this is from 2006. The term "asymmetric piecewise linear, strictly consistent scoring function for the quantile" would also nail it down:smirk:

Indeed, there are infinitely many strictly consistent scoring functions of the quantile, all having the form, see paper cited above: S(y, x) = (1_(x ≥ y) − α) (g(x) − g(y)) and g(x) a strictly increasing function.

I know, you only googled, but I'm not that convinced of arxiv:2011.09588.

@Sandy4321
Copy link

hello friends
is it possible to use pinball-loss
https://scikit-learn.org/dev/modules/model_evaluation.html#pinball-loss

for other than GBM
the matter is I do have good performance for
https://scikit-learn.org/stable/modules/generated/sklearn.svm.LinearSVR.html
but performance for GradientBoostingRegressor

lets say categorical data encoded to one hot then it will be
200000 parse binary features and 20 million rows

It would be great to apply pinball-loss
for some simple regressor like
https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.Lasso.html?highlight=lasso#sklearn.linear_model.Lasso
or
https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.Ridge.html
or
SGD?

@lorentzenchr
Copy link
Member Author

Have a look at #18997.

@Sandy4321
Copy link

@lorentzenchr
@GaelVaroquaux
@cmarmo

hello friends

6 months pasts , but still it is not clear where to use pinball-loss except GradientBoostingRegressor
for example how to use pinball-loss for linear regression , for example through SGD
in other words custom metric for SGD
https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.SGDRegressor.html#sklearn.linear_model.SGDRegressor
or what ever linear regression methods ?

@GaelVaroquaux
Copy link
Member

Linear Quantile regression is also implemented in QuantileRegressor
https://scikit-learn.org/stable/auto_examples/linear_model/plot_quantile_regression.html

Please note that this is a user question, and that pinging developers on an issue to ask a user question is not very good practice (we are overwhelmed with requests). Best

@Sandy4321
Copy link

great thanks for answer
but my guess
https://scikit-learn.org/stable/auto_examples/linear_model/plot_quantile_regression.html

1
is not so flexible as SGD regression implementation ?
for example for sparse one hot data SGD regression is very good
2
it is not ridge regression
it is written
https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.QuantileRegressor.html#sklearn.linear_model.QuantileRegressor
This model uses an L1 regularization like Lasso.

Thank you very much in advance for answer

@cmarmo
Copy link
Contributor

cmarmo commented Jan 7, 2022

@Sandy4321 this issue is closed and you are asking a general question. Please open a github Discussions for general questions. Thanks for your understanding.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Moderate Anything that requires some knowledge of conventions and best practices module:metrics New Feature
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants