Skip to content

ENH Add mean_pinball_loss metric for quantile regression #19415

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 63 commits into from
Feb 18, 2021
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
63 commits
Select commit Hold shift + click to select a range
468c156
add pinball_error
sdpython Feb 9, 2021
05dca44
fix \r issue
sdpython Feb 9, 2021
367b265
add \r
sdpython Feb 9, 2021
d710a6f
add \r
sdpython Feb 9, 2021
72d2d51
remove \r
sdpython Feb 9, 2021
5bc2bc5
fix lint issue
sdpython Feb 9, 2021
621abdd
lint
sdpython Feb 9, 2021
67f3c02
lint
sdpython Feb 9, 2021
789442c
rename pinball_error into pinball_loss
sdpython Feb 9, 2021
076452d
check exception is raised
sdpython Feb 9, 2021
359538e
Fix failing unit test
sdpython Feb 10, 2021
1c2d6d2
refactor example on gradient boosting
sdpython Feb 10, 2021
ef64dd0
lint
sdpython Feb 10, 2021
9e56a19
lint
sdpython Feb 10, 2021
f07ad17
add dependency on tqdm for examples, fix documentation
sdpython Feb 10, 2021
c41f2e3
add new unit test
sdpython Feb 10, 2021
1bdc94e
improve example, add another test on pinball_error with sample_weights
sdpython Feb 10, 2021
0aa34df
lint
sdpython Feb 10, 2021
31d721c
fix failing test due to very small discrepencies
sdpython Feb 10, 2021
f8988c7
documentation
sdpython Feb 10, 2021
04e192e
improve example
sdpython Feb 11, 2021
6e787a6
Update plot_gradient_boosting_quantile.py
sdpython Feb 11, 2021
e9aa68c
Rework quantile regression example to use skewed noise
ogrisel Feb 11, 2021
94ec164
Typo
ogrisel Feb 11, 2021
a7a5aaa
add neg_pinball_loss
sdpython Feb 11, 2021
99d3457
test for more quantiles
sdpython Feb 11, 2021
b8d4628
fix wrong change
sdpython Feb 11, 2021
fc6e62c
Update _scorer.py
sdpython Feb 11, 2021
c3e9431
Phrasing
ogrisel Feb 11, 2021
182b066
Update examples/ensemble/plot_gradient_boosting_quantile.py
ogrisel Feb 11, 2021
d5957d1
Tuner hyper-params of quantile regressors
ogrisel Feb 11, 2021
cbcd70e
Reduce verbosity, faster example
ogrisel Feb 11, 2021
37d3e93
Restore support for old matplotlib
ogrisel Feb 11, 2021
31915f1
Restore support for old matplotlib (take 2)
ogrisel Feb 11, 2021
a258dea
pprint best params to avoid horizontal scroll
ogrisel Feb 11, 2021
95c5709
Update sklearn/metrics/_regression.py
ogrisel Feb 12, 2021
f343f3f
More comprehensive test for the pinball loss with constant predictions
ogrisel Feb 12, 2021
9958903
Skip new test on old numpy
ogrisel Feb 12, 2021
0fa7e10
Expand documentation to present scorer API for quantile regression
ogrisel Feb 12, 2021
22d56b7
Use heteroschedastic noise
ogrisel Feb 12, 2021
619c138
More intuitive examples for pinball_loss
ogrisel Feb 12, 2021
50a65f0
Even more explicit examples
ogrisel Feb 12, 2021
1e5ff82
remove neg_pinball_loss, update what's new
sdpython Feb 12, 2021
3c1161a
remove unnecessary import in _scorer.py
sdpython Feb 12, 2021
e6687f6
Further improve example
ogrisel Feb 15, 2021
7f1c249
Update examples/ensemble/plot_gradient_boosting_quantile.py
ogrisel Feb 15, 2021
bf2033f
Small fix + mention biais of LAD as a (robust) estimator of the mean
ogrisel Feb 15, 2021
070cbab
Assess coverage
ogrisel Feb 15, 2021
ecf1a95
Apply suggestions from code review
ogrisel Feb 15, 2021
4ce9196
Stricter tests for regression metrics
ogrisel Feb 15, 2021
1b2d0ed
Minimize pinball loss with Nelder-Mead
ogrisel Feb 15, 2021
f6482e6
Small fixes and improvements in model_evaluation.rst
ogrisel Feb 15, 2021
bc1882a
Rename variable in test
ogrisel Feb 15, 2021
53e0230
Rename pinball_loss to mean_pinball_loss
ogrisel Feb 15, 2021
6694b1f
Fix linter
ogrisel Feb 15, 2021
d139dc2
Fix missing indent in math formula
ogrisel Feb 15, 2021
728d632
Fix phrasing
ogrisel Feb 15, 2021
eb22059
Add integration test
ogrisel Feb 15, 2021
8edbed1
Change optimization test to make it run faster
ogrisel Feb 15, 2021
1b36537
Missing cell marker
ogrisel Feb 15, 2021
85ab9d3
Missing comas
ogrisel Feb 15, 2021
70d9323
DOC small improvements
ogrisel Feb 16, 2021
7820618
Update sklearn/metrics/tests/test_regression.py
ogrisel Feb 18, 2021
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions doc/modules/classes.rst
Original file line number Diff line number Diff line change
Expand Up @@ -991,6 +991,7 @@ details.
metrics.mean_poisson_deviance
metrics.mean_gamma_deviance
metrics.mean_tweedie_deviance
metrics.mean_pinball_loss

Multilabel ranking metrics
--------------------------
Expand Down
71 changes: 68 additions & 3 deletions doc/modules/model_evaluation.rst
Original file line number Diff line number Diff line change
Expand Up @@ -416,7 +416,7 @@ defined as

.. math::

\texttt{accuracy}(y, \hat{y}) = \frac{1}{n_\text{samples}} \sum_{i=0}^{n_\text{samples}-1} 1(\hat{y}_i = y_i)
\texttt{accuracy}(y, \hat{y}) = \frac{1}{n_\text{samples}} \sum_{i=0}^{n_\text{samples}-1} 1(\hat{y}_i = y_i)

where :math:`1(x)` is the `indicator function
<https://en.wikipedia.org/wiki/Indicator_function>`_.
Expand Down Expand Up @@ -1960,8 +1960,8 @@ Regression metrics
The :mod:`sklearn.metrics` module implements several loss, score, and utility
functions to measure regression performance. Some of those have been enhanced
to handle the multioutput case: :func:`mean_squared_error`,
:func:`mean_absolute_error`, :func:`explained_variance_score` and
:func:`r2_score`.
:func:`mean_absolute_error`, :func:`explained_variance_score`,
:func:`r2_score` and :func:`mean_pinball_loss`.


These functions have an ``multioutput`` keyword argument which specifies the
Expand Down Expand Up @@ -2354,6 +2354,71 @@ the difference in errors decreases. Finally, by setting, ``power=2``::
we would get identical errors. The deviance when ``power=2`` is thus only
sensitive to relative errors.

.. _pinball_loss:

Pinball loss
------------

The :func:`mean_pinball_loss` function is used to evaluate the predictive
performance of quantile regression models. The `pinball loss
<https://en.wikipedia.org/wiki/Quantile_regression#Computation>`_ is equivalent
to :func:`mean_absolute_error` when the quantile parameter ``alpha`` is set to
0.5.

.. math::

\text{pinball}(y, \hat{y}) = \frac{1}{n_{\text{samples}}} \sum_{i=0}^{n_{\text{samples}}-1} \alpha \max(y_i - \hat{y}_i, 0) + (1 - \alpha) \max(\hat{y}_i - y_i, 0)

Here is a small example of usage of the :func:`mean_pinball_loss` function::

>>> from sklearn.metrics import mean_pinball_loss
>>> y_true = [1, 2, 3]
>>> mean_pinball_loss(y_true, [0, 2, 3], alpha=0.1)
0.03...
>>> mean_pinball_loss(y_true, [1, 2, 4], alpha=0.1)
0.3...
>>> mean_pinball_loss(y_true, [0, 2, 3], alpha=0.9)
0.3...
>>> mean_pinball_loss(y_true, [1, 2, 4], alpha=0.9)
0.03...
>>> mean_pinball_loss(y_true, y_true, alpha=0.1)
0.0
>>> mean_pinball_loss(y_true, y_true, alpha=0.9)
0.0

It is possible to build a scorer object with a specific choice of alpha::

>>> from sklearn.metrics import make_scorer
>>> mean_pinball_loss_95p = make_scorer(mean_pinball_loss, alpha=0.95)

Such a scorer can be used to evaluate the generalization performance of a
quantile regressor via cross-validation:

>>> from sklearn.datasets import make_regression
>>> from sklearn.model_selection import cross_val_score
>>> from sklearn.ensemble import GradientBoostingRegressor
>>>
>>> X, y = make_regression(n_samples=100, random_state=0)
>>> estimator = GradientBoostingRegressor(
... loss="quantile",
... alpha=0.95,
... random_state=0,
... )
>>> cross_val_score(estimator, X, y, cv=5, scoring=mean_pinball_loss_95p)
array([11.1..., 10.4... , 24.4..., 9.2..., 12.9...])

It is also possible to build scorer objects for hyper-parameter tuning. The
sign of the loss must be switched to ensure that greater means better as
explained in the example linked below.

.. topic:: Example:

* See :ref:`sphx_glr_auto_examples_ensemble_plot_gradient_boosting_quantile.py`
for an example of using a the pinball loss to evaluate and tune the
hyper-parameters of quantile regression models on data with non-symmetric
noise and outliers.


.. _clustering_metrics:

Clustering metrics
Expand Down
4 changes: 4 additions & 0 deletions doc/whats_new/v1.0.rst
Original file line number Diff line number Diff line change
Expand Up @@ -134,6 +134,10 @@ Changelog
class methods and will be removed in 1.2.
:pr:`18543` by `Guillaume Lemaitre`_.

- |Feature| :func:`metrics.mean_pinball_loss` exposes the pinball loss for
quantile regression. :pr:`19415` by :user:`Xavier Dupré <sdpython>`
and :user:`Oliver Grisel <ogrisel>`.

:mod:`sklearn.naive_bayes`
..........................

Expand Down
Loading