ENH Add Poisson, Gamma and Tweedie deviances to regression metrics #14263

lorentzenchr · 2019-07-05T10:19:05Z

Reference Issues/PRs

Contributes to #9405

What does this implement/fix?

Adds new regression metrics that are natural for GLMs.

Comments

Name ‘mean_teeedie_deviance_error‘ open for discussion ;-)

rth · 2019-07-05T20:39:19Z

Thanks @lorentzenchr ! I added a few minor fixes.

To reviewers: this adds the deviance metric which is commonly used to evaluate GLM, and is a small part of #9405. The Tweedie deviance can be seen as a generalization of least squared error to non Gaussian error distributions (such as counts, frequencies etc). Deviances of Poisson, Gamma and Gaussian distributions are special cases of the Tweedie distribution parametrized by the parameter p.

I find that a good and easy to follow example illustrating why this is useful can be found in the "Interpretable ML" book by C. Molnar (need scroll down). Even when a model is trained with some loss (e.g. least squared error) it can be useful to evaluate it with a different metric (e.g. mean median error or Poisson deviance).

The description of different distributions is not included here, and will be part of the main GLM PR. Empirically when one increases the p value of the Tweedie distribution, large errors have less impact. I have tried to illustrate this in the documentation section (rendered here). This is particularly useful when scoring predictions of targets that have a large ranges. For instance, in the case of predicting insurance claims, an absolute error of 2. on a claim of 10. (in arbitrary units) is much worse than the same error of 2. on a claim of 100. The effect has some analogies to using mean_squared_log_error in that case.

rth

A few explanations below.

rth · 2019-07-05T20:40:07Z

doc/modules/model_evaluation.rst

+  2\left(\frac{\max(y_i,0)^{2-p}}{(1-p)(2-p)}-
+  \frac{y\,\hat{y}^{1-p}_i}{1-p}+\frac{\hat{y}^{2-p}_i}{2-p}\right),
+  & \text{otherwise}
+  \end{cases}


LaTex expression taken from https://en.wikipedia.org/wiki/Tweedie_distribution#The_Tweedie_deviance

full disclosure: This wikipedia entry might not be independent from this PR's implementation. I thought it would be best to be able to cite/refer to wikipedia.

rth · 2019-07-05T20:43:24Z

doc/modules/classes.rst

@@ -902,6 +902,7 @@ details.
   metrics.mean_squared_log_error
   metrics.median_absolute_error
   metrics.r2_score
+   metrics.mean_tweedie_deviance_error


For the name, maybe mean_tweedie_deviance or mean_deviance_error could be enough?

There is also a d2_score that is the analogy of r2_score that wasn't added in this PR.

Maybe mean_tweedie_error?
Should we add the d2_score to this PR?

Maybe mean_tweedie_error

That could work though it would be a shame to drop the deviance term..

Should we add the d2_score to this PR?

+1 to add it, but in the follow up PR. Reviewers availability and interest is the main concern, and often making smaller PR helps.

The convention is that all functions that end with _score return a number which means "higher is better". Functions that end with _error or _loss return a number that means "lower is better".

For this particular case I would have preferred to just used mean_tweedie_deviance. I assume that "low deviance is better" is explicit enough without having to add a _error suffix but others might disagree.

lorentzenchr · 2019-07-09T20:13:24Z

doc/modules/model_evaluation.rst

+The higher `p` the less weight is given to extreme deviations between true and
+predicted targets.
+
+For instance, let's consider two data samples: `[1.0]` and `[100]`,


In the first read, this was a bit unclear to me. What about "let's compare the two predictions 1.0 and 100 that are both 50% of their corresponding true value"?

lorentzenchr · 2019-07-09T20:20:37Z

doc/modules/model_evaluation.rst

+    >>> mean_tweedie_deviance_error([100.], [150.], p=2)
+    0.14...
+
+we would get identical errors in this example.


This is a result of the Tweedie deviance beeing a homogeneous function of degree 2-p. Gamma with p=2means scaling y_trueand y_pred has no effect on the deviance, for Poisson p=1 it scales linearly, for Normal p=0 quadratically.

yes add a sentence above saying that for p=2 the model is only sensitive to relative errors. And say here again that it illustrates this fact

agramfort

besides LGTM !

doc/modules/model_evaluation.rst

agramfort · 2019-07-12T11:46:58Z

doc/modules/model_evaluation.rst

+    >>> mean_tweedie_deviance_error([100.], [150.], p=2)
+    0.14...
+
+we would get identical errors in this example.


yes add a sentence above saying that for p=2 the model is only sensitive to relative errors. And say here again that it illustrates this fact

sklearn/metrics/regression.py

rth · 2019-07-16T10:03:07Z

Thanks for the review @agramfort, I think I addressed all your comments.

The occasional failure in test_ridge.py::test_dtype_match[sag] in one of the jobs is unrelated: none of the code in ridge is modified here and I also saw it in another PR. Maybe something changed in the dependencies of that CI job..

rth · 2019-07-16T11:37:42Z

Also any opinion on the naming between the following?

mean_tweedie_deviance_error (currently)
mean_deviance_error
mean_tweedie_error
mean_tweedie_deviance (I guess it should end with "error"?)

agramfort

good to go from my end !
thx @lorentzenchr and @rth

ogrisel

Here is a first review, overall it looks good to me besides the following comments:

ogrisel · 2019-07-17T11:18:01Z

doc/modules/classes.rst

@@ -902,6 +902,7 @@ details.
   metrics.mean_squared_log_error
   metrics.median_absolute_error
   metrics.r2_score
+   metrics.mean_tweedie_deviance_error


The convention is that all functions that end with _score return a number which means "higher is better". Functions that end with _error or _loss return a number that means "lower is better".

For this particular case I would have preferred to just used mean_tweedie_deviance. I assume that "low deviance is better" is explicit enough without having to add a _error suffix but others might disagree.

sklearn/metrics/scorer.py

sklearn/metrics/tests/test_common.py

sklearn/metrics/tests/test_regression.py

sklearn/metrics/regression.py

sklearn/metrics/tests/test_regression.py

rth · 2019-07-17T15:31:18Z

Thanks a lot for the review @ogrisel ! I think I addressed all your comments. Waiting for CI..

ogrisel · 2019-07-17T15:58:15Z

doc/modules/model_evaluation.rst

+    0.14...
+
+we would get identical errors. The deviance when `p=2` is thus only
+sensitive to relative errors.


Maybe we should also add mean_gamma_deviance (and mean_poisson_deviance as public functions) to ease discoverability / googlability and reference that those are aliases in this part of the documentation (as well as in the docstring).

rth · 2019-07-17T17:16:06Z

@amueller Any thoughts on this? Not necessarily asking for a detailed review, more on the general idea of adding these metrics as a first step on the way to adding GLM in #14300 (and #9405)..

ogrisel · 2019-07-18T06:16:17Z

sklearn/metrics/__init__.py

@@ -110,6 +111,7 @@
    'mean_absolute_error',
    'mean_squared_error',
    'mean_squared_log_error',
+    'mean_tweedie_deviance',
    'median_absolute_error',
    'multilabel_confusion_matrix',
    'mutual_info_score',


We would also need mean_poisson_deviance and mean_gamma_deviance here.

ogrisel

LGTM besides the above comments. Would be good to have confirmation by others that the new names are ok with everybody though.

agramfort · 2019-07-19T08:58:45Z

@rth or @lorentzenchr you need to rebase to fix conflicts

Good to go from my end

thx

lorentzenchr · 2019-07-19T11:06:20Z

@ogrisel @agramfort thx for your reviews and approvements.

@rth can you rebase? I‘m unavailable right now.

rth · 2019-07-19T13:33:17Z

Thanks for the reviews!

I fixed the merge conflict and added mean_poisson_deviance and mean_gamma_deviance functions. There is still only one user manual section now called "Mean Poisson, Gamma and Tweedie deviances" as it makes sense to introduce them together.

Please let me know if there is any other comments.

(The failure in one Azure job in test_fastica_simple and is unrelated #14414).

agramfort · 2019-07-19T15:44:54Z

approved by @ogrisel and myself. merging. thx @lorentzenchr and @rth

Christian Lorentzen and others added 3 commits July 5, 2019 00:30

ENH add new metric Tweedie deviance

816ac80

Merge branch 'master' into TweedieDeviance

22bd160

More improvements

062b42b

rth changed the title ~~WIP ENH add new metric Tweedie deviance~~ ENH Add Tweedie deviance to regression metrics Jul 5, 2019

rth added 2 commits July 5, 2019 18:46

A few more fixes

02d03e1

Fix doctest

8af1721

rth reviewed Jul 5, 2019

View reviewed changes

rth mentioned this pull request Jul 9, 2019

Minimal Generalized linear models implementation (L2 + lbfgs) #14300

Merged

7 tasks

lorentzenchr commented Jul 9, 2019

View reviewed changes

agramfort reviewed Jul 12, 2019

View reviewed changes

Review comments

1665981

rth added 2 commits July 16, 2019 13:25

Add common metric tests

4db6205

Add what's new entry

ee7fcad

rth added 2 commits July 16, 2019 14:15

Fix symmetry check in metrics common tests

106851f

Fix test order determinism

e91a866

agramfort approved these changes Jul 16, 2019

View reviewed changes

ogrisel reviewed Jul 17, 2019

View reviewed changes

rth added 3 commits July 17, 2019 15:57

Rename metric to mean_tweedie_deviance

3684c7c

Address review comments

3cde12c

Fix test_score_objects.py tests

36f1b07

ogrisel reviewed Jul 17, 2019

View reviewed changes

ogrisel reviewed Jul 18, 2019

View reviewed changes

ogrisel approved these changes Jul 18, 2019

View reviewed changes

rth added 4 commits July 19, 2019 14:05

Merge branch 'master' into TweedieDeviance

b55c5d4

Add Poisson and Gamma deviances

cf03332

Fix doc link

3c9ac95

Fix typo on what's new

687d1c8

rth changed the title ~~ENH Add Tweedie deviance to regression metrics~~ ENH Add Poisson, Gamma and Tweedie deviances to regression metrics Jul 19, 2019

Fix rst rendering

87c18a0

thomasjpfan mentioned this pull request Jul 19, 2019

test_fastica_simple[0-False] failed in Linux py35_ubuntu_atlas #14414

Closed

agramfort approved these changes Jul 19, 2019

View reviewed changes

agramfort merged commit 89da7f7 into scikit-learn:master Jul 19, 2019

lorentzenchr deleted the TweedieDeviance branch September 10, 2019 06:16

rth mentioned this pull request Oct 14, 2019

Add d2_tweedie_score #15244

Closed

Uh oh!

ENH Add Poisson, Gamma and Tweedie deviances to regression metrics #14263

ENH Add Poisson, Gamma and Tweedie deviances to regression metrics #14263

Uh oh!

Conversation

lorentzenchr commented Jul 5, 2019

Reference Issues/PRs

What does this implement/fix?

Comments

Uh oh!

rth commented Jul 5, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

rth left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

agramfort left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

rth commented Jul 16, 2019

Uh oh!

rth commented Jul 16, 2019

Uh oh!

agramfort left a comment

Choose a reason for hiding this comment

Uh oh!

ogrisel left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

rth commented Jul 17, 2019

Uh oh!

Choose a reason for hiding this comment

Uh oh!

rth commented Jul 17, 2019

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ogrisel left a comment

Choose a reason for hiding this comment

Uh oh!

agramfort commented Jul 19, 2019

Uh oh!

lorentzenchr commented Jul 19, 2019

Uh oh!

rth commented Jul 19, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

rth commented Jul 5, 2019 •

edited

Loading

rth commented Jul 19, 2019 •

edited

Loading

agramfort commented Jul 19, 2019 •

edited

Loading