Skip to content

ENH Add Poisson, Gamma and Tweedie deviances to regression metrics #14263

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 18 commits into from
Jul 19, 2019

Conversation

lorentzenchr
Copy link
Member

Reference Issues/PRs

Contributes to #9405

What does this implement/fix?

Adds new regression metrics that are natural for GLMs.

Comments

Name ‘mean_teeedie_deviance_error‘ open for discussion ;-)

@rth rth changed the title WIP ENH add new metric Tweedie deviance ENH Add Tweedie deviance to regression metrics Jul 5, 2019
@rth
Copy link
Member

rth commented Jul 5, 2019

Thanks @lorentzenchr ! I added a few minor fixes.

To reviewers: this adds the deviance metric which is commonly used to evaluate GLM, and is a small part of #9405. The Tweedie deviance can be seen as a generalization of least squared error to non Gaussian error distributions (such as counts, frequencies etc). Deviances of Poisson, Gamma and Gaussian distributions are special cases of the Tweedie distribution parametrized by the parameter p.

I find that a good and easy to follow example illustrating why this is useful can be found in the "Interpretable ML" book by C. Molnar (need scroll down). Even when a model is trained with some loss (e.g. least squared error) it can be useful to evaluate it with a different metric (e.g. mean median error or Poisson deviance).

The description of different distributions is not included here, and will be part of the main GLM PR. Empirically when one increases the p value of the Tweedie distribution, large errors have less impact. I have tried to illustrate this in the documentation section (rendered here). This is particularly useful when scoring predictions of targets that have a large ranges. For instance, in the case of predicting insurance claims, an absolute error of 2. on a claim of 10. (in arbitrary units) is much worse than the same error of 2. on a claim of 100. The effect has some analogies to using mean_squared_log_error in that case.

Copy link
Member

@rth rth left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few explanations below.

2\left(\frac{\max(y_i,0)^{2-p}}{(1-p)(2-p)}-
\frac{y\,\hat{y}^{1-p}_i}{1-p}+\frac{\hat{y}^{2-p}_i}{2-p}\right),
& \text{otherwise}
\end{cases}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

full disclosure: This wikipedia entry might not be independent from this PR's implementation. I thought it would be best to be able to cite/refer to wikipedia.

@@ -902,6 +902,7 @@ details.
metrics.mean_squared_log_error
metrics.median_absolute_error
metrics.r2_score
metrics.mean_tweedie_deviance_error
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the name, maybe mean_tweedie_deviance or mean_deviance_error could be enough?

There is also a d2_score that is the analogy of r2_score that wasn't added in this PR.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe mean_tweedie_error?
Should we add the d2_score to this PR?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe mean_tweedie_error

That could work though it would be a shame to drop the deviance term..

Should we add the d2_score to this PR?

+1 to add it, but in the follow up PR. Reviewers availability and interest is the main concern, and often making smaller PR helps.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The convention is that all functions that end with _score return a number which means "higher is better". Functions that end with _error or _loss return a number that means "lower is better".

For this particular case I would have preferred to just used mean_tweedie_deviance. I assume that "low deviance is better" is explicit enough without having to add a _error suffix but others might disagree.

The higher `p` the less weight is given to extreme deviations between true and
predicted targets.

For instance, let's consider two data samples: `[1.0]` and `[100]`,
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the first read, this was a bit unclear to me. What about "let's compare the two predictions 1.0 and 100 that are both 50% of their corresponding true value"?

>>> mean_tweedie_deviance_error([100.], [150.], p=2)
0.14...

we would get identical errors in this example.
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a result of the Tweedie deviance beeing a homogeneous function of degree 2-p. Gamma with p=2means scaling y_trueand y_pred has no effect on the deviance, for Poisson p=1 it scales linearly, for Normal p=0 quadratically.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes add a sentence above saying that for p=2 the model is only sensitive to relative errors. And say here again that it illustrates this fact

Copy link
Member

@agramfort agramfort left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

besides LGTM !

>>> mean_tweedie_deviance_error([100.], [150.], p=2)
0.14...

we would get identical errors in this example.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes add a sentence above saying that for p=2 the model is only sensitive to relative errors. And say here again that it illustrates this fact

@rth
Copy link
Member

rth commented Jul 16, 2019

Thanks for the review @agramfort, I think I addressed all your comments.

The occasional failure in test_ridge.py::test_dtype_match[sag] in one of the jobs is unrelated: none of the code in ridge is modified here and I also saw it in another PR. Maybe something changed in the dependencies of that CI job..

@rth
Copy link
Member

rth commented Jul 16, 2019

Also any opinion on the naming between the following?

  • mean_tweedie_deviance_error (currently)
  • mean_deviance_error
  • mean_tweedie_error
  • mean_tweedie_deviance (I guess it should end with "error"?)

Copy link
Member

@agramfort agramfort left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good to go from my end !
thx @lorentzenchr and @rth

Copy link
Member

@ogrisel ogrisel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here is a first review, overall it looks good to me besides the following comments:

@@ -902,6 +902,7 @@ details.
metrics.mean_squared_log_error
metrics.median_absolute_error
metrics.r2_score
metrics.mean_tweedie_deviance_error
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The convention is that all functions that end with _score return a number which means "higher is better". Functions that end with _error or _loss return a number that means "lower is better".

For this particular case I would have preferred to just used mean_tweedie_deviance. I assume that "low deviance is better" is explicit enough without having to add a _error suffix but others might disagree.

@rth
Copy link
Member

rth commented Jul 17, 2019

Thanks a lot for the review @ogrisel ! I think I addressed all your comments. Waiting for CI..

0.14...

we would get identical errors. The deviance when `p=2` is thus only
sensitive to relative errors.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we should also add mean_gamma_deviance (and mean_poisson_deviance as public functions) to ease discoverability / googlability and reference that those are aliases in this part of the documentation (as well as in the docstring).

@rth
Copy link
Member

rth commented Jul 17, 2019

@amueller Any thoughts on this? Not necessarily asking for a detailed review, more on the general idea of adding these metrics as a first step on the way to adding GLM in #14300 (and #9405)..

@@ -110,6 +111,7 @@
'mean_absolute_error',
'mean_squared_error',
'mean_squared_log_error',
'mean_tweedie_deviance',
'median_absolute_error',
'multilabel_confusion_matrix',
'mutual_info_score',
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We would also need mean_poisson_deviance and mean_gamma_deviance here.

Copy link
Member

@ogrisel ogrisel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM besides the above comments. Would be good to have confirmation by others that the new names are ok with everybody though.

@agramfort
Copy link
Member

@rth or @lorentzenchr you need to rebase to fix conflicts

Good to go from my end

thx

@lorentzenchr
Copy link
Member Author

@ogrisel @agramfort thx for your reviews and approvements.

@rth can you rebase? I‘m unavailable right now.

@rth rth changed the title ENH Add Tweedie deviance to regression metrics ENH Add Poisson, Gamma and Tweedie deviances to regression metrics Jul 19, 2019
@rth
Copy link
Member

rth commented Jul 19, 2019

Thanks for the reviews!

I fixed the merge conflict and added mean_poisson_deviance and mean_gamma_deviance functions. There is still only one user manual section now called "Mean Poisson, Gamma and Tweedie deviances" as it makes sense to introduce them together.

Please let me know if there is any other comments.

(The failure in one Azure job in test_fastica_simple and is unrelated #14414).

@agramfort
Copy link
Member

agramfort commented Jul 19, 2019

approved by @ogrisel and myself. merging. thx @lorentzenchr and @rth

@agramfort agramfort merged commit 89da7f7 into scikit-learn:master Jul 19, 2019
@lorentzenchr lorentzenchr deleted the TweedieDeviance branch September 10, 2019 06:16
@rth rth mentioned this pull request Oct 14, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants