-
-
Notifications
You must be signed in to change notification settings - Fork 25.8k
ENH Add Poisson, Gamma and Tweedie deviances to regression metrics #14263
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Thanks @lorentzenchr ! I added a few minor fixes. To reviewers: this adds the deviance metric which is commonly used to evaluate GLM, and is a small part of #9405. The Tweedie deviance can be seen as a generalization of least squared error to non Gaussian error distributions (such as counts, frequencies etc). Deviances of Poisson, Gamma and Gaussian distributions are special cases of the Tweedie distribution parametrized by the parameter I find that a good and easy to follow example illustrating why this is useful can be found in the "Interpretable ML" book by C. Molnar (need scroll down). Even when a model is trained with some loss (e.g. least squared error) it can be useful to evaluate it with a different metric (e.g. mean median error or Poisson deviance). The description of different distributions is not included here, and will be part of the main GLM PR. Empirically when one increases the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A few explanations below.
2\left(\frac{\max(y_i,0)^{2-p}}{(1-p)(2-p)}- | ||
\frac{y\,\hat{y}^{1-p}_i}{1-p}+\frac{\hat{y}^{2-p}_i}{2-p}\right), | ||
& \text{otherwise} | ||
\end{cases} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LaTex expression taken from https://en.wikipedia.org/wiki/Tweedie_distribution#The_Tweedie_deviance
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
full disclosure: This wikipedia entry might not be independent from this PR's implementation. I thought it would be best to be able to cite/refer to wikipedia.
doc/modules/classes.rst
Outdated
@@ -902,6 +902,7 @@ details. | |||
metrics.mean_squared_log_error | |||
metrics.median_absolute_error | |||
metrics.r2_score | |||
metrics.mean_tweedie_deviance_error |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For the name, maybe mean_tweedie_deviance
or mean_deviance_error
could be enough?
There is also a d2_score
that is the analogy of r2_score
that wasn't added in this PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe mean_tweedie_error
?
Should we add the d2_score
to this PR?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe mean_tweedie_error
That could work though it would be a shame to drop the deviance
term..
Should we add the d2_score to this PR?
+1 to add it, but in the follow up PR. Reviewers availability and interest is the main concern, and often making smaller PR helps.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The convention is that all functions that end with _score
return a number which means "higher is better". Functions that end with _error
or _loss
return a number that means "lower is better".
For this particular case I would have preferred to just used mean_tweedie_deviance
. I assume that "low deviance is better" is explicit enough without having to add a _error
suffix but others might disagree.
doc/modules/model_evaluation.rst
Outdated
The higher `p` the less weight is given to extreme deviations between true and | ||
predicted targets. | ||
|
||
For instance, let's consider two data samples: `[1.0]` and `[100]`, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the first read, this was a bit unclear to me. What about "let's compare the two predictions 1.0 and 100 that are both 50% of their corresponding true value"?
doc/modules/model_evaluation.rst
Outdated
>>> mean_tweedie_deviance_error([100.], [150.], p=2) | ||
0.14... | ||
|
||
we would get identical errors in this example. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a result of the Tweedie deviance beeing a homogeneous function of degree 2-p
. Gamma with p=2
means scaling y_true
and y_pred
has no effect on the deviance, for Poisson p=1
it scales linearly, for Normal p=0
quadratically.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes add a sentence above saying that for p=2 the model is only sensitive to relative errors. And say here again that it illustrates this fact
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
besides LGTM !
doc/modules/model_evaluation.rst
Outdated
>>> mean_tweedie_deviance_error([100.], [150.], p=2) | ||
0.14... | ||
|
||
we would get identical errors in this example. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes add a sentence above saying that for p=2 the model is only sensitive to relative errors. And say here again that it illustrates this fact
Thanks for the review @agramfort, I think I addressed all your comments. The occasional failure in |
Also any opinion on the naming between the following?
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
good to go from my end !
thx @lorentzenchr and @rth
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here is a first review, overall it looks good to me besides the following comments:
doc/modules/classes.rst
Outdated
@@ -902,6 +902,7 @@ details. | |||
metrics.mean_squared_log_error | |||
metrics.median_absolute_error | |||
metrics.r2_score | |||
metrics.mean_tweedie_deviance_error |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The convention is that all functions that end with _score
return a number which means "higher is better". Functions that end with _error
or _loss
return a number that means "lower is better".
For this particular case I would have preferred to just used mean_tweedie_deviance
. I assume that "low deviance is better" is explicit enough without having to add a _error
suffix but others might disagree.
Thanks a lot for the review @ogrisel ! I think I addressed all your comments. Waiting for CI.. |
0.14... | ||
|
||
we would get identical errors. The deviance when `p=2` is thus only | ||
sensitive to relative errors. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe we should also add mean_gamma_deviance
(and mean_poisson_deviance
as public functions) to ease discoverability / googlability and reference that those are aliases in this part of the documentation (as well as in the docstring).
@@ -110,6 +111,7 @@ | |||
'mean_absolute_error', | |||
'mean_squared_error', | |||
'mean_squared_log_error', | |||
'mean_tweedie_deviance', | |||
'median_absolute_error', | |||
'multilabel_confusion_matrix', | |||
'mutual_info_score', |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We would also need mean_poisson_deviance
and mean_gamma_deviance
here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM besides the above comments. Would be good to have confirmation by others that the new names are ok with everybody though.
@rth or @lorentzenchr you need to rebase to fix conflicts Good to go from my end thx |
@ogrisel @agramfort thx for your reviews and approvements. @rth can you rebase? I‘m unavailable right now. |
Thanks for the reviews! I fixed the merge conflict and added Please let me know if there is any other comments. (The failure in one Azure job in |
approved by @ogrisel and myself. merging. thx @lorentzenchr and @rth |
Reference Issues/PRs
Contributes to #9405
What does this implement/fix?
Adds new regression metrics that are natural for GLMs.
Comments
Name ‘mean_teeedie_deviance_error‘ open for discussion ;-)