-
-
Notifications
You must be signed in to change notification settings - Fork 26.2k
ENH migrate GLMs / TweedieRegressor to linear loss #22548
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
decbb5a
26c0836
fb66241
395743a
c029dad
2eb28f4
fd8d1ca
09b243a
cecb328
719bb1b
5f56968
7adc1bc
2c13683
02dd2de
5a455b0
dda943f
9559464
85f616f
bc19cf3
9582e8b
3b6beda
0ebe6f2
e314ad9
6b25501
8c6e08d
324b858
d5f4723
80bf470
5c9b264
db6c192
a0be233
225863b
38b3104
4bfc1e9
7f428b5
a1474d0
2ca9586
f23dd31
129bce5
db4d402
87dd217
238bf6e
5170aca
111dd8a
5a5a22f
16e8be5
0024ca3
49fb93f
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -25,6 +25,7 @@ | |
CyHalfPoissonLoss, | ||
CyHalfGammaLoss, | ||
CyHalfTweedieLoss, | ||
CyHalfTweedieLossIdentity, | ||
CyHalfBinomialLoss, | ||
CyHalfMultinomialLoss, | ||
) | ||
|
@@ -770,6 +771,52 @@ def constant_to_optimal_zero(self, y_true, sample_weight=None): | |
return term | ||
|
||
|
||
class HalfTweedieLossIdentity(BaseLoss): | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. My curiosity: when does it make sense to use identity link with power != 0 ? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Years ago, I thought it a good idea. Meanwhile, I don't think it's is useful. Therefore, I opened #19086 without much response. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
I have a problem where I know the expectation y' follows the linear model y' = w x. My measurements, y, have poisson errors. (The specific problem involves analysis of radiation measurements. The expectation is linear with the amount of source; the measurements are poisson distributed). Using a log link function is just not the right description of my problem. Yes, the whole thing breaks down when evaluating negative values of w, but it seems much better to offer a constraint to avoid ever evaluating negative values of w rather than exclude the situations where you have an actual linear relationship with poisson measurements. |
||
"""Half Tweedie deviance loss with identity link, for regression. | ||
Comment on lines
+774
to
+775
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This new loss class is needed for |
||
|
||
Domain: | ||
y_true in real numbers for power <= 0 | ||
y_true in non-negative real numbers for 0 < power < 2 | ||
y_true in positive real numbers for 2 <= power | ||
y_pred in positive real numbers for power != 0 | ||
y_pred in real numbers for power = 0 | ||
power in real numbers | ||
|
||
Link: | ||
y_pred = raw_prediction | ||
|
||
For a given sample x_i, half Tweedie deviance loss with p=power is defined | ||
as:: | ||
|
||
loss(x_i) = max(y_true_i, 0)**(2-p) / (1-p) / (2-p) | ||
- y_true_i * raw_prediction_i**(1-p) / (1-p) | ||
+ raw_prediction_i**(2-p) / (2-p) | ||
|
||
Note that the minimum value of this loss is 0. | ||
|
||
Note furthermore that although no Tweedie distribution exists for | ||
0 < power < 1, it still gives a strictly consistent scoring function for | ||
the expectation. | ||
""" | ||
|
||
def __init__(self, sample_weight=None, power=1.5): | ||
super().__init__( | ||
closs=CyHalfTweedieLossIdentity(power=float(power)), | ||
link=IdentityLink(), | ||
) | ||
if self.closs.power <= 0: | ||
self.interval_y_true = Interval(-np.inf, np.inf, False, False) | ||
elif self.closs.power < 2: | ||
self.interval_y_true = Interval(0, np.inf, True, False) | ||
else: | ||
self.interval_y_true = Interval(0, np.inf, False, False) | ||
|
||
if self.closs.power == 0: | ||
self.interval_y_pred = Interval(-np.inf, np.inf, False, False) | ||
else: | ||
self.interval_y_pred = Interval(0, np.inf, False, False) | ||
|
||
|
||
class HalfBinomialLoss(BaseLoss): | ||
"""Half Binomial deviance loss with logit link, for binary classification. | ||
|
||
|
Uh oh!
There was an error while loading. Please reload this page.