-
-
Notifications
You must be signed in to change notification settings - Fork 25.8k
FIX Corrects negative gradient of AdaBoost loss in GBDT #22050
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Whaaaat? Why did no other test fire up? And why do @glemaitre Thanks for this fix. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM with or without the proposed more general test (maybe better without).
@@ -320,3 +320,18 @@ def test_lad_equals_quantiles(seed, alpha): | |||
y_true, raw_predictions, sample_weight=weights, alpha=alpha | |||
) | |||
assert pbl_weighted_loss == approx(ql_weighted_loss) | |||
|
|||
|
|||
def test_exponential_loss(): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would prefer to compare the gradients to numerical differentiation for all losses, but this certainly is a regression test for this bug.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can have add these tests for the common losses I hope :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That the point: they are already there 😉
It is what I saw as well. It is indeed written in the
From the original issue, it seems that the way that we update the leaves made that we were fine but I am not absolutely sure since we don't have a good integration test that uses the AdaBoost loss. Now we have a unit test at least.
I agree :) You can have a weird numerical precision bug in 32 bits that you don't expect but it less likely to take the negative of the gradient when you should be :) |
Co-authored-by: Christian Lorentzen <lorentzen.ch@gmail.com>
y_pred = np.array([0]) | ||
# we expect to have loss = exp(0) = 1 | ||
assert loss(y_true, y_pred) == pytest.approx(1) | ||
# we expect to have negative gradient = -1 * (1 * exp(0)) = 1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
# we expect to have negative gradient = -1 * (1 * exp(0)) = 1 | |
# we expect to have negative gradient = -1 * (1 * exp(0)) = -1 |
sklearn/ensemble/_gb_losses.py
Outdated
@@ -936,7 +936,7 @@ def negative_gradient(self, y, raw_predictions, **kargs): | |||
tree ensemble at iteration ``i - 1``. | |||
""" | |||
y_ = -(2.0 * y - 1.0) | |||
return y_ * np.exp(y_ * raw_predictions.ravel()) | |||
return -y_ * np.exp(y_ * raw_predictions.ravel()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the negatives make a copy and we can prevent an extra copy by:
y_ = 2.0 * y - 1.0
return y_ * np.exp(-y_ * raw_predictions.ravel())
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
…#22050) Co-authored-by: Christian Lorentzen <lorentzen.ch@gmail.com>
Co-authored-by: Christian Lorentzen <lorentzen.ch@gmail.com>
…#22050) Co-authored-by: Christian Lorentzen <lorentzen.ch@gmail.com>
closes #9666
Indeed, this seems fishy. We used the following as a reference: https://pbil.univ-lyon1.fr/CRAN/web/packages/gbm/vignettes/gbm.pdf
The loss for a given point is given by:
Thus taking the derivative respect to
y_pred
, we will have:Thus, the negative gradient is indeed:
All gradient reported in the document seems to be the negative gradient but not for the AdaBoost loss.