-
-
Notifications
You must be signed in to change notification settings - Fork 25.8k
ENH Replace loss module HGBT #20811
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ENH Replace loss module HGBT #20811
Conversation
- function is_in_interval_range -> method Interval.includes
|
Marking as high priority because it enables #21800 very easily. |
if self._loss.constant_hessian: | ||
self._loss.gradient( | ||
y_true=y_train, | ||
raw_prediction=raw_predictions.T, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Interesting, I thought there would have been some cpu cache performance issue with either "C" or "F" order. (Depending on which axis prange is iterating over)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall looks good. There is some overhead with remembering that raw_predictions
has shape: (n_trees_per_iteration, n_samples)
and gradient
and hessians
has shape: (n_samples, n_trees_per_iteration)
.
Was there a reason that BaseLoss
to prefer (n_samples, n_trees_per_iteration)
over the reverse?
TBH, that's just my preferred way of looking at it. This way, I could try to make it consistent and change all HGBT functions to use Edit: 27e818f is an attempt to do so. @thomasjpfan what do you think? |
c867c2f
to
27e818f
Compare
Yea, I think everything is much nicer now. (The diff is surprising smaller than I thought it would be.) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
sklearn/ensemble/_hist_gradient_boosting/tests/test_gradient_boosting.py
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for this PR, @lorentzenchr.
Here are a few comments.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM.
Edit: #22173 has been opened for a follow-up.
2 approvals 🎉 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I did one more pass. This looks good for merging. Thank you for working on this!
I think it's mostly a custom. I do not see it documented anywhere. |
Reference Issues/PRs
Follow-up of #20567.
What does this implement/fix? Explain your changes.
This PR replaces the losses of HGBT in
sklearn/ensemble/_hist_gradient_boosting
with the new common loss module of #20567.Any other comments?
Similar to #19089, but only HGBT.
This PR is based on 7bee26b.Edit: #20567 was merged 🚀