-
-
Notifications
You must be signed in to change notification settings - Fork 25.8k
[MRG] Monotonic constraints for GBDT #15582
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[MRG] Monotonic constraints for GBDT #15582
Conversation
…notonic_constraints
…notonic_constraints
why not make it an init argument for now? |
This is more of a I'm open to anything, but ideally we wouldn't merge this PR if we knew for sure that the API would change in the future. |
if gain > best_gain and gain > self.min_gain_to_split: | ||
found_better_split = True | ||
best_gain = gain | ||
best_bin_idx = bin_idx | ||
best_sum_gradient_left = sum_gradient_left | ||
best_sum_hessian_left = sum_hessian_left | ||
best_n_samples_left = n_samples_left | ||
|
||
if found_better_split: | ||
split_info.gain = best_gain | ||
split_info.bin_idx = best_bin_idx | ||
# we scan from left to right so missing values go to the right | ||
split_info.missing_go_to_left = False | ||
split_info.sum_gradient_left = best_sum_gradient_left | ||
split_info.sum_gradient_right = sum_gradients - best_sum_gradient_left | ||
split_info.sum_hessian_left = best_sum_hessian_left | ||
split_info.sum_hessian_right = sum_hessians - best_sum_hessian_left | ||
split_info.n_samples_left = best_n_samples_left | ||
split_info.n_samples_right = n_samples - best_n_samples_left | ||
|
||
# We recompute best values here but it's cheap | ||
split_info.value_left = compute_value( | ||
split_info.sum_gradient_left, split_info.sum_hessian_left, | ||
lower_bound, upper_bound, self.l2_regularization) | ||
|
||
split_info.value_right = compute_value( | ||
split_info.sum_gradient_right, split_info.sum_hessian_right, | ||
lower_bound, upper_bound, self.l2_regularization) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For reviewers: apart from the leaf value computation (which is new), the logic here is actually unchanged.
It's mostly just a small optimization: instead of setting the all the attribute of the split_info at each iteration, we only do it once at the end
gain = negative_loss(sum_gradient_left, sum_hessian_left, | ||
l2_regularization) | ||
gain += negative_loss(sum_gradient_right, sum_hessian_right, | ||
l2_regularization) | ||
gain -= negative_loss_current_node |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For reviewers:
The gain computation has been updated to:
- first compute values of left and right child
- cap those values according to the bounds for monotonic constraints
- discard any split that does not respect left < right (INC) or right < left (DEC)
- compute the loss reduction from the previously (bounded) computed values
# Considering the following tree with a monotonic INC constraint, we | ||
# should have: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note for reviewers: start here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks a lot for the review!
I addressed most comments and will try a fix tomorrow for the gain computation.
# or all decreasing (or neither) depending on the monotonic constraint. | ||
nodes = predictor.nodes | ||
|
||
def get_leaves_values(): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For me that would suggest that there is only one leaf and that one leaf has multiple values.
Thoughts @jnothman ?
dfs(node.left_child) | ||
dfs(node.right_child) | ||
|
||
dfs(grower.root) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure we can do that here because unlike the predictor
object, the grower does not provide an array of the nodes. It only has the root, and a finalized_leaves
list which isn't enough for us here.
sklearn/ensemble/_hist_gradient_boosting/tests/test_monotonic_contraints.py
Outdated
Show resolved
Hide resolved
gain -= _loss_from_value(value_right, sum_gradient_right) # with bounds | ||
# Note that the losses for both children are computed with bounded values, | ||
# while the loss of the current node isn't. It's OK since all the gain | ||
# comparisons will be made with the same loss_current_node anyway. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm reasonably confident that lightgbm does it similarly: see https://github.com/microsoft/LightGBM/blob/master/src/treelearner/feature_histogram.hpp#L160, where the gain of the current node to be split will not take constraints into account (follow BeforeNumercal
and then GetLeafGain
). But I could be missing something.
You're right that it doesn't work well withmin_gain_to_split
. I guess the correct way would be to pass the current node's value into find_node_split
.
I'll try tomorrow and will report back!
sklearn/ensemble/_hist_gradient_boosting/tests/test_monotonic_contraints.py
Outdated
Show resolved
Hide resolved
…contraints.py Co-Authored-By: Olivier Grisel <olivier.grisel@ensta.org>
OK, I pushed something and added a test. I'm still a little bit hesitant on what we should be doing. I think that in the end, it all boils down on how we define the loss at a given node:
or
They're both equivalent if no clipping happens, like in the XGBoost paper. In any case, I agree we should be consistent and use the same formula for all nodes. In the previous version (and in LightGBM as far as I understand), we were using 1 for the current node and 2 for the children. Now, we're using 2 for all nodes. WDYT @ogrisel, good to go? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Can you just do a quick benchmark to check that the latest change did not cause a significant performance regression w.r.t. master?
Thanks for the reminder, There actually was a slow-down due to With I get about 51sec in the current branch and 47 on master now. So there's still a slow down, but as far as I can tell the Cython code is about just as fast. It seems we're spending more time in the Python part. That might be due to the |
Maybe you can use py-spy with native code profiling enabled both on master and on this branch and compare the 2 flamegraphs to help identify the discrepancy? |
Great suggestion! It turns out the grower is spending a significant amount of time setting the bounds of the children. That's where the difference came from. I'm quite surprised because there's no constraint at all in the benchmark. But I guess it adds up in the end since this is done for every single node. I added a fast way in 83dba40. Now both times are similar:
(I ran them on a different machine from #15582 (comment), in case you're wondering why it's faster in both cases) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @NicolasHug , other than the one concern, looks all good.
sklearn/ensemble/_hist_gradient_boosting/tests/test_gradient_boosting.py
Show resolved
Hide resolved
Alright, very nice work. Let's merge! |
Thanks a lot @adrinjalali and @ogrisel for the reviews! |
This PR adds support for monotonic constraints for the histogram GBDTs.
Addresses #6656
(see https://xgboost.readthedocs.io/en/latest/tutorials/monotonic.html)
The API is to pass e.g.
HistGradientBoostingRegressor(monotonic_cst=[-1, 1])
For reviewers: The overall logic is pretty simple, but this still involved a lot of changes because I had to refactor the splitter, since now we require all nodes to have a value (previously only leaves would have a value).
@ogrisel @adrinjalali @thomasjpfan @amueller @glemaitre might be interested in this (after the release of course)