-
-
Notifications
You must be signed in to change notification settings - Fork 26k
[MRG] FIX max_leaf_node and max_depth interaction in GBDT #16183
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[MRG] FIX max_leaf_node and max_depth interaction in GBDT #16183
Conversation
@@ -280,6 +283,18 @@ def test_max_depth(max_depth): | |||
assert depth == max_depth | |||
|
|||
|
|||
def test_max_depth_max_leaf_nodes(): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You don't think that we should move this test in test_gradient_boosting.py
instead of the test_grower.py
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No strong opinion. It's really a test about the grower code more than the estimator but yeah it's not obvious to come up with something that would only involve the grower
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay this may seem a little crazy, but:
def test_max_depth_max_leaf_nodes():
n_samples = 1000
n_bins = 64
max_depth = 4
max_leaf_nodes = 9 # 2**(max_depth - 1) + 1
X, y = make_classification(n_samples=n_samples, random_state=42)
X_binned = _BinMapper(n_bins=n_bins, random_state=42).fit_transform(X)
y = y.astype(Y_DTYPE)
loss = LeastSquares()
baseline = loss.get_baseline_prediction(y, 1)
raw_predictions = np.full(shape=(1, n_samples), fill_value=baseline)
gradients, hessians = loss.init_gradients_and_hessians(
n_samples=n_samples, prediction_dim=1)
for i in range(5):
loss.update_gradients_and_hessians(gradients, hessians, y,
raw_predictions)
grower = TreeGrower(X_binned, gradients[0, :], hessians[0, :],
n_bins=n_bins, max_leaf_nodes=max_leaf_nodes,
max_depth=max_depth)
grower.grow()
_update_raw_predictions(raw_predictions[0, :], grower)
assert len(grower.finalized_leaves) <= max_leaf_nodes
you might like this new solution better @thomasjpfan . |
failure is unrelated |
…x_maxleafnodes_maxdepth
…x_maxleafnodes_maxdepth
…kit-learn into fix_maxleafnodes_maxdepth
@@ -355,16 +355,16 @@ def split_next(self): | |||
|
|||
self.n_nodes += 2 | |||
|
|||
if self.max_depth is not None and depth == self.max_depth: | |||
if (self.max_leaf_nodes is not None |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Diff might be confusing but all I did was check for max_leaf_nodes before checking for max_depth
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This reordered was noticed ;)
Thanks for the reviews so far. I have moved the test and simplified it to a 3-liner. I confirm it's a non-regression test that fails on master |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Can confirm that test fails on master.
@@ -355,16 +355,16 @@ def split_next(self): | |||
|
|||
self.n_nodes += 2 | |||
|
|||
if self.max_depth is not None and depth == self.max_depth: | |||
if (self.max_leaf_nodes is not None |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This reordered was noticed ;)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
love the final fix lol.
* fix max_leaf_node max_depth interaction * Added test * comment * what's new * simpler solution * moved and simplified test * typo
* fix max_leaf_node max_depth interaction * Added test * comment * what's new * simpler solution * moved and simplified test * typo
Fixes #16179
The fix is simply to check for max_samples_leaf before checking for max_depth.
This makes sure that
_finalize_splittable_nodes
is called, which was the cause of the bug.