Fix max_depth overshoot in BFS expansion of trees #12344

adrinjalali · 2018-10-10T14:12:45Z

This fixes an issue with BFS expansion of trees, which overshoots the max_depth of a tree by 1. The output of the following two cases should be the same, but isn't:

>>> from sklearn.datasets import load_iris
>>> from sklearn.tree import DecisionTreeClassifier
>>> X, y = load_iris(return_X_y=True)
>>> 
>>> clf = DecisionTreeClassifier(random_state=0, max_depth=1, max_leaf_nodes=100)
>>> clf = clf.fit(X, y)
>>> clf.get_depth()
2
>>> clf.get_n_leaves()
3
>>> 
>>> clf = DecisionTreeClassifier(random_state=0, max_depth=1)
>>> clf = clf.fit(X, y)
>>> clf.get_depth()
1
>>> clf.get_n_leaves()
2

This PR fixes the issue, and warns the user of the changed behavior if the code reaches that point of change.

I'm not sure about the warning, but right now, this would be the output after this PR:

>>> from sklearn.datasets import load_iris
>>> from sklearn.tree import DecisionTreeClassifier
>>> X, y = load_iris(return_X_y=True)
>>> 
>>> clf = DecisionTreeClassifier(random_state=0, max_depth=1, max_leaf_nodes=100)
>>> clf = clf.fit(X, y)
.../tree.py:380: UserWarning: Due to a bugfix in v0.21 the maximum depth of a tree now does not pass the given max_depth!
  builder.build(self.tree_, X, y, sample_weight, X_idx_sorted)
>>> clf.get_depth()
1
>>> clf.get_n_leaves()
2
>>> 
>>> clf = DecisionTreeClassifier(random_state=0, max_depth=1)
>>> clf = clf.fit(X, y)
>>> clf.get_depth()
1
>>> clf.get_n_leaves()
2

amueller · 2018-10-10T19:33:53Z

I'm -1 on the warning. I don't like warnings that the user can't disable / avoid.

jnothman

I'm okay with no warning. We need to note it clearly in what's new, though

jnothman · 2018-10-10T23:43:37Z

sklearn/tree/_tree.pyx

+                 impurity <= min_impurity_split)):
+            with gil:
+                warnings.warn("Due to a bugfix in v0.21 the maximum depth of a"
+                              " tree now does not pass the given max_depth!",


pass -> exceed

jnothman · 2018-10-10T23:45:44Z

Although: all warnings can be disabled if you try hard enough

…andleafnodes

adrinjalali · 2018-10-11T09:51:56Z

The warning is removed now.

jnothman

Please also add an entry under "Changed models" in what's new.

adrinjalali · 2018-10-11T21:25:31Z

I'm not sure if I should also add the ensemble methods which use trees to the changed classes and the Fix part of the whats_new though. Their tests had to be modified as well, so their behavior has changed the same way.

jnothman · 2018-10-14T00:54:07Z

Yes, it doesn't hurt to mention the forests etc

jnothman · 2018-10-16T06:27:06Z

doc/whats_new/v0.21.rst

@@ -18,6 +18,22 @@ occurs due to changes in the modelling logic (bug fixes or enhancements), or in
 random sampling procedures.

 - please add class and reason here (see version 0.20 what's new)
+- :class:`ensemble.AdaBoostClassifier` (bug fix)


Okay. This is excessive. Firstly, the user can't specify max_depth in adaboost (or bagging) without explicitly constructing a decision tree.
Secondly, I think here we are trying to keep things succinct. Saying decision trees and derived ensembles are affected should be sufficient. It might also be appropriate here to say "with max_depth and max_leaf_nodes set" here to not cause undue panic.

…ybutton * upstream/master: Fix max_depth overshoot in BFS expansion of trees (scikit-learn#12344) TST don't test utils.fixes docstrings (scikit-learn#12576) DOC Fix typo (scikit-learn#12563) FIX Workaround limitation of cloudpickle under PyPy (scikit-learn#12566) MNT bare asserts (scikit-learn#12571) FIX incorrect error when OneHotEncoder.transform called prior to fit (scikit-learn#12443)

* fix the issue with max_depth and BestFirstTreeBuilder * fix the test * fix max_depth overshoot in BFS expansion * fix forest tests * remove the warning, add whats_new entry * remove extra line * add affected classes to changed classes * add other affected estimators to the whats_new changed models * shorten whats_new changed models entry

…ikit-learn into add_codeblock_copybutton * 'add_codeblock_copybutton' of https://github.com/thoo/scikit-learn: Move an extension under sphinx_copybutton/ Move css/js file under sphinxext/ Fix max_depth overshoot in BFS expansion of trees (scikit-learn#12344) TST don't test utils.fixes docstrings (scikit-learn#12576) DOC Fix typo (scikit-learn#12563) FIX Workaround limitation of cloudpickle under PyPy (scikit-learn#12566) MNT bare asserts (scikit-learn#12571) FIX incorrect error when OneHotEncoder.transform called prior to fit (scikit-learn#12443) Retrigger travis:max time limit error DOC: Clarify `cv` parameter description in `GridSearchCV` (scikit-learn#12495) FIX remove FutureWarning in _object_dtype_isnan and add test (scikit-learn#12567) DOC Add 's' to "correspond" in docs for Hamming Loss. (scikit-learn#12565) EXA Fix comment in plot-iris-logistic example (scikit-learn#12564) FIX stop words validation in text vectorizers with custom preprocessors / tokenizers (scikit-learn#12393) DOC Add skorch to related projects (scikit-learn#12561) MNT Don't change self.n_values in OneHotEncoder.fit (scikit-learn#12286) MNT Remove unused assert_true imports (scikit-learn#12560) TST autoreplace assert_true(...==...) with plain assert (scikit-learn#12547) DOC: add a testimonial from JP Morgan (scikit-learn#12555)

* fix the issue with max_depth and BestFirstTreeBuilder * fix the test * fix max_depth overshoot in BFS expansion * fix forest tests * remove the warning, add whats_new entry * remove extra line * add affected classes to changed classes * add other affected estimators to the whats_new changed models * shorten whats_new changed models entry

…rn#12344)" This reverts commit 784e8c0.

* fix the issue with max_depth and BestFirstTreeBuilder * fix the test * fix max_depth overshoot in BFS expansion * fix forest tests * remove the warning, add whats_new entry * remove extra line * add affected classes to changed classes * add other affected estimators to the whats_new changed models * shorten whats_new changed models entry

adrinjalali added 5 commits October 6, 2018 14:48

fix the issue with max_depth and BestFirstTreeBuilder

7860b7e

fix the test

fe3b63c

merge master

a560d87

fix max_depth overshoot in BFS expansion

1bfe726

fix forest tests

8719ff4

jnothman reviewed Oct 10, 2018

View reviewed changes

adrinjalali added 3 commits October 11, 2018 09:34

remove the warning, add whats_new entry

7566a3a

Merge remote-tracking branch 'upstream/master' into bug/tree/maxdepth…

a9e235b

…andleafnodes

remove extra line

ebabc2f

jnothman reviewed Oct 11, 2018

View reviewed changes

jnothman approved these changes Oct 11, 2018

View reviewed changes

add affected classes to changed classes

e9c3cab

add other affected estimators to the whats_new changed models

d0aedd7

jnothman approved these changes Oct 16, 2018

View reviewed changes

adrinjalali added 2 commits October 16, 2018 11:03

shorten whats_new changed models entry

3aa8009

fix merge conflicts

130b9c1

amueller merged commit 02dc9ed into scikit-learn:master Nov 13, 2018

adrinjalali deleted the bug/tree/maxdepthandleafnodes branch November 13, 2018 18:39

adrinjalali mentioned this pull request Feb 13, 2019

max_depth of DecisionTreeRegressor ignored when using max_leaf_nodes #13149

Closed

xhluca pushed a commit to xhluca/scikit-learn that referenced this pull request Apr 28, 2019

Revert "Fix max_depth overshoot in BFS expansion of trees (scikit-lea…

d2bf084

…rn#12344)" This reverts commit 784e8c0.

xhluca pushed a commit to xhluca/scikit-learn that referenced this pull request Apr 28, 2019

Revert "Fix max_depth overshoot in BFS expansion of trees (scikit-lea…

d1babf7

…rn#12344)" This reverts commit 784e8c0.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Fix max_depth overshoot in BFS expansion of trees #12344

Fix max_depth overshoot in BFS expansion of trees #12344

Uh oh!

adrinjalali commented Oct 10, 2018

Uh oh!

amueller commented Oct 10, 2018

Uh oh!

jnothman left a comment

Uh oh!

jnothman Oct 10, 2018

Uh oh!

jnothman commented Oct 10, 2018

Uh oh!

adrinjalali commented Oct 11, 2018

Uh oh!

jnothman left a comment

Uh oh!

adrinjalali commented Oct 11, 2018

Uh oh!

jnothman commented Oct 14, 2018

Uh oh!

jnothman Oct 16, 2018

Uh oh!

Uh oh!

Uh oh!

Fix max_depth overshoot in BFS expansion of trees #12344

Fix max_depth overshoot in BFS expansion of trees #12344

Uh oh!

Conversation

adrinjalali commented Oct 10, 2018

Uh oh!

amueller commented Oct 10, 2018

Uh oh!

jnothman left a comment

Choose a reason for hiding this comment

Uh oh!

jnothman Oct 10, 2018

Choose a reason for hiding this comment

Uh oh!

jnothman commented Oct 10, 2018

Uh oh!

adrinjalali commented Oct 11, 2018

Uh oh!

jnothman left a comment

Choose a reason for hiding this comment

Uh oh!

adrinjalali commented Oct 11, 2018

Uh oh!

jnothman commented Oct 14, 2018

Uh oh!

jnothman Oct 16, 2018

Choose a reason for hiding this comment

Uh oh!

Uh oh!