Skip to content

max_depth of DecisionTreeRegressor ignored when using max_leaf_nodes #13149

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
reneburghardt opened this issue Feb 12, 2019 · 2 comments
Closed

Comments

@reneburghardt
Copy link

Description

For me it seems, that there is a bug going on when using max_depth and max_leaf_nodes. Whenever I use max_leaf_nodes for a DecisionTreeRegressor, the given max_depth is ignored (or at least not working as expected). When I remove the max_leaf_nodes, it seems to work.

Steps/Code to Reproduce

from sklearn.tree import DecisionTreeRegressor

X = [[52, 34, 1, 4, 305, 1, 253],
 [78, 39, 1, 1, 382, 4, 304],
 [241, 34, 1, 4, 1127, 4, 886]]
y = [[5152], [5635], [23940]]

mt = DecisionTreeRegressor(
    max_depth = 1,
    max_leaf_nodes = 99
)
mt.fit(X, y)
print(mt)
print("Number of nodes: {}".format(mt.tree_.node_count))

Expected Results

I expect a tree with a maximum of 3 nodes (the root and two leafs).

Actual Results

DecisionTreeRegressor(criterion='mse', max_depth=1, max_features=None,
max_leaf_nodes=99, min_impurity_decrease=0.0,
min_impurity_split=None, min_samples_leaf=1,
min_samples_split=2, min_weight_fraction_leaf=0.0,
presort=False, random_state=None, splitter='best')
Number of nodes: 5

Exporting the tree via export_graphviz results in this image:
download

Versions

System:
python: 3.6.7 | packaged by conda-forge | (default, Nov 21 2018, 02:32:25) [GCC 4.8.2 20140120 (Red Hat 4.8.2-15)]
executable: /opt/conda/bin/python
machine: Linux-4.14.79-boot2docker-x86_64-with-debian-buster-sid

BLAS:
macros: HAVE_CBLAS=None
lib_dirs: /opt/conda/lib
cblas_libs: openblas, openblas

Python deps:
pip: 19.0.1
setuptools: 40.6.3
sklearn: 0.20.1
numpy: 1.13.3
scipy: 1.1.0
Cython: 0.28.5
pandas: 0.23.4

@reneburghardt
Copy link
Author

reneburghardt commented Feb 13, 2019

Its highly probable that 'BestFirstTreeBuilder.build()' is ignoring max_depth until now, see lines 349-366:

        # Use BestFirst if max_leaf_nodes given; use DepthFirst otherwise
        if max_leaf_nodes < 0:
            builder = DepthFirstTreeBuilder(splitter, min_samples_split,
                                            min_samples_leaf,
                                            min_weight_leaf,
                                            max_depth,
                                            self.min_impurity_decrease,
                                            min_impurity_split)
        else:
            builder = BestFirstTreeBuilder(splitter, min_samples_split,
                                           min_samples_leaf,
                                           min_weight_leaf,
                                           max_depth,
                                           max_leaf_nodes,
                                           self.min_impurity_decrease,
                                           min_impurity_split)

        builder.build(self.tree_, X, y, sample_weight, X_idx_sorted)

At BestFirstTreeBuilder.build() max_depth is never used.

Is there any reason, max_depth might be ignored when using max_leaf_nodes? I can't see one, but I am quite fresh into machine learning. If there is a reason, it should probably be added to the docs, shouldn't it?

@adrinjalali
Copy link
Member

It's not really ignored, it overshoots it by one, #12344 has solved it in master, and will be out in v0.21. Please reopen if I'm mistaken.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants