efficient grid search for random forests #3652

mblondel · 2014-09-10T03:04:56Z

To get the best performance off Random Forests, it is necessary to tune parameters like max_depth, min_samples_split and min_samples_leaf. If we wrap a RF in GridSearchCV, trees are built from scratch every time. However, for depth first tree induction, deeper trees share the same base as shallower trees. An idea to speed up grid search is thus to not rebuild trees from scratch every time. Here's an example for max_depth=[10, 9, .., 1].

Set max_depth=10
Build n_estimators fully developed trees
Prune trees to have a maximum depth of max_depth
Create a RF for this max_depth and evaluate it using the current train/test split
Decrease max_depth and go to step 3

Such an algorithm could be wrapped in RandomForestClassifierCV / RandomForestRegressorCV classes.

The text was updated successfully, but these errors were encountered:

mblondel · 2014-09-13T02:18:53Z

Just remembered that our implementation currently only stores node values for terminal nodes. We would need an option to compute the node values for non-terminal nodes too.

Another idea is that trees don't need to be explicitly pruned. It would be easier/faster to add max_depth, min_samples_split and min_samples_leaf options directly to the apply and predict method of Tree. When evaluating a tree, we can just return the current node value when max_depth, min_samples_split and min_samples_leaf are satisfied.

glouppe · 2014-09-13T13:21:16Z

This would not work for min_samples_leaf, as this criteria might discard
splits that may have been better otherwise. Eg, an unbalanced split of
the root.

Le 13 sept. 2014 04:18, "Mathieu Blondel" notifications@github.com a
écrit :

Just remembered that our implementation currently only stores node values
for terminal nodes. We would need an option to compute the node values for
non-terminal nodes too.

Another idea is that trees don't need to be explicitly pruned. It would
be easier/faster to add max_depth, min_samples_split and min_samples_leaf
options directly to the apply and predict method of Tree. When evaluating a
tree, we can just return the current node value when max_depth,
min_samples_split and min_samples_leaf are satisfied.

—
Reply to this email directly or view it on GitHub.

glouppe · 2015-05-11T07:37:11Z

Just remembered that our implementation currently only stores node values for terminal nodes. We would need an option to compute the node values for non-terminal nodes too.

This is now no longer true :)

adrinjalali · 2024-04-18T11:59:00Z

I don't think we're adding CV classes for random forests, and the idea of warm_start is discussed in other places. So closing this one.

amueller added the Enhancement label Jan 22, 2015

cmarmo added the module:ensemble label Dec 6, 2021

adrinjalali closed this as not planned Won't fix, can't repro, duplicate, stale Apr 18, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

efficient grid search for random forests #3652

efficient grid search for random forests #3652

mblondel commented Sep 10, 2014

mblondel commented Sep 13, 2014

Uh oh!

glouppe commented Sep 13, 2014

Uh oh!

glouppe commented May 11, 2015

Uh oh!

adrinjalali commented Apr 18, 2024

Uh oh!

Uh oh!

efficient grid search for random forests #3652

efficient grid search for random forests #3652

Comments

mblondel commented Sep 10, 2014

mblondel commented Sep 13, 2014

Uh oh!

glouppe commented Sep 13, 2014

Uh oh!

glouppe commented May 11, 2015

Uh oh!

adrinjalali commented Apr 18, 2024

Uh oh!