-
-
Notifications
You must be signed in to change notification settings - Fork 25.9k
efficient grid search for random forests #3652
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Just remembered that our implementation currently only stores node values for terminal nodes. We would need an option to compute the node values for non-terminal nodes too. Another idea is that trees don't need to be explicitly pruned. It would be easier/faster to add |
This would not work for min_samples_leaf, as this criteria might discard Le 13 sept. 2014 04:18, "Mathieu Blondel" notifications@github.com a
|
This is now no longer true :) |
I don't think we're adding CV classes for random forests, and the idea of |
To get the best performance off Random Forests, it is necessary to tune parameters like
max_depth
,min_samples_split
andmin_samples_leaf
. If we wrap a RF in GridSearchCV, trees are built from scratch every time. However, for depth first tree induction, deeper trees share the same base as shallower trees. An idea to speed up grid search is thus to not rebuild trees from scratch every time. Here's an example formax_depth=[10, 9, .., 1]
.max_depth=10
n_estimators
fully developed treesmax_depth
max_depth
and evaluate it using the current train/test splitmax_depth
and go to step 3Such an algorithm could be wrapped in
RandomForestClassifierCV
/RandomForestRegressorCV
classes.The text was updated successfully, but these errors were encountered: