-
-
Notifications
You must be signed in to change notification settings - Fork 26k
ENH Updating defaults for RandomForestRegressor and Classifier #20803
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ENH Updating defaults for RandomForestRegressor and Classifier #20803
Conversation
The code to actually raise a |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@bsun94 Thanks for tackling this issue. I've some comments and think that we should also deprecate "auto"
in the decision trees.
thanks @lorentzenchr ! I've put in the suggested changes for the forest classes. Please let me know how they look - if all good, I can do the same for the trees. Regarding the test fail above, it looks like a certain step in the CI pipeline took too long a time. I did add a test to ensure full test coverage of the _forest.py module after my changes, but I ran that test locally and it succeeded (in 0.67s), so shouldn't be a problem. I'm not sure if this would be a problem on my side? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some more comments.
sklearn/ensemble/_forest.py
Outdated
@@ -398,6 +398,27 @@ def fit(self, X, y, sample_weight=None): | |||
FutureWarning, | |||
) | |||
|
|||
if self.max_features == "auto": | |||
warn( | |||
"The prior default of 'auto' for max_features is " |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I find the message ambiguous. I don't really think that we should speak about "default" because the only way that a user gets this warning is that it did not use the default value but explicitly provide a value (that is the same as the default one). So it could be confusing. I would find better a message along these lines:
`max_features="auto"` has been deprecated in 1.0 and will be removed in
1.2. To keep the past behaviour, explicitly set `max_features=1.0` or
remove this parameter as it is also the default value for
RandomForestRegressor and ExtraTreesRegressor.
sklearn/ensemble/_forest.py
Outdated
"The prior default of 'auto' for max_features is " | ||
"deprecated. Results of the fit, however, do not " | ||
"change. Please use the new default for " | ||
"RandomForestClassifiers and ExtraTreesClassifiers, " | ||
"which is 'sqrt'.", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The same here:
`max_features="auto"` has been deprecated in 1.0 and will be removed in
1.2. To keep the past behaviour, explicitly set `max_features='sqrt'` or
remove this parameter as it is also the default value for
RandomForestClassifier and ExtraTreesClassifier.
sklearn/ensemble/_forest.py
Outdated
randomness can be achieved by setting smaller values, e.g. 0.3. | ||
|
||
.. deprecated:: 1.0 | ||
Previous default option "auto" has been deprecated. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that we need two entries here. One to highlight the change of default and another for the deprecation:
.. versionchanged:: 1.0
The default of `max_features` changed from `"auto"` to 1.0.
.. deprecated:: 1.0
The `"auto"` option was deprecated in 1.0 and will be removed
in 1.2
Could you change for the different entry of all estimators?
thanks @glemaitre ! please let me know if these changes look good, and I can go do the same for trees. |
@bsun94 I think version 1.0 is now sailing. Could you increment, i.e. 1.0->1.1 and 1.2->1.3. For instance, the whatsnew entry should also be in 1.1. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Otherwise, I am fine with the changes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just added "Request changes" to spot that this PR has been reviewed.
Hi @glemaitre @lorentzenchr , please let me know if this looks good. I've put in the version changes. Given this overall is a decently-sized chunk, should we go forward with it (if all looks good) and I can do the trees stuff under a separate PR (to avoid holding this up)? |
updated a version number I missed in the latest commit |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Almost good to go after merging with main and these few nitpicks.
@lorentzenchr I corrected the last small typo and resolve the conflict with |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
@glemaitre Thanks for finishing.
…t-learn#20803) Co-authored-by: Guillaume Lemaitre <g.lemaitre58@gmail.com>
…ivalTrees The new default is "sqrt", which does the same as the previous default "auto". See scikit-learn/scikit-learn#20803
…ivalTrees The new default is "sqrt", which does the same as the previous default "auto". See scikit-learn/scikit-learn#20803
…ivalTrees The new default is "sqrt", which does the same as the previous default "auto". See scikit-learn/scikit-learn#20803
…ivalTrees The new default is "sqrt", which does the same as the previous default "auto". See scikit-learn/scikit-learn#20803
Reference Issues/PRs
Enhancement for #20111 - set
max_features
defaults 1 for Regressors and "sqrt" for Classifiers.What does this implement/fix? Explain your changes.
For the
RandomForest
andExtraTreesRegressors
classes, sets default formax_features
to be 1 with updates to the docstrings. For the correspondingClassifiers
, sets the default to be"sqrt"
with updates to docstrings. An update was also made to the documentation on the default setting of 1 corresponding to bagged trees.Any other comments?
N/A