Skip to content

ENH Updating defaults for RandomForestRegressor and Classifier #20803

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 21 commits into from
Nov 29, 2021

Conversation

bsun94
Copy link
Contributor

@bsun94 bsun94 commented Aug 21, 2021

Reference Issues/PRs

Enhancement for #20111 - set max_features defaults 1 for Regressors and "sqrt" for Classifiers.

What does this implement/fix? Explain your changes.

For the RandomForest and ExtraTreesRegressors classes, sets default for max_features to be 1 with updates to the docstrings. For the corresponding Classifiers, sets the default to be "sqrt" with updates to docstrings. An update was also made to the documentation on the default setting of 1 corresponding to bagged trees.

Any other comments?

N/A

@ogrisel
Copy link
Member

ogrisel commented Aug 23, 2021

The code to actually raise a FutureWarning with the appropriate deprecation message when the users passes "auto" is missing.

Copy link
Member

@lorentzenchr lorentzenchr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@bsun94 Thanks for tackling this issue. I've some comments and think that we should also deprecate "auto" in the decision trees.

@bsun94
Copy link
Contributor Author

bsun94 commented Aug 28, 2021

thanks @lorentzenchr ! I've put in the suggested changes for the forest classes. Please let me know how they look - if all good, I can do the same for the trees.

Regarding the test fail above, it looks like a certain step in the CI pipeline took too long a time. I did add a test to ensure full test coverage of the _forest.py module after my changes, but I ran that test locally and it succeeded (in 0.67s), so shouldn't be a problem. I'm not sure if this would be a problem on my side?

Copy link
Member

@lorentzenchr lorentzenchr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some more comments.

@@ -398,6 +398,27 @@ def fit(self, X, y, sample_weight=None):
FutureWarning,
)

if self.max_features == "auto":
warn(
"The prior default of 'auto' for max_features is "
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I find the message ambiguous. I don't really think that we should speak about "default" because the only way that a user gets this warning is that it did not use the default value but explicitly provide a value (that is the same as the default one). So it could be confusing. I would find better a message along these lines:

`max_features="auto"` has been deprecated in 1.0 and will be removed in
1.2. To keep the past behaviour, explicitly set `max_features=1.0` or
remove this parameter as it is also the default value for
RandomForestRegressor and ExtraTreesRegressor.

Comment on lines 414 to 418
"The prior default of 'auto' for max_features is "
"deprecated. Results of the fit, however, do not "
"change. Please use the new default for "
"RandomForestClassifiers and ExtraTreesClassifiers, "
"which is 'sqrt'.",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The same here:

`max_features="auto"` has been deprecated in 1.0 and will be removed in
1.2. To keep the past behaviour, explicitly set `max_features='sqrt'` or
remove this parameter as it is also the default value for
RandomForestClassifier and ExtraTreesClassifier.

randomness can be achieved by setting smaller values, e.g. 0.3.

.. deprecated:: 1.0
Previous default option "auto" has been deprecated.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that we need two entries here. One to highlight the change of default and another for the deprecation:

.. versionchanged:: 1.0
   The default of `max_features` changed from `"auto"` to 1.0.

.. deprecated:: 1.0
   The `"auto"` option was deprecated in 1.0 and will be removed
   in 1.2

Could you change for the different entry of all estimators?

@bsun94
Copy link
Contributor Author

bsun94 commented Sep 8, 2021

thanks @glemaitre ! please let me know if these changes look good, and I can go do the same for trees.

@lorentzenchr
Copy link
Member

@bsun94 I think version 1.0 is now sailing. Could you increment, i.e. 1.0->1.1 and 1.2->1.3. For instance, the whatsnew entry should also be in 1.1.

Copy link
Member

@glemaitre glemaitre left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Otherwise, I am fine with the changes.

Copy link
Member

@glemaitre glemaitre left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just added "Request changes" to spot that this PR has been reviewed.

@bsun94
Copy link
Contributor Author

bsun94 commented Sep 21, 2021

Hi @glemaitre @lorentzenchr , please let me know if this looks good. I've put in the version changes. Given this overall is a decently-sized chunk, should we go forward with it (if all looks good) and I can do the trees stuff under a separate PR (to avoid holding this up)?

@bsun94
Copy link
Contributor Author

bsun94 commented Sep 22, 2021

updated a version number I missed in the latest commit

Copy link
Member

@glemaitre glemaitre left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Member

@lorentzenchr lorentzenchr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Almost good to go after merging with main and these few nitpicks.

@lorentzenchr lorentzenchr linked an issue Nov 15, 2021 that may be closed by this pull request
@glemaitre glemaitre self-assigned this Nov 29, 2021
@glemaitre
Copy link
Member

@lorentzenchr I corrected the last small typo and resolve the conflict with main.
This would be ready to be merged if you are still up to it.

Copy link
Member

@lorentzenchr lorentzenchr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM
@glemaitre Thanks for finishing.

@lorentzenchr lorentzenchr merged commit 4d68797 into scikit-learn:main Nov 29, 2021
samronsin pushed a commit to samronsin/scikit-learn that referenced this pull request Nov 30, 2021
sebp added a commit to sebp/scikit-survival that referenced this pull request Aug 7, 2022
…ivalTrees

The new default is "sqrt", which does the same as the previous default "auto".

See scikit-learn/scikit-learn#20803
sebp added a commit to sebp/scikit-survival that referenced this pull request Aug 7, 2022
…ivalTrees

The new default is "sqrt", which does the same as the previous default "auto".

See scikit-learn/scikit-learn#20803
sebp added a commit to sebp/scikit-survival that referenced this pull request Aug 13, 2022
…ivalTrees

The new default is "sqrt", which does the same as the previous default "auto".

See scikit-learn/scikit-learn#20803
sebp added a commit to sebp/scikit-survival that referenced this pull request Aug 13, 2022
…ivalTrees

The new default is "sqrt", which does the same as the previous default "auto".

See scikit-learn/scikit-learn#20803
eddiebergman added a commit to automl/auto-sklearn that referenced this pull request Nov 15, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Reconsider default of max_features of RandomForestRegressor
4 participants