[MRG+1]: Add clarification on random forest regressor default params #13248

abenbihi · 2019-02-25T14:04:39Z

Continues #5862.
Previous PR specified that the default RandomForestRegressor behavior is to use all the input features to pick the best split. The documentation only mentioned that a random subset of input features is used. This is not the case when using the default max_features='auto'. I added the explicitation.

…emble of bagged trees with its default max_features parameter

Merge branch 'ensemble-doc-edits' of https://github.com/jonoleson/scikit-learn into ensemble-doc-edits. Explicit that by default, the RandomForestRegressor uses all input features when picking the best split instead of just a random subset of input features.

doc/modules/ensemble.rst

robintibor · 2019-03-14T16:19:29Z

Not sure if this is the appropriate place to ask, is the inconsistent behavior of max_features='auto' between regressor and classifier (max_features=n_features for regressor and max_features = sqrt(n_features) for classifier,

scikit-learn/sklearn/ensemble/forest.py

Line 1111 in a62775e

- If "auto", then `max_features=n_features`.

scikit-learn/sklearn/ensemble/forest.py

Line 1362 in a62775e

- If "auto", then `max_features=sqrt(n_features)`.

) intended? This caused some confusion when we used it.

banilo · 2019-03-16T21:21:47Z

intended?

As far as I know, Leo Breiman himself suggested different default choices for this hyper-parameter of random forests. Perhaps you are right and different default behavior in RFClassifier versus RFRegressor should be made more explicit in the docstring.

jnothman · 2019-03-18T09:05:33Z

doc/modules/ensemble.rst

+from a sample drawn with replacement (i.e., a bootstrap sample) from the
+training set. When splitting a node during the construction of the tree, the
+best split can be computed using either all input features
+(``max_features=’auto’``) or using the best feature subset of size


It should indeed be clearer that in RFC, max_features='auto' indicates using the square root of the number of input features.

Do you think it could also make sense to have defaults 'sqrt' (for classifier) and None (for regressor) (or introduce a new string 'max' or 'all' or sth for max_features='features')? And remove the whole 'auto' option? as 'auto' seems ambiguous anyways?

+1 but this would require a deprecation cycle.

ogrisel · 2019-04-18T09:57:59Z

I have further rephrased the paragraph to explain the meaning of max_features and the bias-variance trade-off.

ogrisel · 2019-04-18T10:06:58Z

The circle ci failure is caused by an invalid file in the debian apt repository:

Get:6 http://deb.debian.org jessie/main amd64 Packages [9098 kB]
Fetched 10.1 MB in 1s (9787 kB/s)
W: Failed to fetch http://deb.debian.org/debian/dists/jessie-updates/InRelease  Unable to find expected entry 'main/binary-amd64/Packages' in Release file (Wrong sources.list entry or malformed file)

E: Some index files failed to download. They have been ignored, or old ones used instead.
Exited with code 100

I hope this is transient. I have already restarted the build once but it gave the same error.

thomasjpfan · 2019-04-18T11:54:25Z

Merge with master should resolve the circleci issue (master uses Debian stretch implicitly) I opened a pr to use Debian stretch explicitly. #13642

glemaitre · 2019-04-24T14:12:48Z

I restarted the CI since #13642 has been merged.

glemaitre · 2019-04-24T14:21:34Z

doc/modules/ensemble.rst

+from a sample drawn with replacement (i.e., a bootstrap sample) from the
+training set.
+
+Furthermore, when splitting each node during the construction of a tree, the


It is difficult to parse "to split on can be". What about:

the best split is found either from all input features or a random subset of size ``max_features``.

glemaitre · 2019-04-24T14:26:42Z

apart of this LGTM

glemaitre · 2019-06-20T12:45:59Z

Just pushed a nitpicks and merged it. Thanks @abenbihi

…3248)

jonoleson and others added 3 commits November 16, 2015 15:53

Add clarification about random forest regressor actually being an ens…

26136f5

…emble of bagged trees with its default max_features parameter

Fixed line break issue

14ec831

abenbihi changed the title ~~Ensemble doc edits~~ Add clarification on random forest regressor default params Feb 25, 2019

abenbihi mentioned this pull request Feb 25, 2019

Add clarification on random forest regressor default params #5862

Closed

abenbihi changed the title ~~Add clarification on random forest regressor default params~~ DOC: Add clarification on random forest regressor default params Feb 25, 2019

agramfort approved these changes Feb 25, 2019

View reviewed changes

agramfort changed the title ~~DOC: Add clarification on random forest regressor default params~~ [MRG+1]: Add clarification on random forest regressor default params Feb 25, 2019

qinhanmin2014 reviewed Feb 25, 2019

View reviewed changes

doc/modules/ensemble.rst Outdated Show resolved Hide resolved

qinhanmin2014 reviewed Feb 25, 2019

View reviewed changes

doc/modules/ensemble.rst Outdated Show resolved Hide resolved

banilo reviewed Feb 27, 2019

View reviewed changes

doc/modules/ensemble.rst Outdated Show resolved Hide resolved

DOC: rephrasing ensemble forst doc on max_features

7763f08

jnothman reviewed Mar 18, 2019

View reviewed changes

Rephrase description of the max_features parameter

1b208b6

glemaitre reviewed Apr 24, 2019

View reviewed changes

glemaitre self-requested a review April 25, 2019 10:04

glemaitre added 3 commits April 25, 2019 15:50

Merge remote-tracking branch 'origin/master' into pr/assiaben/13248

fd98ee1

nitpicks

255436f

Merge remote-tracking branch 'origin/master' into pr/assiaben/13248

b7d9348

glemaitre approved these changes Jun 20, 2019

View reviewed changes

glemaitre merged commit ab4b4ec into scikit-learn:master Jun 20, 2019

koenvandevelde pushed a commit to koenvandevelde/scikit-learn that referenced this pull request Jul 12, 2019

DOC add clarification on random forest default params (scikit-learn#1…

5c4cc6a

…3248)

jnothman pushed a commit to jnothman/scikit-learn that referenced this pull request Jul 29, 2019

DOC add clarification on random forest default params (scikit-learn#1…

7323081

…3248)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

[MRG+1]: Add clarification on random forest regressor default params #13248

[MRG+1]: Add clarification on random forest regressor default params #13248

Uh oh!

abenbihi commented Feb 25, 2019

Uh oh!

Uh oh!

Uh oh!

Uh oh!

robintibor commented Mar 14, 2019 •

edited

Loading

Uh oh!

banilo commented Mar 16, 2019

Uh oh!

jnothman Mar 18, 2019

Uh oh!

robintibor Mar 18, 2019 •

edited

Loading

Uh oh!

ogrisel Apr 18, 2019

Uh oh!

ogrisel commented Apr 18, 2019

Uh oh!

ogrisel commented Apr 18, 2019

Uh oh!

thomasjpfan commented Apr 18, 2019

Uh oh!

glemaitre commented Apr 24, 2019

Uh oh!

glemaitre Apr 24, 2019

Uh oh!

glemaitre commented Apr 24, 2019

Uh oh!

glemaitre commented Jun 20, 2019

Uh oh!

Uh oh!

Uh oh!

[MRG+1]: Add clarification on random forest regressor default params #13248

[MRG+1]: Add clarification on random forest regressor default params #13248

Uh oh!

Conversation

abenbihi commented Feb 25, 2019

Uh oh!

Uh oh!

Uh oh!

Uh oh!

robintibor commented Mar 14, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

banilo commented Mar 16, 2019

Uh oh!

jnothman Mar 18, 2019

Choose a reason for hiding this comment

Uh oh!

robintibor Mar 18, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ogrisel Apr 18, 2019

Choose a reason for hiding this comment

Uh oh!

ogrisel commented Apr 18, 2019

Uh oh!

ogrisel commented Apr 18, 2019

Uh oh!

thomasjpfan commented Apr 18, 2019

Uh oh!

glemaitre commented Apr 24, 2019

Uh oh!

glemaitre Apr 24, 2019

Choose a reason for hiding this comment

Uh oh!

glemaitre commented Apr 24, 2019

Uh oh!

glemaitre commented Jun 20, 2019

Uh oh!

Uh oh!

robintibor commented Mar 14, 2019 •

edited

Loading

robintibor Mar 18, 2019 •

edited

Loading