-
-
Notifications
You must be signed in to change notification settings - Fork 25.8k
Fixes #11128 : Default n_estimator value should be 100 #11172
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Upcoming commits with deprecation warning in progress. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please see our documentation on deprecating default values. Changing it like this would break backwards compatibility
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please addd a test that the warning works correctly. We have helpers like assert_warns
sklearn/ensemble/forest.py
Outdated
@@ -157,6 +158,10 @@ def __init__(self, | |||
self.warm_start = warm_start | |||
self.class_weight = class_weight | |||
|
|||
if self.n_estimators == 10: | |||
warnings.warn("'n_estimators' default value was changed to 100 in version 0.20.", DeprecationWarning) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What we want is "n_estimators default value will change to 100 in version 0.20".
By convention, this warning should only be issued in fit
, not __init__
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed this in my latest commit.
doc/whats_new/v0.20.rst
Outdated
@@ -142,6 +142,10 @@ Classifiers and regressors | |||
several times in a row. :issue:`9234` by :user:`andrewww <andrewww>` | |||
and :user:`Minghui Liu <minghui-liu>`. | |||
|
|||
- In both :class:`ensemble.RandomForestClassifier` and `ensemble.ExtraTreesClassifier`, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should be under API changes, rather than enhancements.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed this too in my latest commit.
@jnothman For the test, do I create a new, say, |
add a function in |
|
||
@pytest.mark.parametrize('name', FOREST_ESTIMATORS) | ||
def test_n_estimator_raises_warning(name): | ||
check_raises_n_estimator_warning(name) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is no need to have a separate function.
Please also test that no warning is raised if the user explicitly passed a value
sklearn/ensemble/forest.py
Outdated
@@ -242,6 +243,11 @@ def fit(self, X, y, sample_weight=None): | |||
------- | |||
self : object | |||
""" | |||
# Validate hyperparameters | |||
if self.n_estimators == 10: | |||
warnings.warn("'n_estimators' default value will change to 100 in version 0.20", DeprecationWarning) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We don't want to break the user code.
Currently, with your PR, a user using the default value will witness a change in its code, in particular increasing its fit time.
We probably want to just warn about future change:
- we should not change the value to 100
- the DeprecationWarning should be a FutureWarning
- the warning needs to say "will be changed in 0.22", instead of "has changed in 0.20"
- the warning could also be more pedagogical, explaining this value needs to be changed manually, as suggested in the issue Change default n_estimators in RandomForest (to 100?) #11128 (comment)
…20 to 0.22. Provided a simple helper on changing the default value to 100
…20 to 0.22. Provided a simple helper on changing the default value to 100
@@ -0,0 +1,118 @@ | |||
|
|||
======= |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This doesn't belong here...
sklearn/ensemble/forest.py
Outdated
@@ -242,6 +243,10 @@ def fit(self, X, y, sample_weight=None): | |||
------- | |||
self : object | |||
""" | |||
# Validate hyperparameters | |||
if self.n_estimators == 10: | |||
warnings.warn("'n_estimators' default value will be changed to 100 in version 0.22. For optimal performance, you should make change this value to 100", FutureWarning) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think that you should have the statement about optimal performance.
Also line length must be <80
ForestClassifier = FOREST_CLASSIFIERS[name] | ||
clf = ForestClassifier(n_estimators=100, random_state=1, max_features=1, | ||
max_depth=1) | ||
assert_warns(DeprecationWarning, func=clf.fit, X=iris.data, y=iris.target) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
shouldn't this be assert_no_warnings
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Totally missed that.
I've done well to fix it in my latest commit.
sklearn/ensemble/forest.py
Outdated
@@ -242,6 +243,10 @@ def fit(self, X, y, sample_weight=None): | |||
------- | |||
self : object | |||
""" | |||
# Validate hyperparameters | |||
if self.n_estimators == 10: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Often we would do something like change the default to None, and treat it as "warn and use 10" to avoid warning when the user explicitly sets n_estimators=10
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this in the dev guide? I don't think so, right? Also that deprecations should be in fit
is not, I think? New issue? ;)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jnorthman, I'm not sure I understand you.
Are you advising I change the n_estimators
value to None
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@amueller I've created two issues to take care of both of these.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, defaulting n_estimators=None would be fine. n_estimators='warn'
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we already have some good PRs to serve as examples (e.g., #9397 #10331 ) ? And I think @TomDLT 's comment(#11172 (comment)) has summarized what to do here. |
@@ -46,7 +46,7 @@ | |||
cm = metrics.confusion_matrix(y_test, y_predicted) | |||
print(cm) | |||
|
|||
#import matplotlib.pyplot as plt | |||
#import matlotlib.pyplot as plt |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
?!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please revert this unrelated change
sklearn/ensemble/forest.py
Outdated
@@ -755,7 +760,7 @@ class RandomForestClassifier(ForestClassifier): | |||
|
|||
Parameters | |||
---------- | |||
n_estimators : integer, optional (default=10) | |||
n_estimators : integer, optional (default=100) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The default is still 10. There should be a note that it'll change in 0.22
@Olamyy |
@Olamyy Are you still willing to fix this issue, otherwise I would take care about it. |
Hi @glemaitre, yes I am. Been a little busy. I should be able to complete it tonight. |
#response_container_BBPPID{font-family: initial; font-size:initial; color: initial;} Great thanks ;) Sent from my phone - sorry to be brief and potential misspell.
|
…ent. Made sure all tests are passing.
@@ -46,7 +46,7 @@ | |||
cm = metrics.confusion_matrix(y_test, y_predicted) | |||
print(cm) | |||
|
|||
#import matplotlib.pyplot as plt | |||
#import matlotlib.pyplot as plt |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please revert this unrelated change
@@ -44,8 +44,8 @@ | |||
# more useful. | |||
# Fit the pipeline on the training set using grid search for the parameters | |||
|
|||
# TASK: print the cross-validated scores for the each parameters set |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please revert this unrelated change
ForestClassifier = FOREST_CLASSIFIERS[name] | ||
clf = ForestClassifier(n_estimators=10, random_state=1, max_features=1, | ||
max_depth=1) | ||
assert_warns(FutureWarning, func=clf.fit, X=iris.data, y=iris.target) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Explicitly passing n_estimators=10 should not result in a warning.
Passing nothing for n_estimators should result in n warning
sklearn/ensemble/forest.py
Outdated
@@ -242,6 +243,10 @@ def fit(self, X, y, sample_weight=None): | |||
------- | |||
self : object | |||
""" | |||
# Validate hyperparameters | |||
if self.n_estimators == 10: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, defaulting n_estimators=None would be fine. n_estimators='warn'
This pull request introduces 1 alert when merging e8147fc into 8083ea4 - view on LGTM.com new alerts:
Comment posted by LGTM.com |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The default value doesn't matter so much. The default behaviour must be to warn and use 10 estimators. And the behaviour when a number is passed must be to not warn.
@@ -204,6 +204,31 @@ def check_probability(name): | |||
np.exp(clf.predict_log_proba(iris.data))) | |||
|
|||
|
|||
@pytest.mark.parametrize('name', FOREST_CLASSIFIERS) | |||
def test_warning_raised_with_deprecated_n_estimator(name): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should be called "no_warning_raised".
sklearn/ensemble/forest.py
Outdated
@@ -242,6 +242,12 @@ def fit(self, X, y, sample_weight=None): | |||
------- | |||
self : object | |||
""" | |||
# Validate hyperparameters | |||
if not self.n_estimators: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should be "is None".
@@ -652,6 +652,10 @@ Linear, kernelized and related models | |||
:class:`linear_model.LogisticRegression` when ``verbose`` is set to 0. | |||
:issue:`10881` by :user:`Alexandre Sevin <AlexandreSev>`. | |||
|
|||
- In both :class:`ensemble.RandomForestClassifier` and `ensemble.ExtraTreesClassifier`, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
missing :class:
for the extra trees. Also, the message should read something like " n_estimators
will be changed to 100 in version 0.22. A FutureWarning is raised when the default value is used." or similar. n_estimators=10
doesn't raise a warning (as is tested).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We need to have the default n_estimators=None.
We would like to have this finished before an upcoming release, as we believe it is important that we correct old wrongs.
I hope it is okay if another contributor completes this work
Fixes #11128