-
-
Notifications
You must be signed in to change notification settings - Fork 25.8k
[MRG] ENH Consistent loss name for squared error #19310
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[MRG] ENH Consistent loss name for squared error #19310
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am mostly on board with this change. In the short term it would be a bit painful for users and educational material, but I am all for being more consistent.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A few minor comments otherwise LGTM. Thanks!
How to write a whatsnew entry for this?
|
I would prefer one entry that contains a list where each item maps from the old name to the new name for each estimator. I suspect we would need to update this list as we change other names. This would be similar to what we did for gradient boosting when there were multiple features: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Once a changelog is added. Also +1 for a single entry. Thanks!
Also there are CI failure and merge conflicts currently... |
@thomasjpfan I placed an entry in the what's new directly below Changelog before the sections of submodules as this change concerns many different submodules. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One comment regarding BaseEnsemble
sklearn/ensemble/_base.py
Outdated
if getattr(estimator, "criterion", None) == "mse": | ||
estimator.set_params(criterion="squared_error") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Estimators such as BaggingClassifier
accepts a base_estimator
parameter. This means that third-party estimators passed as base_estimator
could have a "criterion" parameter that does not support "squared_error". Currently, I see two ways around this:
-
The hacky workaround would be to detect the classes we made the name change to and only make this change for those classes.
-
A cleaner solution would be to remove
criterion
fromself.estimator_params
and have a "additional_estimator_params" kwarg in_make_estimator
. This way the caller can set aself.criterion_
and pass it into_make_estimator
. This would be...more code tho and maybe a little over-engineered for something we are going to remove anyways.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As we're going to remove this piece of code anyway, I'd go for solution 1. Deprecation for criterion="mse"
was made for:
- ExtraTreesRegressor
- RandomForestRegressor
- DecisionTreeRegressor
- ExtraTreeRegressor
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The problem is that RandomForestRegressor inherits from BaseEnsemble. Suggestions?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 for the first solution ("detect the classes we made the name change to and only make this change for those classes"). It feels like that a saner and more predictable way to handle deprecation.
But actually, I don't understand why we even need to do this. Why not just let the warning passthrough if the user has code with a base estimator explicitly constructed with criterion="mse"
?
If they just use the default, they should not get any warning, no?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When a user sets RandomForestRegressor(criterion='mse')
, this PR currently raises a warning in RandomForestRegressor.fit
. Later in _make_estimator
, criterion
would be passed down to base estimators through self.estimator_params
. This would raise warnings again when the base estimator fit
methods are called.
The problem is that RandomForestRegressor inherits from BaseEnsemble. Suggestions?
I think we only need to re-set criterion for RandomForestRegressor
and ExtraTreesRegressor
, where base_estimator
can not be passed in and criterion can be passed in. So for maintainability, we only need to detect for ExtraTreeRegressor
and DecisionTreeRegressor
in _make_estimator
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Another "more correct" OOP solution would be to remove "criterion" from estimator_params
in RandomForestRegressor
and ExtraTreesRegressor
, and when we warn about criterion
in BaseForest.fit
, we set criterion
correctly:
self._validate_estimator()
# TODO: Remove in v1.2
if isinstance(self, (RandomForestRegressor, ExtraTreesRegressor)) self.criterion == "mse":
warn(...)
self.base_estimator_.criterion = "squared_error"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you check again with 7d220b8?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just a few small comments but otherwise LGTM!
sklearn/ensemble/_base.py
Outdated
if getattr(estimator, "criterion", None) == "mse": | ||
estimator.set_params(criterion="squared_error") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 for the first solution ("detect the classes we made the name change to and only make this change for those classes"). It feels like that a saner and more predictable way to handle deprecation.
But actually, I don't understand why we even need to do this. Why not just let the warning passthrough if the user has code with a base estimator explicitly constructed with criterion="mse"
?
If they just use the default, they should not get any warning, no?
@thomasjpfan's remaining comment has been addressed, let's merge :) |
Thanks @lorentzenchr! |
To fix pip installation due to scikit-learn change of option names in versions >1.2.0 (scikit-learn/scikit-learn#19310)
Reference Issues/PRs
Partially solves #18248.
What does this implement/fix? Explain your changes.
This PR renames all variations of squared error to
"squared_error"
.Questions
One open question is: What to do with the export of Tree in sklearn/tree/_export.py? Replace "mse" with "squared_error" or keep the name "mse"?
Currently, the text for
criterion="squared_error"
is set to"mse"
.