[MRG] ENH Consistent loss name for squared error #19310

lorentzenchr · 2021-01-31T13:51:44Z

Reference Issues/PRs

Partially solves #18248.

What does this implement/fix? Explain your changes.

This PR renames all variations of squared error to "squared_error".

Questions

One open question is: What to do with the export of Tree in sklearn/tree/_export.py? Replace "mse" with "squared_error" or keep the name "mse"?
Currently, the text for criterion="squared_error" is set to "mse".

thomasjpfan

I am mostly on board with this change. In the short term it would be a bit painful for users and educational material, but I am all for being more consistent.

doc/modules/ensemble.rst

sklearn/ensemble/_gb.py

sklearn/tree/_export.py

rth

A few minor comments otherwise LGTM. Thanks!

doc/modules/ensemble.rst

examples/ensemble/plot_gradient_boosting_quantile.py

sklearn/linear_model/_ransac.py

lorentzenchr · 2021-02-27T17:31:06Z

How to write a whatsnew entry for this?

Several entries per module affected.
One meta entry for all changes together.

thomasjpfan · 2021-02-27T23:30:33Z

I would prefer one entry that contains a list where each item maps from the old name to the new name for each estimator. I suspect we would need to update this list as we change other names. This would be similar to what we did for gradient boosting when there were multiple features:

rth

Once a changelog is added. Also +1 for a single entry. Thanks!

rth · 2021-03-02T11:17:34Z

Also there are CI failure and merge conflicts currently...

lorentzenchr · 2021-03-12T17:05:31Z

@thomasjpfan I placed an entry in the what's new directly below Changelog before the sections of submodules as this change concerns many different submodules.

thomasjpfan

One comment regarding BaseEnsemble

thomasjpfan · 2021-03-13T19:11:47Z

sklearn/ensemble/_base.py

+        if getattr(estimator, "criterion", None) == "mse":
+            estimator.set_params(criterion="squared_error")


Estimators such as BaggingClassifier accepts a base_estimator parameter. This means that third-party estimators passed as base_estimator could have a "criterion" parameter that does not support "squared_error". Currently, I see two ways around this:

The hacky workaround would be to detect the classes we made the name change to and only make this change for those classes.

A cleaner solution would be to remove criterion from self.estimator_params and have a "additional_estimator_params" kwarg in _make_estimator. This way the caller can set a self.criterion_ and pass it into _make_estimator. This would be...more code tho and maybe a little over-engineered for something we are going to remove anyways.

As we're going to remove this piece of code anyway, I'd go for solution 1. Deprecation for criterion="mse" was made for:

ExtraTreesRegressor

RandomForestRegressor

DecisionTreeRegressor

ExtraTreeRegressor

The problem is that RandomForestRegressor inherits from BaseEnsemble. Suggestions?

+1 for the first solution ("detect the classes we made the name change to and only make this change for those classes"). It feels like that a saner and more predictable way to handle deprecation.

But actually, I don't understand why we even need to do this. Why not just let the warning passthrough if the user has code with a base estimator explicitly constructed with criterion="mse"?

If they just use the default, they should not get any warning, no?

When a user sets RandomForestRegressor(criterion='mse'), this PR currently raises a warning in RandomForestRegressor.fit. Later in _make_estimator, criterion would be passed down to base estimators through self.estimator_params. This would raise warnings again when the base estimator fit methods are called.

The problem is that RandomForestRegressor inherits from BaseEnsemble. Suggestions?

I think we only need to re-set criterion for RandomForestRegressor and ExtraTreesRegressor, where base_estimator can not be passed in and criterion can be passed in. So for maintainability, we only need to detect for ExtraTreeRegressor and DecisionTreeRegressor in _make_estimator.

Another "more correct" OOP solution would be to remove "criterion" from estimator_params in RandomForestRegressor and ExtraTreesRegressor, and when we warn about criterion in BaseForest.fit, we set criterion correctly:

self._validate_estimator() # TODO: Remove in v1.2 if isinstance(self, (RandomForestRegressor, ExtraTreesRegressor)) self.criterion == "mse": warn(...) self.base_estimator_.criterion = "squared_error"

Can you check again with 7d220b8?

ogrisel

Just a few small comments but otherwise LGTM!

doc/whats_new/v1.0.rst

ogrisel · 2021-03-15T09:44:11Z

sklearn/ensemble/_base.py

+        if getattr(estimator, "criterion", None) == "mse":
+            estimator.set_params(criterion="squared_error")


+1 for the first solution ("detect the classes we made the name change to and only make this change for those classes"). It feels like that a saner and more predictable way to handle deprecation.

But actually, I don't understand why we even need to do this. Why not just let the warning passthrough if the user has code with a base estimator explicitly constructed with criterion="mse"?

If they just use the default, they should not get any warning, no?

sklearn/ensemble/_forest.py

ogrisel · 2021-03-19T14:21:25Z

@thomasjpfan's remaining comment has been addressed, let's merge :)

ogrisel · 2021-03-19T14:21:43Z

Thanks @lorentzenchr!

(scikit-learn/scikit-learn#19310)

scikit-learn/scikit-learn#19310

To fix pip installation due to scikit-learn change of option names in versions >1.2.0 (scikit-learn/scikit-learn#19310)

lorentzenchr added 18 commits January 31, 2021 13:02

MNT deprecate mse criterion in tree module

820b7b7

MNT deprecate mse criterion for RandomForestRegressor

627c343

MNT deprecate criterion mse and loss ls in GradientBoosting

bc3b7f8

MNT deprecate loss least_squares in HistGradientBoostingRegressor

2fbc6ee

MNT deprecate loss squared_loss in linear_model SGD

fdd21f6

MNT/TST replace criterion 'mse' by 'squared_error' in PDP tests

590f2f6

MNT/TST forgot a few deprecated 'ls' in gradient boosting tests

fa7f8bd

MNT/TST replace squared_loss in test_sgd.py

ab4c861

MNT deprecate loss squared_loss in RANSACRegressor

7d3d2bd

MNT internally rename squared_loss to squared_error in neural_network

67ceac9

MNT replace losses in benchmarks

83bb09a

DOC replace losses in docs

baec17d

EXA replace losses in exampels

cb0c4e4

MNT replace least_squares in HGBT utils

fded6f7

CLN correct directive deprecated

0777251

CLN filter FutureWarning for squared_loss in SGD tests

68e1f9b

CLN hickups in SGD tests due to param checks in init of BaseSGD

8692240

Merge branch 'main' into consistent_squared_error

1d570bd

lorentzenchr changed the title ~~[WIP] Consistent loss name for squared error~~ [MRG] ENH Consistent loss name for squared error Feb 1, 2021

lorentzenchr mentioned this pull request Feb 1, 2021

RFC Consistent options/names for loss and criterion #18248

Closed

3 tasks

lorentzenchr added 2 commits February 18, 2021 20:00

Merge branch 'main' into consistent_squared_error

1bf1c0b

CLN fix double import of pytest

1e9683a

thomasjpfan reviewed Feb 20, 2021

View reviewed changes

doc/modules/ensemble.rst Outdated Show resolved Hide resolved

sklearn/ensemble/_gb.py Outdated Show resolved Hide resolved

sklearn/tree/_export.py Outdated Show resolved Hide resolved

rth reviewed Feb 23, 2021

View reviewed changes

lorentzenchr added 2 commits February 27, 2021 18:20

address review comments 1st round

91ec366

Merge branch 'main' into consistent_squared_error

cc94841

thomasjpfan mentioned this pull request Mar 1, 2021

FIX Deep copy criterion in trees to fix concurrency bug #19580

Merged

rth approved these changes Mar 2, 2021

View reviewed changes

lorentzenchr added 3 commits March 2, 2021 21:45

FIX test_export.py

e3e92d7

Merge branch 'main' into consistent_squared_error

0bfc742

DOC add whatsnew entry

0179ac9

lorentzenchr added this to the 1.0 milestone Mar 9, 2021

rth requested a review from thomasjpfan March 9, 2021 20:29

thomasjpfan reviewed Mar 13, 2021

View reviewed changes

ogrisel approved these changes Mar 15, 2021

View reviewed changes

thomasjpfan reviewed Mar 15, 2021

View reviewed changes

sklearn/ensemble/_forest.py Outdated Show resolved Hide resolved

lorentzenchr added 3 commits March 15, 2021 20:06

DOC use |API| tag in whatsnew

b50bd75

FIX criterion="mse" test in forest

e288a6a

FIX check for DecisionTreeRegressor ExtraTreeRegressor in ensemble base

7d220b8

ogrisel merged commit b9d6db8 into scikit-learn:main Mar 19, 2021

lorentzenchr deleted the consistent_squared_error branch March 19, 2021 19:06

glemaitre mentioned this pull request Apr 22, 2021

Release 0.24.2 #19954

Merged

12 tasks

lorentzenchr mentioned this pull request Apr 25, 2021

MNT remove deprecation of least_squares in HGBT #19976

Closed

This was referenced Apr 4, 2022

DEP deviance in favor of log_loss for GradientBoostingClassifier #23036

Merged

ENH add criterion log_loss as alternative to entropy in trees and forests #23047

Merged

eddiebergman mentioned this pull request Nov 15, 2022

Update scikit learn 1.2 automl/auto-sklearn#1611

Closed

54 tasks

eddiebergman added a commit to automl/auto-sklearn that referenced this pull request Nov 15, 2022

chore: update criterion

b379b22

(scikit-learn/scikit-learn#19310)

eddiebergman added a commit to automl/auto-sklearn that referenced this pull request Nov 15, 2022

chore(space): RandomForestRegressor criterion

db947ab

scikit-learn/scikit-learn#19310

eddiebergman added a commit to automl/auto-sklearn that referenced this pull request Nov 15, 2022

chore(space): loss HistGradientBoostingRegressor

247fa6d

scikit-learn/scikit-learn#19310

eddiebergman added a commit to automl/auto-sklearn that referenced this pull request Nov 15, 2022

chore(space): Loss SGDRegressor

490c6aa

scikit-learn/scikit-learn#19310

eddiebergman added a commit to automl/auto-sklearn that referenced this pull request Nov 15, 2022

chore(space): DecisionTreeRegressor

68d489b

scikit-learn/scikit-learn#19310

eddiebergman added a commit to automl/auto-sklearn that referenced this pull request Nov 15, 2022

chore(space): ExtraTreesRegressor

277ddef

scikit-learn/scikit-learn#19310

rfwebster added a commit to rfwebster/scikit-optimize that referenced this pull request Jan 15, 2024

Update requirements.txt

3e565c0

To fix pip installation due to scikit-learn change of option names in versions >1.2.0 (scikit-learn/scikit-learn#19310)

rfwebster mentioned this pull request Jan 15, 2024

Update requirements.txt scikit-optimize/scikit-optimize#1199

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[MRG] ENH Consistent loss name for squared error #19310

[MRG] ENH Consistent loss name for squared error #19310

lorentzenchr commented Jan 31, 2021 •

edited

Loading

thomasjpfan left a comment

rth left a comment

lorentzenchr commented Feb 27, 2021

thomasjpfan commented Feb 27, 2021

rth left a comment

rth commented Mar 2, 2021

lorentzenchr commented Mar 12, 2021

thomasjpfan left a comment

thomasjpfan Mar 13, 2021

lorentzenchr Mar 14, 2021

lorentzenchr Mar 14, 2021

ogrisel Mar 15, 2021

thomasjpfan Mar 15, 2021 •

edited

Loading

thomasjpfan Mar 15, 2021 •

edited

Loading

lorentzenchr Mar 15, 2021

ogrisel left a comment

ogrisel Mar 15, 2021

ogrisel commented Mar 19, 2021

ogrisel commented Mar 19, 2021

		if getattr(estimator, "criterion", None) == "mse":
		estimator.set_params(criterion="squared_error")

[MRG] ENH Consistent loss name for squared error #19310

[MRG] ENH Consistent loss name for squared error #19310

Conversation

lorentzenchr commented Jan 31, 2021 • edited Loading

Reference Issues/PRs

What does this implement/fix? Explain your changes.

Questions

thomasjpfan left a comment

Choose a reason for hiding this comment

rth left a comment

Choose a reason for hiding this comment

lorentzenchr commented Feb 27, 2021

thomasjpfan commented Feb 27, 2021

rth left a comment

Choose a reason for hiding this comment

rth commented Mar 2, 2021

lorentzenchr commented Mar 12, 2021

thomasjpfan left a comment

Choose a reason for hiding this comment

thomasjpfan Mar 13, 2021

Choose a reason for hiding this comment

lorentzenchr Mar 14, 2021

Choose a reason for hiding this comment

lorentzenchr Mar 14, 2021

Choose a reason for hiding this comment

ogrisel Mar 15, 2021

Choose a reason for hiding this comment

thomasjpfan Mar 15, 2021 • edited Loading

Choose a reason for hiding this comment

thomasjpfan Mar 15, 2021 • edited Loading

Choose a reason for hiding this comment

lorentzenchr Mar 15, 2021

Choose a reason for hiding this comment

ogrisel left a comment

Choose a reason for hiding this comment

ogrisel Mar 15, 2021

Choose a reason for hiding this comment

ogrisel commented Mar 19, 2021

ogrisel commented Mar 19, 2021

lorentzenchr commented Jan 31, 2021 •

edited

Loading

thomasjpfan Mar 15, 2021 •

edited

Loading

thomasjpfan Mar 15, 2021 •

edited

Loading