[MRG] ENH Consistent loss name for squared error #19310

lorentzenchr · 2021-01-31T13:51:44Z

Reference Issues/PRs

Partially solves #18248.

What does this implement/fix? Explain your changes.

This PR renames all variations of squared error to "squared_error".

Questions

One open question is: What to do with the export of Tree in sklearn/tree/_export.py? Replace "mse" with "squared_error" or keep the name "mse"?
Currently, the text for criterion="squared_error" is set to "mse".

thomasjpfan

I am mostly on board with this change. In the short term it would be a bit painful for users and educational material, but I am all for being more consistent.

doc/modules/ensemble.rst

sklearn/ensemble/_gb.py

sklearn/tree/_export.py

rth

A few minor comments otherwise LGTM. Thanks!

doc/modules/ensemble.rst

examples/ensemble/plot_gradient_boosting_quantile.py

sklearn/linear_model/_ransac.py

lorentzenchr · 2021-02-27T17:31:06Z

How to write a whatsnew entry for this?

Several entries per module affected.
One meta entry for all changes together.

thomasjpfan · 2021-02-27T23:30:33Z

I would prefer one entry that contains a list where each item maps from the old name to the new name for each estimator. I suspect we would need to update this list as we change other names. This would be similar to what we did for gradient boosting when there were multiple features:

rth

Once a changelog is added. Also +1 for a single entry. Thanks!

rth · 2021-03-02T11:17:34Z

Also there are CI failure and merge conflicts currently...

lorentzenchr · 2021-03-12T17:05:31Z

@thomasjpfan I placed an entry in the what's new directly below Changelog before the sections of submodules as this change concerns many different submodules.

thomasjpfan

One comment regarding BaseEnsemble

thomasjpfan · 2021-03-13T19:11:47Z

sklearn/ensemble/_base.py

+        if getattr(estimator, "criterion", None) == "mse":
+            estimator.set_params(criterion="squared_error")


Estimators such as BaggingClassifier accepts a base_estimator parameter. This means that third-party estimators passed as base_estimator could have a "criterion" parameter that does not support "squared_error". Currently, I see two ways around this:

The hacky workaround would be to detect the classes we made the name change to and only make this change for those classes.

A cleaner solution would be to remove criterion from self.estimator_params and have a "additional_estimator_params" kwarg in _make_estimator. This way the caller can set a self.criterion_ and pass it into _make_estimator. This would be...more code tho and maybe a little over-engineered for something we are going to remove anyways.

As we're going to remove this piece of code anyway, I'd go for solution 1. Deprecation for criterion="mse" was made for:

ExtraTreesRegressor

RandomForestRegressor

DecisionTreeRegressor

ExtraTreeRegressor

The problem is that RandomForestRegressor inherits from BaseEnsemble. Suggestions?

+1 for the first solution ("detect the classes we made the name change to and only make this change for those classes"). It feels like that a saner and more predictable way to handle deprecation.

But actually, I don't understand why we even need to do this. Why not just let the warning passthrough if the user has code with a base estimator explicitly constructed with criterion="mse"?

If they just use the default, they should not get any warning, no?

When a user sets RandomForestRegressor(criterion='mse'), this PR currently raises a warning in RandomForestRegressor.fit. Later in _make_estimator, criterion would be passed down to base estimators through self.estimator_params. This would raise warnings again when the base estimator fit methods are called.

The problem is that RandomForestRegressor inherits from BaseEnsemble. Suggestions?

I think we only need to re-set criterion for RandomForestRegressor and ExtraTreesRegressor, where base_estimator can not be passed in and criterion can be passed in. So for maintainability, we only need to detect for ExtraTreeRegressor and DecisionTreeRegressor in _make_estimator.

Another "more correct" OOP solution would be to remove "criterion" from estimator_params in RandomForestRegressor and ExtraTreesRegressor, and when we warn about criterion in BaseForest.fit, we set criterion correctly:

self._validate_estimator() # TODO: Remove in v1.2 if isinstance(self, (RandomForestRegressor, ExtraTreesRegressor)) self.criterion == "mse": warn(...) self.base_estimator_.criterion = "squared_error"

Can you check again with 7d220b8?

ogrisel

Just a few small comments but otherwise LGTM!

doc/whats_new/v1.0.rst

ogrisel · 2021-03-15T09:44:11Z

sklearn/ensemble/_base.py

+        if getattr(estimator, "criterion", None) == "mse":
+            estimator.set_params(criterion="squared_error")


+1 for the first solution ("detect the classes we made the name change to and only make this change for those classes"). It feels like that a saner and more predictable way to handle deprecation.

But actually, I don't understand why we even need to do this. Why not just let the warning passthrough if the user has code with a base estimator explicitly constructed with criterion="mse"?

If they just use the default, they should not get any warning, no?

sklearn/ensemble/_forest.py

ogrisel · 2021-03-19T14:21:25Z

@thomasjpfan's remaining comment has been addressed, let's merge :)

ogrisel · 2021-03-19T14:21:43Z

Thanks @lorentzenchr!

(scikit-learn/scikit-learn#19310)

scikit-learn/scikit-learn#19310

To fix pip installation due to scikit-learn change of option names in versions >1.2.0 (scikit-learn/scikit-learn#19310)

lorentzenchr added 18 commits January 31, 2021 13:02

MNT deprecate mse criterion in tree module

820b7b7

MNT deprecate mse criterion for RandomForestRegressor

627c343

MNT deprecate criterion mse and loss ls in GradientBoosting

bc3b7f8

MNT deprecate loss least_squares in HistGradientBoostingRegressor

2fbc6ee

MNT deprecate loss squared_loss in linear_model SGD

fdd21f6

MNT/TST replace criterion 'mse' by 'squared_error' in PDP tests

590f2f6

MNT/TST forgot a few deprecated 'ls' in gradient boosting tests

fa7f8bd

MNT/TST replace squared_loss in test_sgd.py

ab4c861

MNT deprecate loss squared_loss in RANSACRegressor

7d3d2bd

MNT internally rename squared_loss to squared_error in neural_network

67ceac9

MNT replace losses in benchmarks

83bb09a

DOC replace losses in docs

baec17d

EXA replace losses in exampels

cb0c4e4

MNT replace least_squares in HGBT utils

fded6f7

CLN correct directive deprecated

0777251

CLN filter FutureWarning for squared_loss in SGD tests

68e1f9b

CLN hickups in SGD tests due to param checks in init of BaseSGD

8692240

Merge branch 'main' into consistent_squared_error

1d570bd

lorentzenchr changed the title ~~[WIP] Consistent loss name for squared error~~ [MRG] ENH Consistent loss name for squared error Feb 1, 2021

lorentzenchr mentioned this pull request Feb 1, 2021

RFC Consistent options/names for loss and criterion #18248

Closed

3 tasks

lorentzenchr added 2 commits February 18, 2021 20:00

Merge branch 'main' into consistent_squared_error

1bf1c0b

CLN fix double import of pytest

1e9683a

thomasjpfan reviewed Feb 20, 2021

View reviewed changes

doc/modules/ensemble.rst Outdated Show resolved Hide resolved

sklearn/ensemble/_gb.py Outdated Show resolved Hide resolved

sklearn/tree/_export.py Outdated Show resolved Hide resolved

rth reviewed Feb 23, 2021

View reviewed changes

lorentzenchr added 2 commits February 27, 2021 18:20

address review comments 1st round

91ec366

Merge branch 'main' into consistent_squared_error

cc94841

thomasjpfan mentioned this pull request Mar 1, 2021

FIX Deep copy criterion in trees to fix concurrency bug #19580

Merged

rth approved these changes Mar 2, 2021

View reviewed changes

lorentzenchr added 3 commits March 2, 2021 21:45

FIX test_export.py

e3e92d7

Merge branch 'main' into consistent_squared_error

0bfc742

DOC add whatsnew entry

0179ac9

lorentzenchr added this to the 1.0 milestone Mar 9, 2021

rth requested a review from thomasjpfan March 9, 2021 20:29

thomasjpfan reviewed Mar 13, 2021

View reviewed changes

ogrisel approved these changes Mar 15, 2021

View reviewed changes

thomasjpfan reviewed Mar 15, 2021

View reviewed changes

sklearn/ensemble/_forest.py Outdated Show resolved Hide resolved

lorentzenchr added 3 commits March 15, 2021 20:06

DOC use |API| tag in whatsnew

b50bd75

FIX criterion="mse" test in forest

e288a6a

FIX check for DecisionTreeRegressor ExtraTreeRegressor in ensemble base

7d220b8

ogrisel merged commit b9d6db8 into scikit-learn:main Mar 19, 2021

lorentzenchr deleted the consistent_squared_error branch March 19, 2021 19:06

glemaitre mentioned this pull request Apr 22, 2021

Release 0.24.2 #19954

Merged

12 tasks

lorentzenchr mentioned this pull request Apr 25, 2021

MNT remove deprecation of least_squares in HGBT #19976

Closed

This was referenced Apr 4, 2022

DEP deviance in favor of log_loss for GradientBoostingClassifier #23036

Merged

ENH add criterion log_loss as alternative to entropy in trees and forests #23047

Merged

eddiebergman mentioned this pull request Nov 15, 2022

Update scikit learn 1.2 automl/auto-sklearn#1611

Closed

54 tasks

eddiebergman added a commit to automl/auto-sklearn that referenced this pull request Nov 15, 2022

chore: update criterion

b379b22

(scikit-learn/scikit-learn#19310)

eddiebergman added a commit to automl/auto-sklearn that referenced this pull request Nov 15, 2022

chore(space): RandomForestRegressor criterion

db947ab

scikit-learn/scikit-learn#19310

eddiebergman added a commit to automl/auto-sklearn that referenced this pull request Nov 15, 2022

chore(space): loss HistGradientBoostingRegressor

247fa6d

scikit-learn/scikit-learn#19310

eddiebergman added a commit to automl/auto-sklearn that referenced this pull request Nov 15, 2022

chore(space): Loss SGDRegressor

490c6aa

scikit-learn/scikit-learn#19310

eddiebergman added a commit to automl/auto-sklearn that referenced this pull request Nov 15, 2022

chore(space): DecisionTreeRegressor

68d489b

scikit-learn/scikit-learn#19310

eddiebergman added a commit to automl/auto-sklearn that referenced this pull request Nov 15, 2022

chore(space): ExtraTreesRegressor

277ddef

scikit-learn/scikit-learn#19310

rfwebster added a commit to rfwebster/scikit-optimize that referenced this pull request Jan 15, 2024

Update requirements.txt

3e565c0

To fix pip installation due to scikit-learn change of option names in versions >1.2.0 (scikit-learn/scikit-learn#19310)

rfwebster mentioned this pull request Jan 15, 2024

Update requirements.txt scikit-optimize/scikit-optimize#1199

Open

		if getattr(estimator, "criterion", None) == "mse":
		estimator.set_params(criterion="squared_error")

Uh oh!

[MRG] ENH Consistent loss name for squared error #19310

[MRG] ENH Consistent loss name for squared error #19310

Uh oh!

Conversation

lorentzenchr commented Jan 31, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Reference Issues/PRs

What does this implement/fix? Explain your changes.

Questions

Uh oh!

thomasjpfan left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

rth left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

lorentzenchr commented Feb 27, 2021

Uh oh!

thomasjpfan commented Feb 27, 2021

Uh oh!

rth left a comment

Choose a reason for hiding this comment

Uh oh!

rth commented Mar 2, 2021

Uh oh!

lorentzenchr commented Mar 12, 2021

Uh oh!

thomasjpfan left a comment

Choose a reason for hiding this comment

Uh oh!

thomasjpfan Mar 13, 2021

Choose a reason for hiding this comment

Uh oh!

lorentzenchr Mar 14, 2021

Choose a reason for hiding this comment

Uh oh!

lorentzenchr Mar 14, 2021

Choose a reason for hiding this comment

Uh oh!

ogrisel Mar 15, 2021

Choose a reason for hiding this comment

Uh oh!

thomasjpfan Mar 15, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

thomasjpfan Mar 15, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

lorentzenchr Mar 15, 2021

Choose a reason for hiding this comment

Uh oh!

ogrisel left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

ogrisel Mar 15, 2021

Choose a reason for hiding this comment

Uh oh!

Uh oh!

ogrisel commented Mar 19, 2021

Uh oh!

ogrisel commented Mar 19, 2021

Uh oh!

Uh oh!

lorentzenchr commented Jan 31, 2021 •

edited

Loading

thomasjpfan Mar 15, 2021 •

edited

Loading

thomasjpfan Mar 15, 2021 •

edited

Loading