DEP auto, binary_crossentropy, categorical_crossentropy in HGBT #23040

lorentzenchr · 2022-04-03T19:14:19Z

Reference Issues/PRs

Partially addresses #18248

What does this implement/fix? Explain your changes.

This PR introduces loss="log_loss" for HistGradientBoostingClassifier and deprecates other options.

Any other comments?

Currently, loss can be "auto", "binary_crossentropy" and "categorical_crossentropy". Can we remove the two options "binary_crossentropy" and "categorical_crossentropy"? I don't see a meaningful use case. For instance "categorical_crossentropy" raises ValueError on binary problems.

What's new entry after DEP deviance in favor of log_loss for GradientBoostingClassifier #23036 is merged.

lorentzenchr · 2022-04-03T19:14:43Z

@NicolasHug might be interested.

ogrisel · 2022-04-06T10:19:32Z

I just checked if the choice of the loss function could not be used to over-parameterize the binary classification case as we do for the multiclass case, with one tree per class and per boosting iteration and a softmax inverse link function instead of the logistic sigmoid. At the moment it is not the case:

>>> from sklearn.ensemble import HistGradientBoostingClassifier
>>> from sklearn.datasets import make_classification
>>> X, y = make_classification(n_classes=2)
>>> HistGradientBoostingClassifier(loss="categorical_crossentropy").fit(X, y)
Traceback (most recent call last):
  Input In [19] in <cell line: 1>
    HistGradientBoostingClassifier(loss="categorical_crossentropy").fit(X, y)
  File ~/code/scikit-learn/sklearn/ensemble/_hist_gradient_boosting/gradient_boosting.py:326 in fit
    self._loss = self._get_loss(sample_weight=sample_weight)
  File ~/code/scikit-learn/sklearn/ensemble/_hist_gradient_boosting/gradient_boosting.py:1812 in _get_loss
    raise ValueError(
ValueError: loss='categorical_crossentropy' is not suitable for a binary classification problem. Please use loss='auto' or loss='binary_crossentropy' instead.

if we ever want to do this we can probably introduce a dedicated parameter instead, but this is probably a YAGNI.

jeremiedbb · 2022-04-06T12:37:21Z

What's new entry after #23036 is merged.

#23036 is merged :)

sklearn/ensemble/_hist_gradient_boosting/gradient_boosting.py

sklearn/ensemble/_hist_gradient_boosting/tests/test_gradient_boosting.py

ogrisel · 2022-04-06T13:29:39Z

We need also need to document the deprecation in what's new once we agree on the new loss.

francoisgoupil · 2022-04-06T13:34:42Z

There are some tests that still pass loss="categorical_crossentropy" instead of "multiclass_log_loss" to the HistGradientBoostingClassifier and fail when trying to match the expected error and obtaining our new deprecation warning. A possibly non-exhaustive list of failing tests:

test_zero_sample_weights_classification
test_same_predictions_multiclass_classification[10000-8-1-3]
test_same_predictions_multiclass_classification[10000-8-20-3]
test_same_predictions_multiclass_classification[10000-8-1-2]
test_same_predictions_multiclass_classification[10000-8-20-0]
test_same_predictions_multiclass_classification[10000-8-1-1]
test_same_predictions_multiclass_classification[255-4096-1-0]
test_same_predictions_multiclass_classification[255-4096-1-3]
test_same_predictions_multiclass_classification[10000-8-1-4]
test_same_predictions_multiclass_classification[10000-8-20-2]
test_same_predictions_multiclass_classification[10000-8-20-4]
test_same_predictions_multiclass_classification[255-4096-1-4]
test_same_predictions_multiclass_classification[10000-8-1-0]
test_same_predictions_multiclass_classification[10000-8-20-1]
test_same_predictions_multiclass_classification[255-4096-1-1]
test_same_predictions_multiclass_classification[255-4096-1-2]
test_same_predictions_multiclass_classification[255-4096-20-4]
test_same_predictions_multiclass_classification[255-4096-20-0]
test_same_predictions_multiclass_classification[255-4096-20-1]
test_same_predictions_multiclass_classification[255-4096-20-2]
test_same_predictions_multiclass_classification[255-4096-20-3]

Same is happening for loss="binary_crossentropy" instead of "binary_log_loss". A possibly non-exhaustive list of failing tests:

test_same_predictions_classification[255-4096-1-2]
test_same_predictions_classification[255-4096-1-1]
test_same_predictions_classification[255-4096-1-3]
test_same_predictions_classification[255-4096-1-4]
test_same_predictions_classification[255-4096-20-0]
test_same_predictions_classification[255-4096-20-1]
test_same_predictions_classification[255-4096-20-4]
test_same_predictions_classification[255-4096-1-0]
test_same_predictions_classification[255-4096-20-2]
test_same_predictions_classification[255-4096-20-3]
test_same_predictions_classification[1000-8-1-2]
test_same_predictions_classification[1000-8-1-3]
test_same_predictions_classification[1000-8-1-4]
test_same_predictions_classification[1000-8-20-0]
test_same_predictions_classification[1000-8-20-1]
test_same_predictions_classification[1000-8-20-2]
test_same_predictions_classification[1000-8-20-3]
test_same_predictions_classification[1000-8-20-4]
test_same_predictions_classification[1000-8-1-0]
test_same_predictions_classification[1000-8-1-1]

lorentzenchr · 2022-04-06T21:41:08Z

@francoisgoupil Very good point. It took me a little longer to fix all occurrences and make the changes. I hope I've got them all.

ArturoAmorQ · 2022-04-07T08:45:17Z

We still have some failling tests. Maybe you can try a
git grep 'loss="binary_crossentropy"'
and
git grep 'loss="categorical_crossentropy"'

sklearn/ensemble/_hist_gradient_boosting/gradient_boosting.py

ogrisel

LGTM once @jeremiedbb's comment above have been dealt with.

jeremiedbb

LGTM

…enchr/scikit-learn into pr/lorentzenchr/23040

scikit-learn#23040) Co-authored-by: Jérémie du Boisberranger <34657725+jeremiedbb@users.noreply.github.com>

DEP auto, binary_crossentropy, categorical_crossentropy in HGBT

d357fc8

github-actions bot added the module:ensemble label Apr 3, 2022

lorentzenchr added this to the 1.1 milestone Apr 4, 2022

ogrisel reviewed Apr 6, 2022

View reviewed changes

lorentzenchr added 5 commits April 6, 2022 21:04

Merge branch 'main' into call_it_log_loss_in_hgbt

8ef059c

CLN only "log_loss"

9cc49c1

DOC add whatsnew

658217d

TST fix test_loss_deprecated

9662894

MNT replace *_crossentropy by log_loss

adb59af

jeremiedbb added 3 commits April 7, 2022 10:50

fix tests

cec04a4

lint

5c14228

fix tests again

d91795a

jeremiedbb reviewed Apr 7, 2022

View reviewed changes

sklearn/ensemble/_hist_gradient_boosting/gradient_boosting.py Outdated Show resolved Hide resolved

sklearn/ensemble/_hist_gradient_boosting/gradient_boosting.py Outdated Show resolved Hide resolved

ogrisel approved these changes Apr 7, 2022

View reviewed changes

jeremiedbb and others added 3 commits April 7, 2022 14:47

doc

4c9e76c

remove unreachable code

29a80c1

Merge branch 'main' into call_it_log_loss_in_hgbt

dcb25e4

jeremiedbb approved these changes Apr 7, 2022

View reviewed changes

jeremiedbb added 3 commits April 7, 2022 15:55

fix doc

4d610c7

Merge branch 'call_it_log_loss_in_hgbt' of https://github.com/lorentz…

8356879

…enchr/scikit-learn into pr/lorentzenchr/23040

nit

7152e37

jeremiedbb merged commit 5b69652 into scikit-learn:main Apr 7, 2022

lorentzenchr deleted the call_it_log_loss_in_hgbt branch April 8, 2022 05:56

lorentzenchr mentioned this pull request Apr 11, 2022

RFC Consistent options/names for loss and criterion #18248

Closed

3 tasks

jjerphan pushed a commit to jjerphan/scikit-learn that referenced this pull request Apr 29, 2022

DEP loss = auto, binary_crossentropy, categorical_crossentropy in HGBT (

3ccc334

scikit-learn#23040) Co-authored-by: Jérémie du Boisberranger <34657725+jeremiedbb@users.noreply.github.com>

eddiebergman mentioned this pull request Nov 15, 2022

Update scikit learn 1.2 automl/auto-sklearn#1611

Closed

54 tasks

lorentzenchr mentioned this pull request Mar 24, 2023

Use common loss module in gradient boosting #25964

Closed

7 tasks

lorentzenchr mentioned this pull request May 3, 2024

DEP deprecate multi_class in LogisticRegression #28703

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

DEP auto, binary_crossentropy, categorical_crossentropy in HGBT #23040

DEP auto, binary_crossentropy, categorical_crossentropy in HGBT #23040

Uh oh!

lorentzenchr commented Apr 3, 2022 •

edited

Loading

Uh oh!

lorentzenchr commented Apr 3, 2022

Uh oh!

ogrisel commented Apr 6, 2022 •

edited

Loading

Uh oh!

jeremiedbb commented Apr 6, 2022 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

Uh oh!

ogrisel commented Apr 6, 2022

Uh oh!

francoisgoupil commented Apr 6, 2022 •

edited

Loading

Uh oh!

lorentzenchr commented Apr 6, 2022

Uh oh!

ArturoAmorQ commented Apr 7, 2022

Uh oh!

Uh oh!

Uh oh!

ogrisel left a comment

Uh oh!

jeremiedbb left a comment

Uh oh!

Uh oh!

Uh oh!

DEP auto, binary_crossentropy, categorical_crossentropy in HGBT #23040

DEP auto, binary_crossentropy, categorical_crossentropy in HGBT #23040

Uh oh!

Conversation

lorentzenchr commented Apr 3, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Reference Issues/PRs

What does this implement/fix? Explain your changes.

Any other comments?

Uh oh!

lorentzenchr commented Apr 3, 2022

Uh oh!

ogrisel commented Apr 6, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jeremiedbb commented Apr 6, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

ogrisel commented Apr 6, 2022

Uh oh!

francoisgoupil commented Apr 6, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

lorentzenchr commented Apr 6, 2022

Uh oh!

ArturoAmorQ commented Apr 7, 2022

Uh oh!

Uh oh!

Uh oh!

ogrisel left a comment

Choose a reason for hiding this comment

Uh oh!

jeremiedbb left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

lorentzenchr commented Apr 3, 2022 •

edited

Loading

ogrisel commented Apr 6, 2022 •

edited

Loading

jeremiedbb commented Apr 6, 2022 •

edited

Loading

francoisgoupil commented Apr 6, 2022 •

edited

Loading