-
-
Notifications
You must be signed in to change notification settings - Fork 25.8k
DEP auto, binary_crossentropy, categorical_crossentropy in HGBT #23040
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
DEP auto, binary_crossentropy, categorical_crossentropy in HGBT #23040
Conversation
@NicolasHug might be interested. |
I just checked if the choice of the loss function could not be used to over-parameterize the binary classification case as we do for the multiclass case, with one tree per class and per boosting iteration and a softmax inverse link function instead of the logistic sigmoid. At the moment it is not the case: >>> from sklearn.ensemble import HistGradientBoostingClassifier
>>> from sklearn.datasets import make_classification
>>> X, y = make_classification(n_classes=2)
>>> HistGradientBoostingClassifier(loss="categorical_crossentropy").fit(X, y)
Traceback (most recent call last):
Input In [19] in <cell line: 1>
HistGradientBoostingClassifier(loss="categorical_crossentropy").fit(X, y)
File ~/code/scikit-learn/sklearn/ensemble/_hist_gradient_boosting/gradient_boosting.py:326 in fit
self._loss = self._get_loss(sample_weight=sample_weight)
File ~/code/scikit-learn/sklearn/ensemble/_hist_gradient_boosting/gradient_boosting.py:1812 in _get_loss
raise ValueError(
ValueError: loss='categorical_crossentropy' is not suitable for a binary classification problem. Please use loss='auto' or loss='binary_crossentropy' instead. if we ever want to do this we can probably introduce a dedicated parameter instead, but this is probably a YAGNI. |
sklearn/ensemble/_hist_gradient_boosting/tests/test_gradient_boosting.py
Outdated
Show resolved
Hide resolved
We need also need to document the deprecation in what's new once we agree on the new loss. |
There are some tests that still pass
Same is happening for
|
@francoisgoupil Very good point. It took me a little longer to fix all occurrences and make the changes. I hope I've got them all. |
We still have some failling tests. Maybe you can try a |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM once @jeremiedbb's comment above have been dealt with.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
…enchr/scikit-learn into pr/lorentzenchr/23040
scikit-learn#23040) Co-authored-by: Jérémie du Boisberranger <34657725+jeremiedbb@users.noreply.github.com>
Reference Issues/PRs
Partially addresses #18248
What does this implement/fix? Explain your changes.
This PR introduces
loss="log_loss"
forHistGradientBoostingClassifier
and deprecates other options.Any other comments?
Currently,
loss
can be"auto"
,"binary_crossentropy"
and"categorical_crossentropy"
. Can we remove the two options"binary_crossentropy"
and"categorical_crossentropy"
? I don't see a meaningful use case. For instance"categorical_crossentropy"
raisesValueError
on binary problems.