-
-
Notifications
You must be signed in to change notification settings - Fork 25.8k
[MRG] FIX Correct brier_score_loss when there's only one class in y_true #13628
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
ping @jnothman |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay. We could also raise an error if it's all not 1 and pos_label is not specified. Maybe in the future?
Add what's new?
|
||
if normalize: # Normalize predicted values into interval [0, 1] | ||
y_prob = (y_prob - y_prob.min()) / (y_prob.max() - y_prob.min()) | ||
elif y_prob.min() < 0 or y_prob.max() > 1: | ||
raise ValueError("y_prob has values outside [0, 1] and normalize is " | ||
"set to False.") | ||
|
||
y_true = _check_binary_probabilistic_predictions(y_true, y_prob) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why are we no longer validating y_prob?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
y_prob is already validated above.
sklearn/metrics/classification.py
Outdated
if pos_label is None: | ||
pos_label = y_true.max() | ||
if (np.array_equal(labels, [0, 1]) or |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Share this code with _binary_clf_curve
? And document the behaviour?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems difficult since the definition of pos_label=None is different?
Maybe in the future if someone complains? I guess it's not so important. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Update the parameter documentation please
docstring updated. IMO it's not so easy to describe current behavior. |
sklearn/metrics/classification.py
Outdated
Label of the positive class. If None, the maximum label is used as | ||
positive class | ||
Label of the positive class. | ||
When ``pos_label=None``, if y_true is in {-1, 1} or {0, 1}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Defaults to the greater label unless y_true is all 0 or all -1 in which case pos_label defaults to 1.
How about that?
Much better, I also simplify the code. |
ping @jnothman since the previous PR is tagged as 0.21 by you. |
ping @jnothman since the previous PR is tagged as 0.21 by you :) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is as far as I got today!!
sklearn/metrics/classification.py
Outdated
labels = np.unique(y_true) | ||
if len(labels) > 2: | ||
raise ValueError("Only binary classification is supported. " | ||
"Provided labels %s." % labels) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"Provided labels %s." % labels) | |
"Labels in y_true: %s." % labels) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @qinhanmin2014
doc/whats_new/v0.21.rst
Outdated
@@ -439,6 +439,10 @@ Support for Python 3.4 and below has been officially dropped. | |||
and now it returns NaN and raises :class:`exceptions.UndefinedMetricWarning`. | |||
:issue:`12855` by :user:`Pawel Sendyk <psendyk>`. | |||
|
|||
- |Fix| Fixed a bug where :func:`metrics.brier_score_loss` will sometimes | |||
return incorrect result when there's only one class in ``y_true``. | |||
:issue:`13628` by :user:`Hanmin Qin <qinhanmin2014>`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
:issue:`13628` by :user:`Hanmin Qin <qinhanmin2014>`. | |
:pr:`13628` by :user:`Hanmin Qin <qinhanmin2014>`. |
Thanks @qinhanmin2014 I made the change in the what's new |
…_true (scikit-learn#13628)" This reverts commit d01f763.
…_true (scikit-learn#13628)" This reverts commit d01f763.
closes #8459, closes #9300, closes #9301, closes #9562, closes #11245
Correct brier_score_loss when there's only one class in y_true.
This is an ugly solution, but will fix existing bugs, and avoid backward incompatibility.
Remove some redundant code in calibration_curve and brier_score_loss.