Skip to content

[MRG] FIX Correct brier_score_loss when there's only one class in y_true #13628

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 9 commits into from
Apr 26, 2019
Merged

[MRG] FIX Correct brier_score_loss when there's only one class in y_true #13628

merged 9 commits into from
Apr 26, 2019

Conversation

qinhanmin2014
Copy link
Member

closes #8459, closes #9300, closes #9301, closes #9562, closes #11245
Correct brier_score_loss when there's only one class in y_true.
This is an ugly solution, but will fix existing bugs, and avoid backward incompatibility.
Remove some redundant code in calibration_curve and brier_score_loss.

@qinhanmin2014
Copy link
Member Author

ping @jnothman

@qinhanmin2014 qinhanmin2014 changed the title FIX Correct brier_score_loss when there's only one class in y_true [MRG] FIX Correct brier_score_loss when there's only one class in y_true Apr 13, 2019
Copy link
Member

@jnothman jnothman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay. We could also raise an error if it's all not 1 and pos_label is not specified. Maybe in the future?

Add what's new?


if normalize: # Normalize predicted values into interval [0, 1]
y_prob = (y_prob - y_prob.min()) / (y_prob.max() - y_prob.min())
elif y_prob.min() < 0 or y_prob.max() > 1:
raise ValueError("y_prob has values outside [0, 1] and normalize is "
"set to False.")

y_true = _check_binary_probabilistic_predictions(y_true, y_prob)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why are we no longer validating y_prob?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

y_prob is already validated above.

if pos_label is None:
pos_label = y_true.max()
if (np.array_equal(labels, [0, 1]) or
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Share this code with _binary_clf_curve? And document the behaviour?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems difficult since the definition of pos_label=None is different?

@qinhanmin2014
Copy link
Member Author

We could also raise an error if it's all not 1 and pos_label is not specified. Maybe in the future?

Maybe in the future if someone complains? I guess it's not so important.

Copy link
Member

@jnothman jnothman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Update the parameter documentation please

@qinhanmin2014
Copy link
Member Author

Update the parameter documentation please

docstring updated. IMO it's not so easy to describe current behavior.

Label of the positive class. If None, the maximum label is used as
positive class
Label of the positive class.
When ``pos_label=None``, if y_true is in {-1, 1} or {0, 1},
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Defaults to the greater label unless y_true is all 0 or all -1 in which case pos_label defaults to 1.

How about that?

@qinhanmin2014
Copy link
Member Author

How about that?

Much better, I also simplify the code.

@qinhanmin2014
Copy link
Member Author

ping @jnothman since the previous PR is tagged as 0.21 by you.

@qinhanmin2014
Copy link
Member Author

ping @jnothman since the previous PR is tagged as 0.21 by you :)

Copy link
Member

@jnothman jnothman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is as far as I got today!!

labels = np.unique(y_true)
if len(labels) > 2:
raise ValueError("Only binary classification is supported. "
"Provided labels %s." % labels)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
"Provided labels %s." % labels)
"Labels in y_true: %s." % labels)

Copy link
Member

@jnothman jnothman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@qinhanmin2014 qinhanmin2014 added this to the 0.21 milestone Apr 24, 2019
@@ -439,6 +439,10 @@ Support for Python 3.4 and below has been officially dropped.
and now it returns NaN and raises :class:`exceptions.UndefinedMetricWarning`.
:issue:`12855` by :user:`Pawel Sendyk <psendyk>`.

- |Fix| Fixed a bug where :func:`metrics.brier_score_loss` will sometimes
return incorrect result when there's only one class in ``y_true``.
:issue:`13628` by :user:`Hanmin Qin <qinhanmin2014>`.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
:issue:`13628` by :user:`Hanmin Qin <qinhanmin2014>`.
:pr:`13628` by :user:`Hanmin Qin <qinhanmin2014>`.

@glemaitre glemaitre self-requested a review April 26, 2019 09:15
@glemaitre glemaitre merged commit 296ee0c into scikit-learn:master Apr 26, 2019
@glemaitre
Copy link
Member

Thanks @qinhanmin2014 I made the change in the what's new

xhluca pushed a commit to xhluca/scikit-learn that referenced this pull request Apr 28, 2019
xhluca pushed a commit to xhluca/scikit-learn that referenced this pull request Apr 28, 2019
xhluca pushed a commit to xhluca/scikit-learn that referenced this pull request Apr 28, 2019
@qinhanmin2014 qinhanmin2014 deleted the brier_score branch April 29, 2019 01:21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

brier_score_loss error brier_score_loss returns incorrect value when all y_true values are True/1
3 participants