Skip to content

[MRG] Fixed brier_score_loss function when y_true only has one label #8459

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 1 commit into from

Conversation

jorkro
Copy link

@jorkro jorkro commented Feb 25, 2017

For issue #6980 a fix was introduced allowing the brier_score_loss function to accept y_true with a single label. Test cases were also added, but only the scenario where the probability is 0.5 was tested. 0.5 is problematic because it produces the same Brier score, regardless of whether the outcome is 0 or 1, and it's producing incorrect scores for values other than 0.5 in this scenario. Since brier_score_loss indirectly calls label_binarize, y_true with a single label is converted to a zero-valued array, no matter what is set as a positive label:

brier_score_loss([1], [0.4], pos_label=1) produces 0.16, because [1] is converted to [0]. The correct answer should have been 0.36.

The fix contains a small modification that allows binary arrays to pass through (so it skips label_binarize). Test cases have also been added.

Let me know if anything else is required.

…s one label by not letting label_binarize convert 1s to 0s.
@codecov
Copy link

codecov bot commented Feb 25, 2017

Codecov Report

Merging #8459 into master will decrease coverage by -0.01%.
The diff coverage is 100%.

@@            Coverage Diff             @@
##           master    #8459      +/-   ##
==========================================
- Coverage   95.47%   95.47%   -0.01%     
==========================================
  Files         342      342              
  Lines       60907    60911       +4     
==========================================
+ Hits        58154    58157       +3     
- Misses       2753     2754       +1
Impacted Files Coverage Δ
sklearn/metrics/classification.py 97.51% <100%> (-0.27%)
sklearn/metrics/tests/test_classification.py 100% <100%> (ø)

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 067adad...048b5e8. Read the comment docs.

@jorkro
Copy link
Author

jorkro commented Feb 26, 2017

Hello,

I'm seeing that code coverage was reduced slightly with this fix. The reason I made the pass-through conditional is because _check_binary_probabilistic_predictions is also being used by calibration_curve. brier_score_loss ensures that all data is binary because of np.array(y_true == pos_label, int), calibration_curve does not.

I see 3 approaches here:

  1. I leave it as-is.
  2. I add non-binary test-cases for calibration_curve so coverage is 100% again. I suspect however that that's not its intended use.
  3. I modify _check_binary_probabilistic_predictions so it raises an error when the data is not binary and skip the label_binarize step if it is.

Let me know which approach you prefer.

@@ -1463,5 +1463,7 @@ def test_brier_score_loss():
assert_raises(ValueError, brier_score_loss, y_true, y_pred + 1.)
assert_raises(ValueError, brier_score_loss, y_true, y_pred - 1.)
# calculate even if only single class in y_true (#6980)
assert_almost_equal(brier_score_loss([0], [0.5]), 0.25)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sorry if i'm being slow, but is there any reason to remove these?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@nelson-liu Considering the other tests that I had added, I felt these were redundant. I can add them back if you want.

Copy link
Member

@jnothman jnothman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@qinhanmin2014 could you double check that this is the right solution?

@qinhanmin2014
Copy link
Member

@jnothman
It's not completely right since it does not consider -1 (it only considers 0 and 1).
I might be -1 of merging this. We lack some basic input validations in brier_score_loss (see #9562). Once we add these checks, I'm afraid we have to rewrite the code.
I think currently, the problem is how do define pos_label=None (see #10010), once we've made the decision, I'd prefer something like #9562 to solve all the known problems in brier_score_loss to avoid unnecessary work.

@jnothman
Copy link
Member

@qinhanmin2014 wrote an age ago:

It's not completely right since it does not consider -1 (it only considers 0 and 1).

That's not actually true, since y_true is binarized in brier_score_loss before being passed to _check_binary_probabilistic_predictions, which is modified here.

We should add tests with pos_label=1 where negative is represented as -1, and pos_label='yes' when negative is represented as 'no', just to be sure.

Otherwise this LGTM

@jnothman jnothman added this to the 0.21 milestone Apr 10, 2019
@qinhanmin2014
Copy link
Member

@qinhanmin2014 wrote an age ago:
It's not completely right since it does not consider -1 (it only considers 0 and 1).
That's not actually true, since y_true is binarized in brier_score_loss before being passed to _check_binary_probabilistic_predictions, which is modified here.

Sigh, I feel that again I wasted a whole year.
@jnothman I mean when y_true only contains -1, all of them will be treated as positive class (they should be treated as negative class).
I think a reasonable solution here will be to copy the definition of pos_label=None from precision_recall_curve/roc_auc_curve and deprecate pos_label=None in 0.22. WDYT?

@jnothman
Copy link
Member

jnothman commented Apr 10, 2019 via email

@qinhanmin2014
Copy link
Member

What do you mean by changing the definition but also deprecating? I would
be happy to just deprecate the current behaviour towards doing the same as
ROC

I mean first change the definition of the default in 0.21 (this will be a small PR which might be easier to review), then deprecate pos_label=None in 0.22.

@jnothman
Copy link
Member

jnothman commented Apr 11, 2019 via email

@qinhanmin2014
Copy link
Member

I don't know if we should deprecate pos_label=None

I'm aware that this might be controversial, this is why I propose to do it in another PR.

but we could warn and then raise an error if y_true is constant...?

Why? constant y_true in still a valid option.

I'll open a PR for you to review tomorrow evening.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants