Skip to content

[MRG + 1] brier_score_loss returns incorrect value when all y_true values are True/1 #9301

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 17 commits into from

Conversation

gnsiva
Copy link

@gnsiva gnsiva commented Jul 8, 2017

Fixes #9300. Fixes #8459

This is in reference to this bug report.

Some of the added tests would not pass using the current version of brier_score_loss. I have added a potential fix for the issue. The solution isn't too pretty, but the only alternative I can see is to change the label_binarizer function in a way that is probably undesirable.

@amueller amueller changed the title brier_score_loss returns incorrect value when all y_true values are True/1 [MRG + 1] brier_score_loss returns incorrect value when all y_true values are True/1 May 23, 2018
@amueller
Copy link
Member

looks good. Can you please add an entry to the changelog in whatsnew?

@gnsiva
Copy link
Author

gnsiva commented May 24, 2018

Sure, thanks for reviewing it @amueller

@@ -434,6 +434,10 @@ Metrics
:issue:`9515` by :user:`Alan Liddell <aliddell>` and
:user:`Manh Dao <manhdao>`.

- Fixed a bug in: :func:`metrics.brier_score_loss`, when all `y_true` values are 1,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

single backticks don't do anything. please use double backticks! (they are fine for issue and user below, but not on their own)

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oops, will fix now!

@@ -1951,7 +1952,7 @@ def brier_score_loss(y_true, y_prob, sample_weight=None, pos_label=None):

pos_label : int or str, default=None
Label of the positive class. If None, the maximum label is used as
positive class
positive class. If all values are 0/False, then 1 is used as pos_label.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe say "0 or False"

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sure

@amueller
Copy link
Member

@jnothman do you think this is an ok fix? We probably need a general test, right?
@gnsiva could you see if you can generalize the test to apply to all metrics with a pos_label parameter, or possibly to all metrics? There are tests in metrics/test_common.py.

@gnsiva
Copy link
Author

gnsiva commented May 24, 2018

Sounds good, I'll look into it



@ignore_warnings
def test_all_true_pos_label():
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for this.

Should we be explicitly passing pos_label=1?

And I'm not sure about what we're saying about metrics where the second arg is a score or probability here...

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed it to explicitly pass that.
I thought the test would be like the equivalent of check_classifiers_one_label but for those metrics.

@@ -444,6 +444,10 @@ Metrics
:issue:`9515` by :user:`Alan Liddell <aliddell>` and
:user:`Manh Dao <manhdao>`.

- Fixed a bug in: :func:`metrics.brier_score_loss`, when all ``y_true`` values are 1,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

drop the first :. Change "1," to "1." and start a new sentence.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

will do

@gnsiva
Copy link
Author

gnsiva commented Jun 7, 2018

@jnothman thanks for taking a look before. I've made the changes suggested, please let me know if there is anything else you would like me to do.

Copy link
Member

@jnothman jnothman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Otherwise LGTM

all_ones = np.array([1, 1, 1])

for name in METRICS_WITH_POS_LABEL:
if not name == 'roc_curve':
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be !=

for name in METRICS_WITH_POS_LABEL:
if not name == 'roc_curve':
metric = ALL_METRICS[name]
perfect_score = metric(examples, examples, pos_label=1)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

metric(examples, examples) is not necessarily perfect prediction score if the second argument is meant to be a score rather than a label. You've already seen that in ROC AUC. Is there some better way to characterise metrics for which 0 score does not necessarily mean "predicting the negative class", so that we can give the excepted metrics some name (like how THRESHOLDED_METRICS is used elsewhere in this file)?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Having a tough time thinking of a snappy name, but would something like METRICS_TARGET_VS_PREDICTION_WITH_POS_LABEL work? It would then encompass all the functions in METRICS_WITH_POS_LABEL except for roc_curve.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

POSITIVE_SCORE_MEANS_POSITIVE_CLASS?

@jnothman jnothman added this to the 0.20 milestone Jun 12, 2018
@gnsiva
Copy link
Author

gnsiva commented Jun 24, 2018

@jnothman thanks have made that change

Copy link
Member

@jnothman jnothman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now I'm just trying to work out how to express it in the negative. Perhaps just MAY_NOT_BE

@gnsiva
Copy link
Author

gnsiva commented Jun 25, 2018

To have a separate list containing only "roc_curve"?
Maybe it could be METRICS_COMPARE_TO_SCORE and METRICS_COMPARE_TO_LABEL?
Or perhaps POS_LABEL_METRICS_COMPARE_TO_SCORE POS_LABEL_METRICS_COMPARE_TO_LABEL

@jnothman
Copy link
Member

jnothman commented Jun 25, 2018 via email

@gnsiva
Copy link
Author

gnsiva commented Jun 25, 2018

Yes sure, so the second argument of roc_curve is y_score, scores for how well the classifier performed (or probabilities etc.).
Whereas for all the other metrics in METRICS_WITH_POS_LABEL, the second argument is y_pred, as in the predicted class by the classifier.

So in the case of COMPARE_TO_SCORE I meant it compares the actual values to a score, and for COMPARE_TO_LABEL the metrics compare the actual label to the predicted class label

@jnothman
Copy link
Member

jnothman commented Jun 25, 2018 via email

@gnsiva
Copy link
Author

gnsiva commented Jun 25, 2018

oops, good point

@amueller
Copy link
Member

what's the status of this?

@qinhanmin2014
Copy link
Member

I think the main issue here is the definition of pos_label=None (See #10010).
Also, seems that there's a bug here, e.g., brier_score_loss([-1, 0, -1], [0, -1, -1], pos_label=None) will be interpreted as brier_score_loss([-1, 0, -1], [0, -1, -1], pos_label=1)
I don't think we need so many tests for such a minor fix and I doubt whether the new common test is necessary.
Seems that the problem is not so serious, so my suggestion is to leave it to 0.21. I think we need to have a consistent definition of pos_label=None within the repo (or simply deprecate it).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

brier_score_loss returns incorrect value when all y_true values are True/1
5 participants