Skip to content

MNT Improve error message with implicit pos_label in brier_score_loss #15412

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed

Conversation

qinhanmin2014
Copy link
Member

Seems that people often specify pos_label when y_true is str :)

import numpy as np
from sklearn.metrics import brier_score_loss
y_true = np.array(["neg", "pos", "pos", "neg"])
y_pred = np.array([0.8, 0.6, 0.4, 0.2])
brier_score_loss(y_true, y_pred)
TypeError                                 Traceback (most recent call last)
<ipython-input-1-0a4406be1adf> in <module>
      3 y_true = np.array(["neg", "pos", "pos", "neg"])
      4 y_pred = np.array([0.8, 0.6, 0.4, 0.2])
----> 5 brier_score_loss(y_true, y_pred)

d:\github\scikit-learn\sklearn\metrics\_classification.py in brier_score_loss(y_true, y_prob, sample_weight, pos_label)
   2487             pos_label = 1
   2488         else:
-> 2489             pos_label = y_true.max()
   2490     y_true = np.array(y_true == pos_label, int)
   2491     return np.average((y_true - y_prob) ** 2, weights=sample_weight)

D:\Anaconda3\envs\dev\lib\site-packages\numpy\core\_methods.py in _amax(a, axis, out, keepdims, initial)
     26 def _amax(a, axis=None, out=None, keepdims=False,
     27           initial=_NoValue):
---> 28     return umr_maximum(a, axis, None, out, keepdims, initial)
     29 
     30 def _amin(a, axis=None, out=None, keepdims=False,

TypeError: cannot perform reduce with flexible type

@@ -2180,6 +2180,16 @@ def test_brier_score_loss():
assert_almost_equal(
brier_score_loss(['foo'], [0.4], pos_label='foo'), 0.36)

# correctly infer pos_label
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be tested in test_common.py?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess not, because we only infer pos_label in this way in brier_score_loss.

Copy link
Member

@jnothman jnothman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this need what's new

Copy link
Member

@glemaitre glemaitre left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should modify the documentation to be more specific with string regarding the inference.

@@ -2486,6 +2486,6 @@ def brier_score_loss(y_true, y_prob, sample_weight=None, pos_label=None):
np.array_equal(labels, [-1])):
pos_label = 1
else:
pos_label = y_true.max()
pos_label = labels[-1]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need to update comment on the top of the if statemtent.
We need also to improve the documentation to be more specific in the docstring.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the comment is still valid and we already noted down the expected behaviour: Defaults to the greater label unless y_true is all 0 or all -1 in which case pos_label defaults to 1.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What I meant is to move this comment in the first branch (if np.array_equal ...) of the if and add the behavior in the second one.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the comment is OK since "otherwise pos_label is set to the greater label" is still true here . But yeah the docstring could be clarified to say that strings are also ordered by lexicographic order (which makes absolutely no sense to me)

qinhanmin2014 and others added 2 commits November 7, 2019 08:10
Co-Authored-By: Guillaume Lemaitre <g.lemaitre58@gmail.com>
@qinhanmin2014
Copy link
Member Author

assert score1 == pytest.approx(score2)

should we deprecate assert_almost_equal?

@glemaitre
Copy link
Member

should we deprecate assert_almost_equal?

This is still a undeprecated numpy function so I would say no.

@qinhanmin2014
Copy link
Member Author

This is still a undeprecated numpy function so I would say no.

This seems confusing, which one should we encourage contributors to use?

@glemaitre
Copy link
Member

glemaitre commented Nov 7, 2019 via email

@qinhanmin2014
Copy link
Member Author

We encourage to use pytest. Only when we want to check something on array,
we us assert_***.

Actually we've already deprecated lots of things, e.g., assert_equal, assert_not_equal.
And I can't understand what do you mean by "check something on array"

@glemaitre
Copy link
Member

glemaitre commented Nov 7, 2019 via email

@qinhanmin2014
Copy link
Member Author

assert_array_equal, assert_allclose, ... these functions would work on
arrays while we do not use the numpy assert for scaler assert_equal,
assert_greater, ...

thanks, that's strange but I'm now able to understand. Is it the final decision in scikit-learn?

@glemaitre
Copy link
Member

glemaitre commented Nov 7, 2019 via email

Copy link
Member

@NicolasHug NicolasHug left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry if I'm misunderstanding something, but I think we should stop treating strings as something that can be ordered.

I think that 'pos_label' should be specified if strings are passed. There is no sensible way to infer the positive labels when targets are strings (the lexicographic order is, IMHO, completely arbitrary).

@@ -604,6 +604,10 @@ Changelog
used as the :term:`scoring` parameter of model-selection tools.
:pr:`14417` by `Thomas Fan`_.

- |Fix| Fixed a bug where :func:`metrics.brier_score_loss` will raise an error
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
- |Fix| Fixed a bug where :func:`metrics.brier_score_loss` will raise an error
- |Fix| Fixed a bug where :func:`metrics.brier_score_loss` would raise an error

@@ -2486,6 +2486,6 @@ def brier_score_loss(y_true, y_prob, sample_weight=None, pos_label=None):
np.array_equal(labels, [-1])):
pos_label = 1
else:
pos_label = y_true.max()
pos_label = labels[-1]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the comment is OK since "otherwise pos_label is set to the greater label" is still true here . But yeah the docstring could be clarified to say that strings are also ordered by lexicographic order (which makes absolutely no sense to me)

@qinhanmin2014
Copy link
Member Author

I think that #14222 was going towards this direction.

Yes, we've already deprecated things like assert_equal because of pytest, so I want to ask whether we should also deprecate assert_almost_equal.

Sorry if I'm misunderstanding something, but I think we should stop treating strings as something that can be ordered.

Actually we do so in plot_roc_curve and plot_precision_recall_curve :)
But personally I agree. Things we need to decide is whether we want to keep pos_label=None and infer when y_true is in {0, 1} / {-1, 1}. I prefer to deprecate pos_label=None and always use pos_label=1.

@ogrisel
Copy link
Member

ogrisel commented Nov 8, 2019

pos_label=1. is going to be the cause of some weird issue when y_true takes values in [1, 2] for instance. Dealing with backward compat is likely to be very hard in this case.

@ogrisel
Copy link
Member

ogrisel commented Nov 8, 2019

Sorry if I'm misunderstanding something, but I think we should stop treating strings as something that can be ordered.

I agree. I am not sure we should merge this PR as it is because we don't have a consensus. For now I think it's fine to raise an error when the users use string labels and the brier score together without explicit pos_label, as long as the error message is explicit enough.

@qinhanmin2014
Copy link
Member Author

I agree. I am not sure we should merge this PR as it is because we don't have a consensus.

@ogrisel the aim of this PR is to fix bugs of current behaviour, not to introduce new bahaviour.

@NicolasHug
Copy link
Member

I agree with @ogrisel suggestion. This is not a new behavior this is a bug fix.

@qinhanmin2014
Copy link
Member Author

I agree with @ogrisel suggestion. This is not a new behavior this is a bug fix.

But "This is not a new behavior this is a bug fix." is actually my suggestion?

@NicolasHug
Copy link
Member

No the bugfix is to raise a proper error, which is also what I proposed

@jnothman
Copy link
Member

Sorry if I'm misunderstanding something, but I think we should stop treating strings as something that can be ordered.

Although it's dealing with scorers rather than underlying metrics, the usability issues could be partially addressed by a solution to #12385, since this "scorer builder" could construct a Brier scorer for each class, and only consider a single one positive if requested explicitly by the user.... That tool would explicitly have the job of "describe your task, and I'll give you appropriate scorers"

@NicolasHug
Copy link
Member

Yes it fixes the incorrect inference of pos_label for strings. It made no sense before, now we error with a proper error message. This is a changed behavior (as a bug fix) and it's worth a what's new.

@qinhanmin2014
Copy link
Member Author

This is a changed behavior (as a bug fix) and it's worth a what's new.

We also raise an error before, this PR only improves the error message, do you still want a what's new? @NicolasHug

@NicolasHug
Copy link
Member

right, sorry

no need for a whatsnew

@qinhanmin2014
Copy link
Member Author

ping @NicolasHug @ogrisel let's merge? The doc is wrong so prehaps we should put this into 0.22.

@NicolasHug
Copy link
Member

NicolasHug commented Nov 22, 2019

Why put back the 'O' check?

@qinhanmin2014
Copy link
Member Author

@NicolasHug
See #15562 (comment)
@ogrisel merged his PR himself so I think he's confident.

@NicolasHug
Copy link
Member

@ogrisel what about #15412 (comment)

@qinhanmin2014
Copy link
Member Author

@ogrisel what about #15412 (comment)

ping @ogrisel ? thanks

@qinhanmin2014 qinhanmin2014 mentioned this pull request Nov 27, 2019
@jnothman jnothman modified the milestones: 0.22, 0.23 Dec 5, 2019
@adrinjalali
Copy link
Member

removing from the milestone, happy for it to be back when we get back to it.

@adrinjalali adrinjalali removed this from the 0.23 milestone Apr 22, 2020
Copy link
Member

@jnothman jnothman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Otherwise lgtm

if pos_label is None:
if (np.array_equal(labels, [0]) or
if labels.dtype.kind in ('O', 'U', 'S'):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if labels.dtype.kind in ('O', 'U', 'S'):
if any(isinstance(label, str) for label in labels):

assert score1 == pytest.approx(score2)

# positive class if correctly inferred an object array with all ints
y_pred_num_obj = np.array([0, 1, 1, 0], dtype=object)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test case was added which was enabled by #15412 (comment) and with this fb199bd diff

@cmarmo
Copy link
Contributor

cmarmo commented Aug 15, 2020

@lucyleeow @glemaitre, another interesting PR about pos_label and scores?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants