MNT Improve error message with implicit pos_label in brier_score_loss #15412

qinhanmin2014 · 2019-10-31T03:48:13Z

Seems that people often specify pos_label when y_true is str :)

import numpy as np
from sklearn.metrics import brier_score_loss
y_true = np.array(["neg", "pos", "pos", "neg"])
y_pred = np.array([0.8, 0.6, 0.4, 0.2])
brier_score_loss(y_true, y_pred)

TypeError                                 Traceback (most recent call last)
<ipython-input-1-0a4406be1adf> in <module>
      3 y_true = np.array(["neg", "pos", "pos", "neg"])
      4 y_pred = np.array([0.8, 0.6, 0.4, 0.2])
----> 5 brier_score_loss(y_true, y_pred)

d:\github\scikit-learn\sklearn\metrics\_classification.py in brier_score_loss(y_true, y_prob, sample_weight, pos_label)
   2487             pos_label = 1
   2488         else:
-> 2489             pos_label = y_true.max()
   2490     y_true = np.array(y_true == pos_label, int)
   2491     return np.average((y_true - y_prob) ** 2, weights=sample_weight)

D:\Anaconda3\envs\dev\lib\site-packages\numpy\core\_methods.py in _amax(a, axis, out, keepdims, initial)
     26 def _amax(a, axis=None, out=None, keepdims=False,
     27           initial=_NoValue):
---> 28     return umr_maximum(a, axis, None, out, keepdims, initial)
     29 
     30 def _amin(a, axis=None, out=None, keepdims=False,

TypeError: cannot perform reduce with flexible type

jnothman · 2019-10-31T06:12:40Z

sklearn/metrics/tests/test_classification.py

@@ -2180,6 +2180,16 @@ def test_brier_score_loss():
    assert_almost_equal(
        brier_score_loss(['foo'], [0.4], pos_label='foo'), 0.36)

+    # correctly infer pos_label


Should this be tested in test_common.py?

I guess not, because we only infer pos_label in this way in brier_score_loss.

jnothman

Does this need what's new

glemaitre

We should modify the documentation to be more specific with string regarding the inference.

sklearn/metrics/tests/test_classification.py

glemaitre · 2019-11-07T12:41:06Z

sklearn/metrics/_classification.py

@@ -2486,6 +2486,6 @@ def brier_score_loss(y_true, y_prob, sample_weight=None, pos_label=None):
                np.array_equal(labels, [-1])):
            pos_label = 1
        else:
-            pos_label = y_true.max()
+            pos_label = labels[-1]


We need to update comment on the top of the if statemtent.
We need also to improve the documentation to be more specific in the docstring.

the comment is still valid and we already noted down the expected behaviour: Defaults to the greater label unless y_true is all 0 or all -1 in which case pos_label defaults to 1.

What I meant is to move this comment in the first branch (if np.array_equal ...) of the if and add the behavior in the second one.

I think the comment is OK since "otherwise pos_label is set to the greater label" is still true here . But yeah the docstring could be clarified to say that strings are also ordered by lexicographic order (which makes absolutely no sense to me)

doc/whats_new/v0.22.rst

Co-Authored-By: Guillaume Lemaitre <g.lemaitre58@gmail.com>

qinhanmin2014 · 2019-11-07T14:14:52Z

assert score1 == pytest.approx(score2)

should we deprecate assert_almost_equal?

glemaitre · 2019-11-07T14:24:39Z

should we deprecate assert_almost_equal?

This is still a undeprecated numpy function so I would say no.

qinhanmin2014 · 2019-11-07T14:32:09Z

This is still a undeprecated numpy function so I would say no.

This seems confusing, which one should we encourage contributors to use?

glemaitre · 2019-11-07T14:39:11Z

We encourage to use pytest. Only when we want to check something on array, we us `assert_***`. Deprecating will be annoying because we need to change our tests. This is a similar problem to the PEP8 issue.

…

On Thu, 7 Nov 2019 at 15:32, Hanmin Qin ***@***.***> wrote: This is still a undeprecated numpy function so I would say no. This seems confusing, which one should we encourage contributors to use? — You are receiving this because you commented. Reply to this email directly, view it on GitHub <#15412?email_source=notifications&email_token=ABY32P7YEQJVV5UN2X7GSELQSQRJRA5CNFSM4JHD2DOKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEDMS4BQ#issuecomment-551104006>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ABY32P73J4L6KGFV3DAMXG3QSQRJRANCNFSM4JHD2DOA> .

-- Guillaume Lemaitre Scikit-learn @ Inria Foundation https://glemaitre.github.io/

qinhanmin2014 · 2019-11-07T14:44:15Z

We encourage to use pytest. Only when we want to check something on array,
we us assert_***.

Actually we've already deprecated lots of things, e.g., assert_equal, assert_not_equal.
And I can't understand what do you mean by "check something on array"

glemaitre · 2019-11-07T14:52:52Z

`assert_array_equal`, `assert_allclose`, ... these functions would work on arrays while we do not use the numpy assert for scaler `assert_equal`, `assert_greater`, ...

…

On Thu, 7 Nov 2019 at 15:45, Hanmin Qin ***@***.***> wrote: We encourage to use pytest. Only when we want to check something on array, we us assert_***. Actually we've already deprecated lots of things, e.g., assert_equal, assert_not_equal. And I can't understand what do you mean by "check something on array" — You are receiving this because you commented. Reply to this email directly, view it on GitHub <#15412?email_source=notifications&email_token=ABY32P457CM2VGH5LD42NFDQSQSXHA5CNFSM4JHD2DOKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEDMUDSY#issuecomment-551109067>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ABY32P4FSRAGQ43T6WVW26TQSQSXHANCNFSM4JHD2DOA> .

-- Guillaume Lemaitre Scikit-learn @ Inria Foundation https://glemaitre.github.io/

qinhanmin2014 · 2019-11-07T15:02:12Z

assert_array_equal, assert_allclose, ... these functions would work on
arrays while we do not use the numpy assert for scaler assert_equal,
assert_greater, ...

thanks, that's strange but I'm now able to understand. Is it the final decision in scikit-learn?

glemaitre · 2019-11-07T15:22:50Z

I think that #14222 was going towards this direction.

…

On Thu, 7 Nov 2019 at 16:03, Hanmin Qin ***@***.***> wrote: assert_array_equal, assert_allclose, ... these functions would work on arrays while we do not use the numpy assert for scaler assert_equal, assert_greater, ... thanks, that's strange but I'm now able to understand. Is it the final decision in scikit-learn? — You are receiving this because you commented. Reply to this email directly, view it on GitHub <#15412?email_source=notifications&email_token=ABY32P43OH4M2RT26DMVDWTQSQU2JA5CNFSM4JHD2DOKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEDMWAFY#issuecomment-551116823>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ABY32P5CZQS4A66X2PDOYJTQSQU2JANCNFSM4JHD2DOA> .

-- Guillaume Lemaitre Scikit-learn @ Inria Foundation https://glemaitre.github.io/

NicolasHug

Sorry if I'm misunderstanding something, but I think we should stop treating strings as something that can be ordered.

I think that 'pos_label' should be specified if strings are passed. There is no sensible way to infer the positive labels when targets are strings (the lexicographic order is, IMHO, completely arbitrary).

NicolasHug · 2019-11-08T14:23:21Z

doc/whats_new/v0.22.rst

@@ -604,6 +604,10 @@ Changelog
  used as the :term:`scoring` parameter of model-selection tools.
  :pr:`14417` by `Thomas Fan`_.

+- |Fix| Fixed a bug where :func:`metrics.brier_score_loss` will raise an error


Suggested change

- |Fix| Fixed a bug where :func:`metrics.brier_score_loss` will raise an error

- |Fix| Fixed a bug where :func:`metrics.brier_score_loss` would raise an error

NicolasHug · 2019-11-08T14:29:03Z

sklearn/metrics/_classification.py

@@ -2486,6 +2486,6 @@ def brier_score_loss(y_true, y_prob, sample_weight=None, pos_label=None):
                np.array_equal(labels, [-1])):
            pos_label = 1
        else:
-            pos_label = y_true.max()
+            pos_label = labels[-1]


I think the comment is OK since "otherwise pos_label is set to the greater label" is still true here . But yeah the docstring could be clarified to say that strings are also ordered by lexicographic order (which makes absolutely no sense to me)

qinhanmin2014 · 2019-11-08T14:55:02Z

I think that #14222 was going towards this direction.

Yes, we've already deprecated things like assert_equal because of pytest, so I want to ask whether we should also deprecate assert_almost_equal.

Sorry if I'm misunderstanding something, but I think we should stop treating strings as something that can be ordered.

Actually we do so in plot_roc_curve and plot_precision_recall_curve :)
But personally I agree. Things we need to decide is whether we want to keep pos_label=None and infer when y_true is in {0, 1} / {-1, 1}. I prefer to deprecate pos_label=None and always use pos_label=1.

ogrisel · 2019-11-08T15:06:23Z

pos_label=1. is going to be the cause of some weird issue when y_true takes values in [1, 2] for instance. Dealing with backward compat is likely to be very hard in this case.

ogrisel · 2019-11-08T15:14:30Z

Sorry if I'm misunderstanding something, but I think we should stop treating strings as something that can be ordered.

I agree. I am not sure we should merge this PR as it is because we don't have a consensus. For now I think it's fine to raise an error when the users use string labels and the brier score together without explicit pos_label, as long as the error message is explicit enough.

qinhanmin2014 · 2019-11-09T07:52:26Z

I agree. I am not sure we should merge this PR as it is because we don't have a consensus.

@ogrisel the aim of this PR is to fix bugs of current behaviour, not to introduce new bahaviour.

NicolasHug · 2019-11-10T14:11:09Z

I agree with @ogrisel suggestion. This is not a new behavior this is a bug fix.

qinhanmin2014 · 2019-11-10T15:04:47Z

I agree with @ogrisel suggestion. This is not a new behavior this is a bug fix.

But "This is not a new behavior this is a bug fix." is actually my suggestion?

NicolasHug · 2019-11-10T16:04:01Z

No the bugfix is to raise a proper error, which is also what I proposed

jnothman · 2019-11-10T21:55:48Z

Sorry if I'm misunderstanding something, but I think we should stop treating strings as something that can be ordered.

Although it's dealing with scorers rather than underlying metrics, the usability issues could be partially addressed by a solution to #12385, since this "scorer builder" could construct a Brier scorer for each class, and only consider a single one positive if requested explicitly by the user.... That tool would explicitly have the job of "describe your task, and I'll give you appropriate scorers"

NicolasHug · 2019-11-17T13:25:17Z

Yes it fixes the incorrect inference of pos_label for strings. It made no sense before, now we error with a proper error message. This is a changed behavior (as a bug fix) and it's worth a what's new.

qinhanmin2014 · 2019-11-17T13:44:54Z

This is a changed behavior (as a bug fix) and it's worth a what's new.

We also raise an error before, this PR only improves the error message, do you still want a what's new? @NicolasHug

NicolasHug · 2019-11-17T13:55:10Z

right, sorry

no need for a whatsnew

qinhanmin2014 · 2019-11-22T03:41:47Z

ping @NicolasHug @ogrisel let's merge? The doc is wrong so prehaps we should put this into 0.22.

NicolasHug · 2019-11-22T12:08:49Z

Why put back the 'O' check?

qinhanmin2014 · 2019-11-22T12:12:56Z

@NicolasHug
See #15562 (comment)
@ogrisel merged his PR himself so I think he's confident.

NicolasHug · 2019-11-22T12:16:36Z

@ogrisel what about #15412 (comment)

qinhanmin2014 · 2019-11-26T12:48:13Z

@ogrisel what about #15412 (comment)

ping @ogrisel ? thanks

adrinjalali · 2020-04-22T10:47:55Z

removing from the milestone, happy for it to be back when we get back to it.

jnothman

Otherwise lgtm

jnothman · 2020-04-27T13:56:23Z

sklearn/metrics/_classification.py

    if pos_label is None:
-        if (np.array_equal(labels, [0]) or
+        if labels.dtype.kind in ('O', 'U', 'S'):


Suggested change

if labels.dtype.kind in ('O', 'U', 'S'):

if any(isinstance(label, str) for label in labels):

thomasjpfan · 2020-04-27T23:38:07Z

sklearn/metrics/tests/test_classification.py

+    assert score1 == pytest.approx(score2)
+
+    # positive class if correctly inferred an object array with all ints
+    y_pred_num_obj = np.array([0, 1, 1, 0], dtype=object)


This test case was added which was enabled by #15412 (comment) and with this fb199bd diff

cmarmo · 2020-08-15T21:04:18Z

@lucyleeow @glemaitre, another interesting PR about pos_label and scores?

FIX Correctly infer pos_label in brier_score_loss

840adc0

jnothman reviewed Oct 31, 2019

View reviewed changes

jnothman approved these changes Nov 2, 2019

View reviewed changes

whats new

b9d750c

jnothman added Bug Waiting for Reviewer labels Nov 2, 2019

jnothman added this to the 0.22 milestone Nov 2, 2019

qinhanmin2014 closed this Nov 7, 2019

qinhanmin2014 reopened this Nov 7, 2019

glemaitre reviewed Nov 7, 2019

View reviewed changes

qinhanmin2014 and others added 2 commits November 7, 2019 08:10

Apply suggestions from code review

538317e

Co-Authored-By: Guillaume Lemaitre <g.lemaitre58@gmail.com>

Merge branch 'master' into brier_score_loss

a8af41f

NicolasHug reviewed Nov 8, 2019

View reviewed changes

review

bf45832

qinhanmin2014 mentioned this pull request Nov 19, 2019

[MRG] Improve error message with implicit pos_label in _binary_clf_curve #15562

Merged

qinhanmin2014 added 2 commits November 22, 2019 11:19

Merge remote-tracking branch 'upstream/master' into brier_score_loss

9adb3a5

consistency with merged PR

a9d84be

qinhanmin2014 mentioned this pull request Nov 27, 2019

Release 0.22rc3 #15715

Merged

jnothman modified the milestones: 0.22, 0.23 Dec 5, 2019

github-actions bot added the module:metrics label Mar 2, 2020

adrinjalali removed this from the 0.23 milestone Apr 22, 2020

jnothman approved these changes Apr 27, 2020

View reviewed changes

thomasjpfan added 3 commits April 27, 2020 18:27

Merge remote-tracking branch 'upstream/master' into pr/15412

8ba6f45

CLN Address comments

fb199bd

DOC Adds comment

8635fd8

thomasjpfan reviewed Apr 27, 2020

View reviewed changes

cmarmo added help wanted Stalled and removed Waiting for Reviewer labels Aug 15, 2020

glemaitre self-assigned this Aug 18, 2020

glemaitre mentioned this pull request Aug 18, 2020

MNT make error message consistent in brier_score_loss #18183

Merged

cmarmo removed the help wanted label Aug 18, 2020

thomasjpfan closed this in #18183 Sep 29, 2020

	- \|Fix\| Fixed a bug where :func:`metrics.brier_score_loss` will raise an error
	- \|Fix\| Fixed a bug where :func:`metrics.brier_score_loss` would raise an error

	if labels.dtype.kind in ('O', 'U', 'S'):
	if any(isinstance(label, str) for label in labels):

Uh oh!

MNT Improve error message with implicit pos_label in brier_score_loss #15412

MNT Improve error message with implicit pos_label in brier_score_loss #15412

Uh oh!

Conversation

qinhanmin2014 commented Oct 31, 2019

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jnothman left a comment

Choose a reason for hiding this comment

Uh oh!

glemaitre left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

qinhanmin2014 commented Nov 7, 2019

Uh oh!

glemaitre commented Nov 7, 2019

Uh oh!

qinhanmin2014 commented Nov 7, 2019

Uh oh!

glemaitre commented Nov 7, 2019 via email

Uh oh!

qinhanmin2014 commented Nov 7, 2019

Uh oh!

glemaitre commented Nov 7, 2019 via email

Uh oh!

qinhanmin2014 commented Nov 7, 2019

Uh oh!

glemaitre commented Nov 7, 2019 via email

Uh oh!

NicolasHug left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

qinhanmin2014 commented Nov 8, 2019

Uh oh!

ogrisel commented Nov 8, 2019

Uh oh!

ogrisel commented Nov 8, 2019

Uh oh!

qinhanmin2014 commented Nov 9, 2019

Uh oh!

NicolasHug commented Nov 10, 2019

Uh oh!

qinhanmin2014 commented Nov 10, 2019

Uh oh!

NicolasHug commented Nov 10, 2019

Uh oh!

jnothman commented Nov 10, 2019

Uh oh!

NicolasHug commented Nov 17, 2019

Uh oh!

qinhanmin2014 commented Nov 17, 2019

Uh oh!

NicolasHug commented Nov 17, 2019

Uh oh!

qinhanmin2014 commented Nov 22, 2019

Uh oh!

NicolasHug commented Nov 22, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

qinhanmin2014 commented Nov 22, 2019

Uh oh!

NicolasHug commented Nov 22, 2019 •

edited

Loading