roc_auc_score fails if dtype is object #2723

pprett · 2014-01-06T20:30:51Z

to reproduce:

import numpy as np
np.random.seed(13)
classes = np.array(['yes', 'no'])
y = classes[np.random.randint(2, size=10)]
t = classes[np.random.randint(2, size=10)]
from sklearn.metrics import roc_auc_score
from sklearn.metrics import accuracy_score
roc_auc_score(t, y)

ValueError: Data is not binary and pos_label is not specified

The text was updated successfully, but these errors were encountered:

pprett · 2014-01-06T22:58:26Z

I'm not sure how this should be dealt with: either rely on lexical ordering and let the last label be the positive or expose the pos_label argument. Anyways, the error is misleading because the data is binary and the pos_label cannot be specified.

amueller · 2014-01-07T00:22:54Z

why can pos_label not be specified? shouldn't it be yes?

jnothman · 2014-01-07T02:54:54Z

The greater-is-positive approach is used elsewhere, but is awkward (e.g.
hinge_loss iirc). And there's a PR to add pos_label here (#2616). Though I
suggest that in order to not make the mess of multiclass averages and
pos_label interaction that we consider something more like #2610, which
allows you to specify a set of labels to average over, such that binary is
no longer a special case (indeed, the lack of labels parameter in
multiclass roc_auc_score is already a bug, because it means a missing class
in one cv fold could give a wildly different macro-average).

On 7 January 2014 11:22, Andreas Mueller notifications@github.com wrote:

why can pos_label not be specified? shouldn't it be yes?

—
Reply to this email directly or view it on GitHubhttps://github.com//issues/2723#issuecomment-31702170
.

arjoly · 2014-01-28T13:50:26Z

related #2616

Nikeshbajaj · 2015-08-04T13:21:46Z

What I can understand is, roc_auc_score works for binary, 0, 1, does not understand 'yes' and 'no' as negative and positive class, which will make confusion for computation of true/false positive/negative, which to be treated what??

-----------try this instead ------This works-------
import numpy as np
np.random.seed(13)
#classes = np.array(['yes', 'no'])
#y = classes[np.random.randint(2, size=10)]
#t = classes[np.random.randint(2, size=10)]
y = np.random.randint(2, size=10)
t = np.random.randint(2, size=10)
from sklearn.metrics import roc_auc_score
from sklearn.metrics import accuracy_score
accuracy_score(t,y)
roc_auc_score(t, y)
-----------------end-----------------------------------------

jnothman · 2017-09-20T05:33:25Z

We should not be able to provide pos_label to roc_auc_score: it is invariant to swapping classes as long as the score is similarly inverted (which I hope is tested but probably isn't...?). Rather we should make sure that roc_auc_score assumes that the greater of binary labels is the one for which the score is being reported.

qinhanmin2014 · 2017-09-20T06:42:05Z

@jnothman Just make sure that this will only work for binary y_true, not for multilabel-indicator y_true, right?

qinhanmin2014 · 2017-09-21T06:09:23Z

@jnothman
I'm still wondering how you want to solve the problem.
For example, if I get y_true = ['good','not-good', 'good', 'not-good'] and y_score = [0.9, 0.1, 0.2, 0.7], y_score represents the probability of 'good'. So you are expecting users to do something like roc_auc_score(y_true, 1-y_pred) or 1-roc_auc_score(y_true, y_pred) instead of roc_auc_score(y_true, y_pred, pos_label='good')? Thanks.

jnothman · 2017-09-24T11:47:38Z

why does y_score represent the probability of good? If someone trains a LogisticRegression model with y=['good','not-good', 'good', 'not-good'], it will automatically map 'not-good' as the positive class. When the user provides the output of the LogisticRegression.predict_proba (or does so via the scorer) as y_score, a high value will correspond to a likely 'not-good' prediction.

qinhanmin2014 · 2017-09-25T09:01:28Z

@jnothman Thanks. I opened #9828 to address your solution. Please have a look when you have time :)

arjoly mentioned this issue Jan 7, 2014

Testing log_loss and hinge_loss under THRESHOLDED_METRICS #2717

Merged

ogrisel removed this from the 0.15 milestone Jun 9, 2014

amueller added Easy Well-defined and straightforward way to resolve Need Contributor labels Oct 25, 2016

amueller mentioned this issue Oct 25, 2016

roc_auc_score doesn't have a pos_class attribute #6873

Closed

amueller removed the Need Contributor label Oct 25, 2016

qinhanmin2014 mentioned this issue Sep 20, 2017

roc_auc_score should be calculated regardless of classification label #9805

Closed

jnothman mentioned this issue Sep 20, 2017

[MRG] Adding pos_label parameter to roc_auc_score (#6873) #6874

Closed

qinhanmin2014 mentioned this issue Sep 25, 2017

[MRG+1] Completely support binary y_true in roc_auc_score #9828

Merged

TomDLT closed this as completed in #9828 Oct 11, 2017

qinhanmin2014 mentioned this issue Oct 23, 2019

FIX Infer pos_label automatically in plot_roc_curve #15316

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

roc_auc_score fails if dtype is object #2723

roc_auc_score fails if dtype is object #2723

pprett commented Jan 6, 2014 •

edited by amueller

Loading

pprett commented Jan 6, 2014

amueller commented Jan 7, 2014

jnothman commented Jan 7, 2014

arjoly commented Jan 28, 2014

Nikeshbajaj commented Aug 4, 2015

jnothman commented Sep 20, 2017

qinhanmin2014 commented Sep 20, 2017

qinhanmin2014 commented Sep 21, 2017 •

edited

Loading

jnothman commented Sep 24, 2017

qinhanmin2014 commented Sep 25, 2017

roc_auc_score fails if dtype is object #2723

roc_auc_score fails if dtype is object #2723

Comments

pprett commented Jan 6, 2014 • edited by amueller Loading

pprett commented Jan 6, 2014

amueller commented Jan 7, 2014

jnothman commented Jan 7, 2014

arjoly commented Jan 28, 2014

Nikeshbajaj commented Aug 4, 2015

jnothman commented Sep 20, 2017

qinhanmin2014 commented Sep 20, 2017

qinhanmin2014 commented Sep 21, 2017 • edited Loading

jnothman commented Sep 24, 2017

qinhanmin2014 commented Sep 25, 2017

pprett commented Jan 6, 2014 •

edited by amueller

Loading

qinhanmin2014 commented Sep 21, 2017 •

edited

Loading