Skip to content

[MRG] Makes roc_auc_score and average_precision_score docstrings more explicit #9557

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 4 commits into from
Aug 22, 2017

Conversation

jrbourbeau
Copy link
Contributor

Reference Issue

Fixes issue #9554

What does this implement/fix? Explain your changes.

Changes the description of the y_true parameter for metrics.roc_auc_score
and metrics.average_precision_score to be slightly more explicit about
the allowed values. This is more in line with the metrics.roc_curve and
metrics.precision_recall_curve description of y_true. Specifically, this PR
makes the following modification.

Current description:
True binary labels in binary label indicators.

Modified description:
True binary labels (either {0, 1} or {-1, 1}) or binary label indicators.

This could help with avoid confusion in situations like the following

from sklearn.metrics import roc_auc_score
import numpy as np

y_true = np.array([-1, -1, 0, 0])
y_scores = np.array([0.1, 0.4, 0.35, 0.8])
roc_auc_score(y_true, y_scores)

which yields

ValueError: Data is not binary and pos_label is not specified

Even though there are only two unique classes in y_true.

Changes the description of the y_true parameter to be slightly
more explicit about the allowed values.
@jrbourbeau jrbourbeau changed the title Makes roc_auc_score and average_precision_score docstrings more explicit [MRG] Makes roc_auc_score and average_precision_score docstrings more explicit Aug 15, 2017
@amueller
Copy link
Member

Thanks! Can you also change the entry in roc_curve, from the issue that sounds like it was a bit ambiguous, too.

@jrbourbeau
Copy link
Contributor Author

I changed the y_true description for roc_curve to be:

True binary labels (either {0, 1} or {-1, 1}). If labels are not binary, pos_label should be explicitly given.

roc_curve is a little strange because it allows labels other than {-1, 1} or {0, 1}, but the user has to specify a positive label. But I think the accepted values are (hopefully) clear in the above description.

@amueller
Copy link
Member

I'm not sure I understand If labels are not binary, pos_label should be explicitly given. (I haven't checked the code).

@jrbourbeau
Copy link
Contributor Author

Yeah, it's kind of weird how there are two modes that roc_curve operates in, depending on if pos_label is given. Without specifying pos_label, {0, 1} or {-1, 1} are the only values that can be in y_true. But if pos_label is specified, then any two values can be in y_true. For example, from the roc_curve doc (http://scikit-learn.org/dev/modules/generated/sklearn.metrics.roc_curve.html)

import numpy as np
from sklearn import metrics
y = np.array([1, 1, 2, 2])
scores = np.array([0.1, 0.4, 0.35, 0.8])
fpr, tpr, thresholds = metrics.roc_curve(y, scores, pos_label=2)

Is totally fine because pos_label=2 is specified. If that was left out, then the error

ValueError: Data is not binary and pos_label is not specified

would be raised.

@jrbourbeau
Copy link
Contributor Author

Here is where the values in y_true get explicitly checked if pos_label isn't given

https://github.com/scikit-learn/scikit-learn/blob/baa20488/sklearn/metrics/ranking.py#L311

@amueller
Copy link
Member

I would clarify "labels are not binary" as "labels are not {-1, 1} or {0, 1}". {2, 5} are also binary labels

@jrbourbeau
Copy link
Contributor Author

Thanks for the comment @amueller. I changed the y_true description in roc_curve to

y_true : array, shape = [n_samples]
    True binary labels. If labels are not either {-1, 1} or {0, 1},
    then pos_label should be explicitly given.

@@ -116,7 +116,8 @@ def average_precision_score(y_true, y_score, average="macro",
Parameters
----------
y_true : array, shape = [n_samples] or [n_samples, n_classes]
True binary labels in binary label indicators.
True binary labels (either {0, 1} or {-1, 1}) or binary label
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These are binary label indicators. Just drop "or binary label indicators" here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point @jnothman, thanks

@jnothman
Copy link
Member

LGTM

@jnothman jnothman merged commit 6f42105 into scikit-learn:master Aug 22, 2017
AishwaryaRK pushed a commit to AishwaryaRK/scikit-learn that referenced this pull request Aug 29, 2017
maskani-moh pushed a commit to maskani-moh/scikit-learn that referenced this pull request Nov 15, 2017
jwjohnson314 pushed a commit to jwjohnson314/scikit-learn that referenced this pull request Dec 18, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants