-
-
Notifications
You must be signed in to change notification settings - Fork 25.8k
[MRG+2] ENH&BUG Add pos_label parameter and fix a bug in average_precision_score #9980
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[MRG+2] ENH&BUG Add pos_label parameter and fix a bug in average_precision_score #9980
Conversation
ping @jnothman Could you please take some time to review it or at least judge whether this is the right way to go? Thanks a lot :) |
I've been taking a little break from review to work on other things, while
still doing triage. this PR is not lost, merely waiting
|
ping @jnothman for a review. Thanks :) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Otherwise LGTM
sklearn/metrics/ranking.py
Outdated
|
||
y_type = type_of_target(y_true) | ||
if y_type == "binary": | ||
_partial_binary_uninterpolated_average_precision = partial( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is used once. It can have a short, uninformative name...
sklearn/metrics/ranking.py
Outdated
@@ -173,6 +174,10 @@ def average_precision_score(y_true, y_score, average="macro", | |||
``'samples'``: | |||
Calculate metrics for each instance, and find their average. | |||
|
|||
pos_label : int or str (default=1) | |||
The label of the positive class. For multilabel-indicator y_true, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps clarify that this is only applied to binary classification. We do not, for instance, binarize a multiclass problem using pos_label
...
sklearn/metrics/ranking.py
Outdated
return _average_binary_score( | ||
_partial_binary_uninterpolated_average_precision, y_true, | ||
y_score, average, sample_weight=sample_weight) | ||
else: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
are we sure at this stage that the y_type is not multiclass?
sklearn/metrics/ranking.py
Outdated
raise ValueError("Parameter pos_label is fixed to 1 for " | ||
"multilabel-indicator y_true. Do not set " | ||
"pos_label or set pos_label to 1.") | ||
return _average_binary_score( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why don't you outdent this and use it in all cases?
@jnothman Thanks a lot for the review :) Comments addressed. I also simplify the code.
At this point not. But we do the check once we finish the preparation and enter |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Please add what's new
@@ -150,7 +151,7 @@ def average_precision_score(y_true, y_score, average="macro", | |||
Parameters | |||
---------- | |||
y_true : array, shape = [n_samples] or [n_samples, n_classes] | |||
True binary labels (either {0, 1} or {-1, 1}). | |||
True binary labels or binary label indicators. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
label indicators -> multilabel indicators ??
@jnothman Thanks for the review :) I update what's new (not sure whether we need two entries)
I'm not creating term but just reverting the previous change (#9557) since we now extend the function. We are using the term |
sounds alright. I'm just trying to help users new to the terminology. they
are label indicators, used for multilabel representation. some of the
terminology is confused for historical reasons: we used to have an
alternative multilabel format.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks good aside from the cosmetic comment.
sklearn/metrics/ranking.py
Outdated
@@ -23,6 +23,7 @@ | |||
import numpy as np | |||
from scipy.sparse import csr_matrix | |||
from scipy.stats import rankdata | |||
from functools import partial |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cosmetic: you should move this import to the top of the imports, as it is an import from the standard library.
I am addressing the cosmetic comment myself and merging. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM.
Merging when CI is green.
Thanks a lot @GaelVaroquaux :) |
Reference Issues/PRs
part of #9829
What does this implement/fix? Explain your changes.
(1)add pos_label parameter to average_precision_score (Although we finally decide not to introduce pos_label in roc_auc_score, I think we might need pos_label here. Because there are no relationship between the results if we reverse the true labels, also, precision/recall all support pos_label)
(2)fix a bug where average_precision_score will sometimes return nan when sample_weight contains 0
I do it here because of (3)
(3)move average_precision scores out of METRIC_UNDEFINED_BINARY (this should contain the regression test for (1) and (2))
Some comments:
(1)For the underlying method(precision_recall_curve), the default value of pos_label is None, but I choose to set the default value of pos_label to 1 because this is what P/R/F is doing. What's more, the meaning of pos_label=None is not clear even in scikit-learn itself (see #10010)
(2)I slightly modified the common test. Currently, the part I modified is only designed for brier_score_loss(I'm doing the same thing in #9562) . I think it is right because as a common test, it seems not good to force metrics to accept str y_true without pos_label.
Any other comments?
cc @jnothman Could you please take some time to review or at least judge whether this is the right way to go? Thanks a lot :)