-
-
Notifications
You must be signed in to change notification settings - Fork 25.8k
[MRG+2] ENH labels parameter in P/R/F may extend or reduce label set #4287
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Ahh it seems rebase incorporated a failing test due to #4147. See https://github.com/scikit-learn/scikit-learn/pull/4147/files#r25564051 |
I've changed that test, hopefully not weakening it substantially in the multiclass case. |
(Travis had failed due to a heisenbug and I hadn't noticed for a while...) |
Rebased. |
This is now deprecated. We can dig that out of the codebase. |
Thanks for the reminder. I've cleared this out, I think. |
@@ -498,8 +498,13 @@ def f1_score(y_true, y_pred, labels=None, pos_label=1, average='binary', | |||
y_pred : 1d array-like, or label indicator array / sparse matrix | |||
Estimated targets as returned by a classifier. | |||
|
|||
labels : array | |||
Integer array of labels. | |||
labels : array-like, optional |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's more a list-like, isn't it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think "list-like" is used elsewhere in the codebase for lists of labels/classes... I'm fine for this to be list
.
Rebased |
Rebased and Gaël's doc comment addressed |
@@ -315,7 +382,7 @@ def test_precision_refcall_f1_score_multilabel_unordered_labels(): | |||
y_pred = np.array([[0, 0, 1, 1]]) | |||
for average in ['samples', 'micro', 'macro', 'weighted', None]: | |||
p, r, f, s = precision_recall_fscore_support( | |||
y_true, y_pred, labels=[4, 1, 2, 3], warn_for=[], average=average) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
huh the previous test seems odd...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
indeed
LGTM apart from minor comments. |
Thanks for the review @amueller! |
excluded, for example to calculate a multiclass average ignoring a | ||
majority negative class, while labels not present in the data will | ||
result in 0 components in a macro average. By default, all labels in | ||
``y_true`` and ``y_pred`` are used in sorted order. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you add a comment about the labels
in multi-label classification?
Otherwise thanks, the docstring is a lot better!
Can you add a test ensuring that specifying edit: I mean for micro, weighted and sample averaging as macro is already covered by your test. |
assert_array_equal([.5, 1.], recall_13(average=None)) | ||
assert_equal((.5 + 1.) / 2, recall_13(average='macro')) | ||
assert_equal((.5 * 2 + 1. * 1) / 3, recall_13(average='weighted')) | ||
assert_equal(2. / 3, recall_13(average='micro')) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
assert_array_almost_equal and assert_almost_equal?
Do we still support the custom label order? |
Thanks for the review. Is there a reason we shouldn't still support custom Also not sure what you want me to say about multilabel in the document On 8 June 2015 at 18:06, Arnaud Joly notifications@github.com wrote:
|
I was unclear. Could you specify what you should be specified if multilabel data is passed? I would add something like : "In multi-label classification, labels are the column indices." |
You realise that for micro, weighted and sample averaging, the result would be identical were labels added with no instances? |
Your comments have been addressed, @arjoly. Thanks On 8 June 2015 at 19:24, Arnaud Joly notifications@github.com wrote:
|
Travis error appears unrelated. |
Yes, however I would expect that all labels are specified given the indicator matrix encoding. |
LGTM ! Thanks @jnothman ! |
throwing in what's new. |
and squashed |
shall i merge, then? |
You can !! Thanks @jnothman |
Thanks @jnothman ! |
Thanks! |
You're welcome; I'm glad to have this support a meaningful multiclass micro-average. |
This PR replaces #2610, making the
labels
parameter toprecision_recall_fscore_support
more functional (and better documented, tested), but in accordance with #4192 not deprecatingpos_label
, but instead being restricted to theaverage != 'binary'
case.A common use of the micro-average is to extend the notion of binary P/R/F to the case where there is a frequent "negative class" and multiple classes of interest. Following this PR, explicitly listing the labels of interest allows the negative class to be excluded from a multiclass problem. (The same result can be achieved by transforming a multiclass problem into a multilabel problem excluding one label, but in the model evaluation API that would necessitate a custom and tricky
scoring
object.)