[MRG+2] P/R/F: in future, average='binary' iff 2 labels in y one of which is pos_label #4192

jnothman · 2015-02-01T13:23:28Z

As per my comment elsewhere, this attempts to complete #2679 by implementing the converse in the precision-recall-fscore family: binary data should not be handled specially when average != 'binary'. This PR fixes a bug where pos_label=None -- which makes binary data not be handled specially -- was not being treated appropriately.

… pos_label

arjoly · 2015-02-03T09:42:41Z

LGTM!

arjoly · 2015-02-03T09:43:00Z

Thanks @jnothman !

jnothman · 2015-02-03T11:23:45Z

Thanks for the straightfoward review, @arjoly.

jnothman · 2015-02-04T10:41:09Z

I know I've not been especially active of late, but I'd like to finish up the changes that replace #2610 before release. If you could take a look at this @amueller (or someone else), then I can move onto the next bit more easily.

jnothman · 2015-02-15T09:38:54Z

This would love another review... It deserves to be released at the same time as #2679 and I can pursue the next piece of the puzzle if it's merged soon.

amueller · 2015-02-25T18:45:14Z

Sorry, I haven't really been following the "puzzle" closely enough to have a good understanding of what is happening. Trying to catch up.

amueller · 2015-03-02T17:27:43Z

So what will pos_label do if average!= "binary"? Raise an exception?
I realize that is tangential to this PR, but what was the reason for not having a default weighting scheme? Forcing the user to think? Usually we try to do "sensible defaults" which might be either "micro" or "macro".

amueller · 2015-03-02T17:29:39Z

sklearn/metrics/tests/test_classification.py

+    assert_dep_warning = partial(assert_warns, DeprecationWarning)
+    for kwargs, my_assert in [({}, assert_no_warnings),
+                              ({'average': 'binary'}, assert_no_warnings),
+                              ({'average': 'micro'}, assert_dep_warning)]:


Sorry, maybe I'm a bit slow, but why is this supposed to give a deprecation warning?

I think the line before saying "this is deprecated for average != binary" is quite clear.

The point is that if I explicitly say "average='macro'", the score shouldn't automatically ignore one label, though that is the current behaviour when there are fewer than 3 labels exhibited.

Ah. So what would a user do now that wants average precision over two classes?

For the moment, that functionality continues to be provided through pos_label=None....

ahh... I was thrown off by pos_label=1 by default. Never mind then, all looks good :)

jnothman · 2015-03-02T21:34:41Z

So what will pos_label do if average!= "binary"? Raise an exception?

Currently I don't think there's special handling of when the user provides pos_label and average != 'binary'. We could do that, but it's neater to leave pos_label=1 in the code than pos_label='unspecified'. WDYT?

I realize that is tangential to this PR, but what was the reason for not having a default weighting scheme? Forcing the user to think? Usually we try to do "sensible defaults" which might be either "micro" or "macro".

Ideally, that would be the case. Firstly the incumbent default was obscure, rather than sensible. I agree, macro could be justified, but for a function specific to multiclass problems, rather than sniffing the problem type and landing up with issues like #2094 (because the binary case discards one label). In any case I think it's a bad idea to encourage users to report "F1" for a multiclass problem when there are many meanings of that, and certainly if what they used is "weighted macro-average" that is rarely described in the literature merely because it was a default. So I think the sensible default is to work for binary problems.

Most users now will hopefully be accessing this through the scoring interface, where accuracy is a sensible default for classification, and where now they have their choice of basic averaging scheme for multiclass/multilabel P/R/F.

jnothman · 2015-03-02T21:36:37Z

IIRC, you had written tests that weren't doing what they intended because you added or removed a class and that changed the function's behaviour entirely.

amueller · 2015-03-02T21:41:49Z

That is quite possible ;) And I'm all for explicitness. Apart from my other comment +1 for merge.

amueller · 2015-03-02T22:38:21Z

@ogrisel we might want that one in the beta, too ;) [and has +2]

ogrisel · 2015-03-03T08:21:33Z

LGTM as well. Merging. Sorry for the slow response time.

[MRG+2] P/R/F: in future, average='binary' iff 2 labels in y one of which is pos_label

jnothman · 2015-03-03T09:50:36Z

No problem! Thanks for merging.

On 3 March 2015 at 19:21, Olivier Grisel notifications@github.com wrote:

LGTM as well. Merging. Sorry for the slow response time.

—
Reply to this email directly or view it on GitHub
#4192 (comment)
.

TST/FIX in future, average='binary' iff 2 labels in y one of which is…

4fcf20a

… pos_label

jnothman changed the title ~~TST/FIX in future, average='binary' iff 2 labels in y one of which is pos_label~~ [MRG] P/R/F: in future, average='binary' iff 2 labels in y one of which is pos_label Feb 1, 2015

jnothman added this to the 0.16 milestone Feb 1, 2015

arjoly changed the title ~~[MRG] P/R/F: in future, average='binary' iff 2 labels in y one of which is pos_label~~ [MRG+1] P/R/F: in future, average='binary' iff 2 labels in y one of which is pos_label Feb 15, 2015

jnothman mentioned this pull request Feb 24, 2015

[MRG+2] ENH labels parameter in P/R/F may extend or reduce label set #4287

Merged

amueller reviewed Mar 2, 2015
View reviewed changes

amueller changed the title ~~[MRG+1] P/R/F: in future, average='binary' iff 2 labels in y one of which is pos_label~~ [MRG+2] P/R/F: in future, average='binary' iff 2 labels in y one of which is pos_label Mar 2, 2015

ogrisel added a commit that referenced this pull request Mar 3, 2015

Merge pull request #4192 from jnothman/binary_iff_binary

6813528

[MRG+2] P/R/F: in future, average='binary' iff 2 labels in y one of which is pos_label

ogrisel merged commit 6813528 into scikit-learn:master Mar 3, 2015

dan-blanchard mentioned this pull request Mar 3, 2015

When scikit-learn 0.18 comes out, we need to update our F1 metrics in __init__.py EducationalTestingService/skll#231

Closed

jbschiratti mentioned this pull request Jul 17, 2018

Obscure remark about removing pos_label from weighted f1 score #11148

Closed

Uh oh!

[MRG+2] P/R/F: in future, average='binary' iff 2 labels in y one of which is pos_label #4192

[MRG+2] P/R/F: in future, average='binary' iff 2 labels in y one of which is pos_label #4192

Uh oh!

Conversation

jnothman commented Feb 1, 2015

Uh oh!

arjoly commented Feb 3, 2015

Uh oh!

arjoly commented Feb 3, 2015

Uh oh!

jnothman commented Feb 3, 2015

Uh oh!

jnothman commented Feb 4, 2015

Uh oh!

jnothman commented Feb 15, 2015

Uh oh!

amueller commented Feb 25, 2015

Uh oh!

amueller commented Mar 2, 2015

Uh oh!

amueller Mar 2, 2015

Choose a reason for hiding this comment

Uh oh!

jnothman Mar 2, 2015

Choose a reason for hiding this comment

Uh oh!

amueller Mar 2, 2015

Choose a reason for hiding this comment

Uh oh!

jnothman Mar 2, 2015

Choose a reason for hiding this comment

Uh oh!

amueller Mar 2, 2015

Choose a reason for hiding this comment

Uh oh!

jnothman commented Mar 2, 2015

Uh oh!

jnothman commented Mar 2, 2015

Uh oh!

amueller commented Mar 2, 2015

Uh oh!

amueller commented Mar 2, 2015

Uh oh!

ogrisel commented Mar 3, 2015

Uh oh!

jnothman commented Mar 3, 2015

Uh oh!

Uh oh!