Skip to content

[MRG+2] P/R/F: in future, average='binary' iff 2 labels in y one of which is pos_label #4192

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Mar 3, 2015

Conversation

jnothman
Copy link
Member

@jnothman jnothman commented Feb 1, 2015

As per my comment elsewhere, this attempts to complete #2679 by implementing the converse in the precision-recall-fscore family: binary data should not be handled specially when average != 'binary'. This PR fixes a bug where pos_label=None -- which makes binary data not be handled specially -- was not being treated appropriately.

@jnothman jnothman changed the title TST/FIX in future, average='binary' iff 2 labels in y one of which is pos_label [MRG] P/R/F: in future, average='binary' iff 2 labels in y one of which is pos_label Feb 1, 2015
@jnothman jnothman added this to the 0.16 milestone Feb 1, 2015
@arjoly
Copy link
Member

arjoly commented Feb 3, 2015

LGTM!

@arjoly
Copy link
Member

arjoly commented Feb 3, 2015

Thanks @jnothman !

@jnothman
Copy link
Member Author

jnothman commented Feb 3, 2015

Thanks for the straightfoward review, @arjoly.

@jnothman
Copy link
Member Author

jnothman commented Feb 4, 2015

I know I've not been especially active of late, but I'd like to finish up the changes that replace #2610 before release. If you could take a look at this @amueller (or someone else), then I can move onto the next bit more easily.

@jnothman
Copy link
Member Author

This would love another review... It deserves to be released at the same time as #2679 and I can pursue the next piece of the puzzle if it's merged soon.

@arjoly arjoly changed the title [MRG] P/R/F: in future, average='binary' iff 2 labels in y one of which is pos_label [MRG+1] P/R/F: in future, average='binary' iff 2 labels in y one of which is pos_label Feb 15, 2015
@amueller
Copy link
Member

Sorry, I haven't really been following the "puzzle" closely enough to have a good understanding of what is happening. Trying to catch up.

@amueller
Copy link
Member

amueller commented Mar 2, 2015

So what will pos_label do if average!= "binary"? Raise an exception?
I realize that is tangential to this PR, but what was the reason for not having a default weighting scheme? Forcing the user to think? Usually we try to do "sensible defaults" which might be either "micro" or "macro".

assert_dep_warning = partial(assert_warns, DeprecationWarning)
for kwargs, my_assert in [({}, assert_no_warnings),
({'average': 'binary'}, assert_no_warnings),
({'average': 'micro'}, assert_dep_warning)]:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, maybe I'm a bit slow, but why is this supposed to give a deprecation warning?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the line before saying "this is deprecated for average != binary" is quite clear.

The point is that if I explicitly say "average='macro'", the score shouldn't automatically ignore one label, though that is the current behaviour when there are fewer than 3 labels exhibited.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah. So what would a user do now that wants average precision over two classes?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the moment, that functionality continues to be provided through pos_label=None....

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ahh... I was thrown off by pos_label=1 by default. Never mind then, all looks good :)

@jnothman
Copy link
Member Author

jnothman commented Mar 2, 2015

So what will pos_label do if average!= "binary"? Raise an exception?

Currently I don't think there's special handling of when the user provides pos_label and average != 'binary'. We could do that, but it's neater to leave pos_label=1 in the code than pos_label='unspecified'. WDYT?

I realize that is tangential to this PR, but what was the reason for not having a default weighting scheme? Forcing the user to think? Usually we try to do "sensible defaults" which might be either "micro" or "macro".

Ideally, that would be the case. Firstly the incumbent default was obscure, rather than sensible. I agree, macro could be justified, but for a function specific to multiclass problems, rather than sniffing the problem type and landing up with issues like #2094 (because the binary case discards one label). In any case I think it's a bad idea to encourage users to report "F1" for a multiclass problem when there are many meanings of that, and certainly if what they used is "weighted macro-average" that is rarely described in the literature merely because it was a default. So I think the sensible default is to work for binary problems.

Most users now will hopefully be accessing this through the scoring interface, where accuracy is a sensible default for classification, and where now they have their choice of basic averaging scheme for multiclass/multilabel P/R/F.

@jnothman
Copy link
Member Author

jnothman commented Mar 2, 2015

IIRC, you had written tests that weren't doing what they intended because you added or removed a class and that changed the function's behaviour entirely.

@amueller
Copy link
Member

amueller commented Mar 2, 2015

That is quite possible ;) And I'm all for explicitness. Apart from my other comment +1 for merge.

@amueller
Copy link
Member

amueller commented Mar 2, 2015

@ogrisel we might want that one in the beta, too ;) [and has +2]

@amueller amueller changed the title [MRG+1] P/R/F: in future, average='binary' iff 2 labels in y one of which is pos_label [MRG+2] P/R/F: in future, average='binary' iff 2 labels in y one of which is pos_label Mar 2, 2015
@ogrisel
Copy link
Member

ogrisel commented Mar 3, 2015

LGTM as well. Merging. Sorry for the slow response time.

ogrisel added a commit that referenced this pull request Mar 3, 2015
[MRG+2] P/R/F: in future, average='binary' iff 2 labels in y one of which is pos_label
@ogrisel ogrisel merged commit 6813528 into scikit-learn:master Mar 3, 2015
@jnothman
Copy link
Member Author

jnothman commented Mar 3, 2015

No problem! Thanks for merging.

On 3 March 2015 at 19:21, Olivier Grisel notifications@github.com wrote:

LGTM as well. Merging. Sorry for the slow response time.


Reply to this email directly or view it on GitHub
#4192 (comment)
.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants