-
-
Notifications
You must be signed in to change notification settings - Fork 25.8k
[MRG+2] P/R/F: in future, average='binary' iff 2 labels in y one of which is pos_label #4192
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
LGTM! |
Thanks @jnothman ! |
Thanks for the straightfoward review, @arjoly. |
This would love another review... It deserves to be released at the same time as #2679 and I can pursue the next piece of the puzzle if it's merged soon. |
Sorry, I haven't really been following the "puzzle" closely enough to have a good understanding of what is happening. Trying to catch up. |
So what will |
assert_dep_warning = partial(assert_warns, DeprecationWarning) | ||
for kwargs, my_assert in [({}, assert_no_warnings), | ||
({'average': 'binary'}, assert_no_warnings), | ||
({'average': 'micro'}, assert_dep_warning)]: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, maybe I'm a bit slow, but why is this supposed to give a deprecation warning?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the line before saying "this is deprecated for average != binary" is quite clear.
The point is that if I explicitly say "average='macro'", the score shouldn't automatically ignore one label, though that is the current behaviour when there are fewer than 3 labels exhibited.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah. So what would a user do now that wants average precision over two classes?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For the moment, that functionality continues to be provided through pos_label=None
....
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ahh... I was thrown off by pos_label=1
by default. Never mind then, all looks good :)
Currently I don't think there's special handling of when the user provides
Ideally, that would be the case. Firstly the incumbent default was obscure, rather than sensible. I agree, macro could be justified, but for a function specific to multiclass problems, rather than sniffing the problem type and landing up with issues like #2094 (because the binary case discards one label). In any case I think it's a bad idea to encourage users to report "F1" for a multiclass problem when there are many meanings of that, and certainly if what they used is "weighted macro-average" that is rarely described in the literature merely because it was a default. So I think the sensible default is to work for binary problems. Most users now will hopefully be accessing this through the scoring interface, where accuracy is a sensible default for classification, and where now they have their choice of basic averaging scheme for multiclass/multilabel P/R/F. |
IIRC, you had written tests that weren't doing what they intended because you added or removed a class and that changed the function's behaviour entirely. |
That is quite possible ;) And I'm all for explicitness. Apart from my other comment +1 for merge. |
@ogrisel we might want that one in the beta, too ;) [and has +2] |
LGTM as well. Merging. Sorry for the slow response time. |
[MRG+2] P/R/F: in future, average='binary' iff 2 labels in y one of which is pos_label
No problem! Thanks for merging. On 3 March 2015 at 19:21, Olivier Grisel notifications@github.com wrote:
|
As per my comment elsewhere, this attempts to complete #2679 by implementing the converse in the precision-recall-fscore family: binary data should not be handled specially when
average != 'binary'
. This PR fixes a bug wherepos_label=None
-- which makes binary data not be handled specially -- was not being treated appropriately.