-
-
Notifications
You must be signed in to change notification settings - Fork 25.8k
[MRG+2] Require explicit average arg for multiclass/label P/R/F metrics and scorers #2679
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
I think it is a bit weird that the |
Can you briefly explain why this change is necessary after #2610 is merged? |
Thanks for looking at this, Andy. Responses:
|
@arjoly, I'd like it if you could review or comment on this at some point. |
Do you plan to move the averaging keyword to the third argument? Do you want to remove the default value or set the default value to None? |
It is essential that we stop the default being 'weighted', and anything I don't mind changing average to None by default... but I don't think that On Fri, Jan 3, 2014 at 1:00 AM, Arnaud Joly notifications@github.comwrote:
|
looks good to merge ! |
Thanks for the review, @arjoly |
@@ -20,7 +20,7 @@ | |||
|
|||
|
|||
METRICS = { | |||
'f1': f1_score, | |||
'f1': partial(f1_score, average='micro'), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that the docs (the part that describes the different scoring options http://scikit-learn.org/dev/modules/model_evaluation.html#common-cases-predefined-values ) should be updated to stress this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I should just merge this PR with #2676 which focuses on that
specifically. It doesn't entirely make sense without it.
On 19 January 2014 02:22, Gael Varoquaux notifications@github.com wrote:
In benchmarks/bench_multilabel_metrics.py:
@@ -20,7 +20,7 @@
METRICS = {
- 'f1': f1_score,
- 'f1': partial(f1_score, average='micro'),
I think that the docs (the part that describes the different scoring
options
http://scikit-learn.org/dev/modules/model_evaluation.html#common-cases-predefined-values) should be updated to stress this.—
Reply to this email directly or view it on GitHubhttps://github.com//pull/2679/files#r8988047
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note to self (and other reviewer) this merge has been done.
I've rebased this on #2676, so that both the metrics and scorers are explicit. |
And that rebase means @arjoly's LGTM no longer applies. If you'd like to review the whole PR, Arnaud, that would be nice ;) |
Is there a need to make There are also several new constants such as It would be nice to add an |
"on your data; '{0}_macro', '{0}_micro' and '{0}_samples' provide " | ||
"alternative multiclass/multilabel averaging.") | ||
for name, metric in [('precision', precision_score), | ||
('recall', recall_score), ('f1', f1_score)]: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
pep8?
Yes, IMO. If someone wrote their own CV utility, they should be using I should note that
I agree that there's a lot of unnecessary mess in the module namespace, but I think that's out of this PR's scope. |
Rebased and addressed @arjoly's comments. |
@@ -185,6 +185,17 @@ API changes summary | |||
of length greater than one. | |||
By `Manoj Kumar`_. | |||
|
|||
- `scoring` parameter for cross validatiokn now accepts `'f1_binary'`, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
validation
LGTM |
Thanks for the feedback! I hope overall that it's the right thing to be On 21 January 2014 21:05, Arnaud Joly notifications@github.com wrote:
|
2672e68
to
c628ac8
Compare
Rebased again. I'd really like to see this merged, finally. @mblondel, you expressed distaste in the default |
Thanks you! |
'well as `average` will average over those ' | ||
'labels. For now, please use `labels=None` with ' | ||
'`pos_label` to evaluate precision, recall and ' | ||
'well as `average!=\'binary\'` will average over ' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I find that it is more elegant to simply use a double tick (") when the string is defined with simple ticks, and vice versa, rather than protecting the ticks.
Not that it really matters, so don't change anything.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Even I think the same way ;)
Looks good to me! 👍 for merge. Thanks a lot. |
I can have a look at this tomorrow., (if it hasn't been merged by then already) |
Thanks Gaël! Now get back to your research proposals! ;) On 8 December 2014 at 04:21, Manoj Kumar notifications@github.com wrote:
|
the set of classes, each of which may be useful in some scenario. | ||
Where available, you should select among these using the ``average`` parameter. | ||
|
||
* ``"macro"`` simply calculates calculates the mean of the binary metrics, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why does it have to calculate twice? ;)
@jnothman I'm still at a point where I learn more from Pull Requests than Pull Requests learn from me, so sorry if my comments caused more pain (in explaining). than pleasure. ;) |
c628ac8
to
83d9e4f
Compare
There was no problem with the comments. Thanks for catching some small On 9 December 2014 at 01:36, Manoj Kumar notifications@github.com wrote:
|
83d9e4f
to
f0d2881
Compare
I'll merge after travis confirms that I haven't done anything silly. |
…ault Scorers for different average parameters have been added.
f0d2881
to
081a554
Compare
[MRG] Require explicit average arg for multiclass/label P/R/F metrics and scorers
Thanks @jnothman ! |
Sorry I am at a conference and didn't have time to review. Glad that this is finally in. Regarding 'weighted', my main concern was just that we should really try to avoid using non-standard stuff as default. On the other hand, there is the question of backward compatibility and it's hard to tell which would be a better default between micro and macro averaging. When the default is used (weighted), we could potentially raise a warning and tell the user to explicitly specify the averaging option. This way, users will be able to correctly report the results they got when writing a paper. |
In order to avoid problems like #2094, and to avoid people unwittingly reporting weighted average, this goes towards making 'average' a required parameter for multiclass/multilabel precision, recall, f-score. Closely related to #2676.
After a deprecation cycle, we can turn the warning into an error, or make macro/micro default.
This PR also shards the builtin scorers to make the averaging explicit. This avoids users getting binary behaviour when they shouldn't (cf. #2094 where
scoring
isn't used). I think this is extra important because "weighted" F1 isn't especially common in the literature, and having people report it without realising that's what it is is unhelpful to the applied ML community. This helps, IMO, towards a more explicit and robust API for binary classification metrics (cf. #2610).It also entails a deprecation procedure for scorers, and more API there: public
get_scorer
andlist_scorers