[WIP] enhance `labels` and deprecate `pos_label` in PRF metrics #2610

jnothman · 2013-11-24T01:05:16Z

This intends to make the parameter labels clearly defined for the precision_recall_fscore_support family of metrics. This amounts to a (partial) fix for #1983, #1989, #2029, #2094, #3122. As implied by the comment I have committed, labels will:

determine the sort order when returning a result for each class
determine which labels will be included in an average (allowing one or more negative/ignored classes in multiclass classification)
by default, all labels are used, with no special handling of binary -the current special handling of binary classification is preserved, but pos_label is determined implicitly.-

There are some potential issues in the deprecation process, and in making binary classifier metrics available as Scorers...

This should not be merged until after the 0.15 release to allow at least one release with the warning merged from #2952

coveralls · 2013-11-24T20:05:21Z

Coverage remained the same when pulling a852787 on jnothman:deprecate_pos_label into f63e5fd on scikit-learn:master.

arjoly · 2013-11-25T07:08:51Z

Thanks for working on those issues !!!

jnothman · 2013-11-25T11:19:43Z

Thanks for working on those issues !!!

np! The fiddliness of pos_label and average has annoyed me for a long time.

But there is at least one problem. Consider the following parameters in the current code:

a binary classification dataset (e.g. y_true=[-1, 1, 1], y_pred=[1, 1, -1])
pos_label=1 by default
average='macro' (or at least not None)
labels=[-1, 1] is set explicitly

This was treated as the special binary case, i.e. the metric was returned only for the positive label (P,R,F=1/3). Because labels is set explicitly, this PR would want to return the macro-average over all labels (P,R,F=1/6). However, it would be unusual for labels to have been set explicitly, because it does not affect the binary special case.

I can think of the following deprecation options for this weird case:

delay this PR for a couple of versions, in the meantime warning that behaviour will change in the future
provide only the new functionality, with a warning that the behaviour has changed
provide the old functionality for a couple of versions, but it's not clear how to allow the user to override this behaviour
Change the labels parameter name in the new version, and deprecate the old parameter.

Also, I'm considering changing it to, by default, do the automatic handling of binary like it currently does (but with an inferred pos_label). The main benefit would be that we don't need to change the current scorers, but still has some problems with implicit behaviour.

coveralls · 2013-11-25T12:35:39Z

Coverage remained the same when pulling 31eafdd on jnothman:deprecate_pos_label into f63e5fd on scikit-learn:master.

jnothman · 2013-11-25T21:05:52Z

If someone has a moment to review this -- as an idea if not code -- I am very eager to hear comment on the deprecation edge-case I described in my previous comment.

I'm also interested in whether we should maintain special handling of binary targets as a default (not just during deprecation), even if it provides confusing implicit behaviour (e.g. #2094), for convenience.

arjoly · 2013-12-17T08:26:55Z

I don't have any strong opinion. My only fear is how to keep the behaviour simple to be compatible with binary and multi-class scorers in mind.

delay this PR for a couple of versions, in the meantime warning that behaviour will change in the future

I have seen numpy using this strategy.

jnothman · 2013-12-17T10:42:32Z

I've decided that to make things more explicit, and to enable some features
of this PR, we should provide scorers named 'binary_f1', 'micro_f1',
'macro_f1', 'weighted_f1'. I might throw together a PR shortly.

On Tue, Dec 17, 2013 at 7:26 PM, Arnaud Joly notifications@github.comwrote:

I don't have any strong opinion. My only fear is how to keep the
behaviour simple to be compatible with binary and multi-class scorershttp://scikit-learn.org/dev/modules/model_evaluation.html#common-cases-predefined-valuesin mind.

delay this PR for a couple of versions, in the meantime warning
that behaviour will change in the future

I have seen numpy using this strategy.

—
Reply to this email directly or view it on GitHubhttps://github.com//pull/2610#issuecomment-30733488
.

MAINT warn of future behaviour change proposed in #2610

…support

…labels

amueller · 2015-01-27T00:49:36Z

Will this fix #3123? Also: can I help?

jnothman · 2015-01-27T00:52:18Z

I'll try to get this to MRG soon. Should not be a big deal. I'm not sure it will intentionally fix #3123, but I can make a point of doing so so as not to make conflicts.

amueller · 2015-01-27T01:21:32Z

Thanks, let me know if you feel ready for review.

jnothman · 2015-02-01T12:07:39Z

@amueller, @arjoly and anyone else interested:

I've recalled that this was somewhat muddied by #2679 being altered to have an average='binary' mode. While that mode intends to raise an error with multiclass/label data in the future, the opposite is not true, and ultimately the number of distinct class labels in y_{true,pred} still switches the mode of operation (i.e. with pos_label is not None and average not None binary targets always get the measure of a single positive class), which I think is pretty bad. In the future, average=something-other-than-binary should do so strictly, without reference to the number of input classes.

I think that stopping special handling of binary data needs to happen before pos_label can be deprecated: I don't have a nice way of asking someone who otherwise provides binary targets, pos_label=None and average=macro (with or without a list of labels) to correct their code to make it future-compatible.

So I propose that, at least for now, we keep pos_label. This should disrupt fewer users, although leave the API still a bit messy in the near future.

At the same time, recall this is not just an API fix, but supports an additional (useful) case wherein labels can specify a strict subset of present labels, thus treating some as an ignored class.

Rather, this PR or its successor should:

deprecate towards banning pos_label=None when average='binary'
deprecate towards stopping special handling of binary data when average != 'binary', pos_label != None
extend the functionality of labels when average != 'binary'

Thus in the future, pos_label will be exclusively used when average='binary', and labels will be used otherwise.

(This has taken an alarming effort to reason through, so I hope I've come to correct conclusions! At some point soon I'll try to implement them, although I don't look forward to testing all these possibilities and their backwards-compatibilities.)

jnothman · 2015-02-01T12:22:30Z

That leads me to split this into two PRs: one that basically finishes up the work of #2679 to handle the converse case; the other to extend and fix the handling of labels in the non-binary case.

jnothman · 2015-02-01T12:23:57Z

(The other reason I abandoned this PR was because of the splitting of metrics.py which made the rebase a pain!)

jnothman · 2015-02-01T13:18:59Z

Closing this incarnation.

arjoly mentioned this pull request Dec 3, 2013

Added pos_label parameter to roc_auc_score function #2616

Closed

jnothman mentioned this pull request Dec 19, 2013

[MRG] replace 'f1' scorer by explicit variants #2676

Closed

amueller mentioned this pull request Dec 27, 2013

[MRG+2] Require explicit average arg for multiclass/label P/R/F metrics and scorers #2679

Merged

jnothman mentioned this pull request Jan 7, 2014

roc_auc_score fails if dtype is object #2723

Closed

jnothman added a commit to jnothman/scikit-learn that referenced this pull request Mar 9, 2014

MAINT warn of future behaviour change proposed in scikit-learn#2610

9e6e892

jnothman added a commit to jnothman/scikit-learn that referenced this pull request Mar 9, 2014

MAINT warn of future behaviour change proposed in scikit-learn#2610

63595d5

jnothman mentioned this pull request Mar 9, 2014

[MRG+2] MAINT warn of future behaviour change proposed in #2610 #2952

Merged

jnothman added a commit to jnothman/scikit-learn that referenced this pull request Mar 9, 2014

MAINT warn of future behaviour change proposed in scikit-learn#2610

28c8d0b

jnothman added this to the 0.16 milestone Mar 11, 2014

jnothman mentioned this pull request Mar 22, 2014

precision_recall_curve does not take the pos_label into account #2987

Closed

jnothman added a commit that referenced this pull request Apr 7, 2014

Merge pull request #2952 from jnothman/future_warn_two_labels

d954846

MAINT warn of future behaviour change proposed in #2610

maheshakya pushed a commit to maheshakya/scikit-learn that referenced this pull request Apr 15, 2014

MAINT warn of future behaviour change proposed in scikit-learn#2610

c27e75f

maheshakya pushed a commit to maheshakya/scikit-learn that referenced this pull request Apr 15, 2014

MAINT warn of future behaviour change proposed in scikit-learn#2610

d9d0e76

mbatchkarov mentioned this pull request Apr 29, 2014

Averaging of precision/recall/F1 inconsistent #3122

Closed

arjoly mentioned this pull request Apr 30, 2014

labels argument of classification_report is not useful when y is a list of strings #3123

Closed

jnothman added 5 commits July 17, 2014 20:58

DOC comment for new semantics of labels in precision_recall_fscore_…

314ded5

…support

ENH labels parameter to P/R/F now may limit, expand or order present …

c851034

…labels

Warnings related to deprecation of pos_label

88c6807

DOC make documentation of labels consistent

7bcbea1

FIX Rebase fixes

0deb8c9

jnothman mentioned this pull request Aug 1, 2014

[WIP] balanced accuracy score #3511

Closed

larsmans force-pushed the master branch from 58a55ad to 4b82379 Compare August 25, 2014 21:50

MechCoder force-pushed the master branch from 6deaea0 to 3f49cee Compare November 3, 2014 12:36

amueller mentioned this pull request Jan 22, 2015

[MRG + 1] Sort labels in precision_recall_fscore_support #4147

Merged

jnothman closed this Feb 1, 2015

jnothman mentioned this pull request Feb 1, 2015

[MRG+2] P/R/F: in future, average='binary' iff 2 labels in y one of which is pos_label #4192

Merged

jnothman mentioned this pull request Feb 24, 2015

[MRG+2] ENH labels parameter in P/R/F may extend or reduce label set #4287

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[WIP] enhance `labels` and deprecate `pos_label` in PRF metrics #2610

[WIP] enhance `labels` and deprecate `pos_label` in PRF metrics #2610

jnothman commented Nov 24, 2013

coveralls commented Nov 24, 2013

arjoly commented Nov 25, 2013

jnothman commented Nov 25, 2013

coveralls commented Nov 25, 2013

jnothman commented Nov 25, 2013

arjoly commented Dec 17, 2013

jnothman commented Dec 17, 2013

amueller commented Jan 27, 2015

jnothman commented Jan 27, 2015

amueller commented Jan 27, 2015

jnothman commented Feb 1, 2015

jnothman commented Feb 1, 2015

jnothman commented Feb 1, 2015

jnothman commented Feb 1, 2015

[WIP] enhance labels and deprecate pos_label in PRF metrics #2610

[WIP] enhance labels and deprecate pos_label in PRF metrics #2610

Conversation

jnothman commented Nov 24, 2013

coveralls commented Nov 24, 2013

arjoly commented Nov 25, 2013

jnothman commented Nov 25, 2013

coveralls commented Nov 25, 2013

jnothman commented Nov 25, 2013

arjoly commented Dec 17, 2013

jnothman commented Dec 17, 2013

amueller commented Jan 27, 2015

jnothman commented Jan 27, 2015

amueller commented Jan 27, 2015

jnothman commented Feb 1, 2015

jnothman commented Feb 1, 2015

jnothman commented Feb 1, 2015

jnothman commented Feb 1, 2015

[WIP] enhance `labels` and deprecate `pos_label` in PRF metrics #2610

[WIP] enhance `labels` and deprecate `pos_label` in PRF metrics #2610