Skip to content

[WIP] enhance labels and deprecate pos_label in PRF metrics #2610

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 5 commits into from

Conversation

jnothman
Copy link
Member

This intends to make the parameter labels clearly defined for the precision_recall_fscore_support family of metrics. This amounts to a (partial) fix for #1983, #1989, #2029, #2094, #3122. As implied by the comment I have committed, labels will:

  • determine the sort order when returning a result for each class
  • determine which labels will be included in an average (allowing one or more negative/ignored classes in multiclass classification)
  • by default, all labels are used, with no special handling of binary -the current special handling of binary classification is preserved, but pos_label is determined implicitly.-

There are some potential issues in the deprecation process, and in making binary classifier metrics available as Scorers...

  • Define and document intended behaviour
  • Tests for new labels functionality
  • Implement new labels functionality
  • Ensure labels is set correctly for legacy functionality
  • Deprecation warnings for pos_label
  • Some solution for scorers, at least handling the binary case
  • Copy documentation to derived functions
  • Update narrative documentation
  • Update what's new
  • Open issue for similar behaviour in average_precision_score, roc_auc_score and any other binary-averaged metrics.

This should not be merged until after the 0.15 release to allow at least one release with the warning merged from #2952

@coveralls
Copy link

Coverage Status

Coverage remained the same when pulling a852787 on jnothman:deprecate_pos_label into f63e5fd on scikit-learn:master.

@arjoly
Copy link
Member

arjoly commented Nov 25, 2013

Thanks for working on those issues !!!

@jnothman
Copy link
Member Author

Thanks for working on those issues !!!

np! The fiddliness of pos_label and average has annoyed me for a long time.

But there is at least one problem. Consider the following parameters in the current code:

  • a binary classification dataset (e.g. y_true=[-1, 1, 1], y_pred=[1, 1, -1])
  • pos_label=1 by default
  • average='macro' (or at least not None)
  • labels=[-1, 1] is set explicitly

This was treated as the special binary case, i.e. the metric was returned only for the positive label (P,R,F=1/3). Because labels is set explicitly, this PR would want to return the macro-average over all labels (P,R,F=1/6). However, it would be unusual for labels to have been set explicitly, because it does not affect the binary special case.

I can think of the following deprecation options for this weird case:

  1. delay this PR for a couple of versions, in the meantime warning that behaviour will change in the future
  2. provide only the new functionality, with a warning that the behaviour has changed
  3. provide the old functionality for a couple of versions, but it's not clear how to allow the user to override this behaviour
  4. Change the labels parameter name in the new version, and deprecate the old parameter.

Also, I'm considering changing it to, by default, do the automatic handling of binary like it currently does (but with an inferred pos_label). The main benefit would be that we don't need to change the current scorers, but still has some problems with implicit behaviour.

@coveralls
Copy link

Coverage Status

Coverage remained the same when pulling 31eafdd on jnothman:deprecate_pos_label into f63e5fd on scikit-learn:master.

@jnothman
Copy link
Member Author

If someone has a moment to review this -- as an idea if not code -- I am very eager to hear comment on the deprecation edge-case I described in my previous comment.

I'm also interested in whether we should maintain special handling of binary targets as a default (not just during deprecation), even if it provides confusing implicit behaviour (e.g. #2094), for convenience.

@arjoly
Copy link
Member

arjoly commented Dec 17, 2013

I don't have any strong opinion. My only fear is how to keep the behaviour simple to be compatible with binary and multi-class scorers in mind.

  1. delay this PR for a couple of versions, in the meantime warning that behaviour will change in the future

I have seen numpy using this strategy.

@jnothman
Copy link
Member Author

I've decided that to make things more explicit, and to enable some features
of this PR, we should provide scorers named 'binary_f1', 'micro_f1',
'macro_f1', 'weighted_f1'. I might throw together a PR shortly.

On Tue, Dec 17, 2013 at 7:26 PM, Arnaud Joly notifications@github.comwrote:

I don't have any strong opinion. My only fear is how to keep the
behaviour simple to be compatible with binary and multi-class scorershttp://scikit-learn.org/dev/modules/model_evaluation.html#common-cases-predefined-valuesin mind.

  1. delay this PR for a couple of versions, in the meantime warning
    that behaviour will change in the future

    I have seen numpy using this strategy.


Reply to this email directly or view it on GitHubhttps://github.com//pull/2610#issuecomment-30733488
.

jnothman added a commit to jnothman/scikit-learn that referenced this pull request Mar 9, 2014
jnothman added a commit to jnothman/scikit-learn that referenced this pull request Mar 9, 2014
jnothman added a commit to jnothman/scikit-learn that referenced this pull request Mar 9, 2014
@jnothman jnothman added this to the 0.16 milestone Mar 11, 2014
jnothman added a commit that referenced this pull request Apr 7, 2014
MAINT warn of future behaviour change proposed in #2610
maheshakya pushed a commit to maheshakya/scikit-learn that referenced this pull request Apr 15, 2014
maheshakya pushed a commit to maheshakya/scikit-learn that referenced this pull request Apr 15, 2014
@amueller
Copy link
Member

Will this fix #3123? Also: can I help?

@jnothman
Copy link
Member Author

I'll try to get this to MRG soon. Should not be a big deal. I'm not sure it will intentionally fix #3123, but I can make a point of doing so so as not to make conflicts.

@amueller
Copy link
Member

Thanks, let me know if you feel ready for review.

@jnothman
Copy link
Member Author

jnothman commented Feb 1, 2015

@amueller, @arjoly and anyone else interested:

I've recalled that this was somewhat muddied by #2679 being altered to have an average='binary' mode. While that mode intends to raise an error with multiclass/label data in the future, the opposite is not true, and ultimately the number of distinct class labels in y_{true,pred} still switches the mode of operation (i.e. with pos_label is not None and average not None binary targets always get the measure of a single positive class), which I think is pretty bad. In the future, average=something-other-than-binary should do so strictly, without reference to the number of input classes.

I think that stopping special handling of binary data needs to happen before pos_label can be deprecated: I don't have a nice way of asking someone who otherwise provides binary targets, pos_label=None and average=macro (with or without a list of labels) to correct their code to make it future-compatible.

So I propose that, at least for now, we keep pos_label. This should disrupt fewer users, although leave the API still a bit messy in the near future.

At the same time, recall this is not just an API fix, but supports an additional (useful) case wherein labels can specify a strict subset of present labels, thus treating some as an ignored class.

Rather, this PR or its successor should:

  • deprecate towards banning pos_label=None when average='binary'
  • deprecate towards stopping special handling of binary data when average != 'binary', pos_label != None
  • extend the functionality of labels when average != 'binary'

Thus in the future, pos_label will be exclusively used when average='binary', and labels will be used otherwise.

(This has taken an alarming effort to reason through, so I hope I've come to correct conclusions! At some point soon I'll try to implement them, although I don't look forward to testing all these possibilities and their backwards-compatibilities.)

@jnothman
Copy link
Member Author

jnothman commented Feb 1, 2015

That leads me to split this into two PRs: one that basically finishes up the work of #2679 to handle the converse case; the other to extend and fix the handling of labels in the non-binary case.

@jnothman
Copy link
Member Author

jnothman commented Feb 1, 2015

(The other reason I abandoned this PR was because of the splitting of metrics.py which made the rebase a pain!)

@jnothman
Copy link
Member Author

jnothman commented Feb 1, 2015

Closing this incarnation.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants