Skip to content

ENH P/R/F should be able to ignore a majority class in the multiclass case #1983

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
jnothman opened this issue May 21, 2013 · 8 comments
Closed

Comments

@jnothman
Copy link
Member

P/R/F are famous for handling class imbalance in the binary classification case. Correct me if I'm wrong (@arjoly?), but imbalance against a majority negative class should also be handled in the multiclass case. In particular, while the documentation currently states that micro-averaged P = R = F, this is not true of the case where a negative class is ignored; but it should be possible to ignore a negative class for any of the average settings.

Indeed, I think the pos_label argument is a mistake (except in that you can more reliably provide a default value than for neg_label): it only applies to the binary case and overrides the average setting; neg_label would apply to all multiclass averaging methods.

It should be easy to implement: treat the problem as multilabel and delete the neg_label column from the label indicator matrix. I.e. it is the case where each instance is assigned 0 or 1 label.

The tricky part is the interface: should pos_label be deprecated? Deprecation makes sense as pos_label and neg_label should not be necessary together. But if so, how do we ensure the binary case works by default?

@jnothman
Copy link
Member Author

Say we add neg_label and deprecate pos_label. The default value of neg_label has to act like that of pos_label:

  • in non-binary classification it does nothing
  • in binary classification if average is not None it guesses and ignores the negative label
  • in binary classification if average is None, its value is ignored

and if neg_label is set to ?None, averaging is performed even in binary classification.

So it needs some special values like:

  • None: all labels are positive classes
  • 'auto': guesses the negative label even in multi-class classification
  • 'binary' (default): guesses the negative label in binary classification only if average is not None

WDYT?

@jnothman
Copy link
Member Author

Also, @arjoly, can we not assume as a rule that a label indicator matrix consists of 0s and 1s, or Falses and Trues? Do we really need to use pos_label there?

@arjoly
Copy link
Member

arjoly commented May 23, 2013

Also, @arjoly, can we not assume as a rule that a label indicator matrix consists of 0s and 1s, or Falses and Trues? Do we really need to use pos_label there?

This feature could be remove.

P/R/F are famous for handling class imbalance in the binary classification case. Correct me if I'm wrong (@arjoly?), but imbalance against a majority negative class should also be handled in the multiclass case.

Do you have some references about this?

@jnothman
Copy link
Member Author

This feature could be remove.

Even if we're not assured the others are zeros, I think pos_label should be always assumed to be 1. Certainly, the positive label indicator should not be confused with the positive class which is the meaning in precision_recall_fscore_support and other metrics.

Do you have some references about this?

I haven't gone looking for them, but for example, see the last comment at http://metaoptimize.com/qa/questions/8284/does-precision-equal-to-recall-for-micro-averaging which suggests the case where you "have some samples which are not classified to belong to any of known classes" in an otherwise single-label multiclass task. It's easy to come up with such classification tasks, such as classifying Wikipedia articles into non-overlapping named entity categories: the vast majority of articles are non-entities, and a micro-average or a macro-average would be a good measure of performance as long as that non-entity class is not taken into account.

There's nothing special about the binary case: it's just interpreted as if it's multilabel with a single class. We should also be able to treat multiclass as if it's zero-or-one label.

@arjoly
Copy link
Member

arjoly commented May 24, 2013

Sorry for the dumb questions. but what is a named entity category? And what is a non entity in this context?

http://metaoptimize.com/qa/questions/8284/does-precision-equal-to-recall-for-micro-averaging

The last comment simply states that when you have at least one sample with no true label and no predicted label in the multilabel case will have different micro-precision, micro-recall and micro-f1score.

hile the documentation currently states that micro-averaged P = R = F, this is not true of the case where a negative class is ignore

Before you rewrote the average description, they were a comment about this in the docstring
(see d33634d#sklearn).

but it should be possible to ignore a negative class for any of the average settings.

Quid if your classifier is not able to identificate the negative class?

I am not against the idea of implementing this functionality, but I wouldn't want to see this without some good references.

@jnothman
Copy link
Member Author

but what is a named entity category? And what is a non entity in this context?

I'm using terms from Named Entity Recognition. Basically, let's say we want to decide if a Wikipedia topic is a Person, Location or Organisation, and the rest is noise. That is a multiclass classification problem with a vast negative class. I have published work on an expanded version of this classification task, but do not explicitly describe my calculation of micro-F. I in fact report micro-F excluding multiple negative classes output by my classifier. Perhaps I should support that too.

they were a comment about this in the docstring (see d33634d#sklearn).

It said In multilabel classification, this is true only if every sample has a label. I didn't find this very precise language. For example, I consider the following to be a case where every sample has a label:

>>> import sklearn.metrics
>>> print(sklearn.metrics.precision_recall_fscore_support([[0, 1]], [[0]], average='micro'))
(1.0, 0.5, 0.667, None)

[Actually, that's not what it output at master. Apparently there's a bug in your implementation -- one I haven't yet investigated but we need to test -- which returns (0.0, 0.0, 0.0, 1). But the above is what it outputs in my rewrite, as it should because tp, fp, fn = 1, 1, 0.]

And often people wouldn't consider multiclass classification with a negative class as multilabel classification, just as binary classification isn't considered multilabel classification. Multilabel implies the system may output [0 .. n_labels] outputs per sample. These cases are {0, 1}.

Quid if your classifier is not able to identificate the negative class?

I don't know what this means.

I still haven't got my hands on any references. However, for the multiclass case that you speak of, where every sample is assigned one meaningful class, you get fn == fp. This is why P == R == F. But you also get tp + fn == n_samples in such a case, so P == R == F == accuracy, if I'm not mistaken. So why bother calculating it at all in such a case? And there is no doubt that considering one class as negative but still otherwise requiring at most one classification decision per sample corresponds to real-world tasks; and that in such a task micro P, R and F will differ...

@jnothman
Copy link
Member Author

(And the bug in your implementation is that you treat [0, 1]], [[0]] as a binary classification task, when it's multilabel...)

@jnothman
Copy link
Member Author

Fixed in #4287

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants