Skip to content

[WIP] balanced accuracy score #3511

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 6 commits into from

Conversation

lazywei
Copy link
Contributor

@lazywei lazywei commented Jul 31, 2014

As mentioned in #3506, I'm trying to implement balanced accuracy score.
Now it only supports binary classification.
Also, it is currently a prototype, which means I haven't worked on documentation and test. However, because this is the first time I contribute to scikit-learn, so I'd like to open the pull request first so that you can point out my mistake as early as possible. I also want to make sure my understanding to this issue is correct.

I'll keep working on this branch. Thanks.

@lazywei
Copy link
Contributor Author

lazywei commented Jul 31, 2014

For edge cases, I consider to use the strategy described in this question and answer

@jnothman
Copy link
Member

Thanks. We mark such pull requests WIP (work in progress).

@jnothman jnothman changed the title Prototype for balanced accuracy score. [WIP] balanced accuracy score Jul 31, 2014
@arjoly
Copy link
Member

arjoly commented Jul 31, 2014

Whenever you think, that your work is ready to be reviewed. You can prefix your pull request title with the MRG tag (instead of WIP) and tell it to us in the pull request discussion.

MRG means ready to be reviewed.

@lazywei
Copy link
Contributor Author

lazywei commented Jul 31, 2014

I think the binary part is ready to be reviewed, but we may need to discuss how to implement multi-class cases. So in such case should I change WIP to MRG (when there are still works need to be done)?

@coveralls
Copy link

Coverage Status

Coverage decreased (-0.0%) when pulling e7160a5 on lazywei:balanced_accuracy_score into 1b2833a on scikit-learn:master.

@coveralls
Copy link

Coverage Status

Coverage decreased (-0.0%) when pulling 80f1048 on lazywei:balanced_accuracy_score into 1b2833a on scikit-learn:master.

@lazywei
Copy link
Contributor Author

lazywei commented Aug 1, 2014

As inspired by @larsmans in #3506 (comment), I introduced a new parameter neg_class to let user indicate which class should be regarded as negative, and treat others as positive.
Based on this, I think we can also support multi-lable cases by letting user specify such "negative class". For binary or multiclass, the neg_class can only be integer, while for multi-label, we can allow neg_class to be an indicator array, e.g. neg_class=[0, 1, 0, 1, 1]. Any thought?

@coveralls
Copy link

Coverage Status

Coverage decreased (-0.0%) when pulling bcd5963 on lazywei:balanced_accuracy_score into 1b2833a on scikit-learn:master.

@jnothman
Copy link
Member

jnothman commented Aug 1, 2014

It's fine to have only a binary implementation. Averaging is something there are functions for already.

But you do need tests before it's worthy of MRG. A test for sanity and tricky cases for this particular metric implementation, plus a presence in the common metrics tests.

neg_class has heretofore been represented in pos_label (see e.g. precision_recall_fscore_support), and I have proposed in #2610 to get rid of that parameter, in favour of making the include-ignore distinction a function of the labels parameter. In many cases neg_class (or neg_label) is easier to state, but in the case of P/R/F, labels also controls the ordering of the per-class results in a multiclass classification context. I would really like to see this feature be both expressive and consistent across metrics.

@jnothman
Copy link
Member

jnothman commented Aug 1, 2014

It is also easier to provide a good default for pos_label (i.e. 1) than for neg_label which is frequently 0 or -1.

@jnothman
Copy link
Member

jnothman commented Aug 1, 2014

For averaging, see _average_binary_score

@lazywei
Copy link
Contributor Author

lazywei commented Aug 2, 2014

neg_class has heretofore been represented in pos_label (see e.g. precision_recall_fscore_support), and I have proposed in #2610 to get rid of that parameter, in favour of making the include-ignore distinction a function of the labels parameter

I could change from neg_class to pos_label. However, I don't really understand the label parameter. It should be an array of all possible labels? Why we need that?
Do I need to implement both pos_label and label parameters, or I should only implement the latter as we are going to deprecated pos_label?

@jnothman
Copy link
Member

jnothman commented Aug 3, 2014

labels is used where average=None in precision_recall_fscore_support to
define an order for the array of per-label scores. Because our definition
of macro-average can include labels not seen in a particular sample, it
also enables those unseen labels to be accounted for.

I would like to see it also used to exclude one or more negative classes in
the multiclass case (like pos_label but with multiple positive labels),
i.e. to indicate the or all positive labels, such that the default setting
would be labels=1.

On 3 August 2014 02:00, Bert Chang notifications@github.com wrote:

neg_class has heretofore been represented in pos_label (see e.g.
precision_recall_fscore_support), and I have proposed in #2610
#2610 to get rid of
that parameter, in favour of making the include-ignore distinction a
function of the labels parameter

I could change from neg_class to pos_label. However, I don't really
understand the label parameter. It should be an array of all possible
labels? Why we need that?


Reply to this email directly or view it on GitHub
#3511 (comment)
.

@lazywei
Copy link
Contributor Author

lazywei commented Aug 4, 2014

@jnothman Thanks. I replace the pos_label by labels, but I'm not sure if this is what you mean.
Please let me know if I misunderstands anything. Thanks.

@arjoly
Copy link
Member

arjoly commented Oct 23, 2014

Can you add tests for the new metric to ensure the correctness (see sklearn/metrics/tests/test_classification.py)? Some tests shared by all metrics are already written in sklearn/metrics/tests/test_common.py that you can readily re-used with minimal effort as explained in the file.

You also need to add a narrative documentation in doc/modules/model_evaluation.rst to highlight your work and showsto everybody your useful contribution. It should give the appropriate definition in more mathematical term and also gives insights where you want to use the balanced_accuracy_score or not. Lastly, could you add an entry for balanced_accuracy_score in doc/modules/classses.rst?


return score

return _average_binary_score(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You don't need to use this function, especially if we don't add support to multi-class and multi-label classification.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So we only need to support binary, right? That should make things much simpler.
I'll remove all multilabel or multiclass related code then.
However, even if we only support binary case, shouldn't using this _average_binary_score makes things easier? I mean, we need to handle case:

  • y_true is a 1-by-n array
  • y_true is a k-by-n ndarray
    _average_binary_score seems handle this problem gracefully .

Thanks.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's supporting binary and multilabel, which I can easily motivate in terms of this metric. I don't know about multiclass; is there a reference paper describing balanced accuracy in a multiclass context?

@arjoly
Copy link
Member

arjoly commented Oct 24, 2015

closed in favor of #5588

@arjoly arjoly closed this Oct 24, 2015
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants