[WIP] balanced accuracy score #3511

lazywei · 2014-07-31T03:07:15Z

As mentioned in #3506, I'm trying to implement balanced accuracy score.
Now it only supports binary classification.
Also, it is currently a prototype, which means I haven't worked on documentation and test. However, because this is the first time I contribute to scikit-learn, so I'd like to open the pull request first so that you can point out my mistake as early as possible. I also want to make sure my understanding to this issue is correct.

I'll keep working on this branch. Thanks.

lazywei · 2014-07-31T03:08:32Z

For edge cases, I consider to use the strategy described in this question and answer

jnothman · 2014-07-31T03:11:32Z

Thanks. We mark such pull requests WIP (work in progress).

arjoly · 2014-07-31T07:14:33Z

Whenever you think, that your work is ready to be reviewed. You can prefix your pull request title with the MRG tag (instead of WIP) and tell it to us in the pull request discussion.

MRG means ready to be reviewed.

lazywei · 2014-07-31T17:01:07Z

I think the binary part is ready to be reviewed, but we may need to discuss how to implement multi-class cases. So in such case should I change WIP to MRG (when there are still works need to be done)?

coveralls · 2014-07-31T17:18:01Z

Coverage decreased (-0.0%) when pulling e7160a5 on lazywei:balanced_accuracy_score into 1b2833a on scikit-learn:master.

coveralls · 2014-08-01T04:28:02Z

Coverage decreased (-0.0%) when pulling 80f1048 on lazywei:balanced_accuracy_score into 1b2833a on scikit-learn:master.

lazywei · 2014-08-01T04:44:09Z

As inspired by @larsmans in #3506 (comment), I introduced a new parameter neg_class to let user indicate which class should be regarded as negative, and treat others as positive.
Based on this, I think we can also support multi-lable cases by letting user specify such "negative class". For binary or multiclass, the neg_class can only be integer, while for multi-label, we can allow neg_class to be an indicator array, e.g. neg_class=[0, 1, 0, 1, 1]. Any thought?

coveralls · 2014-08-01T04:51:17Z

Coverage decreased (-0.0%) when pulling bcd5963 on lazywei:balanced_accuracy_score into 1b2833a on scikit-learn:master.

jnothman · 2014-08-01T05:05:46Z

It's fine to have only a binary implementation. Averaging is something there are functions for already.

But you do need tests before it's worthy of MRG. A test for sanity and tricky cases for this particular metric implementation, plus a presence in the common metrics tests.

neg_class has heretofore been represented in pos_label (see e.g. precision_recall_fscore_support), and I have proposed in #2610 to get rid of that parameter, in favour of making the include-ignore distinction a function of the labels parameter. In many cases neg_class (or neg_label) is easier to state, but in the case of P/R/F, labels also controls the ordering of the per-class results in a multiclass classification context. I would really like to see this feature be both expressive and consistent across metrics.

jnothman · 2014-08-01T05:08:26Z

It is also easier to provide a good default for pos_label (i.e. 1) than for neg_label which is frequently 0 or -1.

jnothman · 2014-08-01T05:10:24Z

For averaging, see _average_binary_score

lazywei · 2014-08-02T16:00:26Z

neg_class has heretofore been represented in pos_label (see e.g. precision_recall_fscore_support), and I have proposed in #2610 to get rid of that parameter, in favour of making the include-ignore distinction a function of the labels parameter

I could change from neg_class to pos_label. However, I don't really understand the label parameter. It should be an array of all possible labels? Why we need that?
Do I need to implement both pos_label and label parameters, or I should only implement the latter as we are going to deprecated pos_label?

jnothman · 2014-08-03T00:19:53Z

labels is used where average=None in precision_recall_fscore_support to
define an order for the array of per-label scores. Because our definition
of macro-average can include labels not seen in a particular sample, it
also enables those unseen labels to be accounted for.

I would like to see it also used to exclude one or more negative classes in
the multiclass case (like pos_label but with multiple positive labels),
i.e. to indicate the or all positive labels, such that the default setting
would be labels=1.

On 3 August 2014 02:00, Bert Chang notifications@github.com wrote:

neg_class has heretofore been represented in pos_label (see e.g.
precision_recall_fscore_support), and I have proposed in #2610
#2610 to get rid of
that parameter, in favour of making the include-ignore distinction a
function of the labels parameter

I could change from neg_class to pos_label. However, I don't really
understand the label parameter. It should be an array of all possible
labels? Why we need that?

—
Reply to this email directly or view it on GitHub
#3511 (comment)
.

lazywei · 2014-08-04T14:06:12Z

@jnothman Thanks. I replace the pos_label by labels, but I'm not sure if this is what you mean.
Please let me know if I misunderstands anything. Thanks.

arjoly · 2014-10-23T07:53:57Z

Can you add tests for the new metric to ensure the correctness (see sklearn/metrics/tests/test_classification.py)? Some tests shared by all metrics are already written in sklearn/metrics/tests/test_common.py that you can readily re-used with minimal effort as explained in the file.

You also need to add a narrative documentation in doc/modules/model_evaluation.rst to highlight your work and showsto everybody your useful contribution. It should give the appropriate definition in more mathematical term and also gives insights where you want to use the balanced_accuracy_score or not. Lastly, could you add an entry for balanced_accuracy_score in doc/modules/classses.rst?

arjoly · 2014-10-23T07:54:52Z

sklearn/metrics/classification.py

+
+        return score
+
+    return _average_binary_score(


You don't need to use this function, especially if we don't add support to multi-class and multi-label classification.

So we only need to support binary, right? That should make things much simpler.
I'll remove all multilabel or multiclass related code then.
However, even if we only support binary case, shouldn't using this _average_binary_score makes things easier? I mean, we need to handle case:

y_true is a 1-by-n array

y_true is a k-by-n ndarray
_average_binary_score seems handle this problem gracefully .

Thanks.

That's supporting binary and multilabel, which I can easily motivate in terms of this metric. I don't know about multiclass; is there a reference paper describing balanced accuracy in a multiclass context?

arjoly · 2015-10-24T21:31:28Z

closed in favor of #5588

Prototype for balanced accuracy score.

de92e87

jnothman changed the title ~~Prototype for balanced accuracy score.~~ [WIP] balanced accuracy score Jul 31, 2014

Refine balanced_accuracy_score on 2d array

e7160a5

Change to compatible map() for python3

80f1048

Support multiclass by introducing neg_class

bcd5963

Use _average_binary_score, change neg_class to pos_label

d83d22d

Introduce labels parameter

4f74794

larsmans force-pushed the master branch from 58a55ad to 4b82379 Compare August 25, 2014 21:50

lazywei mentioned this pull request Oct 22, 2014

Add balanced_accuracy_score metrics #3506

Closed

arjoly reviewed Oct 23, 2014
View reviewed changes

MechCoder force-pushed the master branch from 6deaea0 to 3f49cee Compare November 3, 2014 12:36

ppuggioni mentioned this pull request Dec 2, 2014

[WIP] add balanced_accuracy_score metric #3506 #3929

Closed

xuewei4d mentioned this pull request Feb 28, 2015

[WIP] implement balanced_accuracy_score #4300

Closed

TTRh mentioned this pull request Oct 24, 2015

[MRG] Add balanced accuracy score in metrics #5588

Closed

arjoly closed this Oct 24, 2015

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[WIP] balanced accuracy score #3511

[WIP] balanced accuracy score #3511

lazywei commented Jul 31, 2014

lazywei commented Jul 31, 2014

jnothman commented Jul 31, 2014

arjoly commented Jul 31, 2014

lazywei commented Jul 31, 2014

coveralls commented Jul 31, 2014

coveralls commented Aug 1, 2014

lazywei commented Aug 1, 2014

coveralls commented Aug 1, 2014

jnothman commented Aug 1, 2014

jnothman commented Aug 1, 2014

jnothman commented Aug 1, 2014

lazywei commented Aug 2, 2014

jnothman commented Aug 3, 2014

lazywei commented Aug 4, 2014

arjoly commented Oct 23, 2014

arjoly Oct 23, 2014

lazywei Dec 7, 2014

jnothman Dec 8, 2014

arjoly commented Oct 24, 2015

[WIP] balanced accuracy score #3511

[WIP] balanced accuracy score #3511

Conversation

lazywei commented Jul 31, 2014

lazywei commented Jul 31, 2014

jnothman commented Jul 31, 2014

arjoly commented Jul 31, 2014

lazywei commented Jul 31, 2014

coveralls commented Jul 31, 2014

coveralls commented Aug 1, 2014

lazywei commented Aug 1, 2014

coveralls commented Aug 1, 2014

jnothman commented Aug 1, 2014

jnothman commented Aug 1, 2014

jnothman commented Aug 1, 2014

lazywei commented Aug 2, 2014

jnothman commented Aug 3, 2014

lazywei commented Aug 4, 2014

arjoly commented Oct 23, 2014

arjoly Oct 23, 2014

Choose a reason for hiding this comment

lazywei Dec 7, 2014

Choose a reason for hiding this comment

jnothman Dec 8, 2014

Choose a reason for hiding this comment

arjoly commented Oct 24, 2015