[WIP] score function computing balanced accuracy #6752

xyguo · 2016-05-03T10:31:53Z

Reference Issue

This PR comes to address issue #6747, which suggests to implement an score function calculating the balanced accuracy.

What does this implement/fix? Explain your changes.

The balanced accuracy is actually an unweighted average of recall scores for each class. And the functionality is already provided by the sklearn.metrics.recall_score -- just pass the argument average='macro' (and pos_label=None for version before 0.18).

So the balanced_accuracy_score in this PR is a simple wrapper of the recall_score.

Any other comments?

I'm not sure if there should be an test case for this function since the corresponding scenario already tested for recall_score.

xyguo · 2016-05-03T11:24:54Z

According to the latest comment under issue #6747, the balanced accuracy should only be conducted for binary classification problem as well as multi-label problems.

Fixing the implementation.

jnothman · 2016-05-03T13:52:20Z

You don't necessary need to support multilabel initially. You do need to ensure this has:

narrative documentation in model_evaluation.rst
metrics common tests applied
unit tests, perhaps
a scorer

xyguo · 2016-05-04T16:09:25Z

@jnothman I see, by the way, I think it'd be better not to accept multilabel input, because this is essentially not an metric for multilabel problems. Maybe we should leave it to user.

jnothman · 2016-05-04T21:43:47Z

You may be right that it's not often reported for multilabel problems, but any metric applicable to binary problems is applicable to each label of a multilabel problem: a multilabel problem can be seen as multiple binary tasks. But as I said, we can leave multilabel support out for the moment.

…nt with others

xyguo · 2016-05-05T12:10:09Z

Now I've made a preliminary version of the metric function, with corresponding documentations, all tests passed.
Needs some review. Any comments are welcome. Thx.

jnothman · 2016-05-05T12:54:24Z

sklearn/metrics/classification.py

+    The balanced accuracy is used in binary classification problems to deal
+    with imbalanced datasets. It is defined as the arithmetic mean of sensitivity
+    (true positive rate) and specificity (true negative rate), or the average
+    accuracy obtained on either class.


Either use "recall" here or be explicit that it's "on either class's gold standard instances" or something.

xyguo · 2016-05-06T00:59:35Z

@jnothman I have updated the doc as you commented.

What's the next should I do now? (this is the first time I contribute code to an open source project >_<)

Currently I'm trying to extend it to multilabel problems. As your comments mentioned, this quantity is equivalent to roc_auc_score with binary inputs, which already handles multilabel inputs by setting an average parameter. So I find the most straightforward way is to simply wrap the roc_auc_score. And on my computer the implementation based on roc_auc_score is roughly 2x faster than that based on recall_score. But this have to import a function from ranking.py, which may not be desired. Any suggestions?

jnothman · 2016-05-06T02:44:58Z

Yes, I think wrapping roc_auc_score is the way to go... I don't think the module coupling is an issue in this case...

jnothman · 2016-05-08T10:56:30Z

doc/modules/model_evaluation.rst

+conventional accuracy (i.e., the number of correct predictions divided by the total
+number of predictions). In contrast, if the conventional accuracy is above chance only
+because the classifier takes advantage of an imbalanced test set, then the balanced
+accuracy, as appropriate, will drop to chance.


0.5 or 50% is clearer than 'chance'

jnothman · 2016-05-08T11:11:35Z

For the most part, this looks great. I'm not sure if more specific tests for balanced accuracy are needed, or whether the doctests + common tests suffice.

One problem with using roc_auc_score, I've realised, is that it won't already support a sparse matrix for y_pred in the multilabel cases.

xyguo · 2016-05-08T12:39:31Z

Yes, I've noticed the problem caused by sparse input.
Actually there're more problems than I expected when extending to multilabel setting, I need some suggestions for the following problems:

if y_true contains only one class, for example
```
y_true = np.array([1, 1, 1, 1])
y_pred = np.array([1, 1, 0, 0])
```
then recall_score(y_true, y_pred, average='macro', pos_label=None) will issue a warning and return 0.25, i.e., the recall on the absent class default to zero; but roc_auc_score would raise an exception. I'm not sure which behavior should balanced_accuracy follow. Personally I prefer the recall_score's way since most classification metrics seem to act like that.
To deal with multilabel inputs, I mimic other classification metrics and add an average parameter in the function definition. But in the tests/test_common.py, the case test_no_averaging_labels assumes that classification metrics with an average parameter also accept an labels parameter, while roc_auc_score doesn't support that.

Maybe I'll have to reimplement it from scratch. -_- ||

jnothman · 2016-05-08T13:59:44Z

y_true containing only one class is a pretty special case. In terms of giving meaningful numbers, I think it makes sense just to report the recall of that one class, but it's hard to justify directly from the definition of balanced accuracy. Note that your proposal means that if y_pred == y_true == np.ones(n) the score is 0.5.

The labels argument is only actually necessary for the multiclass case (and it's a bit weird that we support it for the multilabel case, but it hails back to when we had a different format for multilabel). In any case, you could certainly argue to skip from that test any of the METRIC_UNDEFINED_MULTICLASS.

xyguo · 2016-05-09T00:45:52Z

The labels argument may make sense if we are going to support multilabel setting: say you want the macro metric on all labels except some undesired ones, then the labels argument could help to exclude those unwanted labels (see the doc of recall_score for example).

May be we should also support multi-class problems, the definition of balanced accuracy generalizes to multi-class settings naturally (although it may not be so useful when number of classes exceeds two).

jnothman · 2016-05-09T01:33:35Z

How does it generalise to multiclass naturally? I don't think it's obvious.

I don't think the need to exclude labels is important for multilabel; it is important for multiclass which is why it is supported in recall_score (I would know; I proposed and implemented it).

jnothman · 2016-05-11T04:52:03Z

FYI, #5588 was an existing PR attempting this enhancement. I don't know why we didn't just continue on that one... but between these two PRs we should attempt some convergence...

xyguo · 2016-05-12T16:45:41Z

I have finished the support for multilabel, but several tests fails in test_common.py. For example, if the metric function accepts an average argument, then the case test_averaging_multiclass implicitly assume the metric can deal with multiclass problem, while my implementation would raise an exception...

And there are several other cases failed due to similar problems. I doubt maybe we should clarify the interface for different type of metrics ...

jnothman · 2016-05-14T14:12:38Z

So you need to get those tests to check for METRIC_UNDEFINED_MULTICLASS, no?

jnothman · 2016-06-17T01:44:21Z

@rhiever has convinced me over at #6747 that we should be indeed supporting multiclass as just a macro-average over binarised problems (i.e. the same as calculating multilabel balanced accuracy after LabelBinarizer).

xyguo · 2016-06-17T02:48:36Z

@jnothman got it, I will work on it soon

amueller · 2016-10-11T02:40:02Z

@xyguo are you still working on this?

xyguo · 2016-10-11T03:14:39Z

@amueller Yes. I have been writing my thesis and don't have much time for this project. I plan to resume it later this month.

dalmia · 2016-12-05T06:39:13Z

@xyguo Are you still working on this or can I take this up?

xyguo · 2016-12-05T07:33:28Z

@dalmia Please take this up, I'm just too busy to work on it recently. Thanks!

dalmia · 2016-12-05T07:58:42Z

Thanks @xyguo

…ikit-learn#6747

* add function computing balanced accuracy * documentation for the balanced_accuracy_score * apply common tests to balanced_accuracy_score * constrained to binary classification problems only * add balanced_accuracy_score for CLF test * add scorer for balanced_accuracy * reorder the place of importing balanced_accuracy_score to be consistent with others * eliminate an accidentally added non-ascii character * remove balanced_accuracy_score from METRICS_WITH_LABELS * eliminate all non-ascii charaters in the doc of balanced_accuracy_score * fix doctest for nonexistent scoring function * fix documentation, clarify linkages to recall and auc * FIX: added changes as per last review See #6752, fixes #6747 * FIX: fix typo * FIX: remove flake8 errors * DOC: merge fixes * DOC: remove unwanted files * DOC update what's new

lesteve · 2017-10-18T08:04:37Z

Closed by #8066.

* add function computing balanced accuracy * documentation for the balanced_accuracy_score * apply common tests to balanced_accuracy_score * constrained to binary classification problems only * add balanced_accuracy_score for CLF test * add scorer for balanced_accuracy * reorder the place of importing balanced_accuracy_score to be consistent with others * eliminate an accidentally added non-ascii character * remove balanced_accuracy_score from METRICS_WITH_LABELS * eliminate all non-ascii charaters in the doc of balanced_accuracy_score * fix doctest for nonexistent scoring function * fix documentation, clarify linkages to recall and auc * FIX: added changes as per last review See scikit-learn#6752, fixes scikit-learn#6747 * FIX: fix typo * FIX: remove flake8 errors * DOC: merge fixes * DOC: remove unwanted files * DOC update what's new

add function computing balanced accuracy

fad6759

xyguo added 10 commits May 5, 2016 10:50

documentation for the balanced_accuracy_score

5d84236

apply common tests to balanced_accuracy_score

717727b

constrained to binary classification problems only

49befec

add balanced_accuracy_score for CLF test

f60e311

add scorer for balanced_accuracy

1ce2ebc

reorder the place of importing balanced_accuracy_score to be consiste…

09c127e

…nt with others

eliminate an accidentally added non-ascii character

9b668e9

remove balanced_accuracy_score from METRICS_WITH_LABELS

0c5389b

eliminate all non-ascii charaters in the doc of balanced_accuracy_score

17b0d5b

fix doctest for nonexistent scoring function

34fa3e1

jnothman reviewed May 5, 2016
View reviewed changes

fix documentation, clarify linkages to recall and auc

ecb48b2

jnothman reviewed May 8, 2016
View reviewed changes

jnothman mentioned this pull request May 11, 2016

[MRG] Add balanced accuracy score in metrics #5588

Closed

dalmia added a commit to dalmia/scikit-learn that referenced this pull request Dec 16, 2016

FIX: added changes as per last review See scikit-learn#6752, fixes sc…

348b1ac

…ikit-learn#6747

dalmia mentioned this pull request Dec 16, 2016

[MRG+1] Adding support for balanced accuracy #8066

Merged

maskani-moh pushed a commit to maskani-moh/scikit-learn that referenced this pull request Oct 9, 2017

FIX: added changes as per last review See scikit-learn#6752, fixes sc…

0cd8507

…ikit-learn#6747

qinhanmin2014 mentioned this pull request Oct 18, 2017

Several fixed issues/PRs that might be closed #9948

Closed

lesteve closed this Oct 18, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[WIP] score function computing balanced accuracy #6752

[WIP] score function computing balanced accuracy #6752

xyguo commented May 3, 2016

xyguo commented May 3, 2016

jnothman commented May 3, 2016

xyguo commented May 4, 2016

jnothman commented May 4, 2016

xyguo commented May 5, 2016

jnothman May 5, 2016

xyguo commented May 6, 2016

jnothman commented May 6, 2016

jnothman May 8, 2016

jnothman commented May 8, 2016

xyguo commented May 8, 2016

jnothman commented May 8, 2016

xyguo commented May 9, 2016

jnothman commented May 9, 2016

jnothman commented May 11, 2016

xyguo commented May 12, 2016

jnothman commented May 14, 2016

jnothman commented Jun 17, 2016

xyguo commented Jun 17, 2016

amueller commented Oct 11, 2016

xyguo commented Oct 11, 2016

dalmia commented Dec 5, 2016

xyguo commented Dec 5, 2016

dalmia commented Dec 5, 2016

lesteve commented Oct 18, 2017

[WIP] score function computing balanced accuracy #6752

[WIP] score function computing balanced accuracy #6752

Conversation

xyguo commented May 3, 2016

Reference Issue

What does this implement/fix? Explain your changes.

Any other comments?

xyguo commented May 3, 2016

jnothman commented May 3, 2016

xyguo commented May 4, 2016

jnothman commented May 4, 2016

xyguo commented May 5, 2016

jnothman May 5, 2016

Choose a reason for hiding this comment

xyguo commented May 6, 2016

jnothman commented May 6, 2016

jnothman May 8, 2016

Choose a reason for hiding this comment

jnothman commented May 8, 2016

xyguo commented May 8, 2016

jnothman commented May 8, 2016

xyguo commented May 9, 2016

jnothman commented May 9, 2016

jnothman commented May 11, 2016

xyguo commented May 12, 2016

jnothman commented May 14, 2016

jnothman commented Jun 17, 2016

xyguo commented Jun 17, 2016

amueller commented Oct 11, 2016

xyguo commented Oct 11, 2016

dalmia commented Dec 5, 2016

xyguo commented Dec 5, 2016

dalmia commented Dec 5, 2016

lesteve commented Oct 18, 2017