-
-
Notifications
You must be signed in to change notification settings - Fork 25.8k
[WIP] score function computing balanced accuracy #6752
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
According to the latest comment under issue #6747, the balanced accuracy should only be conducted for binary classification problem as well as multi-label problems. Fixing the implementation. |
You don't necessary need to support multilabel initially. You do need to ensure this has:
|
@jnothman I see, by the way, I think it'd be better not to accept multilabel input, because this is essentially not an metric for multilabel problems. Maybe we should leave it to user. |
You may be right that it's not often reported for multilabel problems, but any metric applicable to binary problems is applicable to each label of a multilabel problem: a multilabel problem can be seen as multiple binary tasks. But as I said, we can leave multilabel support out for the moment. |
Now I've made a preliminary version of the metric function, with corresponding documentations, all tests passed. |
The balanced accuracy is used in binary classification problems to deal | ||
with imbalanced datasets. It is defined as the arithmetic mean of sensitivity | ||
(true positive rate) and specificity (true negative rate), or the average | ||
accuracy obtained on either class. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Either use "recall" here or be explicit that it's "on either class's gold standard instances" or something.
@jnothman I have updated the doc as you commented. What's the next should I do now? (this is the first time I contribute code to an open source project >_<) Currently I'm trying to extend it to multilabel problems. As your comments mentioned, this quantity is equivalent to |
Yes, I think wrapping |
conventional accuracy (i.e., the number of correct predictions divided by the total | ||
number of predictions). In contrast, if the conventional accuracy is above chance only | ||
because the classifier takes advantage of an imbalanced test set, then the balanced | ||
accuracy, as appropriate, will drop to chance. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
0.5 or 50% is clearer than 'chance'
For the most part, this looks great. I'm not sure if more specific tests for balanced accuracy are needed, or whether the doctests + common tests suffice. One problem with using |
Yes, I've noticed the problem caused by sparse input.
Maybe I'll have to reimplement it from scratch. -_- || |
The |
The May be we should also support multi-class problems, the definition of balanced accuracy generalizes to multi-class settings naturally (although it may not be so useful when number of classes exceeds two). |
How does it generalise to multiclass naturally? I don't think it's obvious. I don't think the need to exclude labels is important for multilabel; it is important for multiclass which is why it is supported in |
FYI, #5588 was an existing PR attempting this enhancement. I don't know why we didn't just continue on that one... but between these two PRs we should attempt some convergence... |
I have finished the support for multilabel, but several tests fails in And there are several other cases failed due to similar problems. I doubt maybe we should clarify the interface for different type of metrics ... |
So you need to get those tests to check for |
@jnothman got it, I will work on it soon |
@xyguo are you still working on this? |
@amueller Yes. I have been writing my thesis and don't have much time for this project. I plan to resume it later this month. |
@xyguo Are you still working on this or can I take this up? |
@dalmia Please take this up, I'm just too busy to work on it recently. Thanks! |
Thanks @xyguo |
* add function computing balanced accuracy * documentation for the balanced_accuracy_score * apply common tests to balanced_accuracy_score * constrained to binary classification problems only * add balanced_accuracy_score for CLF test * add scorer for balanced_accuracy * reorder the place of importing balanced_accuracy_score to be consistent with others * eliminate an accidentally added non-ascii character * remove balanced_accuracy_score from METRICS_WITH_LABELS * eliminate all non-ascii charaters in the doc of balanced_accuracy_score * fix doctest for nonexistent scoring function * fix documentation, clarify linkages to recall and auc * FIX: added changes as per last review See #6752, fixes #6747 * FIX: fix typo * FIX: remove flake8 errors * DOC: merge fixes * DOC: remove unwanted files * DOC update what's new
Closed by #8066. |
* add function computing balanced accuracy * documentation for the balanced_accuracy_score * apply common tests to balanced_accuracy_score * constrained to binary classification problems only * add balanced_accuracy_score for CLF test * add scorer for balanced_accuracy * reorder the place of importing balanced_accuracy_score to be consistent with others * eliminate an accidentally added non-ascii character * remove balanced_accuracy_score from METRICS_WITH_LABELS * eliminate all non-ascii charaters in the doc of balanced_accuracy_score * fix doctest for nonexistent scoring function * fix documentation, clarify linkages to recall and auc * FIX: added changes as per last review See scikit-learn#6752, fixes scikit-learn#6747 * FIX: fix typo * FIX: remove flake8 errors * DOC: merge fixes * DOC: remove unwanted files * DOC update what's new
* add function computing balanced accuracy * documentation for the balanced_accuracy_score * apply common tests to balanced_accuracy_score * constrained to binary classification problems only * add balanced_accuracy_score for CLF test * add scorer for balanced_accuracy * reorder the place of importing balanced_accuracy_score to be consistent with others * eliminate an accidentally added non-ascii character * remove balanced_accuracy_score from METRICS_WITH_LABELS * eliminate all non-ascii charaters in the doc of balanced_accuracy_score * fix doctest for nonexistent scoring function * fix documentation, clarify linkages to recall and auc * FIX: added changes as per last review See scikit-learn#6752, fixes scikit-learn#6747 * FIX: fix typo * FIX: remove flake8 errors * DOC: merge fixes * DOC: remove unwanted files * DOC update what's new
Reference Issue
This PR comes to address issue #6747, which suggests to implement an score function calculating the balanced accuracy.
What does this implement/fix? Explain your changes.
The balanced accuracy is actually an unweighted average of recall scores for each class. And the functionality is already provided by the
sklearn.metrics.recall_score
-- just pass the argumentaverage='macro'
(andpos_label=None
for version before 0.18).So the
balanced_accuracy_score
in this PR is a simple wrapper of therecall_score
.Any other comments?
I'm not sure if there should be an test case for this function since the corresponding scenario already tested for
recall_score
.