-
-
Notifications
You must be signed in to change notification settings - Fork 25.8k
[WIP] balanced accuracy score #3511
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
For edge cases, I consider to use the strategy described in this question and answer |
Thanks. We mark such pull requests WIP (work in progress). |
Whenever you think, that your work is ready to be reviewed. You can prefix your pull request title with the MRG tag (instead of WIP) and tell it to us in the pull request discussion. MRG means ready to be reviewed. |
I think the binary part is ready to be reviewed, but we may need to discuss how to implement multi-class cases. So in such case should I change WIP to MRG (when there are still works need to be done)? |
As inspired by @larsmans in #3506 (comment), I introduced a new parameter |
It's fine to have only a binary implementation. Averaging is something there are functions for already. But you do need tests before it's worthy of MRG. A test for sanity and tricky cases for this particular metric implementation, plus a presence in the common metrics tests.
|
It is also easier to provide a good default for |
For averaging, see |
I could change from |
labels is used where average=None in precision_recall_fscore_support to I would like to see it also used to exclude one or more negative classes in On 3 August 2014 02:00, Bert Chang notifications@github.com wrote:
|
@jnothman Thanks. I replace the |
Can you add tests for the new metric to ensure the correctness (see You also need to add a narrative documentation in |
|
||
return score | ||
|
||
return _average_binary_score( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You don't need to use this function, especially if we don't add support to multi-class and multi-label classification.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So we only need to support binary, right? That should make things much simpler.
I'll remove all multilabel or multiclass related code then.
However, even if we only support binary case, shouldn't using this _average_binary_score
makes things easier? I mean, we need to handle case:
y_true
is a 1-by-n arrayy_true
is a k-by-n ndarray
_average_binary_score
seems handle this problem gracefully .
Thanks.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's supporting binary and multilabel, which I can easily motivate in terms of this metric. I don't know about multiclass; is there a reference paper describing balanced accuracy in a multiclass context?
closed in favor of #5588 |
As mentioned in #3506, I'm trying to implement balanced accuracy score.
Now it only supports binary classification.
Also, it is currently a prototype, which means I haven't worked on documentation and test. However, because this is the first time I contribute to scikit-learn, so I'd like to open the pull request first so that you can point out my mistake as early as possible. I also want to make sure my understanding to this issue is correct.
I'll keep working on this branch. Thanks.