-
-
Notifications
You must be signed in to change notification settings - Fork 25.8k
[MRG] Multi-class roc_auc_score #10481
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[MRG] Multi-class roc_auc_score #10481
Conversation
# Hand & Till (2001) implementation | ||
return _average_multiclass_ovo_score( | ||
_binary_roc_auc_score, y_true, y_score, average) | ||
elif multiclass == "ovr" and average == "weighted": |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it the best way to use the P&D definition?
Should we state in the docstring that if one want to use the P&D implementation, he/she should set the parameters multiclass == "ovr"
and average == "weighted"
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What happens if someone sets multiclass='ovr' and average='macro' right now?
You have test failures. |
TL;DR: one test fails because of The only failing test is What happens is that in the @jnothman any thoughts on how could I handle this case? |
Is it somehow related to this issue? |
sklearn/metrics/base.py
Outdated
pair_scores = np.empty(n_pairs) | ||
|
||
ix = 0 | ||
for a, b in itertools.combinations(range(n_classes), 2): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Consider using enumerate to avoid the manual ix and increment?
Have you tried just using dtype=None in that check_array call |
@maskani-moh any progress on this? Do you need my help? |
still erroring ;) (just lines too long) |
Flake issue ... fixing that now |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure I can review this all right now, but surely you should be modifying metrics/tests/test_common.py to run common tests on the multiclass variants?
y_score.shape[1] > 2): | ||
# validation of the input y_score | ||
if not np.allclose(1, y_score.sum(axis=1)): | ||
raise ValueError("Target scores should sum up to 1.0 for all" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
space missing between "all" and "samples"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We only need this for OvO, not for OvR, right?
# do not support partial ROC computation for multiclass | ||
if max_fpr is not None and max_fpr != 1.: | ||
raise ValueError("Partial AUC computation not available in " | ||
"multiclass setting. Parameter 'max_fpr' must be" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please be consistent within a string about whether white space appears at the end or start of a line
Hi ! |
common metric tests aren't currently testing this case, so no, it needs work
|
@jnothman I am happy to do some work on this PR. I just forked this branch and pulled master into it, now 4 common test cases fail on my side ( Can you please explain what you mean with common metric tests? When I remove |
We haven't changed the common metric tests to use pytest collection, so yes, it's hard to tell if tests are being run. The issue here, iirc, is that we don't currently have common metric tests designed for score-based multiclass metrics. |
So if I understand the todo for this PR correct, it should add common metric tests for score-based multi-class metrics? |
You should double check what code paths a multiclass roc_auc_score would go through in test_common.py (unfortunately) and make sure that it's actually covered. |
Really test_common should be better at telling us when something is not
fully tested.
|
if average == "weighted" else np.average(pair_scores)) | ||
|
||
|
||
def _average_multiclass_ovr_score(binary_metric, y_true, y_score, average): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is this not the same as _average_binary_score
?
fixed in #12789 |
Reference Issues/PRs
Fixes #7663
See also 3298
What does this implement/fix? Explain your changes.
This PR takes over the work initiated in PR #7663 and complements it given the comments on the same thread.
This PR incorporates ROC AUC computations as defined by:
It also:
np.allclose(1, y_score.sum(axis=1))
.Any other comments?
Due to rebase issues, I had to create a new PR from scratch including the work previously done.