Skip to content

[MRG] Multi-class roc_auc_score #10481

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed

Conversation

maskani-moh
Copy link
Contributor

Reference Issues/PRs

Fixes #7663
See also 3298

What does this implement/fix? Explain your changes.

This PR takes over the work initiated in PR #7663 and complements it given the comments on the same thread.

This PR incorporates ROC AUC computations as defined by:

  • Hand & Till (2001), one vs one
  • Provost & Domingo (2000), one vs rest

It also:

  • Does the tests for: OvO, OvR and invariance under permutation.
  • Validates the input in the multiclass case: np.allclose(1, y_score.sum(axis=1)).

Any other comments?

Due to rebase issues, I had to create a new PR from scratch including the work previously done.

# Hand & Till (2001) implementation
return _average_multiclass_ovo_score(
_binary_roc_auc_score, y_true, y_score, average)
elif multiclass == "ovr" and average == "weighted":
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it the best way to use the P&D definition?
Should we state in the docstring that if one want to use the P&D implementation, he/she should set the parameters multiclass == "ovr" and average == "weighted"?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What happens if someone sets multiclass='ovr' and average='macro' right now?

@jnothman
Copy link
Member

You have test failures.

@maskani-moh
Copy link
Contributor Author

TL;DR: one test fails because of check_array where dtype='object'. How to handle this case?


The only failing test is test_invariance_string_vs_numbers_labels (under metrics/tests/test_common.py) because of measure_with_strobj = metric(y1_str.astype('O'), y2) (here)

What happens is that in the roc_auc_score function (ranking.py) I check the array y_true = check_array(y_true, [ensure_2d=False)(here) which fails when the type of the elements in the array is object (while it does not for other types str, float, ...). The line in question in check_array is this one.

@jnothman any thoughts on how could I handle this case?

@maskani-moh
Copy link
Contributor Author

Is it somehow related to this issue?

pair_scores = np.empty(n_pairs)

ix = 0
for a, b in itertools.combinations(range(n_classes), 2):
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider using enumerate to avoid the manual ix and increment?

@jnothman
Copy link
Member

Have you tried just using dtype=None in that check_array call

@amueller
Copy link
Member

@maskani-moh any progress on this? Do you need my help?

@amueller
Copy link
Member

amueller commented Mar 26, 2018

still erroring ;) (just lines too long)

@maskani-moh
Copy link
Contributor Author

Flake issue ... fixing that now

Copy link
Member

@jnothman jnothman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure I can review this all right now, but surely you should be modifying metrics/tests/test_common.py to run common tests on the multiclass variants?

y_score.shape[1] > 2):
# validation of the input y_score
if not np.allclose(1, y_score.sum(axis=1)):
raise ValueError("Target scores should sum up to 1.0 for all"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

space missing between "all" and "samples"

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We only need this for OvO, not for OvR, right?

# do not support partial ROC computation for multiclass
if max_fpr is not None and max_fpr != 1.:
raise ValueError("Partial AUC computation not available in "
"multiclass setting. Parameter 'max_fpr' must be"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please be consistent within a string about whether white space appears at the end or start of a line

@GTimothy
Copy link

Hi !
Would love to see multi-class ROC AUC capability !
I think that @maskani-moh has addressed the change requests on his branch, am I wrong?
(thanks for this great library!)

@jnothman
Copy link
Member

jnothman commented May 15, 2018 via email

@janvanrijn
Copy link
Contributor

@jnothman I am happy to do some work on this PR.

I just forked this branch and pulled master into it, now 4 common test cases fail on my side (test_root_import_all_completeness, test_non_meta_estimators[GaussianProcess-GaussianProcess-check_fit2d_1sample], test_non_meta_estimators[GaussianProcess-GaussianProcess-check_supervised_y_2d], test_non_meta_estimators[GaussianProcess-GaussianProcess-check_estimators_overwrite_params]). This is unexpected behavior, right? (I conclude this because both the unit tests in this branch and the unit tests in master seem to pass.. I'll spare you further details and stacktraces)

Can you please explain what you mean with common metric tests? When I remove roc_auc_score from METRIC_UNDEFINED_MULTICLASS, no additional tests are invoked (i.e., number of tests is still 5158). Is there somewhere some documentation that I could read into?

@jnothman
Copy link
Member

We haven't changed the common metric tests to use pytest collection, so yes, it's hard to tell if tests are being run. The issue here, iirc, is that we don't currently have common metric tests designed for score-based multiclass metrics.

@janvanrijn
Copy link
Contributor

So if I understand the todo for this PR correct, it should add common metric tests for score-based multi-class metrics?

@jnothman
Copy link
Member

You should double check what code paths a multiclass roc_auc_score would go through in test_common.py (unfortunately) and make sure that it's actually covered.

@jnothman
Copy link
Member

jnothman commented Jul 29, 2018 via email

if average == "weighted" else np.average(pair_scores))


def _average_multiclass_ovr_score(binary_metric, y_true, y_score, average):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this not the same as _average_binary_score?

@amueller
Copy link
Member

fixed in #12789

@amueller amueller closed this Jul 17, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants