Skip to content

[MRG] Adds multiclass ROC AUC #12789

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 123 commits into from
Jul 17, 2019

Conversation

thomasjpfan
Copy link
Member

@thomasjpfan thomasjpfan commented Dec 14, 2018

Reference Issues/PRs

Resolves #3298
Continues #7663, #10481, #12311

Adds common test for multiclass label permutations, addressing some of #12309

What does this implement/fix? Explain your changes.

  • The One-vs-One implementation is 100% based on Hand and Till: https://link.springer.com/article/10.1023/A:1010920819831. The weighting by prevalence does not seem to be in literature, but it looks like a simple extension of Hand and Till.

  • The One-vs-Rest implementation is functionally the same as the multi-label case.

  • The plot_roc.py example has been refactored to bring the descriptions closer to the code, allowing a user to read through it without losing too much context.

  • y_score in test_multiclass_sample_weight_invariance was normalized, because roc_auc_score checks for this condition.

Any other comments?

Currently, roc_auc_score does not support sample_weight with multiclass="ovo". This comes from the fact that the binary_metric calls in _average_multiclass_ovo_score become dependent on each other when the masked sample weights are passed to binary_metric.

maskani-moh and others added 30 commits January 16, 2018 11:23
Copy link
Member

@jnothman jnothman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Apart from that API decision, I'm quite happy with where this, especially the example, has come, and would be happy to Approve.

should be either equal to ``None`` or ``1.0`` as AUC ROC partial
computation currently is not supported for multiclass.

multiclass : string, 'ovr' or 'ovo', optional(default='ovr')
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
multiclass : string, 'ovr' or 'ovo', optional(default='ovr')
multiclass : string, 'ovr' or 'ovo', optional (default='ovr')

I do wonder if we should make the user intentionally specify ovo or ovr. Unlike say the average parameter of P/R/F, we don't need the user to tell us that the data is to be treated as multiclass (we can use the number of columns in y_score for that) but I suspect we should be trying to encourage literacy in the idea that there are several ways to extend roc_auc to multiclass.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am okay with this. We can set the default to None and raising an error before going down the multi-class code path.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can set the default to None and raising an error before going down the multi-class code path.

Do others have opinions on this? @amueller?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since we are moving to strings as keywords more, multiclass would most likely be better as 'error' for raising an exception, or 'warn' if we decide on just warning.

Also to be consistent with the rest of sklearn, we should use multi_class.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's just do it. We can always change it later to have a default value.

@thomasjpfan
Copy link
Member Author

thomasjpfan commented Jun 24, 2019

How did we pick the default again? I usually feel OVR is easier to grasp and it would make the default multi-class behavior be consistent with the multi-label behavior.

@amueller The default is ovr in roc_auc_score. This PR defines both ovo and ovr in SCORERS, do you want to remove ovo from SCORERS?

@jnothman
Copy link
Member

jnothman commented Jul 2, 2019

Change of heart? CI is unhappy.

@thomasjpfan
Copy link
Member Author

Change of heart? CI is unhappy.

It should be happy now. Had to add roc_auc_score to METRIC_UNDEFINED_MULTICLASS, since it does not support multiclass by default. (Also added ovr_roc_auc and weighted_ovr_roc_auc to the common test)

@jnothman
Copy link
Member

jnothman commented Jul 3, 2019 via email

@thomasjpfan
Copy link
Member Author

test_explained_variance_components_10_20 fail is unrelated.

@amueller
Copy link
Member

is there an issue for the test failures?

Copy link
Member

@amueller amueller left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks good apart from nitpicks

return _average_binary_score(
_binary_roc_auc_score, y_true, y_score, average,
sample_weight=sample_weight)
else:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

when is this else active? can you add a comment maybe (unless I'm just being very slow)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

multilabel-indicator

sample_weight=sample_weight)


def _multiclass_roc_auc_score(binary_metric, y_true, y_score, labels,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

isn't binary_metric alwasy _binary_roc_auc_score?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

_binary_roc_auc_score is only defined in the scope of roc_auc_score


labels : array, shape = [n_classes] or None, optional (default=None)
List of labels to index ``y_score`` used for multiclass. If ``None``,
the lexicon order of ``y_true`` is used to index ``y_score``.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
the lexicon order of ``y_true`` is used to index ``y_score``.
the lexical order of ``y_true`` is used to index ``y_score``.

@amueller amueller merged commit dc9955b into scikit-learn:master Jul 17, 2019
@GaelVaroquaux
Copy link
Member

GaelVaroquaux commented Jul 17, 2019 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Support for multi-class roc_auc scores
7 participants