Check predict_proba shape in ThresholdScorer #12486

reshamas · 2018-10-30T01:08:40Z

Reference Issues/PRs

Continues and resolves #12221, fixes #7598

What does this implement/fix? Explain your changes.

Any other comments?

this time, I ran flake8, fixed formatting

@amueller Can you point me to the GMM test? I did not see on in here:
scikit-learn/sklearn/tests

Looks good in general, but you need to add a regression test (you could use the GMM one or just a classification one with a single class maybe)

sklearn-lgtm · 2018-10-30T01:37:29Z

This pull request introduces 1 alert when merging 73489b3 into f6b0c67 - view on LGTM.com

new alerts:

1 for Unreachable code

Comment posted by LGTM.com

amueller · 2018-10-30T14:57:53Z

gmm tests are in sklearn/mixture/tests

reshamas · 2018-10-30T16:55:11Z

@AMDonati all tests passed!

jnothman

I'm confused. The test change doesn't match the code change and neither matches the title of the pr

amueller · 2018-11-05T22:20:25Z

I'm also confused by the test?

amueller · 2018-11-05T22:21:30Z

Renamed to something more concrete (hopefully).

amueller · 2018-11-05T22:22:09Z

is this also an issue with _ProbaScorer?

reshamas · 2018-11-05T22:23:15Z

is this also an issue with _ProbaScorer?

We created our own test, which failed. That's when I went over and used the mixture test which passed.
cc: @AMDonati

amueller · 2018-11-09T19:10:12Z

ok I added the appropriate test

amueller · 2018-11-09T19:14:04Z

sklearn/metrics/scorer.py

+            if y_pred.shape[1] == 2:
+                y_pred = y_pred[:, 1]
+            else:
+                raise ValueError('Must use classifier with two classes'


should we say "got predict_proba of shape"?

That seems like a good idea, because that is what is actually happening.

But, will the user understand that? Is there some place in the documentation where we can write:

When obtaining predictions from a classifier, if it is only one class, the returned ValueError will be "got predict_proba of shape". That means it returns one column all with same probabilities (equal to 1).
Q: How to fix this?
A: Must use a classifier with 2 classes.

Or will they be able to see it from the test below:
with pytest.raises(ValueError, match="use classifier with two classes")
I would feel better if the user had explicit feedback on how to solve the error.

cc: @AMDonati

Are you saying the current error message is not informative enough? The test is just to make sure that the error that is raised is actually informative.
I suggested adding the predict_proba shape to the error message, not replace the message that's there now.

That works.
I tried making the update myself but ran into multiple problems: errors list

ok, making progress. Please let me know next steps.

cc: @AMDonati

…into check_if_classifier

amueller · 2018-11-12T20:12:34Z

sklearn/metrics/scorer.py

+            if y_pred.shape[1] == 2:
+                y_pred = y_pred[:, 1]
+            else:
+                raise ValueError('got predict_proba of shape;'


you're not actually providing the shape.

ok, what is happening is this:

previously, when there was only one classifier, but expecting two, the y_pred was not resolving. This was the error: IndexError: index 1 is out of bounds for axis 1 with size 1

so, we made a check and an adjustment so that if the shape was 2 (meaning it only returned one column vector probabilities, which are all 1's), we want to raise a ValueError.

What should we call it? I don't think returning the shape (even if we did that) is informative. How about:

"vector of probabilities returnedpredict_proba is all 1's because only one classifier is in the model."

Does that work?

jnothman · 2018-11-12T22:51:28Z

sklearn/metrics/scorer.py

+            if y_pred.shape[1] == 2:
+                y_pred = y_pred[:, 1]
+            else:
+                raise ValueError('got predict_proba of shape;'


aren't you meant to output the shape here?

Why is it a good idea to output the shape here? Isn't this more informative:
"vector of probabilities returnedpredict_proba is all 1's because only one classifier is in the model."

do you mean "only one class" rather than "only one classifier"?

Either way the proposed error message here says "got predict of shape; ..." that simply doesn't make sense.

How about I change the error message to:
"vector of probabilities returned (predict_proba) is all 1's because there is only one class in the input to the model."

It can also fail with GMMs where the shape would be wrong but the entries wouldn't be all ones.

yes, but you should use actual y_pred.shape[1] not hard-code 1 as there might be code-paths that have >2. (unless we can make sure that's not happening)

you mean like this?

if y_pred.shape[1] == 2: y_pred = y_pred[:, 1] elif y_pred.shape[1] == 1: raise ValueError('got predict_proba of shape;'

Or, I think you mean this:

predict_proba has shape (n_samples, y_pred.shape[1]) which returns only 1 vector of probabilities (because it is a single class) but 2 classes in the data are required.

No I mean like

ValueError("got predict_proba of shape {}, but need classifier with two classes for {} scoring".format(y_pred.shape, self._score_func.__name__))

ok, I have made the updates.

…into check_if_classifier

amueller · 2018-11-14T00:51:49Z

thanks.

jnothman

Thanks!

reshamas · 2018-11-14T11:45:57Z

Congratulations @AMDonati. We closed our first PR on scikit-learn! 🎆
Thanks @amueller for all your help!

amueller · 2018-11-14T15:59:09Z

yay!

…orer (scikit-learn#12486) Continues and resolves scikit-learn#12221, fixes scikit-learn#7598

…esholdScorer (scikit-learn#12486)" This reverts commit e930402.

…orer (scikit-learn#12486) Continues and resolves scikit-learn#12221, fixes scikit-learn#7598

reshamas added 2 commits October 29, 2018 20:50

ck estimator is classifier & num_classes>=2 in score.py

41a220c

fixed formatting issues after using flake8

73489b3

putting assert error in 1 line instead of 2

7cfe9ea

reshamas added 2 commits October 30, 2018 11:45

added test for this pull request

9054e97

adding mixture test

7336082

jnothman reviewed Oct 31, 2018

View reviewed changes

reshamas mentioned this pull request Nov 1, 2018

ck estimator is classifier & num_classes>=2 in score.py #12221

Closed

amueller changed the title ~~Check if classifier~~ Check predict_proba shape in roc_auc_score Nov 5, 2018

amueller changed the title ~~Check predict_proba shape in roc_auc_score~~ Check predict_proba shape in ThresholdScorer Nov 5, 2018

add regression test, also add fix for proba scorers

15a4397

amueller reviewed Nov 9, 2018

View reviewed changes

reshamas added 8 commits November 10, 2018 20:25

Merge branch 'master' of https://github.com/scikit-learn/scikit-learn …

d083449

…into check_if_classifier

updating scorer and test per edits by amueller

c08cb2e

updating error description

210424a

updating scorer, formatting

78a75ea

Merge branch 'master' of https://github.com/scikit-learn/scikit-learn …

4ae5b54

…into check_if_classifier

uh, fixed merge conflict

a9e873e

make error message wording consistent

f021674

use consistent wording for error in both places

e9e492e

amueller reviewed Nov 12, 2018

View reviewed changes

jnothman reviewed Nov 12, 2018

View reviewed changes

added shape value to error message

389482e

reshamas added 4 commits November 13, 2018 15:05

Merge branch 'master' of https://github.com/scikit-learn/scikit-learn …

b4be4aa

…into check_if_classifier

fixed formatting error from flake8 test

dbda7a2

made updates after running pytests

8495900

line 198 too long, fixed formatting

4a256f1

amueller approved these changes Nov 14, 2018

View reviewed changes

jnothman approved these changes Nov 14, 2018

View reviewed changes

jnothman merged commit 94db3d9 into scikit-learn:master Nov 14, 2018

amueller mentioned this pull request Nov 20, 2018

[MRG] Release 0.20.1 #12383

Merged

jnothman pushed a commit to jnothman/scikit-learn that referenced this pull request Nov 20, 2018

ENH Improved error message for bad predict_proba shape in ThresholdSc…

1f4451e

…orer (scikit-learn#12486) Continues and resolves scikit-learn#12221, fixes scikit-learn#7598

xhluca pushed a commit to xhluca/scikit-learn that referenced this pull request Apr 28, 2019

ENH Improved error message for bad predict_proba shape in ThresholdSc…

e930402

…orer (scikit-learn#12486) Continues and resolves scikit-learn#12221, fixes scikit-learn#7598

xhluca pushed a commit to xhluca/scikit-learn that referenced this pull request Apr 28, 2019

Revert "ENH Improved error message for bad predict_proba shape in Thr…

2aff630

…esholdScorer (scikit-learn#12486)" This reverts commit e930402.

xhluca pushed a commit to xhluca/scikit-learn that referenced this pull request Apr 28, 2019

Revert "ENH Improved error message for bad predict_proba shape in Thr…

357af62

…esholdScorer (scikit-learn#12486)" This reverts commit e930402.

koenvandevelde pushed a commit to koenvandevelde/scikit-learn that referenced this pull request Jul 12, 2019

ENH Improved error message for bad predict_proba shape in ThresholdSc…

d6878ce

…orer (scikit-learn#12486) Continues and resolves scikit-learn#12221, fixes scikit-learn#7598

thomasjpfan mentioned this pull request Nov 1, 2019

BUG Fixes error with multiclass roc auc scorer #15274

Merged

Uh oh!

Check predict_proba shape in ThresholdScorer #12486

Check predict_proba shape in ThresholdScorer #12486

Uh oh!

Conversation

reshamas commented Oct 30, 2018 • edited by jnothman Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Reference Issues/PRs

What does this implement/fix? Explain your changes.

Any other comments?

Uh oh!

sklearn-lgtm commented Oct 30, 2018

Uh oh!

amueller commented Oct 30, 2018

Uh oh!

reshamas commented Oct 30, 2018

Uh oh!

jnothman left a comment

Choose a reason for hiding this comment

Uh oh!

amueller commented Nov 5, 2018

Uh oh!

amueller commented Nov 5, 2018

Uh oh!

amueller commented Nov 5, 2018

Uh oh!

reshamas commented Nov 5, 2018

Uh oh!

amueller commented Nov 9, 2018

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

reshamas Nov 12, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

reshamas Nov 13, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

amueller commented Nov 14, 2018

Uh oh!

jnothman left a comment

Choose a reason for hiding this comment

Uh oh!

reshamas commented Nov 14, 2018

Uh oh!

amueller commented Nov 14, 2018

Uh oh!

Uh oh!

reshamas commented Oct 30, 2018 •

edited by jnothman

Loading

reshamas Nov 12, 2018 •

edited

Loading

reshamas Nov 13, 2018 •

edited

Loading