-
-
Notifications
You must be signed in to change notification settings - Fork 25.8k
Check predict_proba shape in ThresholdScorer #12486
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
This pull request introduces 1 alert when merging 73489b3 into f6b0c67 - view on LGTM.com new alerts:
Comment posted by LGTM.com |
gmm tests are in |
@AMDonati all tests passed! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm confused. The test change doesn't match the code change and neither matches the title of the pr
I'm also confused by the test? |
Renamed to something more concrete (hopefully). |
is this also an issue with |
We created our own test, which failed. That's when I went over and used the mixture test which passed. |
ok I added the appropriate test |
sklearn/metrics/scorer.py
Outdated
if y_pred.shape[1] == 2: | ||
y_pred = y_pred[:, 1] | ||
else: | ||
raise ValueError('Must use classifier with two classes' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should we say "got predict_proba of shape"?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That seems like a good idea, because that is what is actually happening.
But, will the user understand that? Is there some place in the documentation where we can write:
When obtaining predictions from a classifier, if it is only one class, the returned ValueError will be "got predict_proba of shape". That means it returns one column all with same probabilities (equal to 1).
Q: How to fix this?
A: Must use a classifier with 2 classes.
Or will they be able to see it from the test below:
with pytest.raises(ValueError, match="use classifier with two classes")
I would feel better if the user had explicit feedback on how to solve the error.
cc: @AMDonati
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are you saying the current error message is not informative enough? The test is just to make sure that the error that is raised is actually informative.
I suggested adding the predict_proba shape to the error message, not replace the message that's there now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That works.
I tried making the update myself but ran into multiple problems: errors list
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok, making progress. Please let me know next steps.
cc: @AMDonati
…into check_if_classifier
…into check_if_classifier
sklearn/metrics/scorer.py
Outdated
if y_pred.shape[1] == 2: | ||
y_pred = y_pred[:, 1] | ||
else: | ||
raise ValueError('got predict_proba of shape;' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you're not actually providing the shape.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok, what is happening is this:
- previously, when there was only one classifier, but expecting two, the y_pred was not resolving. This was the error:
IndexError: index 1 is out of bounds for axis 1 with size 1
- so, we made a check and an adjustment so that if the shape was 2 (meaning it only returned one column vector probabilities, which are all 1's), we want to raise a
ValueError
. - What should we call it? I don't think returning the shape (even if we did that) is informative. How about:
- "vector of probabilities returned
predict_proba
is all 1's because only one classifier is in the model."
Does that work?
sklearn/metrics/scorer.py
Outdated
if y_pred.shape[1] == 2: | ||
y_pred = y_pred[:, 1] | ||
else: | ||
raise ValueError('got predict_proba of shape;' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
aren't you meant to output the shape here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is it a good idea to output the shape here? Isn't this more informative:
"vector of probabilities returnedpredict_proba is all 1's because only one classifier is in the model."
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do you mean "only one class" rather than "only one classifier"?
Either way the proposed error message here says "got predict of shape; ..." that simply doesn't make sense.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about I change the error message to:
"vector of probabilities returned (predict_proba
) is all 1's because there is only one class in the input to the model."
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It can also fail with GMMs where the shape would be wrong but the entries wouldn't be all ones.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, but you should use actual y_pred.shape[1]
not hard-code 1 as there might be code-paths that have >2. (unless we can make sure that's not happening)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you mean like this?
if y_pred.shape[1] == 2:
y_pred = y_pred[:, 1]
elif y_pred.shape[1] == 1:
raise ValueError('got predict_proba of shape;'
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Or, I think you mean this:
predict_proba has shape (n_samples, y_pred.shape[1]) which returns only 1 vector of probabilities (because it is a single class) but 2 classes in the data are required.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No I mean like
ValueError("got predict_proba of shape {}, but need classifier with two classes for {} scoring".format(y_pred.shape, self._score_func.__name__))
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok, I have made the updates.
thanks. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks!
yay! |
…orer (scikit-learn#12486) Continues and resolves scikit-learn#12221, fixes scikit-learn#7598
…orer (scikit-learn#12486) Continues and resolves scikit-learn#12221, fixes scikit-learn#7598
…esholdScorer (scikit-learn#12486)" This reverts commit e930402.
…esholdScorer (scikit-learn#12486)" This reverts commit e930402.
…orer (scikit-learn#12486) Continues and resolves scikit-learn#12221, fixes scikit-learn#7598
Reference Issues/PRs
Continues and resolves #12221, fixes #7598
What does this implement/fix? Explain your changes.
Any other comments?
this time, I ran flake8, fixed formatting
cc: @AMDonati
@amueller Can you point me to the GMM test? I did not see on in here:
scikit-learn/sklearn/tests