-
-
Notifications
You must be signed in to change notification settings - Fork 26k
implemented predict_proba for OneVsRestClassifier #1416
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
implemented predict_proba for OneVsRestClassifier #1416
Conversation
This also involved writing function `predict_proba_ovr` to mimic the methodology of existing code.
ping @mblondel (who has probably seen it already any way) |
Am I reading the travis traceback wrong? Because it looks as though it failed from something unrelated to my changes. |
I was just looking at it ;) |
|
||
if not multilabel: | ||
#then probabilities should be normalized to 1. | ||
Y /= np.sum(Y,1)[:,np.newaxis].repeat(Y.shape[1],1 ) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do you need repeat
?
Since normalization just divided the probabilities by a constant, I wonder if we can normalize even in the multilabel case?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I used repeat
to make the row sums into a matrix of the same shape as Y. I'm still unfamiliar with all of the available numpy methods, so if there's a better way I'm all ears.
I don't think we should normalize in the multilabel case, since these probabilities should not sum to one. I think these should be the marginal probability that the given sample has the label in question. It is entirely consistent that two labels both have a 90% probability of applying to a given sample.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is "broadcasting" along axis. You don't need to repeat. Y /= np.sum(Y,1)
should work I think.
+1 for not normalizing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is entirely consistent that two labels both have a 90% probability of applying to a given sample.
Yes, makes perfect sense!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Y /= np.sum(Y, axis=1)
is better for readability :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
true :)
Can you add tests to |
@mblondel I was going to start by testing that probabilities add to one, that the prediction is the argmax and that errors are raised on malformed input. Are there any other tests I should write? |
@AWinterman Sounds good. You also need to check the multilabel case. |
@@ -12,7 +12,7 @@ | |||
use these estimators to turn a binary classifier or a regressor into a | |||
multiclass classifier. It is also possible to use these estimators with | |||
multiclass estimators in the hope that their accuracy or runtime performance | |||
improves. | |||
improves. It is also possible to do multilabel classification. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A note should definitely be added somewhere around here. Maybe even more prominent. (also the formulation is a bit awkward at the moment).
Do we want "predict_proba" for estimators that only have decision function? In the multi-class case, if we renormalize, the "probabilities" are not calibrated anyway, right? |
I wasn't sure how to go about implementing I'm not sure what you mean by that second sentence. Is that an argument for implementing |
If anyone has any suggestions for additional testing methods, it would be much appreciated. Currently implemented: - Do probabilities sum to one in single label case? - Is a ValueError raised for base classifier with no predict_proba method? - Do you arrive at the same predictions from predict_proba and predict?
Estimators that don't implement Regarding the motivation for normalizing multiclass probabilities, you can cite "Transforming Classifier Scores into Accurate Multiclass Probability Estimates" (KDD 2002). |
@AWinterman Yes, it was supposed to be an argument. On the other hand, I realized that we don't implement |
@amueller But decision_function can return negative values, unlike |
(I would have used softmax but let's forget about it) |
n_samples = 100 | ||
n_classes = 5 | ||
for multilabel in (False, True): | ||
for au in (False, True): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you implement the tests for the multiclass and multilabel cases in two different functions? au
is not used in the multiclass case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sure!
Sounds like I should just remove the |
It should just raise an attr. error if it shows
`test_ovr_single_label_predict_proba` wasn't checking consitency between `predict_proba` and `predict` correctly. Now it is. Nose tests pass except for 1 conerning PIL.
#predict assigns a label if the probability that the | ||
#sample has the label is greater than than 0.5. | ||
pred = np.array([l.argmax() for l in Y_proba]) | ||
assert_true(not (pred-Y_pred).any()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can use assert_false
for that :) Can you spellcheck the comments? There are many typos. Then run pep8 and we are good to merge.
labels both have a 90% probability of applying to a given sample. | ||
|
||
In the single label multiclass case, the rows of the returned matrix | ||
should sum to unity. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd say "one" but I guess "unity" is ok.
LGTM apart from the sphinx thing. |
Thanks, merged (by rebase). |
This also involved writing function
predict_proba_ovr
to mimic themethodology of existing code.
Note that in the multilabel case, the marginal probability of the sample having the given label is returned. These probabilities do not sum to unity, since the set of such probabilities over all labels do not partition the sample space.