Skip to content

implemented predict_proba for OneVsRestClassifier #1416

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 13 commits into from
Closed

implemented predict_proba for OneVsRestClassifier #1416

wants to merge 13 commits into from

Conversation

AWinterman
Copy link
Contributor

This also involved writing function predict_proba_ovr to mimic the
methodology of existing code.

Note that in the multilabel case, the marginal probability of the sample having the given label is returned. These probabilities do not sum to unity, since the set of such probabilities over all labels do not partition the sample space.

This also involved writing function `predict_proba_ovr` to mimic the
methodology of existing code.
@amueller
Copy link
Member

ping @mblondel (who has probably seen it already any way)

@AWinterman
Copy link
Contributor Author

Am I reading the travis traceback wrong? Because it looks as though it failed from something unrelated to my changes.

@amueller
Copy link
Member

I was just looking at it ;)
Actually, I have this problem a lot. It seems that mldata.org doesn't like travis or something and downloading data from there randomly gives server errors.
The travis is just 2 days old.... With your next commit it should hopefully work.


if not multilabel:
#then probabilities should be normalized to 1.
Y /= np.sum(Y,1)[:,np.newaxis].repeat(Y.shape[1],1 )
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do you need repeat?

Since normalization just divided the probabilities by a constant, I wonder if we can normalize even in the multilabel case?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I used repeat to make the row sums into a matrix of the same shape as Y. I'm still unfamiliar with all of the available numpy methods, so if there's a better way I'm all ears.

I don't think we should normalize in the multilabel case, since these probabilities should not sum to one. I think these should be the marginal probability that the given sample has the label in question. It is entirely consistent that two labels both have a 90% probability of applying to a given sample.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is "broadcasting" along axis. You don't need to repeat. Y /= np.sum(Y,1) should work I think.
+1 for not normalizing

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is entirely consistent that two labels both have a 90% probability of applying to a given sample.

Yes, makes perfect sense!

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Y /= np.sum(Y, axis=1) is better for readability :)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

true :)

@mblondel
Copy link
Member

Can you add tests to sklearn/tests/test_multiclass.py? Thanks!

@AWinterman
Copy link
Contributor Author

@mblondel I was going to start by testing that probabilities add to one, that the prediction is the argmax and that errors are raised on malformed input. Are there any other tests I should write?

@mblondel
Copy link
Member

@AWinterman Sounds good. You also need to check the multilabel case.

@@ -12,7 +12,7 @@
use these estimators to turn a binary classifier or a regressor into a
multiclass classifier. It is also possible to use these estimators with
multiclass estimators in the hope that their accuracy or runtime performance
improves.
improves. It is also possible to do multilabel classification.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A note should definitely be added somewhere around here. Maybe even more prominent. (also the formulation is a bit awkward at the moment).

@amueller
Copy link
Member

Do we want "predict_proba" for estimators that only have decision function? In the multi-class case, if we renormalize, the "probabilities" are not calibrated anyway, right?

@AWinterman
Copy link
Contributor Author

I wasn't sure how to go about implementing predict_proba for decision function only estimators, so I figured I'd hold off for wiser heads. I'd be glad to add that to my todo list if somebody can point me to some resources on how to do it.

I'm not sure what you mean by that second sentence. Is that an argument for implementing predict_proba for estimators which only have a decision function?

If anyone has any suggestions for additional testing methods, it would be much
appreciated. Currently implemented:
- Do probabilities sum to one in single label case?
- Is a ValueError raised for base classifier with no predict_proba method?
- Do you arrive at the same predictions from predict_proba and predict?
@mblondel
Copy link
Member

Estimators that don't implement predict_proba will raise an AttributeError. This is the expected behavior in my opinion (can you add an assert_raises test fort that?). Calibration is currently being implemented in PR #1176.

Regarding the motivation for normalizing multiclass probabilities, you can cite "Transforming Classifier Scores into Accurate Multiclass Probability Estimates" (KDD 2002).

@amueller
Copy link
Member

@AWinterman Yes, it was supposed to be an argument.
@mblondel Ok, I'll have to look into your reference. My point was that I didn't see a reason not to implement predict_proba from decision function, as it seems to be as valid as what happens now - maybe your reference will convince me otherwise.

On the other hand, I realized that we don't implement predict_proba just because it is possible, so maybe leave it out.

@mblondel
Copy link
Member

@amueller But decision_function can return negative values, unlike
predict_proba. So I don't think it is wise to use its output without
calibration and call it a probability.

@amueller
Copy link
Member

(I would have used softmax but let's forget about it)

n_samples = 100
n_classes = 5
for multilabel in (False, True):
for au in (False, True):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you implement the tests for the multiclass and multilabel cases in two different functions? au is not used in the multiclass case.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sure!

@AWinterman
Copy link
Contributor Author

Sounds like I should just remove the _check_has_proba function, and let the people working on calibration reveal a predict_proba method if it's appropriate.

It should just raise an attr. error if it shows
`test_ovr_single_label_predict_proba` wasn't checking consitency between
`predict_proba` and `predict` correctly. Now it is. Nose tests pass except for
1 conerning PIL.
#predict assigns a label if the probability that the
#sample has the label is greater than than 0.5.
pred = np.array([l.argmax() for l in Y_proba])
assert_true(not (pred-Y_pred).any())
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can use assert_false for that :) Can you spellcheck the comments? There are many typos. Then run pep8 and we are good to merge.

labels both have a 90% probability of applying to a given sample.

In the single label multiclass case, the rows of the returned matrix
should sum to unity.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd say "one" but I guess "unity" is ok.

@amueller
Copy link
Member

amueller commented Dec 1, 2012

LGTM apart from the sphinx thing.

@amueller
Copy link
Member

amueller commented Dec 2, 2012

Thanks, merged (by rebase).

@amueller amueller closed this Dec 2, 2012
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants