Skip to content

[MRG] FIX solve consistency between predict and predict_proba in AdaBoost #14114

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 10 commits into from
Jul 16, 2019

Conversation

glemaitre
Copy link
Member

@glemaitre glemaitre commented Jun 18, 2019

closes #14084
closes #2974

compute the probabilities in AdaBoostClassifier as specified in "Multiclass AdaBoost"

@glemaitre glemaitre changed the title [WIP] FIX solve consistency between predict and predict_proba in AdaBoost [MRG] FIX solve consistency between predict and predict_proba in AdaBoost Jun 19, 2019
@glemaitre
Copy link
Member Author

@NicolasHug @amueller I am playing with something which is not my strong suit. Could you have a look at the PR and if the fix proposed seems theoretically reasonable?

Copy link
Member

@NicolasHug NicolasHug left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As far as I can tell the changes are correct.

What is strange to me is that the previous implementation is also a softmax of the decision function, so other than the fact that the new version is much clearer, I don't understand where the fix is.

(to be honest this isn't really my specialty either)

2009.
"""
if n_classes == 2:
decision = np.vstack([-decision, decision]).T
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you explain this? is it in the paper too?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My original thought was that we were keeping one of the 2 columns when computing the decision function. Actually, we do something a bit different:

        if n_classes == 2:
            pred[:, 0] *= -1
            return pred.sum(axis=1)

You originally have symmetry between both classes. However, I should divide decision by 2 since that we are summing.

@glemaitre
Copy link
Member Author

What is strange to me is that the previous implementation is also a softmax of the decision function, so other than the fact that the new version is much clearer, I don't understand where the fix is.

It uses the predict_proba of the underlying classifier instead of the predict (used in the decision function). It is the main difference.

@glemaitre
Copy link
Member Author

@NicolasHug do you have any other comments?

Any second reviewer? @rth @thomasjpfan

@thomasjpfan thomasjpfan self-assigned this Jul 4, 2019
Copy link
Member

@thomasjpfan thomasjpfan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Strange how master's implementation used predict_proba instead of predict which is more inline with the paper's use of missclassification error rate.

@glemaitre
Copy link
Member Author

@thomasjpfan I addressed the comments.

Copy link
Member

@thomasjpfan thomasjpfan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@agramfort agramfort merged commit c0c5313 into scikit-learn:master Jul 16, 2019
@agramfort
Copy link
Member

thx @glemaitre

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
4 participants