Skip to content

Enabling Grid Search using AUC #450

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
amueller opened this issue Nov 24, 2011 · 6 comments
Closed

Enabling Grid Search using AUC #450

amueller opened this issue Nov 24, 2011 · 6 comments
Milestone

Comments

@amueller
Copy link
Member

Grid search is currently not possible with auc score since fit_grid_point calls "predict" on the classifier.

@ogrisel
Copy link
Member

ogrisel commented Nov 24, 2011

I know I add the same issue. As a workaround I did:

    ...

    class AUCSGDClassifier(linear_model.SGDClassifier):

        def score(self, X, y):
            probas = self.predict_proba(X)
            fpr, tpr, thresholds = roc_curve(y, probas)
            return auc(fpr, tpr)


    print "Fitting a model on the training set"
    clf = AUCSGDClassifier(loss='log', alpha=1e-4, n_iter=5)

    # check the cross validated score on the development set
    print cross_val_score(clf, X_train_transformed, y_train).mean()

I wonder how we could extend / refactor the scoring or grid search API to allow for "clf-aware" score functions.

@amueller
Copy link
Member Author

Thanks for your code. I also used a work around.
What do you mean by clf-aware?

I would like the fit_grid_point to use the decision_function instead of predict.
I just had a look into cross_val_score and I think I haven't really understood the API.
This is not so important for me and I have enough other things going on at the moment ;)

@ogrisel
Copy link
Member

ogrisel commented Nov 24, 2011

The current score_func for the grid search can only be a function of the y_true and y_pred output arrays. To compute the AUC score we either need to write auc_score that accepts y_true as first args and y_probas as (n_samples, n_classes) shaped array and make the GridSearchCV and cross_val_score able to handle this special case to call clf.predict_probas instead of clf.predict in the inner loop or make it possible to pass score function that get (clf, X, y_true) as arguments instead of (y_true, y_pred).

@amueller
Copy link
Member Author

I am just considering doing this. But it seems ugly :-/

Also, current AUC methods only support the binary case. So should we start implementing the multi-class and multi-label cases first?

I think I might do a hack that allows us to do this for the binary case.

@amueller
Copy link
Member Author

Hm are you sure the code you posted above does what it is supposed to do?
predict_proba should produce something of shape [n_samples, n_classes], while roc_curve only handles the binary case and expects [n_samples].

@amueller
Copy link
Member Author

amueller commented Feb 3, 2013

Closed via #1381.

@amueller amueller closed this as completed Feb 3, 2013
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants