Skip to content

[MRG] Multimetric GridSearch - Memoize prediction results (and address some previous comments) #9326

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 8 commits into from

Conversation

raghavrv
Copy link
Member

Attempt to memoize the prediction result as currently it calls predict for each scorer.

Also address previous comments of @jnothman at #7388

cc: @amueller @jnothman @agramfort

@raghavrv
Copy link
Member Author

raghavrv commented Jul 16, 2017

Memoization done. Speedups are quite good when the prediction is time consuming.

from sklearn.neighbors import KNeighborsClassifier
from sklearn.model_selection import GridSearchCV
from sklearn.datasets import make_classification

X, y = make_classification(n_samples=1000, random_state=0)
gs = GridSearchCV(KNeighborsClassifier(),
                  param_grid={'n_neighbors': [4, 5, 6],
                              'p': [2, 3],
                              'algorithm': ['kd_tree', 'ball_tree']},
                  scoring=['accuracy', 'precision'], refit='accuracy')

Master

>>> %timeit gs.fit(X, y)
1 loop, best of 3: 31.5 s per loop

this branch

>>> %timeit gs.fit(X, y)
1 loop, best of 3: 15.6 s per loop

def _multimetric_score(estimator, X_test, y_test, scorers):
"""Return a dict of score for multimetric scoring"""
scores = {}

# Try wrapping the estimator in _MemoizedPredictEstimator if we don't use
# the pass-through scorer
uses_score_method = any([scorer is _passthrough_scorer
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we not do it in the passthrough case?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For passthrough it uses the score method, which can't be (trivially) memoized.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, but is there any harm? Why is this any, not all?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh yea if the score function is going to be used, then it can be wrapped by this _MemoizedPredictEstimator safely.

if not hasattr(self, '_decisions'):
self._decisions = self.estimator.decision_function(X)
return self._decisions

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No predict_proba or predict_log_proba? Are you these call decidion_function internally? Probabilistic non-linear predictors will be among those most benefiting from memoization, and I don't think they tend to implement decision_function.

Copy link
Member

@jnothman jnothman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well your memoized predictor then needs to delegate score. In fact, to be compatible with every scorer someone might conceive of, it needs to delegate everything with __getattr__ magic.

In a similar vein you will need to support kwargs like predict_std in delegating.

@amueller
Copy link
Member

If there are any unrelated things inhere as you indicate, please put them in a separate PR.

@raghavrv
Copy link
Member Author

@amueller This is ready for merge as such. No unrelated stuff. Just a few comments from last PR addressed.

@raghavrv raghavrv changed the title [WIP] Multimetric GridSearch - Memoize prediction results (and address some previous comments) [MRG] Multimetric GridSearch - Memoize prediction results (and address some previous comments) Jul 24, 2017
@@ -242,14 +242,14 @@ permitted and will require a wrapper to return a single metric::
>>> # A sample toy binary classification dataset
>>> X, y = datasets.make_classification(n_classes=2, random_state=0)
>>> svm = LinearSVC(random_state=0)
>>> tp = lambda y_true, y_pred: confusion_matrix(y_true, y_pred)[0, 0]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the point was that if you want to change this for 0.19, make it a separate PR (or just commit to master).

The multi-metric stuff won't go into 0.19 as features are frozen.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But this need not go into 0.19. We can release this for 0.20?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, but isn't it unrelated to the rest of the pr?

Copy link
Member Author

@raghavrv raghavrv Jul 24, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll raise as separate PR then.

@raghavrv
Copy link
Member Author

In fact, to be compatible with every scorer someone might conceive of, it needs to delegate everything with getattr magic. In a similar vein you will need to support kwargs like predict_std in delegating.

this is in addition to memoizing the results for predict_proba, predict_log_proba, predict, decision_function correct?

@jnothman
Copy link
Member

I suppose so!

@amueller amueller added the Superseded PR has been replace by a newer PR label Aug 5, 2019
@amueller
Copy link
Member

amueller commented Aug 5, 2019

referencing #10802 for posterity

@amueller
Copy link
Member

fixed in #14593.

@amueller amueller closed this Sep 11, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Superseded PR has been replace by a newer PR
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants