[MRG] Multimetric GridSearch - Memoize prediction results (and address some previous comments) #9326

raghavrv · 2017-07-11T15:55:22Z

Attempt to memoize the prediction result as currently it calls predict for each scorer.

Also address previous comments of @jnothman at #7388

raghavrv · 2017-07-16T17:10:02Z

Memoization done. Speedups are quite good when the prediction is time consuming.

from sklearn.neighbors import KNeighborsClassifier
from sklearn.model_selection import GridSearchCV
from sklearn.datasets import make_classification

X, y = make_classification(n_samples=1000, random_state=0)
gs = GridSearchCV(KNeighborsClassifier(),
                  param_grid={'n_neighbors': [4, 5, 6],
                              'p': [2, 3],
                              'algorithm': ['kd_tree', 'ball_tree']},
                  scoring=['accuracy', 'precision'], refit='accuracy')

Master

>>> %timeit gs.fit(X, y)
1 loop, best of 3: 31.5 s per loop

this branch

>>> %timeit gs.fit(X, y)
1 loop, best of 3: 15.6 s per loop

jnothman · 2017-07-16T23:01:41Z

sklearn/model_selection/_validation.py

 def _multimetric_score(estimator, X_test, y_test, scorers):
    """Return a dict of score for multimetric scoring"""
    scores = {}

+    # Try wrapping the estimator in _MemoizedPredictEstimator if we don't use
+    # the pass-through scorer
+    uses_score_method = any([scorer is _passthrough_scorer


Why do we not do it in the passthrough case?

For passthrough it uses the score method, which can't be (trivially) memoized.

Yes, but is there any harm? Why is this any, not all?

Oh yea if the score function is going to be used, then it can be wrapped by this _MemoizedPredictEstimator safely.

jnothman · 2017-07-16T23:07:23Z

sklearn/model_selection/_validation.py

+        if not hasattr(self, '_decisions'):
+            self._decisions = self.estimator.decision_function(X)
+        return self._decisions
+


No predict_proba or predict_log_proba? Are you these call decidion_function internally? Probabilistic non-linear predictors will be among those most benefiting from memoization, and I don't think they tend to implement decision_function.

jnothman

Well your memoized predictor then needs to delegate score. In fact, to be compatible with every scorer someone might conceive of, it needs to delegate everything with __getattr__ magic.

In a similar vein you will need to support kwargs like predict_std in delegating.

amueller · 2017-07-21T20:00:09Z

If there are any unrelated things inhere as you indicate, please put them in a separate PR.

raghavrv · 2017-07-24T10:30:04Z

@amueller This is ready for merge as such. No unrelated stuff. Just a few comments from last PR addressed.

jnothman · 2017-07-24T10:32:31Z

doc/modules/model_evaluation.rst

@@ -242,14 +242,14 @@ permitted and will require a wrapper to return a single metric::
    >>> # A sample toy binary classification dataset
    >>> X, y = datasets.make_classification(n_classes=2, random_state=0)
    >>> svm = LinearSVC(random_state=0)
-    >>> tp = lambda y_true, y_pred: confusion_matrix(y_true, y_pred)[0, 0]


I think the point was that if you want to change this for 0.19, make it a separate PR (or just commit to master).

The multi-metric stuff won't go into 0.19 as features are frozen.

But this need not go into 0.19. We can release this for 0.20?

Sure, but isn't it unrelated to the rest of the pr?

I'll raise as separate PR then.

raghavrv · 2017-07-24T11:22:47Z

In fact, to be compatible with every scorer someone might conceive of, it needs to delegate everything with getattr magic. In a similar vein you will need to support kwargs like predict_std in delegating.

this is in addition to memoizing the results for predict_proba, predict_log_proba, predict, decision_function correct?

jnothman · 2017-07-24T12:56:14Z

I suppose so!

amueller · 2019-08-05T17:47:47Z

referencing #10802 for posterity

amueller · 2019-09-11T19:49:24Z

fixed in #14593.

raghavrv added 5 commits July 10, 2017 16:58

Use a def as it is pickle-able

d3ce2ad

Address Joel's comments

54d5341

Merge branch 'master' into multimetric_part2

e7e9125

Try memoizing predict / decicion_function calls

2e5b729

TST memoizing of the predictions when non-default scoring is used

18d3549

jnothman reviewed Jul 16, 2017

View reviewed changes

Add predict_proba, predict_log_proba and wrap all estimators

e45b154

jnothman reviewed Jul 19, 2017

View reviewed changes

Fix travis

b6b3994

raghavrv changed the title ~~[WIP] Multimetric GridSearch - Memoize prediction results (and address some previous comments)~~ [MRG] Multimetric GridSearch - Memoize prediction results (and address some previous comments) Jul 24, 2017

jnothman reviewed Jul 24, 2017

View reviewed changes

Undo the changes in model_evaluation.rst

7c0ed54

raghavrv mentioned this pull request Jul 24, 2017

[MRG] DOC use def instead of lambda in the multimetric example at model_evaluation.rst #9442

Merged

amueller added the Superseded PR has been replace by a newer PR label Aug 5, 2019

amueller mentioned this pull request Aug 26, 2019

[MRG] Adds _MultimetricScorer for Optimized Scoring #14593

Merged

amueller closed this Sep 11, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[MRG] Multimetric GridSearch - Memoize prediction results (and address some previous comments) #9326

[MRG] Multimetric GridSearch - Memoize prediction results (and address some previous comments) #9326

raghavrv commented Jul 11, 2017

raghavrv commented Jul 16, 2017 •

edited

Loading

jnothman Jul 16, 2017

raghavrv Jul 18, 2017

jnothman Jul 19, 2017

raghavrv Jul 19, 2017

jnothman Jul 16, 2017

jnothman left a comment

amueller commented Jul 21, 2017

raghavrv commented Jul 24, 2017

jnothman Jul 24, 2017

raghavrv Jul 24, 2017

jnothman Jul 24, 2017

raghavrv Jul 24, 2017 •

edited

Loading

raghavrv commented Jul 24, 2017

jnothman commented Jul 24, 2017

amueller commented Aug 5, 2019

amueller commented Sep 11, 2019

[MRG] Multimetric GridSearch - Memoize prediction results (and address some previous comments) #9326

[MRG] Multimetric GridSearch - Memoize prediction results (and address some previous comments) #9326

Conversation

raghavrv commented Jul 11, 2017

raghavrv commented Jul 16, 2017 • edited Loading

Master

this branch

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jnothman left a comment

Choose a reason for hiding this comment

amueller commented Jul 21, 2017

raghavrv commented Jul 24, 2017

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

raghavrv Jul 24, 2017 • edited Loading

Choose a reason for hiding this comment

raghavrv commented Jul 24, 2017

jnothman commented Jul 24, 2017

amueller commented Aug 5, 2019

amueller commented Sep 11, 2019

raghavrv commented Jul 16, 2017 •

edited

Loading

raghavrv Jul 24, 2017 •

edited

Loading