-
-
Notifications
You must be signed in to change notification settings - Fork 25.8k
[MRG] Multimetric GridSearch - Memoize prediction results (and address some previous comments) #9326
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Memoization done. Speedups are quite good when the prediction is time consuming. from sklearn.neighbors import KNeighborsClassifier
from sklearn.model_selection import GridSearchCV
from sklearn.datasets import make_classification
X, y = make_classification(n_samples=1000, random_state=0)
gs = GridSearchCV(KNeighborsClassifier(),
param_grid={'n_neighbors': [4, 5, 6],
'p': [2, 3],
'algorithm': ['kd_tree', 'ball_tree']},
scoring=['accuracy', 'precision'], refit='accuracy') Master
this branch
|
def _multimetric_score(estimator, X_test, y_test, scorers): | ||
"""Return a dict of score for multimetric scoring""" | ||
scores = {} | ||
|
||
# Try wrapping the estimator in _MemoizedPredictEstimator if we don't use | ||
# the pass-through scorer | ||
uses_score_method = any([scorer is _passthrough_scorer |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we not do it in the passthrough case?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For passthrough it uses the score
method, which can't be (trivially) memoized.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, but is there any harm? Why is this any, not all?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh yea if the score function is going to be used, then it can be wrapped by this _MemoizedPredictEstimator
safely.
if not hasattr(self, '_decisions'): | ||
self._decisions = self.estimator.decision_function(X) | ||
return self._decisions | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No predict_proba or predict_log_proba? Are you these call decidion_function internally? Probabilistic non-linear predictors will be among those most benefiting from memoization, and I don't think they tend to implement decision_function.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well your memoized predictor then needs to delegate score. In fact, to be compatible with every scorer someone might conceive of, it needs to delegate everything with __getattr__
magic.
In a similar vein you will need to support kwargs like predict_std in delegating.
If there are any unrelated things inhere as you indicate, please put them in a separate PR. |
@amueller This is ready for merge as such. No unrelated stuff. Just a few comments from last PR addressed. |
doc/modules/model_evaluation.rst
Outdated
@@ -242,14 +242,14 @@ permitted and will require a wrapper to return a single metric:: | |||
>>> # A sample toy binary classification dataset | |||
>>> X, y = datasets.make_classification(n_classes=2, random_state=0) | |||
>>> svm = LinearSVC(random_state=0) | |||
>>> tp = lambda y_true, y_pred: confusion_matrix(y_true, y_pred)[0, 0] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the point was that if you want to change this for 0.19, make it a separate PR (or just commit to master).
The multi-metric stuff won't go into 0.19 as features are frozen.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But this need not go into 0.19. We can release this for 0.20?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, but isn't it unrelated to the rest of the pr?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll raise as separate PR then.
this is in addition to memoizing the results for |
I suppose so! |
referencing #10802 for posterity |
fixed in #14593. |
Attempt to memoize the prediction result as currently it calls predict for each scorer.
Also address previous comments of @jnothman at #7388
cc: @amueller @jnothman @agramfort