Timing and training score in *SearchCV.results_ #7026

eyc88 · 2016-07-17T04:59:27Z

Reference Issue

Resolves #6894 Timing in *SearchCV.results_
#6895 Training Score in *SearchCV.results_

What does this implement/fix? Explain your changes.

The _fit method in BaseSearchCV calls the _fit_and_score function from module _validation.
fit_and_score has the ability to report train_score, but not by default. On the other hand,
the init of BaseSearchCV does not accept return_train_score as an optional keyword. As a result, there was no way to have the results attribute receving train score.

The fix is to add the keyword return_train_score to the init of BaseSearchCV (which is inherited by both GridSearchCV and RandomizedSearchCV). Default is still set to False. When return_train_score is set to True, a set of addition steps is taken to put train_score into results_

The timing information is always available but was not written into "results_". This patch also writes timing info into results_ by default.

Any other comments?

Test is also re-written. In the test I explicitly choose return_train_score=True for better coverage.

Now *SearchCV.results_ includes both timing and training scores.

and new doctest (sklearn/model_selection/_search.py)

…dSearchCV.

jnothman · 2016-07-17T12:15:25Z

sklearn/model_selection/_search.py

            'test_std_score'    : [0.02, 0.01, 0.03, 0.03],
            'test_rank_score'   : [2, 4, 3, 1],
            'params'            : [{'kernel': 'poly', 'degree': 2}, ...],
            }

        NOTE that the key ``'params'`` is used to store a list of parameter
-        settings dict for all the parameter candidates.
+        settings dict for all the parameter candidates.  Besides,
+        'train_mean_score', 'train_split*_score', ... will be present when


Please follow the style of using double backticks to indicate code.

Just noting that it should be backticks surrounding the single quoted key.

'train_mean_score' (which will display as 'train_mean_score')

raghavrv · 2016-07-17T12:21:26Z

Thanks for the PR!!

jnothman · 2016-07-17T12:21:28Z

sklearn/model_selection/_search.py

+                np.average((time - time_means[:, np.newaxis]) ** 2,
+                           axis=1, weights=weights))
+        if self.return_train_score:
+            train_means = np.average(train_scores, axis=1, weights=weights)


It doesn't make sense to weight the train scores by the number of test samples. Perhaps we should weight by the number of training samples when iid=True, or report an unweighted average. For times, weighting by training instances is more appropriate than by test, but leaving it unweighted might be best.

I agree. Also, I think iid is a bad hack and we need to fix the scorers to accept both training and test data to solve this properly :-/

Sorry if I'm a bit slow here, but please elucidate at some point. What's iid trying to hack? Why does accepting training data in the scorer help?

It's trying to hack the R^2, IIRC. @agramfort can say more about this, I think.
The way we compute R^2 is weird because we are using the test set mean, not the training set mean. Though actually after thinking about it more, maybe accepting training data is too big a change just for that. I thought it would also resolve some unsupervised issued, but maybe it doesn't.

jnothman · 2016-07-17T12:24:02Z

Perhaps we should avoid code duplication by defining _result_scores and reusing it for train and test, e.g. self.results_.update(_result_scores('test_', test_scores, weights)).

jnothman · 2016-07-17T12:24:33Z

sklearn/model_selection/_search.py

@@ -1104,4 +1155,4 @@ def fit(self, X, y=None, labels=None):
        sampled_params = ParameterSampler(self.param_distributions,
                                          self.n_iter,
                                          random_state=self.random_state)
-        return self._fit(X, y, labels, sampled_params)
+        return self._fit(X, y, labels, sampled_params)


You have strangely removed the newline character from the end of this file.

jnothman · 2016-07-17T12:25:38Z

I've renamed the PR to something clearer.

amueller · 2016-07-17T14:22:02Z

sklearn/model_selection/_search.py

@@ -371,7 +371,7 @@ class BaseSearchCV(six.with_metaclass(ABCMeta, BaseEstimator,
    def __init__(self, estimator, scoring=None,
                 fit_params=None, n_jobs=1, iid=True,
                 refit=True, cv=None, verbose=0, pre_dispatch='2*n_jobs',
-                 error_score='raise'):
+                 error_score='raise', return_train_score=False):


I know it was in the issue, but I'm not sure why we want to make that optional / not the default. Computing the training scores is really cheap compared to everything else that goes on. There are already a lot of parameters, not sure if this one is particularly helpful. @jnothman ?

I guess making it optional is helpful if you have a very large training set on which evaluation takes a long time because of an very expensive evaluation procedure. But I think we should default to True both for convenience and discoverability.

amueller · 2016-07-17T17:01:56Z

It would be nice to add the example of #1742 but I think that's not that important. LGTM.

amueller · 2016-07-17T17:03:30Z

wait, sorry I thought this was only the timing. For the training scores, we should really add an example and tests.

amueller · 2016-07-17T17:04:01Z

@eyc88 I think you said you might not have time to work on this now. Should someone else take over?

eyc88 · 2016-07-17T21:00:17Z

@amueller If you just want to test that the training score is smaller or equal to 1, I can deliver it soon (one thing I would like to respond to @ogrisel is that training score can be exactly 1). I am now travelling so there could be some delay. All that being said, I don't mind if someone wants to take it over.

1. check test_rank_score always >= 1 2. check all regular scores (test/train_mean/std_score) and timing >= 0 3. check all regular scores <= 1 Note that timing can be greater than 1 in general, and std of regular scores always <= 1 because the scores are bounded between 0 and 1.

eyc88 · 2016-07-18T05:03:56Z

@amueller Check this out. Let me know if this is what you had in mind. We can discuss.

jnothman · 2016-07-23T11:41:52Z

A test for training score would usually require a more explicit test that we're reporting the right number. We want, for instance, the tests to be fairly robust to this all being reimplemented.

amueller · 2016-07-28T20:06:51Z

Yeah so a good test would be to instantiate a cross-validation object and do the calculations "by hand" for some simple example and check that GridSearchCV does them correctly.

eyc88 · 2016-08-03T16:21:04Z

Sounds good. I will take a closer look and write a new version of test once I got some free time.

jnothman · 2016-08-30T02:30:24Z

@amueller, if tests are the main issue here, is it worth hurrying this into 0.18 as part of the grid search results overhaul. @eyc88 have you found some time?

eyc88 · 2016-08-30T18:32:39Z

@jnothman When are you guys rolling out 0.18? I can sqeeze some bandwidth if the deadline is approaching...

jnothman · 2016-08-31T00:11:03Z

Within the next week or two.

amueller · 2016-08-31T18:03:08Z

I think tests were my only issue. and I'd love to see this in 0.18

raghavrv · 2016-08-31T18:04:31Z

@eyc88 Do you mind if I take this over?

eyc88 · 2016-08-31T18:47:45Z

@raghavrv No.

lesteve · 2016-09-26T12:04:06Z

Closing in favour of #7325.

Eugene Chen added 3 commits July 16, 2016 17:01

Resolved issue scikit-learn#6894 and scikit-learn#6895:

35a8780

Now *SearchCV.results_ includes both timing and training scores.

wrote new test (sklearn/model_selection/test_search.py)

316ffba

and new doctest (sklearn/model_selection/_search.py)

added a few more lines in the docstring of GridSearchCV and Randomize…

ea38f84

…dSearchCV.

jnothman reviewed Jul 17, 2016
View reviewed changes

jnothman changed the title ~~Resolving issue #6894 and #6895~~ Timing and training score in *SearchCV.results_ Jul 17, 2016

amueller reviewed Jul 17, 2016
View reviewed changes

Revised code according to suggestions.

5a43a55

amueller changed the title ~~Timing and training score in *SearchCV.results_~~ [MRG + 1] Timing and training score in *SearchCV.results_ Jul 17, 2016

amueller changed the title ~~[MRG + 1] Timing and training score in *SearchCV.results_~~ Timing and training score in *SearchCV.results_ Jul 17, 2016

amueller added this to the 0.18 milestone Jul 28, 2016

amueller mentioned this pull request Jul 29, 2016

MRG Training Score in Gridsearch #1742

Closed

raghavrv mentioned this pull request Sep 1, 2016

[MRG+2] Timing and training score in GridSearchCV #7325

Merged

4 tasks

lesteve closed this Sep 26, 2016

raghavrv mentioned this pull request Sep 26, 2016

[MRG + 2] ENH Allow cross_val_score, GridSearchCV et al. to evaluate on multiple metrics #7388

Merged

16 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Timing and training score in *SearchCV.results_ #7026

Timing and training score in *SearchCV.results_ #7026

eyc88 commented Jul 17, 2016 •

edited

Loading

jnothman Jul 17, 2016

raghavrv Jul 17, 2016

raghavrv commented Jul 17, 2016

jnothman Jul 17, 2016

amueller Jul 17, 2016

jnothman Jul 17, 2016

amueller Jul 17, 2016

jnothman commented Jul 17, 2016

jnothman Jul 17, 2016

jnothman commented Jul 17, 2016

amueller Jul 17, 2016

amueller Jul 17, 2016

amueller commented Jul 17, 2016

amueller commented Jul 17, 2016

amueller commented Jul 17, 2016

eyc88 commented Jul 17, 2016

eyc88 commented Jul 18, 2016

jnothman commented Jul 23, 2016

amueller commented Jul 28, 2016

eyc88 commented Aug 3, 2016

jnothman commented Aug 30, 2016

eyc88 commented Aug 30, 2016

jnothman commented Aug 31, 2016

amueller commented Aug 31, 2016

raghavrv commented Aug 31, 2016

eyc88 commented Aug 31, 2016

lesteve commented Sep 26, 2016

Timing and training score in *SearchCV.results_ #7026

Timing and training score in *SearchCV.results_ #7026

Conversation

eyc88 commented Jul 17, 2016 • edited Loading

Reference Issue

What does this implement/fix? Explain your changes.

Any other comments?

Choose a reason for hiding this comment

Choose a reason for hiding this comment

raghavrv commented Jul 17, 2016

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jnothman commented Jul 17, 2016

Choose a reason for hiding this comment

jnothman commented Jul 17, 2016

Choose a reason for hiding this comment

Choose a reason for hiding this comment

amueller commented Jul 17, 2016

amueller commented Jul 17, 2016

amueller commented Jul 17, 2016

eyc88 commented Jul 17, 2016

eyc88 commented Jul 18, 2016

jnothman commented Jul 23, 2016

amueller commented Jul 28, 2016

eyc88 commented Aug 3, 2016

jnothman commented Aug 30, 2016

eyc88 commented Aug 30, 2016

jnothman commented Aug 31, 2016

amueller commented Aug 31, 2016

raghavrv commented Aug 31, 2016

eyc88 commented Aug 31, 2016

lesteve commented Sep 26, 2016

eyc88 commented Jul 17, 2016 •

edited

Loading