Skip to content

Timing and training score in *SearchCV.results_ #7026

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 5 commits into from

Conversation

eyc88
Copy link

@eyc88 eyc88 commented Jul 17, 2016

Reference Issue

Resolves #6894 Timing in *SearchCV.results_
#6895 Training Score in *SearchCV.results_

What does this implement/fix? Explain your changes.

The _fit method in BaseSearchCV calls the _fit_and_score function from module _validation.
fit_and_score has the ability to report train_score, but not by default. On the other hand,
the init of BaseSearchCV does not accept return_train_score as an optional keyword. As a result, there was no way to have the results
attribute receving train score.

The fix is to add the keyword return_train_score to the init of BaseSearchCV (which is inherited by both GridSearchCV and RandomizedSearchCV). Default is still set to False. When return_train_score is set to True, a set of addition steps is taken to put train_score into results_

The timing information is always available but was not written into "results_". This patch also writes timing info into results_ by default.

Any other comments?

Test is also re-written. In the test I explicitly choose return_train_score=True for better coverage.

Eugene Chen added 3 commits July 16, 2016 17:01
    Now *SearchCV.results_ includes both timing and training scores.
and new doctest (sklearn/model_selection/_search.py)
'test_std_score' : [0.02, 0.01, 0.03, 0.03],
'test_rank_score' : [2, 4, 3, 1],
'params' : [{'kernel': 'poly', 'degree': 2}, ...],
}

NOTE that the key ``'params'`` is used to store a list of parameter
settings dict for all the parameter candidates.
settings dict for all the parameter candidates. Besides,
'train_mean_score', 'train_split*_score', ... will be present when
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please follow the style of using double backticks to indicate code.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just noting that it should be backticks surrounding the single quoted key.

'train_mean_score' (which will display as 'train_mean_score')

@raghavrv
Copy link
Member

Thanks for the PR!!

np.average((time - time_means[:, np.newaxis]) ** 2,
axis=1, weights=weights))
if self.return_train_score:
train_means = np.average(train_scores, axis=1, weights=weights)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It doesn't make sense to weight the train scores by the number of test samples. Perhaps we should weight by the number of training samples when iid=True, or report an unweighted average. For times, weighting by training instances is more appropriate than by test, but leaving it unweighted might be best.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree. Also, I think iid is a bad hack and we need to fix the scorers to accept both training and test data to solve this properly :-/

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry if I'm a bit slow here, but please elucidate at some point. What's iid trying to hack? Why does accepting training data in the scorer help?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's trying to hack the R^2, IIRC. @agramfort can say more about this, I think.
The way we compute R^2 is weird because we are using the test set mean, not the training set mean. Though actually after thinking about it more, maybe accepting training data is too big a change just for that. I thought it would also resolve some unsupervised issued, but maybe it doesn't.

@jnothman
Copy link
Member

Perhaps we should avoid code duplication by defining _result_scores and reusing it for train and test, e.g. self.results_.update(_result_scores('test_', test_scores, weights)).

@@ -1104,4 +1155,4 @@ def fit(self, X, y=None, labels=None):
sampled_params = ParameterSampler(self.param_distributions,
self.n_iter,
random_state=self.random_state)
return self._fit(X, y, labels, sampled_params)
return self._fit(X, y, labels, sampled_params)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You have strangely removed the newline character from the end of this file.

@jnothman jnothman changed the title Resolving issue #6894 and #6895 Timing and training score in *SearchCV.results_ Jul 17, 2016
@jnothman
Copy link
Member

I've renamed the PR to something clearer.

@@ -371,7 +371,7 @@ class BaseSearchCV(six.with_metaclass(ABCMeta, BaseEstimator,
def __init__(self, estimator, scoring=None,
fit_params=None, n_jobs=1, iid=True,
refit=True, cv=None, verbose=0, pre_dispatch='2*n_jobs',
error_score='raise'):
error_score='raise', return_train_score=False):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know it was in the issue, but I'm not sure why we want to make that optional / not the default. Computing the training scores is really cheap compared to everything else that goes on. There are already a lot of parameters, not sure if this one is particularly helpful. @jnothman ?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess making it optional is helpful if you have a very large training set on which evaluation takes a long time because of an very expensive evaluation procedure. But I think we should default to True both for convenience and discoverability.

@amueller
Copy link
Member

It would be nice to add the example of #1742 but I think that's not that important. LGTM.

@amueller amueller changed the title Timing and training score in *SearchCV.results_ [MRG + 1] Timing and training score in *SearchCV.results_ Jul 17, 2016
@amueller
Copy link
Member

wait, sorry I thought this was only the timing. For the training scores, we should really add an example and tests.

@amueller amueller changed the title [MRG + 1] Timing and training score in *SearchCV.results_ Timing and training score in *SearchCV.results_ Jul 17, 2016
@amueller
Copy link
Member

@eyc88 I think you said you might not have time to work on this now. Should someone else take over?

@eyc88
Copy link
Author

eyc88 commented Jul 17, 2016

@amueller If you just want to test that the training score is smaller or equal to 1, I can deliver it soon (one thing I would like to respond to @ogrisel is that training score can be exactly 1). I am now travelling so there could be some delay. All that being said, I don't mind if someone wants to take it over.

    1. check test_rank_score always >= 1
    2. check all regular scores (test/train_mean/std_score) and timing >= 0
    3. check all regular scores <= 1
Note that timing can be greater than 1 in general, and std of regular scores
always <= 1 because the scores are bounded between 0 and 1.
@eyc88
Copy link
Author

eyc88 commented Jul 18, 2016

@amueller Check this out. Let me know if this is what you had in mind. We can discuss.

@jnothman
Copy link
Member

A test for training score would usually require a more explicit test that we're reporting the right number. We want, for instance, the tests to be fairly robust to this all being reimplemented.

@amueller amueller added this to the 0.18 milestone Jul 28, 2016
@amueller
Copy link
Member

Yeah so a good test would be to instantiate a cross-validation object and do the calculations "by hand" for some simple example and check that GridSearchCV does them correctly.

@eyc88
Copy link
Author

eyc88 commented Aug 3, 2016

Sounds good. I will take a closer look and write a new version of test once I got some free time.

@jnothman
Copy link
Member

@amueller, if tests are the main issue here, is it worth hurrying this into 0.18 as part of the grid search results overhaul. @eyc88 have you found some time?

@eyc88
Copy link
Author

eyc88 commented Aug 30, 2016

@jnothman When are you guys rolling out 0.18? I can sqeeze some bandwidth if the deadline is approaching...

@jnothman
Copy link
Member

Within the next week or two.

@amueller
Copy link
Member

I think tests were my only issue. and I'd love to see this in 0.18

@raghavrv
Copy link
Member

@eyc88 Do you mind if I take this over?

@eyc88
Copy link
Author

eyc88 commented Aug 31, 2016

@raghavrv No.

@lesteve
Copy link
Member

lesteve commented Sep 26, 2016

Closing in favour of #7325.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Timing in *SearchCV.results_
5 participants