[MRG+2] Timing and training score in GridSearchCV #7325

raghavrv · 2016-09-01T07:51:53Z

Finishing up @eyc88's #7026

Resolves #6894 and #6895

TODO

Rebase after merge of [MRG+2] ENH/MNT results_ --> cv_results; test_mean_score --> mean_test_score et al. #7324 and make all tests pass
Better tests for training score
What's new
Split training and scoring time as mean|std_fit_time vs mean|std_score_time

~~Have an example at examples/?~~ Address in separate PR

@jnothman @amueller @vene @ogrisel (To be reviewed after merge of #7324)

NOTE: Make sure @eyc88 is credited when squashing and merging.

raghavrv · 2016-09-01T11:58:11Z

@amueller @jnothman Do we have to get the weighted average for mean_train_score across split if i.i.d.?

jnothman · 2016-09-01T12:20:22Z

It doesn't really make sense to weight it by test set size. Perhaps by training set size. I'd leave it unweighted.

GaelVaroquaux · 2016-09-01T12:20:53Z

It doesn't really make sense to weight it by test set size. Perhaps by
training set size. I'd leave it unweighted.

+1.

raghavrv · 2016-09-01T12:24:43Z

Yes I was asking for weighing it by the training set size only... Okay! Let's leave it unweighted...

amueller · 2016-09-06T19:59:39Z

are you working on this?

raghavrv · 2016-09-07T21:09:57Z

@amueller yes. I was waiting for #7324 to get merged. I'll push in a while...

raghavrv · 2016-09-11T09:47:19Z

I've made the keys as "mean_time" and "std_time" as the word test/train is irrelevant here...

raghavrv · 2016-09-11T12:31:51Z

@amueller @jnothman added some basic tests for training scores... Reviews please...

raghavrv · 2016-09-11T12:43:30Z

About the example, do we want one that compares the train and test scores and also plots the training time for each candidate? (Also ping @vene for suggestions)

jnothman · 2016-09-11T13:26:04Z

sklearn/model_selection/_search.py

@@ -746,6 +784,10 @@ class GridSearchCV(BaseSearchCV):
        FitFailedWarning is raised. This parameter does not affect the refit
        step, which will always raise the error.

+    return_train_score: boolean, default=True
+        If ``'False'``, the results_ attribute will not include training


space before colon

jnothman · 2016-09-11T13:41:38Z

I'm not altogether happy that in _fit_and_score, scoring of training data is included within the timing. Should we consider pulling it out?

jnothman · 2016-09-11T13:50:02Z

Regarding example, @amueller plotted train score and time in a variant of plot_rbf_parameters at #1742.

jnothman · 2016-09-11T13:50:20Z

I.e. that was @amueller's attempt at this PR's feature, never merged.

jnothman · 2016-09-11T13:52:18Z

Otherwise, re example, I think an example of train scores explicitly introducing over/under-fitting would be interesting, but adding it to the wishlist for a later PR is fine.

raghavrv · 2016-09-11T14:13:54Z

I'm not altogether happy that in _fit_and_score, scoring of training data is included within the timing. Should we consider pulling it out?

I think if we want to... then mean_test_time would make sense... WDYT?

EDIT: Or maybe not. I feel test still does not have a place here... We are timing fit + score (on testing set).

raghavrv · 2016-09-23T12:22:41Z

Apologies for the delay. I got busy configuring my laptop ^_^.

And

oh, you're saying it registered 0. Of course >= 0 is what you want.

No I speculate that to be the reason...

I tried running on my windows machine and found it not to be the case... time.time is as accurate as it was on linux. I don't know how older versions of windows handle it though...

I still don't have success in installing scikit-learn on windows ;(

And if we follow the >= 0 that would mean we are allowing fit_time to be 0 in linux too. Such a test would mask a bug of fit_time always being 0...

In the next couple of hours I'll somehow figure out a way to install scikit-learn on my windows and see if this test fails...

raghavrv · 2016-09-23T23:39:01Z

I did >= 0 and the appveyor passes. I am unable to install on windows (I set too small a partition size and it bit me by being unable to install the whopping 8GB for visual studio libraries. I don't understand why mingw requires vs13)

Anyway, @jnothman @amueller WDYT we should do. The current test setup fine?

jnothman · 2016-09-24T11:30:04Z

You can use a less trivial fit, then? But I'm okay with this being written
off as "hard to test".

On 24 September 2016 at 09:39, Raghav RV notifications@github.com wrote:

I did >= 0 and the appveyor passes. I am unable to install on windows (I
set too small a partition size and it bit me by being unable to install the
whopping 8GB for visual studio libraries. I don't understand why mingw
requires vs13)

Anyway, @jnothman https://github.com/jnothman @amueller
https://github.com/amueller WDYT we should do. The current test setup
fine?

—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
#7325 (comment),
or mute the thread
https://github.com/notifications/unsubscribe-auth/AAEz61B1nXjUChsX_xHAckbXYD-EcUzIks5qtGMWgaJpZM4JyaRM
.

raghavrv · 2016-09-24T21:13:34Z

Finally I succeeded in installing it in windows. (Had to resort to a 32 bit win in virtual box). Now I can confirm that it is the time precision that makes the test fail.

This section of timeit explains it clearly. I would say let's leave it as such like you suggested before. The alternative would be to make a utils.fixes function safe_time that always gives us the highest precision clock. (Which could mean checking platform and choosing between time.clock, time.time, or the newer time.perf_counter)

@amueller @jnothman Final reviews and merge?

jnothman

Otherwise LGTM

jnothman · 2016-09-24T23:20:08Z

sklearn/model_selection/_search.py

+            'split0_train_score' : [0.8, 0.9, 0.7],
+            'split1_train_score' : [0.82, 0.5, 0.7],
+            'mean_train_score'   : [0.81, 0.7, 0.7],
+            'std_train_score'    : [0.00073, 0.00063, 0.00043],


Can we make these numbers more readable? Pretend it's slower.

amueller · 2016-09-22T05:29:02Z

sklearn/model_selection/_validation.py

@@ -249,19 +258,22 @@ def _fit_and_score(estimator, X, y, scorer, train, test, verbose,

    else:
        test_score = _score(estimator, X_test, y_test, scorer)
+        score_time = time.time() - start_time - fit_time


Ah ok that makes more sense. Before this was just time.time() - start_time or something like that and I was really confused. and I was really confused.

amueller · 2016-09-25T10:56:20Z

lgtm.

raghavrv · 2016-09-25T11:08:42Z

Thanks everyone for the reviews. @jnothman Does it look fine now? Merge?

* ENH separate fit / score times * Make score_time=0 if errored; Ignore warnings in test * Cleanup docstrings * ENH Use helper to store the results * Move fit time computation to else of try...except...else * DOC readable sample scores * COSMIT Add a commnent on why time test is >= 0 instead of > 0 (Windows time.time precision is not accurate enought to be non-zero for trivial fits)

raghavrv · 2016-09-27T08:57:20Z

@jnothman bump :) Anything more to be done here?

jnothman · 2016-09-27T09:04:09Z

Thanks for asking. The last thing to do: mention somewhere that timings are in seconds in cv_results_.

raghavrv · 2016-09-27T09:10:03Z

Done :)

jnothman · 2016-09-27T10:14:47Z

And done. Backport please, @amueller...

* Resolved issue scikit-learn#6894 and scikit-learn#6895: Now *SearchCV.results_ includes both timing and training scores. wrote new test (sklearn/model_selection/test_search.py) and new doctest (sklearn/model_selection/_search.py) added a few more lines in the docstring of GridSearchCV and RandomizedSearchCV. Revised code according to suggestions. Add a few more lines to test_grid_search_results(): 1. check test_rank_score always >= 1 2. check all regular scores (test/train_mean/std_score) and timing >= 0 3. check all regular scores <= 1 Note that timing can be greater than 1 in general, and std of regular scores always <= 1 because the scores are bounded between 0 and 1. * ENH/FIX timing and training score. * ENH separate fit / score times * Make score_time=0 if errored; Ignore warnings in test * Cleanup docstrings * ENH Use helper to store the results * Move fit time computation to else of try...except...else * DOC readable sample scores * COSMIT Add a commnent on why time test is >= 0 instead of > 0 (Windows time.time precision is not accurate enought to be non-zero for trivial fits) * Convey that times are in seconds # Conflicts: # doc/whats_new.rst

* Resolved issue scikit-learn#6894 and scikit-learn#6895: Now *SearchCV.results_ includes both timing and training scores. wrote new test (sklearn/model_selection/test_search.py) and new doctest (sklearn/model_selection/_search.py) added a few more lines in the docstring of GridSearchCV and RandomizedSearchCV. Revised code according to suggestions. Add a few more lines to test_grid_search_results(): 1. check test_rank_score always >= 1 2. check all regular scores (test/train_mean/std_score) and timing >= 0 3. check all regular scores <= 1 Note that timing can be greater than 1 in general, and std of regular scores always <= 1 because the scores are bounded between 0 and 1. * ENH/FIX timing and training score. * ENH separate fit / score times * Make score_time=0 if errored; Ignore warnings in test * Cleanup docstrings * ENH Use helper to store the results * Move fit time computation to else of try...except...else * DOC readable sample scores * COSMIT Add a commnent on why time test is >= 0 instead of > 0 (Windows time.time precision is not accurate enought to be non-zero for trivial fits) * Convey that times are in seconds

* tag '0.18': (1286 commits) [MRG + 1] More versionadded everywhere! (scikit-learn#7403) minor doc fixes fix lbfgs rename (scikit-learn#7503) minor fixes to whatsnew fix scoring function table fix rebase messup DOC more what's new subdivision DOC Attempt to impose some order on What's New 0.18 no fixed width within bold REL changes for release in 0.18.X branch (scikit-learn#7414) [MRG+2] Timing and training score in GridSearchCV (scikit-learn#7325) DOC: Added Nested Cross Validation Example (scikit-learn#7111) Sync docstring and definition default argument in kneighbors (scikit-learn#7476) added contributors for 0.18, minor formatting fixes. Fix typo in whats_new.rst [MRG+2] FIX adaboost estimators not randomising correctly (scikit-learn#7411) Addressing issue scikit-learn#7468. (scikit-learn#7472) Reorganize README clean up deprecation warning stuff in common tests [MRG+1] Fix regression in silhouette_score for clusters of size 1 (scikit-learn#7438) ...

* releases: (1286 commits) [MRG + 1] More versionadded everywhere! (scikit-learn#7403) minor doc fixes fix lbfgs rename (scikit-learn#7503) minor fixes to whatsnew fix scoring function table fix rebase messup DOC more what's new subdivision DOC Attempt to impose some order on What's New 0.18 no fixed width within bold REL changes for release in 0.18.X branch (scikit-learn#7414) [MRG+2] Timing and training score in GridSearchCV (scikit-learn#7325) DOC: Added Nested Cross Validation Example (scikit-learn#7111) Sync docstring and definition default argument in kneighbors (scikit-learn#7476) added contributors for 0.18, minor formatting fixes. Fix typo in whats_new.rst [MRG+2] FIX adaboost estimators not randomising correctly (scikit-learn#7411) Addressing issue scikit-learn#7468. (scikit-learn#7472) Reorganize README clean up deprecation warning stuff in common tests [MRG+1] Fix regression in silhouette_score for clusters of size 1 (scikit-learn#7438) ...

* dfsg: (1286 commits) [MRG + 1] More versionadded everywhere! (scikit-learn#7403) minor doc fixes fix lbfgs rename (scikit-learn#7503) minor fixes to whatsnew fix scoring function table fix rebase messup DOC more what's new subdivision DOC Attempt to impose some order on What's New 0.18 no fixed width within bold REL changes for release in 0.18.X branch (scikit-learn#7414) [MRG+2] Timing and training score in GridSearchCV (scikit-learn#7325) DOC: Added Nested Cross Validation Example (scikit-learn#7111) Sync docstring and definition default argument in kneighbors (scikit-learn#7476) added contributors for 0.18, minor formatting fixes. Fix typo in whats_new.rst [MRG+2] FIX adaboost estimators not randomising correctly (scikit-learn#7411) Addressing issue scikit-learn#7468. (scikit-learn#7472) Reorganize README clean up deprecation warning stuff in common tests [MRG+1] Fix regression in silhouette_score for clusters of size 1 (scikit-learn#7438) ...

* Resolved issue scikit-learn#6894 and scikit-learn#6895: Now *SearchCV.results_ includes both timing and training scores. wrote new test (sklearn/model_selection/test_search.py) and new doctest (sklearn/model_selection/_search.py) added a few more lines in the docstring of GridSearchCV and RandomizedSearchCV. Revised code according to suggestions. Add a few more lines to test_grid_search_results(): 1. check test_rank_score always >= 1 2. check all regular scores (test/train_mean/std_score) and timing >= 0 3. check all regular scores <= 1 Note that timing can be greater than 1 in general, and std of regular scores always <= 1 because the scores are bounded between 0 and 1. * ENH/FIX timing and training score. * ENH separate fit / score times * Make score_time=0 if errored; Ignore warnings in test * Cleanup docstrings * ENH Use helper to store the results * Move fit time computation to else of try...except...else * DOC readable sample scores * COSMIT Add a commnent on why time test is >= 0 instead of > 0 (Windows time.time precision is not accurate enought to be non-zero for trivial fits) * Convey that times are in seconds

amueller added this to the 0.18 milestone Sep 6, 2016

raghavrv force-pushed the timing_training_score branch from a7d8c32 to 28290a9 Compare September 8, 2016 13:08

raghavrv force-pushed the timing_training_score branch from 28290a9 to 5d4813c Compare September 11, 2016 11:41

jnothman reviewed Sep 11, 2016
View reviewed changes

raghavrv changed the title ~~[WIP] Timing and training score in GridSearchCV~~ [MRG] Timing and training score in GridSearchCV Sep 11, 2016

raghavrv force-pushed the timing_training_score branch from e4246dc to 07e1e34 Compare September 23, 2016 23:29

jnothman approved these changes Sep 24, 2016

View reviewed changes

jnothman changed the title ~~[MRG] Timing and training score in GridSearchCV~~ [MRG+1] Timing and training score in GridSearchCV Sep 24, 2016

amueller approved these changes Sep 25, 2016

View reviewed changes

amueller changed the title ~~[MRG+1] Timing and training score in GridSearchCV~~ [MRG+2] Timing and training score in GridSearchCV Sep 25, 2016

amueller mentioned this pull request Sep 25, 2016

[MRG+3] Fix min_weight_fraction_leaf to work when sample_weights are not provided #7301

Merged

raghavrv force-pushed the timing_training_score branch from 9a7a93a to c3478e3 Compare September 25, 2016 12:10

lesteve mentioned this pull request Sep 26, 2016

Timing and training score in *SearchCV.results_ #7026

Closed

raghavrv mentioned this pull request Sep 26, 2016

[MRG + 2] ENH Allow cross_val_score, GridSearchCV et al. to evaluate on multiple metrics #7388

Merged

16 tasks

Convey that times are in seconds

883f70d

jnothman merged commit b444cc9 into scikit-learn:master Sep 27, 2016

raghavrv deleted the timing_training_score branch September 27, 2016 10:50

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[MRG+2] Timing and training score in GridSearchCV #7325

[MRG+2] Timing and training score in GridSearchCV #7325

raghavrv commented Sep 1, 2016 •

edited

Loading

raghavrv commented Sep 1, 2016 •

edited

Loading

jnothman commented Sep 1, 2016

GaelVaroquaux commented Sep 1, 2016

raghavrv commented Sep 1, 2016 •

edited

Loading

amueller commented Sep 6, 2016

raghavrv commented Sep 7, 2016

raghavrv commented Sep 11, 2016

raghavrv commented Sep 11, 2016

raghavrv commented Sep 11, 2016

jnothman Sep 11, 2016

jnothman Sep 11, 2016

jnothman commented Sep 11, 2016

jnothman commented Sep 11, 2016

jnothman commented Sep 11, 2016

jnothman commented Sep 11, 2016

raghavrv commented Sep 11, 2016 •

edited

Loading

raghavrv commented Sep 23, 2016 •

edited

Loading

raghavrv commented Sep 23, 2016

jnothman commented Sep 24, 2016

raghavrv commented Sep 24, 2016

jnothman left a comment

jnothman Sep 24, 2016

amueller Sep 22, 2016

amueller commented Sep 25, 2016

raghavrv commented Sep 25, 2016

raghavrv commented Sep 27, 2016

jnothman commented Sep 27, 2016

raghavrv commented Sep 27, 2016

jnothman commented Sep 27, 2016

[MRG+2] Timing and training score in GridSearchCV #7325

[MRG+2] Timing and training score in GridSearchCV #7325

Conversation

raghavrv commented Sep 1, 2016 • edited Loading

raghavrv commented Sep 1, 2016 • edited Loading

jnothman commented Sep 1, 2016

GaelVaroquaux commented Sep 1, 2016

raghavrv commented Sep 1, 2016 • edited Loading

amueller commented Sep 6, 2016

raghavrv commented Sep 7, 2016

raghavrv commented Sep 11, 2016

raghavrv commented Sep 11, 2016

raghavrv commented Sep 11, 2016

jnothman Sep 11, 2016

Choose a reason for hiding this comment

jnothman Sep 11, 2016

Choose a reason for hiding this comment

jnothman commented Sep 11, 2016

jnothman commented Sep 11, 2016

jnothman commented Sep 11, 2016

jnothman commented Sep 11, 2016

raghavrv commented Sep 11, 2016 • edited Loading

raghavrv commented Sep 23, 2016 • edited Loading

raghavrv commented Sep 23, 2016

jnothman commented Sep 24, 2016

raghavrv commented Sep 24, 2016

jnothman left a comment

Choose a reason for hiding this comment

jnothman Sep 24, 2016

Choose a reason for hiding this comment

amueller Sep 22, 2016

Choose a reason for hiding this comment

amueller commented Sep 25, 2016

raghavrv commented Sep 25, 2016

raghavrv commented Sep 27, 2016

jnothman commented Sep 27, 2016

raghavrv commented Sep 27, 2016

jnothman commented Sep 27, 2016

raghavrv commented Sep 1, 2016 •

edited

Loading

raghavrv commented Sep 1, 2016 •

edited

Loading

raghavrv commented Sep 1, 2016 •

edited

Loading

raghavrv commented Sep 11, 2016 •

edited

Loading

raghavrv commented Sep 23, 2016 •

edited

Loading