[MRG] Refactor CV and grid search #2736

AlexanderFabisch · 2014-01-09T22:44:47Z

While implementing #2701 I have seen some duplicate code in grid_search and cross_validation. Before I will implement the rest of issue #2584 I would like to clean that up a little bit.

Todo:

merge cross_validation.cross_val_score and grid_search.fit_grid_point

jnothman · 2014-01-09T23:09:37Z

Thank you! I've been trying to get to another attempt at this. I can't review right away, but I'll get there soon.

coveralls · 2014-01-09T23:11:02Z

Coverage remained the same when pulling c4d6278 on AlexanderFabisch:refactor_cv into b3a9da9 on scikit-learn:master.

AlexanderFabisch · 2014-01-10T00:20:15Z

Are there other related pull requests like #2079?

jnothman · 2014-01-10T01:05:40Z

Are there other related pull requests like #2079?

Its predecessor, #1787, also did this refactoring. They also hoped to make a consistent extensible output from cross_val_score and grid_search; and in #2079 to refactor their parallelism. Keeping the scope of this PR small is wise :)

I also consider this related to some of the CV API consistency stuff @eshilts is propsing in #2733.

jnothman · 2014-01-10T02:14:56Z

sklearn/cross_validation.py

+
+
+def _fit(fit_function, X_train, y_train, **fit_params):
+    """Fit and estimator on a given training set."""


"and" -> "an"

coveralls · 2014-01-10T08:53:07Z

Coverage remained the same when pulling 5e52031 on AlexanderFabisch:refactor_cv into b3a9da9 on scikit-learn:master.

coveralls · 2014-01-10T10:02:07Z

Coverage remained the same when pulling 4b5f468 on AlexanderFabisch:refactor_cv into b3a9da9 on scikit-learn:master.

AlexanderFabisch · 2014-01-10T23:23:14Z

Now, fit_grid_point calls _cross_val_score.

coveralls · 2014-01-10T23:40:29Z

Coverage remained the same when pulling 30c86ea on AlexanderFabisch:refactor_cv into b3a9da9 on scikit-learn:master.

coveralls · 2014-01-10T23:43:23Z

Coverage remained the same when pulling 30c86ea on AlexanderFabisch:refactor_cv into b3a9da9 on scikit-learn:master.

jnothman · 2014-01-11T10:22:26Z

sklearn/cross_validation.py

+    score = _score(estimator, X_test, y_test, scorer)
+
+    scoring_time = time.time() - start_time
+


You appear to have dropped the verbose output here. Why not report time and score in _cross_val_score?

coveralls · 2014-01-11T15:44:23Z

Coverage remained the same when pulling 389ed8d on AlexanderFabisch:refactor_cv into b3a9da9 on scikit-learn:master.

coveralls · 2014-01-11T23:55:16Z

Coverage remained the same when pulling 1fa3ec3 on AlexanderFabisch:refactor_cv into b3a9da9 on scikit-learn:master.

jnothman · 2014-01-12T00:36:55Z

I think this is looking very good, so I'll just do some nitpicking :P

jnothman · 2014-01-12T00:38:00Z

sklearn/metrics/scorer.py

@@ -198,6 +198,69 @@ def get_scorer(scoring):
    return scorer


+class _passthrough_scorer(object):


I know this is how I named it, but I think others will prefer camel-case: _PassthroughScorer

I thought that would be a convention. :) I will change that.

ogrisel · 2014-01-15T09:57:22Z

Apart from the previous comments on error messages checks in the tests and private helpers usage and naming, this PR LGTM.

GaelVaroquaux · 2014-01-15T10:00:31Z

sklearn/cross_validation.py

+    return np.array(scores)[:, 0]
+
+
+def _cross_val_score(estimator, X, y, scorer, train, test,


Maybe this function needs renaming. I called it originally "_cross_val_score" because it was the inner loop of the cross_val_score function, but now that it is used in a different setting, I find that the name is confusing, as it doesn't really do a cross-validation. Maybe something like "_fit_and_score"?

PS: Sorry for nitpicking about the names, but they are important as they control the readability of a codebase.

As I said before: I like the nitpicking. It is especially helpful to find meaningful names and naming is the most important thing to create readable software. I think you do a great job. :)

I renamed it to fit_and_score.

coveralls · 2014-01-15T21:51:12Z

Coverage remained the same when pulling 33994f0 on AlexanderFabisch:refactor_cv into b3a9da9 on scikit-learn:master.

jnothman · 2014-01-15T22:22:57Z

sklearn/cross_validation.py

+    return np.array(scores)[:, 0]
+
+
+def fit_and_score(estimator, X, y, scorer, train, test, verbose, parameters,


@GaelVaroquaux, do you think fit_and_score should be public? Personally, I don't think either it or fit_grid_point should be public (and while not _-prefixed, fit_grid_point isn't in classes.rst, nor is it any longer the right name for that function). It makes it much harder to add functionality to grid search etc. since every modification to its output is an API change.

Also, I don't understand what the objection is to calling this cross-validation... I think that's what fitting on one chunk of data and scoring on another is.

+1 on using private functions

coveralls · 2014-01-15T22:34:12Z

Coverage remained the same when pulling 8bc79b3 on AlexanderFabisch:refactor_cv into b3a9da9 on scikit-learn:master.

mblondel · 2014-01-16T02:47:38Z

I will work on addressing the final comments and merging.

mblondel · 2014-01-16T03:54:21Z

I merged by rebase.

I haven't made fit_grid_point private yet. It has been there for a long time so I would prefer to wait for other opinions before changing anything. If we decide to make it private, we can keep the public one for two releases.

jnothman · 2014-01-16T04:00:42Z

Hurrah! Thank you @AlexanderFabisch for some long-needed maintenance, and @mblondel for pushing it through. I look forward to your and other extensions to this to get more returned from _fit_and_score.

mblondel · 2014-01-16T04:43:12Z

Indeed, thanks for your recent contributions, @AlexanderFabisch.

mblondel · 2014-01-16T05:13:23Z

I removed the _score utility function in my multiple_grid_search branch:
mblondel@a756083

I think this makes the code better.

AlexanderFabisch · 2014-01-16T07:03:41Z

sklearn/cross_validation.py

+
+    Returns
+    -------
+    test_score : float


I messed this up. The order of test_score and train_score is wrong in the docstring.

AlexanderFabisch · 2014-01-16T07:16:08Z

Oh, I did not expect the PR to get merged so soon after all the difficulties. Unfortunately I found the bug in the documentation too late. @mblondel could you fix this?

Anyway, thanks for the compliments. :)

mblondel · 2014-01-16T08:20:51Z

I just realized that fit_grid_point is no longer used anywhere in the code base. Shall we remove it then?

jnothman · 2014-01-16T08:53:43Z

+1 to deprecate in case it's in private use

On 16 January 2014 19:20, Mathieu Blondel notifications@github.com wrote:

I just realized that fit_grid_point is no longer used anywhere in the
code base. Shall we remove it then?

—
Reply to this email directly or view it on GitHubhttps://github.com//pull/2736#issuecomment-32449619
.

ogrisel · 2014-01-16T11:30:19Z

Please deprecate it as this method had a public name (no leading underscore).

AlexanderFabisch added 2 commits January 9, 2014 09:18

Refactor cv code

b217697

Clean up

c4d6278

jnothman reviewed Jan 10, 2014
View reviewed changes

AlexanderFabisch added 2 commits January 10, 2014 09:27

Refactor RFE and add _check_scorable

1599952

FIX typo in docstring

5e52031

Merge fit_grid_point into _cross_val_score

4b5f468

Return time

38081fd

Move set_params back to fit_grid_point

30c86ea

jnothman reviewed Jan 11, 2014
View reviewed changes

Log score and time in 'cross_val_score'

389ed8d

check_scorable returns scorer

1fa3ec3

jnothman reviewed Jan 12, 2014
View reviewed changes

GaelVaroquaux reviewed Jan 15, 2014
View reviewed changes

AlexanderFabisch added 4 commits January 15, 2014 22:05

Remove 'fit_grid_point' from 'BaseSearchCV'

2330ebe

Check substrings of error messages

f4aa5ca

Rename '_split' to '_split_with_kernel'

33994f0

_passthrough_scorer is a function

4494f15

AlexanderFabisch added 2 commits January 15, 2014 23:14

Remove '_deprecate_loss_and_score_funcs'

062d400

Check error message

8bc79b3

jnothman reviewed Jan 15, 2014
View reviewed changes

mblondel closed this Jan 16, 2014

AlexanderFabisch reviewed Jan 16, 2014
View reviewed changes

rmcgibbo mentioned this pull request May 2, 2014

[DOC] Make _fit_and_score return value docstring consistent with code. #3127

Merged



		def _fit(fit_function, X_train, y_train, **fit_params):
		"""Fit and estimator on a given training set."""

		score = _score(estimator, X_test, y_test, scorer)

		scoring_time = time.time() - start_time

		@@ -198,6 +198,69 @@ def get_scorer(scoring):
		return scorer


		class _passthrough_scorer(object):

		return np.array(scores)[:, 0]


		def _cross_val_score(estimator, X, y, scorer, train, test,

		return np.array(scores)[:, 0]


		def fit_and_score(estimator, X, y, scorer, train, test, verbose, parameters,

Uh oh!

[MRG] Refactor CV and grid search #2736

[MRG] Refactor CV and grid search #2736

Uh oh!

Conversation

AlexanderFabisch commented Jan 9, 2014

Uh oh!

jnothman commented Jan 9, 2014

Uh oh!

coveralls commented Jan 9, 2014

Uh oh!

AlexanderFabisch commented Jan 10, 2014

Uh oh!

jnothman commented Jan 10, 2014

Uh oh!

Choose a reason for hiding this comment

Uh oh!

coveralls commented Jan 10, 2014

Uh oh!

coveralls commented Jan 10, 2014

Uh oh!

AlexanderFabisch commented Jan 10, 2014

Uh oh!

coveralls commented Jan 10, 2014

Uh oh!

coveralls commented Jan 10, 2014

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

coveralls commented Jan 11, 2014

Uh oh!

coveralls commented Jan 11, 2014

Uh oh!

jnothman commented Jan 12, 2014

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ogrisel commented Jan 15, 2014

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

coveralls commented Jan 15, 2014

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

coveralls commented Jan 15, 2014

Uh oh!

mblondel commented Jan 16, 2014

Uh oh!

mblondel commented Jan 16, 2014

Uh oh!

jnothman commented Jan 16, 2014

Uh oh!

mblondel commented Jan 16, 2014

Uh oh!

mblondel commented Jan 16, 2014

Uh oh!

Choose a reason for hiding this comment

Uh oh!

AlexanderFabisch commented Jan 16, 2014

Uh oh!

mblondel commented Jan 16, 2014

Uh oh!

jnothman commented Jan 16, 2014

Uh oh!

ogrisel commented Jan 16, 2014

Uh oh!

Uh oh!