-
-
Notifications
You must be signed in to change notification settings - Fork 25.8k
[MRG] Refactor CV and grid search #2736
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[MRG] Refactor CV and grid search #2736
Conversation
Thank you! I've been trying to get to another attempt at this. I can't review right away, but I'll get there soon. |
Are there other related pull requests like #2079? |
Its predecessor, #1787, also did this refactoring. They also hoped to make a consistent extensible output from I also consider this related to some of the CV API consistency stuff @eshilts is propsing in #2733. |
|
||
|
||
def _fit(fit_function, X_train, y_train, **fit_params): | ||
"""Fit and estimator on a given training set.""" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"and" -> "an"
Now, |
score = _score(estimator, X_test, y_test, scorer) | ||
|
||
scoring_time = time.time() - start_time | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You appear to have dropped the verbose output here. Why not report time and score in _cross_val_score
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed
I think this is looking very good, so I'll just do some nitpicking :P |
@@ -198,6 +198,69 @@ def get_scorer(scoring): | |||
return scorer | |||
|
|||
|
|||
class _passthrough_scorer(object): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I know this is how I named it, but I think others will prefer camel-case: _PassthroughScorer
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought that would be a convention. :) I will change that.
Apart from the previous comments on error messages checks in the tests and private helpers usage and naming, this PR LGTM. |
return np.array(scores)[:, 0] | ||
|
||
|
||
def _cross_val_score(estimator, X, y, scorer, train, test, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe this function needs renaming. I called it originally "_cross_val_score" because it was the inner loop of the cross_val_score function, but now that it is used in a different setting, I find that the name is confusing, as it doesn't really do a cross-validation. Maybe something like "_fit_and_score"?
PS: Sorry for nitpicking about the names, but they are important as they control the readability of a codebase.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As I said before: I like the nitpicking. It is especially helpful to find meaningful names and naming is the most important thing to create readable software. I think you do a great job. :)
I renamed it to fit_and_score
.
return np.array(scores)[:, 0] | ||
|
||
|
||
def fit_and_score(estimator, X, y, scorer, train, test, verbose, parameters, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@GaelVaroquaux, do you think fit_and_score
should be public? Personally, I don't think either it or fit_grid_point
should be public (and while not _
-prefixed, fit_grid_point
isn't in classes.rst, nor is it any longer the right name for that function). It makes it much harder to add functionality to grid search etc. since every modification to its output is an API change.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, I don't understand what the objection is to calling this cross-validation... I think that's what fitting on one chunk of data and scoring on another is.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 on using private functions
I will work on addressing the final comments and merging. |
I merged by rebase. I haven't made fit_grid_point private yet. It has been there for a long time so I would prefer to wait for other opinions before changing anything. If we decide to make it private, we can keep the public one for two releases. |
Hurrah! Thank you @AlexanderFabisch for some long-needed maintenance, and @mblondel for pushing it through. I look forward to your and other extensions to this to get more returned from |
Indeed, thanks for your recent contributions, @AlexanderFabisch. |
I removed the I think this makes the code better. |
|
||
Returns | ||
------- | ||
test_score : float |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I messed this up. The order of test_score
and train_score
is wrong in the docstring.
Oh, I did not expect the PR to get merged so soon after all the difficulties. Unfortunately I found the bug in the documentation too late. @mblondel could you fix this? Anyway, thanks for the compliments. :) |
I just realized that |
+1 to deprecate in case it's in private use On 16 January 2014 19:20, Mathieu Blondel notifications@github.com wrote:
|
Please deprecate it as this method had a public name (no leading underscore). |
While implementing #2701 I have seen some duplicate code in
grid_search
andcross_validation
. Before I will implement the rest of issue #2584 I would like to clean that up a little bit.Todo:
cross_validation.cross_val_score
andgrid_search.fit_grid_point