-
-
Notifications
You must be signed in to change notification settings - Fork 25.8k
Rename grid_search module #1848
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Generally I agree. Maybe also I am always in favor of renaming sooner rather than later. Only when necessary, though ;) Any thoughts by the others? |
+1 for |
Also like model_selection. 👍 |
I like param_search: it's understandable by everybody. However, we want a really long deprecation procedure with this. I was |
Only problem with |
The nice thing about |
So do we want to see |
+1 for having it in the next release. |
So let's say we rename the module to |
|
Moving to 0.15 because we need to pick a name, and 0.14 is due in less than a day. |
Agreed; I think model_selection was favoured, but I wonder is other stuff On Sun, Jul 28, 2013 at 6:57 PM, Lars Buitinck notifications@github.comwrote:
|
Yes, that was exactly our thoughts: we want to change a few more things, |
I would retag this. We want to fix the return of multiple scores, but I don't see that happening for this release. |
Agreed |
Retaged |
How about:
|
Sounds like a good module structure, but what would the imports look like? I.e. what's the public interface, the whole package or separate modules? |
I think most objects will be available at the top level, i.e. from sklearn.model_selection import GridSearchCV, Bootstrap, make_scorer for instance. |
From a usage perspective, it would be nice to have pipeline in here as well, but it might be harder to justify from a naming perspective. |
I think that this would be an interesting refactor. I think that the new The bad thing of it, is that it is a break of compatibility for cosmetic I would like to suggest that if we are going to break compatibility, we The CV objects need the data to be created. For instance, they need A change of API could fix it in the following way:
The refactor that you suggest would change the import path and thus What do people think? |
I don't think there's that great a problem moving things around at this stage in development; we'd leave placeholder modules where they were with deprecation warnings for a few releases. But I have separately disliked the inconsistency of interface for cross validation generators. I would prefer it if data-dependent parameters weren't part of the constructor; and the generator parameters are too variable and un-memorable. For another example,
One option is to just make One shortcoming is the handling of CV generators with grouping or sampling information such as labels or sample weights. Are these constructor or call parameters? |
scikit-learn has been in intensive development for 4 years. It is used
I tend to prefer methods to callable objects: it makes for more explicit
Certainly not call parameters, as in a subsampling they would vary. I |
It's a deprecation, not a backward-incompatible change, for as long as it
Which means On 11 February 2014 18:19, Gael Varoquaux notifications@github.com wrote:
|
How serious is this problem? It sounds like you have a very advanced use case in mind. While I sometimes do nested model selection, I typically do so in contexts where the pipeline/grid search workflow breaks down anyway (e.g. custom semi-supervised learning) so I personally wouldn't care much for in-library support. Also, would this break custom CV objects such as my repeated CV for sequence data? |
Nested model selection is not an advanced use case. This is the only proper way for both selecting and evaluating your model without bias, which basically every ML paper should have done when reporting results of an algorithm for which they have tuned hyper-parameters. |
It's the nesting that I consider advanced, not the idea of model selection by cross-validation. I guess I'm missing something here... |
The CV scores that you optimize (e.g., with a grid-search procedure) are not an unbiased estimate of the generalization error of your model. (As it is often the case in papers,) If you want to both i) find the best parameters and ii) estimate without bias the generalization error, then you should run a train/valid/test protocol or do nested model selection. |
Ok, that's what you mean. I tend to use held-out test sets for final evaluation. I was thinking of nesting e.g. Coming back to the discussion then, do any of you have an example of something that is not possible in the current API? |
You can't currently use: On 11 February 2014 23:44, Lars Buitinck notifications@github.com wrote:
|
Alright, +1 for a careful API change. It would be nicest if the old API just keeps working. |
I agree that we should strive to have it working for quite a while. I just want to sort out these issues before 1.0. |
Actually I'd like it most if the old API just kept working forever... |
It's broken. Really. It was a bad design. It makes it impossible to do We will need to stop supporting it at some point. |
The
grid_search
module now supports a list of grids, and a random-sampled parameter space, and may in the future support other search algorithms. The shared purpose is: tuning (or exploring) hyper-parameters under cross-validation. So perhaps thegrid_search
name should be deprecated and replaced with something like:cv_search
(orsearch_cv
)hyperparams
model_selection
(thanks @amueller)The text was updated successfully, but these errors were encountered: