-
-
Notifications
You must be signed in to change notification settings - Fork 26k
Gaussian Process-based hyper-parameter optimizer #5185
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
This is cool! A few questions.
|
Thanks for working on this. |
doctests are failing. |
Are you using this in practice and does it help? |
"bootstrap": ['cat', [True, False]], | ||
"criterion": ['cat', ["gini", "entropy"]]} | ||
|
||
search = GPSearchCV(parameters, estimator=clf, X=X, y=y, n_iter=20) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
X and y should be passed to fit() not to init()
I'd like to look into this during the Paris sprint (October 19-23) |
That's nice ! As @craffel pointed out, I think it would be nice to handle cases where a single hyper-parameter test takes a long time. My idea for that is to sub-sample the dataset before performing a test. This introduces some noise but the process stays reliable if the sub-sample size is not too small. |
subsampling is a good idea but I would make it optional and use the same subsample for all test runs. I think otherwise things will get too noisy. |
@fabianp I just stumbled upon https://github.com/mpearmain/BayesBoost maybe it can be of help. Also, do you plan to make use of the new GP implementation? #4270 |
Sorry for the late reply (for some reason I didn't get the github notification).@sds-dubois my plan is to start from your code and take it from there. My opinion is that data subsampling is an interesting idea but it is not specific to this PR, i.e., basically any grid search method could use this technique, so it should not be part of this PR. Thanks @glouppe for the links, I'll definitely look into BayesBoost. For simplicity I'll stick with the current GP implementation. Since the GP is accessed only through its |
Ok @fabianp so let me know if you have any question. |
@fabianp I talked to Jasper Snoek and the auto-sklearn guys quite a bit ;) What will be your approach? |
spearmint seeds on a grid and does local gradient descent on the EI |
Thanks all for your feedback :-). I've been working on this today with @Djabbz, adapting it to the new gaussian process and adding some tests. I hope we soon make a PR with these additions. @sds-dubois I hope you don't feel offended if we modify your code rather than give you feedback, I think this way things get done faster (and in any case you will be able to comment on our modifications real soon). @amueller right now we are focusing on getting the tests and the API right, i.e. not changing the logic of this code. Hopefully we will compare with the spearmint approach as soon as possible, and get to implement those tricks. |
@fabienp of course not ! I'm glad to see this continuing and you should take the most advantage of the sprint ! Also this is my first attempt to contributing so I'm looking forward to see the results and what improvements you made, and learn that way. By the way I don't know if you can push your work to this PR or if I should give some writing access. |
@sds-dubois if you can give me access to your branch then that would be easier |
I opened a new pull request with the code that we build upon this: #5491 |
@fabianp you now have access to my branch |
Thanks. At the end we thought it would be easier to open a new PR, but I've given you write access to my repo. Feel free to comment on that or take one of the TODO items. |
I would also appreciate your opinion on some of the issues I comment, in particular in the format of the hyperparameters. I was quite surprised that {'foo_param': [1, 2, 3]} is accepted and works as in the GridSearchCV(). Is this intended? |
closing, this is in scikit-optimize now. |
Following discussion on Issue #474, this PR aims at implementing a GP-based hyper-parameter optimizer.
This is based on sklearn's Gaussian Process and randomized-search optimizer implementations.
Given a budget (a number of iterations), at each step :
Current acquisition functions implemented are the Expected Improvement (EI) and the Upper Confidence Bound (UCB).
Examples are provided in
examples/model_selection/gp_search.py
.Results obtained with a simple pipeline on the Iris dataset (comparison random-based in green vs. GP-based in blue; the 20 first iterations are random):
