Skip to content

Gaussian Process-based hyper-parameter optimizer #5185

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 9 commits into from

Conversation

sds-dubois
Copy link
Contributor

Following discussion on Issue #474, this PR aims at implementing a GP-based hyper-parameter optimizer.
This is based on sklearn's Gaussian Process and randomized-search optimizer implementations.

Given a budget (a number of iterations), at each step :

  • model y = f(X) through a GP
  • sample randomly candidates within the hyper-parameter space
  • compute the value of the acquisition function for each candidate, thanks to the GP model (where the acquisition function could be the expected improvement and the upper confidence bound)
  • select the candidate that maximizes the acquistion function as the next point to test

Current acquisition functions implemented are the Expected Improvement (EI) and the Upper Confidence Bound (UCB).

Examples are provided in examples/model_selection/gp_search.py.

Results obtained with a simple pipeline on the Iris dataset (comparison random-based in green vs. GP-based in blue; the 20 first iterations are random):
iris_results

@craffel
Copy link

craffel commented Aug 31, 2015

This is cool! A few questions.

  1. It appears that when a hyperparameter setting has already been tested, the previous objective value will be used. This is fine I think if the objective is noiseless, but when it's noisy I think it needs to be handled carefully.
  2. How are you converting int at cat parameters to an optimizable parameter space? If you can point me to where in the code you do this, that would be cool.
  3. I believe Spearmint does a simple hack where they try out points near the best point so far. It seems to help a lot in practice. https://github.com/HIPS/Spearmint/blob/c97959ae0449a6594226fe7d18d1384d9463d215/spearmint/choosers/default_chooser.py#L337 You might consider adding that.
  4. Sometimes people want to optimize things where each trial takes hours, or days. Is there a way to do a partial fit?
  5. It'd be interesting to compare this to existing packages, which at their core do the same thing but may contain useful hacks/different parameters which help in practice.

@amueller
Copy link
Member

Thanks for working on this.
It would be great to have an example where this is actually practical, that is improves over GridSearchCV.

@amueller
Copy link
Member

doctests are failing.

@amueller
Copy link
Member

Are you using this in practice and does it help?
I think the way spearmint gets a new point is evaluate the GP on a grid and does local search.
It would be interesting to compare against that.

"bootstrap": ['cat', [True, False]],
"criterion": ['cat', ["gini", "entropy"]]}

search = GPSearchCV(parameters, estimator=clf, X=X, y=y, n_iter=20)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

X and y should be passed to fit() not to init()

@fabianp
Copy link
Member

fabianp commented Oct 12, 2015

I'd like to look into this during the Paris sprint (October 19-23)

@sds-dubois
Copy link
Contributor Author

That's nice !
Do you plan on starting from my code ? If yes, I can spend some time on it during this week to fix some issues.

As @craffel pointed out, I think it would be nice to handle cases where a single hyper-parameter test takes a long time. My idea for that is to sub-sample the dataset before performing a test. This introduces some noise but the process stays reliable if the sub-sample size is not too small.

@amueller
Copy link
Member

subsampling is a good idea but I would make it optional and use the same subsample for all test runs. I think otherwise things will get too noisy.

@glouppe
Copy link
Contributor

glouppe commented Oct 15, 2015

@fabianp I just stumbled upon https://github.com/mpearmain/BayesBoost maybe it can be of help. Also, do you plan to make use of the new GP implementation? #4270

@fabianp
Copy link
Member

fabianp commented Oct 19, 2015

Sorry for the late reply (for some reason I didn't get the github notification).@sds-dubois my plan is to start from your code and take it from there.

My opinion is that data subsampling is an interesting idea but it is not specific to this PR, i.e., basically any grid search method could use this technique, so it should not be part of this PR.

Thanks @glouppe for the links, I'll definitely look into BayesBoost. For simplicity I'll stick with the current GP implementation. Since the GP is accessed only through its fit and predict interface, I suppose this implementation will not pose a compatibility problem to the new GP implementation.

@glouppe glouppe added New Feature Moderate Anything that requires some knowledge of conventions and best practices labels Oct 19, 2015
@sds-dubois
Copy link
Contributor Author

Ok @fabianp so let me know if you have any question.
I think you're right regarding the data subsampling, but we may want to think about that. In particular you should fix a mistake in my code, at line 402 when we update the score of a hyperparameter that has already been tested. We should in some way keep track of the number of times it was tested before to weight properly each score (old and new ones). Of course it does not matter if the score is always the same for a given hyperparameter, but if we use noisy evaluations that's not good.
Good luck for the week !

@amueller
Copy link
Member

@fabianp I talked to Jasper Snoek and the auto-sklearn guys quite a bit ;) What will be your approach?

@amueller
Copy link
Member

spearmint seeds on a grid and does local gradient descent on the EI

@fabianp
Copy link
Member

fabianp commented Oct 19, 2015

Thanks all for your feedback :-). I've been working on this today with @Djabbz, adapting it to the new gaussian process and adding some tests. I hope we soon make a PR with these additions.

@sds-dubois I hope you don't feel offended if we modify your code rather than give you feedback, I think this way things get done faster (and in any case you will be able to comment on our modifications real soon).

@amueller right now we are focusing on getting the tests and the API right, i.e. not changing the logic of this code. Hopefully we will compare with the spearmint approach as soon as possible, and get to implement those tricks.

@sds-dubois
Copy link
Contributor Author

@fabienp of course not ! I'm glad to see this continuing and you should take the most advantage of the sprint ! Also this is my first attempt to contributing so I'm looking forward to see the results and what improvements you made, and learn that way. By the way I don't know if you can push your work to this PR or if I should give some writing access.

@fabianp
Copy link
Member

fabianp commented Oct 20, 2015

@sds-dubois if you can give me access to your branch then that would be easier

@fabianp
Copy link
Member

fabianp commented Oct 20, 2015

I opened a new pull request with the code that we build upon this: #5491

@sds-dubois
Copy link
Contributor Author

@fabianp you now have access to my branch

@fabianp
Copy link
Member

fabianp commented Oct 20, 2015

Thanks. At the end we thought it would be easier to open a new PR, but I've given you write access to my repo. Feel free to comment on that or take one of the TODO items.

@fabianp
Copy link
Member

fabianp commented Oct 20, 2015

I would also appreciate your opinion on some of the issues I comment, in particular in the format of the hyperparameters. I was quite surprised that {'foo_param': [1, 2, 3]} is accepted and works as in the GridSearchCV(). Is this intended?

@sds-dubois
Copy link
Contributor Author

@fabianp I answered in PR #5491

@amueller
Copy link
Member

closing, this is in scikit-optimize now.

@amueller amueller closed this Sep 27, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Moderate Anything that requires some knowledge of conventions and best practices New Feature
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants