Gaussian Process-based hyper-parameter optimizer #5185

sds-dubois · 2015-08-29T20:42:49Z

Following discussion on Issue #474, this PR aims at implementing a GP-based hyper-parameter optimizer.
This is based on sklearn's Gaussian Process and randomized-search optimizer implementations.

Given a budget (a number of iterations), at each step :

model y = f(X) through a GP
sample randomly candidates within the hyper-parameter space
compute the value of the acquisition function for each candidate, thanks to the GP model (where the acquisition function could be the expected improvement and the upper confidence bound)
select the candidate that maximizes the acquistion function as the next point to test

Current acquisition functions implemented are the Expected Improvement (EI) and the Upper Confidence Bound (UCB).

Examples are provided in examples/model_selection/gp_search.py.

Results obtained with a simple pipeline on the Iris dataset (comparison random-based in green vs. GP-based in blue; the 20 first iterations are random):

craffel · 2015-08-31T14:12:17Z

This is cool! A few questions.

It appears that when a hyperparameter setting has already been tested, the previous objective value will be used. This is fine I think if the objective is noiseless, but when it's noisy I think it needs to be handled carefully.
How are you converting int at cat parameters to an optimizable parameter space? If you can point me to where in the code you do this, that would be cool.
I believe Spearmint does a simple hack where they try out points near the best point so far. It seems to help a lot in practice. https://github.com/HIPS/Spearmint/blob/c97959ae0449a6594226fe7d18d1384d9463d215/spearmint/choosers/default_chooser.py#L337 You might consider adding that.
Sometimes people want to optimize things where each trial takes hours, or days. Is there a way to do a partial fit?
It'd be interesting to compare this to existing packages, which at their core do the same thing but may contain useful hacks/different parameters which help in practice.

amueller · 2015-08-31T16:24:29Z

Thanks for working on this.
It would be great to have an example where this is actually practical, that is improves over GridSearchCV.

amueller · 2015-08-31T16:26:38Z

doctests are failing.

amueller · 2015-08-31T16:28:28Z

Are you using this in practice and does it help?
I think the way spearmint gets a new point is evaluate the GP on a grid and does local search.
It would be interesting to compare against that.

jmetzen · 2015-08-31T18:18:32Z

examples/model_selection/gp_search.py

+                  "bootstrap": ['cat', [True, False]],
+                  "criterion": ['cat', ["gini", "entropy"]]}
+
+    search = GPSearchCV(parameters, estimator=clf, X=X, y=y, n_iter=20)


X and y should be passed to fit() not to init()

fabianp · 2015-10-12T09:33:30Z

I'd like to look into this during the Paris sprint (October 19-23)

sds-dubois · 2015-10-12T21:22:35Z

That's nice !
Do you plan on starting from my code ? If yes, I can spend some time on it during this week to fix some issues.

As @craffel pointed out, I think it would be nice to handle cases where a single hyper-parameter test takes a long time. My idea for that is to sub-sample the dataset before performing a test. This introduces some noise but the process stays reliable if the sub-sample size is not too small.

amueller · 2015-10-12T21:48:41Z

subsampling is a good idea but I would make it optional and use the same subsample for all test runs. I think otherwise things will get too noisy.

glouppe · 2015-10-15T06:52:37Z

@fabianp I just stumbled upon https://github.com/mpearmain/BayesBoost maybe it can be of help. Also, do you plan to make use of the new GP implementation? #4270

fabianp · 2015-10-19T08:55:09Z

Sorry for the late reply (for some reason I didn't get the github notification).@sds-dubois my plan is to start from your code and take it from there.

My opinion is that data subsampling is an interesting idea but it is not specific to this PR, i.e., basically any grid search method could use this technique, so it should not be part of this PR.

Thanks @glouppe for the links, I'll definitely look into BayesBoost. For simplicity I'll stick with the current GP implementation. Since the GP is accessed only through its fit and predict interface, I suppose this implementation will not pose a compatibility problem to the new GP implementation.

sds-dubois · 2015-10-19T16:10:13Z

Ok @fabianp so let me know if you have any question.
I think you're right regarding the data subsampling, but we may want to think about that. In particular you should fix a mistake in my code, at line 402 when we update the score of a hyperparameter that has already been tested. We should in some way keep track of the number of times it was tested before to weight properly each score (old and new ones). Of course it does not matter if the score is always the same for a given hyperparameter, but if we use noisy evaluations that's not good.
Good luck for the week !

amueller · 2015-10-19T16:14:51Z

@fabianp I talked to Jasper Snoek and the auto-sklearn guys quite a bit ;) What will be your approach?

amueller · 2015-10-19T16:15:15Z

spearmint seeds on a grid and does local gradient descent on the EI

fabianp · 2015-10-19T19:07:25Z

Thanks all for your feedback :-). I've been working on this today with @Djabbz, adapting it to the new gaussian process and adding some tests. I hope we soon make a PR with these additions.

@sds-dubois I hope you don't feel offended if we modify your code rather than give you feedback, I think this way things get done faster (and in any case you will be able to comment on our modifications real soon).

@amueller right now we are focusing on getting the tests and the API right, i.e. not changing the logic of this code. Hopefully we will compare with the spearmint approach as soon as possible, and get to implement those tricks.

sds-dubois · 2015-10-19T19:25:35Z

@fabienp of course not ! I'm glad to see this continuing and you should take the most advantage of the sprint ! Also this is my first attempt to contributing so I'm looking forward to see the results and what improvements you made, and learn that way. By the way I don't know if you can push your work to this PR or if I should give some writing access.

fabianp · 2015-10-20T11:50:25Z

@sds-dubois if you can give me access to your branch then that would be easier

fabianp · 2015-10-20T15:16:28Z

I opened a new pull request with the code that we build upon this: #5491

sds-dubois · 2015-10-20T15:34:05Z

@fabianp you now have access to my branch

fabianp · 2015-10-20T17:05:28Z

Thanks. At the end we thought it would be easier to open a new PR, but I've given you write access to my repo. Feel free to comment on that or take one of the TODO items.

fabianp · 2015-10-20T17:07:11Z

I would also appreciate your opinion on some of the issues I comment, in particular in the format of the hyperparameters. I was quite surprised that {'foo_param': [1, 2, 3]} is accepted and works as in the GridSearchCV(). Is this intended?

sds-dubois · 2015-10-20T20:32:55Z

@fabianp I answered in PR #5491

amueller · 2018-09-27T01:56:35Z

closing, this is in scikit-optimize now.

sds-dubois added 9 commits July 30, 2015 00:24

Initial gp-based hyperopt code

21ef8ba

GPSearchCV handles both sklearn Estimators and custom score functions

37ebc26

FIX typo

e7677b6

Adding example for GPSearchCV

2a043a4

Added EI and changed parameter descriptions

69b56f4

example/update and fixed typo

bb9fb48

FIX for pep8 conformity

7af5f6a

FIX pep8 conformity

c6b193c

Added documentation

7aeb5f7

sds-dubois mentioned this pull request Aug 29, 2015

Implement Gaussian Process based hyper-parameter optimizer #474

Closed

jmetzen reviewed Aug 31, 2015
View reviewed changes

glouppe added New Feature Moderate Anything that requires some knowledge of conventions and best practices labels Oct 19, 2015

fabianp mentioned this pull request Oct 20, 2015

[WIP] Gaussian Process-based hyper-parameter optimization #5491

Closed

8 tasks

amueller closed this Sep 27, 2018

Uh oh!

Gaussian Process-based hyper-parameter optimizer #5185

Gaussian Process-based hyper-parameter optimizer #5185

Uh oh!

Conversation

sds-dubois commented Aug 29, 2015

Uh oh!

craffel commented Aug 31, 2015

Uh oh!

amueller commented Aug 31, 2015

Uh oh!

amueller commented Aug 31, 2015

Uh oh!

amueller commented Aug 31, 2015

Uh oh!

jmetzen Aug 31, 2015

Choose a reason for hiding this comment

Uh oh!

fabianp commented Oct 12, 2015

Uh oh!

sds-dubois commented Oct 12, 2015

Uh oh!

amueller commented Oct 12, 2015

Uh oh!

glouppe commented Oct 15, 2015

Uh oh!

fabianp commented Oct 19, 2015

Uh oh!

sds-dubois commented Oct 19, 2015

Uh oh!

amueller commented Oct 19, 2015

Uh oh!

amueller commented Oct 19, 2015

Uh oh!

fabianp commented Oct 19, 2015

Uh oh!

sds-dubois commented Oct 19, 2015

Uh oh!

fabianp commented Oct 20, 2015

Uh oh!

fabianp commented Oct 20, 2015

Uh oh!

sds-dubois commented Oct 20, 2015

Uh oh!

fabianp commented Oct 20, 2015

Uh oh!

fabianp commented Oct 20, 2015

Uh oh!

sds-dubois commented Oct 20, 2015

Uh oh!

amueller commented Sep 27, 2018

Uh oh!

Uh oh!