-
-
Notifications
You must be signed in to change notification settings - Fork 26k
[WIP] Gaussian Process revisited (refactoring of correlation models + some new features/bugfixes) #3388
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
…lation (work-in-progress) Advantages * GPs and their correlation models more strictly separated * Allow non-stationary correlation models in the future * Reduce code-redundanca
…, self.corr to class)
…g likelihood over several random starts. The sign of the optimal_rlf_value was set wrongly in _arg_max_reduced_likelihood_function() since it was set to the negative of the return of reduced_likelihood_function(). This error was probably caused by confusing reduced_likelihood_function(theta) with minus_reduced_likelihood_function(log10t). It resulted, however, in _arg_max_reduced_likelihood_function() returning the worst local maxima instead of the best one when performing several random_starts.
…s callables In addition to the 'fmin_cobyla', 'Welch', GaussianProcess allows now to pass other optimizers directly as callables via the parameter optimizer.
…owing its advantages
This allows to call __init__ of more complex correlation models outside of the GP before the training data X used in fit() is known
…ead of ML Instead of choosing the hyperparameters that maximize the likelihood, the maximum-a-posteriori is used. While the prior is assumed to be uniform for the implemented stationary correlation models, more complex correlation models with more hyperparameters may use an appropriate prior to reduce the risk of overfitting when the training set is small.
This PR might be interesting for @dubourg, @agramfort, @larsmans, @JohnFNovak, and @AlexanderFabisch |
raise Exception("Multiple input features cannot have the same" | ||
" value.") | ||
|
||
def __call__(self, theta, X=None): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there any way we could avoid overriding the call method, and call transform() explicitly instead by setting properties on calls to fit()?
Several things here:
|
Actually, I am not 100% sure why these classes need all the fit() machinery/stateful behavior at all. It seems like there isn't much need for the class structure, and could be implemented as pure functions that also use some shared functions from the module level. Maybe I am missing something crucial? |
regarding 1) I also dislike nugget, but it is part of the external GaussianProcess API so I didn't change it. |
On the first, I am glad to see that I am not the only person who is bugged by the name :) . It would be a solid opportunity to deprecate I will try to work up a skeleton example off your branch to show what I mean on the second count. Thanks for the work so far - GP definitely needs some love! |
Regarding the fit() machinery/stateful behavior: The l1_cross_differences stored in StationaryCorrelation.D need to be computed only once. Formerly, they have been stored in the GP itself (where they are not really required). The stateful behavior would also be required for implementing some non-stationary correlation models from the literature (not part of this PR). |
On the l1_cross_differences, it would be pretty easy to do an "if None" check to avoid recomputing. I would actually prefer storing state in the GP itself - if that was the case, we could still use the state stored in the GP to implement non-stationary correlation models, right? It seems more explicit to me to keep the state in the class itself, and let the correlations stay as simple as possible. In any case, I will try and work out an example of "stupid" correlations off your branch to see if it is a reasonable approach. |
Ultimately, I think my comments are better addressed by a larger refactor of the GP class. Ignore me for now, and we can analyze this PR on it's own merits. In general we should work toward making the GP class more transparent and accessible - I currently find it quite complicated for (possibly) no reason, and much of the class does not adhere to modern sklearn API standards. Adding a class hierarchy to the kernel methods which is not shared with other kernel classes, pairwise_metrics, etc. doesn't really help this. If we decide to merge this PR, I would prefer to see these implementation details (the two metaclasses and their subclasses) private rather than public, in order to make a larger refactor easier in the future. Making the correlations/kernels/covariances very dumb and keeping as much state contained to the GP class as possible is also a step in this direction. That said, I think this PR can definitely help people work in the GP class now, which is very important. There is a lot in common between kernel ridge regression and GP/kriging - I do not advocate writing "the class to end all classes" but it is definitely worth seeing if we can merge some common functionality now and in the future. |
This is a big PR, let's not start changing the API ("nugget" vs. whatever) if we don't need to. Too many good PRs go stale because to many changes are introduced at the same time and the review process becomes a nightmare. |
Good point. If it helps get this merged, let's minimize the surface area On Tue, Jul 15, 2014 at 12:52 PM, Lars Buitinck notifications@github.com
|
+1 that the GP class needs to get more transparent and accessible
|
Is there any way we could separate the bugfix stuff and the refactoring/API change proposal into two PRs? I really, really like the bugfixes but we will need to have a lot of thought and input about the API stuff. If we can split this into two smaller PRs I think it is more likely to get looked at. |
I will create a separate PR for the bugfixes and corresponding tests |
Great - there are a lot of fixes here :) |
Oops, seems that we fixed the same issue twice. In order to get PR #2632 finally merged, I added a proposal for a regression test in the comments. |
Now that the bugfixes are merged via PR #3545, what is the general opinion regarding the new features? And using classes for the correlation_models? If this PR is too big, I can focus on a subset of the features/refacotoring and create a smaller PR. |
If have created a separate PR (PR #3885) for adding the Matern kernel to pairwise.py. Once that is done, I will try refactoring correlation_models.py such that it uses as much from pairwise.py as possible and close this PR. What is your opinion, @kastnerkyle? |
That sounds awesome! Thanks for all your work on this, it is really nice to On Fri, Dec 19, 2014 at 9:57 AM, Jan Hendrik Metzen <
|
@jmetzen you closed this becaue you want to break it up? Sorry for the lack of feedback before. |
I plan to split the contents of this PR into several smaller PRs. First step would be to refactor correlation_models.py such that it reuses the kernels from pairwise.py. |
Btw, did you follow the (somewhat) recent discussion on the mailing list on whether it would be worth to have a language for specifying kernels in sklearn? |
I suspsect that we'll need kernel classes anyway for kernel parameter |
So if we need kernel classes, I'm not sure the proposed refactoring is the best idea. |
My advice would be to worry about code factorization after the GP module got rewritten. |
Makes sense to defer refactoring until the GP module is rewritten. I also checked how pairwise's kernels could be reused in the GP context and there are at least 2 drawbacks: pairwise's kernels are restricted to the isotropic case (scalar gamma) and they don't exploit that k(X, X) is symmetric - in contrast to the current GP implementation. Thus, code reuse between pairwise and correlation_models would probably also require some non-trivial refactoring of pairwise. BTW: Is anyone currently working on the GP refactoring? |
Unfortunately not. |
This PR contains a proposal for a refactoring of the gaussian_process subpackage (mainly the correlation_models module) and some novel features which become possible because of this refactoring. Moreover, some subtle bugs are fixed. In general, I tried to keep the external interface of GaussianProcess fixed and to change only internals.The main purpose of this PR is to begin a discussion on the future of the GP subpackage and revive its further development.
CHANGES:
NEW FEATURES:
BUGFIXES
To be discussed: