-
-
Notifications
You must be signed in to change notification settings - Fork 25.8k
Kriging model class #14
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
I'd like to commit a new feature to the scikits-learn project. I implemented a kriging model class which is able to perform both regression and probabilistic classification. I will send an e-mail with a script that performs a demo. Since I have no clue where to put my contribution I simply created a single file named kriging.py with the class and other functions such as correlation and regression models, and I added an import instruction to the main __init__.py module of scikit.learn. I hope you'll enjoy this contribution. Vincent
…orrelation parameters.
Hi all, I am glad to read your reactions and answer some of them in the following:
Thanks for welcoming me and my contribution. For the sake of being honnest, this implementation is essentially inspired from the DACE toolbox (implemented in Matlab and Scilab, http://www2.imm.dtu.dk/~hbn/dace/) that I am using for my PhD. I just changed the score function into its opposite so that it matches my understanding of the scikits-learn coding guidelines : the higher the score, the better. By the way, should I put the DACE website as a reference in the RST page? Finally, I'd like to add another precision about this implementation of kriging regarding the classification problem. Classification is only a postprocessing there, meaning you first need to solve a regression (interpolation) problem by providing the complete scalar ouptut of the experiment you attempt to surrogate. You can't specify binary labels as input -- or at least you can bu it won't be as efficient. Indeed, you use the Gaussian property of the prediction to perform probabilistic classification. I hope it will become clearer with the examples I implemented... |
As for coding guidelines, also please follow http://www.python.org/dev/peps/pep-0257 (this is not just for Vincent but for everybody :) |
Also when you say "inspired by DACE", I hope this is not a language matlab to numpy direct transliteration. Since SAVE license is not opensource hence would not allow us to accept your contribution (or at least we should ask the permission of the authors). |
@ogrisel: Well I am afraid that this is nothing more than a "direct transliteration" for the moment, out of a few modifications in an attempt to match scikits-learn's philosophy (which is not that far from DACE's). I don't want you guys (or even me) to have any trouble with copyrights so please let me know if that's a problem. I apologize if this has been nothing but a loss of time for you. |
Ideally it would have been best to read the algorithm from the paper and come up with your own scipy / numpy implementation yourself without literally translating the matlab code. Since this is done I think to integrate your code in scikit-learn (which is BSD licensed) it would be better to explicitly ask (by email) the DACE authors whether they don't see any objection that we distribute your python implementation under the BSD license. Of course, if they accept they would get due credits with a reference to their paper and website in the documentation of the module (both in the source code and on the public scikit-learn HTML website). |
I agree with Olivier |
@dubourg I let you contact them and let us know on the mailing list what is their answer? |
Sure, I have just sent an e-mail to them. I'll let you know as soon as I get a reply. I precised that scikit-learn is distributed under BSD license and gave them the link to the scikit-learn website. |
@dubourg I think you can call the module "gaussian_processes" and your class "GaussianProcess". from scikits.learn.gaussian_processes import GaussianProcess Not sure if the module name should be singular or plural. |
@dubourg: In case the DACE people do not agree with BSD licencing I think it would good for you to keep this code open in a separate github project while keeping the scikit-learn API. The code would then be limited to personal use. |
Hi all, I got response from Pr Nielsen, the corresponding author of the Matlab's kriging toolbox. It seems to me that he accepts this implementation provided I refer to the original Matlab toolbox. Here are his reply and my initial request : Dear Vincent Dubourg, You are very wellcome to make a Python version with due references of the DACE toolbox. Sincerely, Hans Bruun Nielsen Vincent Dubourg wrote:
Hans Bruun Nielsen Informatics and Mathematical Modelling (IMM) Richard Petersens Plads |
this is great news ! i think you're ready to add tests, documentation, examples and ask for a pull request :) |
That sounds like a go to me :) It was better to ask explicitly. It's always a pain to spend days working on a piece of code only to discover later that there is a legal issue that would prevent you to distribute it. |
I completely agree with the fact that it was better to ask. scikits/learn/gpml/init.py # The main file where I would implement my present 'KrigingModel' class turned into a 'GaussianProcessModel' class so that mblondel's use case holds: "from scikits.learn.gpml import GaussianProcessModel as gpm" GPML stands for 'Gaussian Processes for Machine Learning' (named after the book by Williams & Rasmussen that you can download for free on the website of the authors). |
I would go for:
where classification.py contains GaussianProcessClassification Then you can add a 1D example for regression model ok with that folks? |
+1! |
…l directory. The module implement a class named GaussianProcessModel. I also add doc, examples and tests (involving a coupling with the cross_val module).
…. I just don't know how to involve this test within the whole scikit testing procedure (nosetests). Also add a modification of the TOC in doc.
Hi list, I added a few commits among which:
One of the test involves a coupling with your REALLY GREAT cross_val and metrics modules !!! I like them. I just don't know why I get a Pickling Error when I ask for n_jobs > 1... Which is quite a shame... But I'm sure you'll be able to help me with this latter point ! |
joblib is using the multiprocessing module to compute in // . To pass parameters and models results across processes the models need to be picklable. BTW, when you are commenting on the issue, you are not talking to the project mailing list as a whole, just the people registered on the issue. |
…the deviation between the predicted targets and the true ones. This is for convenience only because it allows then to use the distributing capacity of the cross_val module. The old score function is renamed with the more explicit name: `reduced_likelihood_function` (see eg the DACE documentation).
…e in order to load the gpml module and tests.
A few quick "mess of" remarks:
A comment on this sentence (in the doc):
I've seen Gaussian Processes applied with thousands of features that's it for this sunday morning :) |
Thank you Olivier I understood my mistake about the PicklingError: I declared a score function on the go right before I run the cross validation. Thus, the score function was not pickable. |
I agree with Alex: I'd like the package to be named 'gaussian_process' (I hate acronyms, I am not too happy with the various gmm, glm, sgd that we already have). For the name of the example files (not the example directory), I think using 'gp' instead of the full 'gaussian_process' should be fine, as it is explicit because of the context. +1 also with the fact that predict shouldn't output more than y: it can easily break all our model selection machinery. Thanks a lot, this is starting to look very good! |
Is it still planned to merge my branch despite the more recent discussions on the forthcoming release? |
Do you have any idea on how I could prevent the QR decomposition function from scipy from sending this iritating DeprecationWarning?
And maybe there is a better work around than a try/except to ensure this kwarg backward compatibility (Lines 561--569 of gaussian_process.py)?... |
it should be. Give a bit of time to review all the code. I'll get back to you ASAP |
@dubourg: you now have write access on the git repo, you should be able commit it now by yourself :-) |
Also please remove the comments blocks such as: ############### Dependencies############### They don't bring any additional information and break the beautiful flow a you python program :) |
@dubourg: if you have pbs with git to commit to the main repo please do not use -f to force the commit :) |
Removed useless comment blocks in the code.
I will merge my pull request tonight following the guidelines that appeared at the bottom of my pull request. |
Don't hesitate to ask questions if you run into trouble (but I'll probably reply tomorrow :-) |
Conflicts: scikits/learn/__init__.py
…score so that I needed to modify this example.
As you can see on the GitHub pull request I was able to merge the new features (since my latest pull in... September!), but I can't find the command I should run to pull my merged branch into the trunk...
sounds good? |
git push master origin/master |
Well I don't have any remote named master, I only have "origin"... |
well it depends on what is origin for you. for me origin is scikit-learn/scikit-learn so I would do git push alex master # to push to my fork hope this helps |
Not really...
And following your examples, I'd be tempted to try:
But it fails with error:
|
git push main master, I believe |
FIX transform tests
Feature/agent
Hello list,
I'd like to commit a new feature to the scikits-learn project.
I implemented a kriging model class which is able to perform both regression and probabilistic classification.
I will send an e-mail with a script that performs a demo.
Since I have no clue where to put my contribution I simply created a single
file named kriging.py with the class and other functions such as correlation
and regression models, and I added an import instruction to the main
init.py module of scikit.learn.
I hope you'll enjoy this contribution.
Vincent