-
-
Notifications
You must be signed in to change notification settings - Fork 25.8k
[MRG+2] Allowing Gaussian process kernels on structured data - updated #15557
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[MRG+2] Allowing Gaussian process kernels on structured data - updated #15557
Conversation
…een kernels that operate on explicit feature vectors or structured objects.
updated corresponding tests
…n variable-length sequences
…base `Kernel` class. Removed `VectorKernelMixin` which is redundant in presence of a default `requires_vector_input()` method.
Why do you regard this as WIP? What work remains other than review? |
I have one more commit that updates the changelog that has not yet been pushed : ) Should be there in a few minutes. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is looking good to me
Co-Authored-By: Joel Nothman <joel.nothman@gmail.com>
@jnothman How could I restart a failed pipeline caused by network error? |
@jnothman Could you please let me know the next steps to do? |
Await another reviewer... unfortunately, this may miss the cutoff for 0.22. |
I see... Do I need to update the PR name to MRG + 1 etc.? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Only nitpicks, thanks for this nice contribution !
Co-Authored-By: Tom Dupré la Tour <tom.dupre-la-tour@m4x.org>
Co-Authored-By: Tom Dupré la Tour <tom.dupre-la-tour@m4x.org>
Thanks for reviewing! |
Co-Authored-By: Tom Dupré la Tour <tom.dupre-la-tour@m4x.org>
@adrinjalali are you keeping track of things that will either need their change log entries moved to 0.23 or will need to be picked into 0.22 final? (You don't necessarily need to keep track, just to be aware when selecting commits for release) |
@NicolasHug Hi Nicolas, the primary motivation that I brought up the PR was due to a need to perform Gaussian process regressions on an ensemble of graphs. While the PR intends to bridge scikit-learn's GPR module to data including and beyond variable-length sequences, the actual changes involves no more than allowing non-vectorial data to be passed through the GP regressors/classifiers --- without being touched at all. Thanks to the kernel trick, as discussed in the original issue, the logic of computing the kernel matrix from samples is delegated to a kernel, which will be user-supplied for sequence and generic data. As such, I would argue that this introduces a very minimal amount of burden to the development and maintenance of the GP module and does not disrupt the API and/or future development, while at the same time greatly extends the applicability of the module. |
Fixes #13936.
This PR is a from-scratch refresh of #13954 (with all previous modifications incorporated) based on the latest master branch.
Made the Gaussian process regressor and classifier compatible with generic kernels on arbitrary data. Specifically:
requires_vector_input
property to all the built-in GP kernels ingaussian_process/kernels.py
.GenericKernelMixin
base classes to overload therequires_vector_input
property for kernels that can work on structured and/or generic data.check_array
andcheck_X_y
parameters for vector/generic input. I.e., do not force the X and Y array to be at least 2D and numeric if the kernel can operate on generic input. Updated docstrings accordingly.plot_gpr_on_structured_data.py
.