-
-
Notifications
You must be signed in to change notification settings - Fork 25.8k
Feature request: forcing a Gram matrix to be positive definite #8590
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
What kernel methods in sklearn don't accept precomputed kernels? I'm not sold on this because it's really hacky. I have not heard of anyone do that. Can you give an example when you'd do that? |
I do this because the objects in my x space are molecules. My kernel is Tanimoto similarity between molecule fingerprints. The point of kernel methods is that devising an appropriate kernel function is an alternative to feature extraction. Mercer's theorem says that a kernel induces a vector space over the input space, but an infinite-dimensional one, so this approach is inherently more powerful than identifying features. Reducing the input space to a vector of features then choose a kernel function is missing the point. |
To repeat: What kernel methods in sklearn don't accept precomputed kernels? |
Tanimoto similarity is the jaccard similarity, right? That doesn't lead to a positive definite kernel? |
Hm I guess it does not, but wouldn't exp(similarity) lead to one? |
Gaussian Process Regression. For Kernel Ridge Regression, just a documentation change is needed. For Kernel Ridge Regression does not mention it, but in fact it supports |
"It is well known that the Jaccard–Tanimoto similarity measure satisfies the Mercer's conditions and, hence, yields a Mercer kernel " |
Ok, so you don't need to "coerce the Gram matrix". I had asked when you need to coerce the gram matrix and you gave that as an example. |
Hi Chris, a bigger problem (but related I think) is that currently scikit-learn doesn't implement the Tanimoto kernel. While this kernel is super useful in cheminformatics and information research and not everybody is skilled enough to contribute to scikit-learn. Cf. #29507 I am unsure if Tanimoto kernel would be unsuited to kernel-based methods, but @jpvert might know. |
Description
Kernel methods like SVR use a similarity measure rather than a vector of features. The prior knowledge is encoded by making a domain-appropriate similarity measure. However, there is a technical requirement that is sometimes hard to meet: a valid kernel must be symmetric, and positive semidefinite.
Suggested solution
In some circumstances, it works well to coerce the Gram matrix into a symmetric matrix by averaging it with its transpose; and then coercing it into a PSD matrix by discarding negative eigenvalues. For R users, the latter operation is available here: https://stat.ethz.ch/R-manual/R-devel/library/Matrix/html/nearPD.html.
The user can do these calculation for SVR, which supports precomputed kernels, but not for the other kernel methods in sklearn, which do not.
Even for SVR, the second transformation is more efficient if it is integrated into the Cholesky decomposition which is already performed by the sklearn package.
I therefore propose an option coerce_kernel=False, on the fit method for SVR.
The text was updated successfully, but these errors were encountered: