-
-
Notifications
You must be signed in to change notification settings - Fork 25.8k
FIX Use cho_solve when return_std=True for GaussianProcessRegressor #19939
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Thanks for the PR. I can see that your PR is as well fixing the inconsistency that we saw there: #19703 (comment) I am going to finish the work on this PR and have a look at your PR. @pytest.mark.parametrize("target", [y, np.ones(X.shape[0], dtype=np.float64)])
def test_predict_cov_vs_std(kernel, target): In addition, it could be nice to add your minimal example as a non-regression test as well. |
Please add an entry to the change log at You can use 0.24.2 for adding an entry. |
(First contribution to this project, let me know if I have messed up procedures) Resolved PR comments. Test cases existed in test_gpr.py aimed at testing K_inv, since this is no longer computed it has been removed.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Mainly some style comment but otherwise everything seems correct.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. I am happy with this PR. I did a quick test regarding the speed and I could not point out a regression of the performance. It looks like a win-win.
We need a second review to be able to merge this but I would like to have it in the upcoming bug release 0.24.2
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thank you @iwhalvic.
I have included minor suggestions.
Non-regression test for: | ||
https://github.com/scikit-learn/scikit-learn/issues/19936 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Non-regression test for: | |
https://github.com/scikit-learn/scikit-learn/issues/19936 | |
Non-regression test for issues #19936. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To be honest I prefer to have the URL :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah yes, I just have seen that both styles exist.
@iwhalvic: you can ignore this comment.
kernel = (C(8.98576054e+05, (1e-12, 1e12)) * | ||
RBF([5.91326520e+02, 1.32584051e+03], (1e-12, 1e12)) + | ||
WhiteKernel(noise_level=1e-5)) | ||
gpr = GaussianProcessRegressor(kernel=kernel, alpha=0, optimizer=None) | ||
X_train = np.array([[0., 0.], [1.54919334, -0.77459667], [-1.54919334, 0.], | ||
[0., -1.54919334], [0.77459667, 0.77459667], | ||
[-0.77459667, 1.54919334]]) | ||
y_train = np.array([[-2.14882017e-10], [-4.66975823e+00], [4.01823986e+00], | ||
[-1.30303674e+00], [-1.35760156e+00], | ||
[3.31215668e+00]]) | ||
gpr.fit(X_train, y_train) | ||
X_test = np.array([[-1.93649167, -1.93649167], [1.93649167, -1.93649167], | ||
[-1.93649167, 1.93649167], [1.93649167, 1.93649167]]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are those values the ones for the case you are describing?
Inconsistencies were observed when the kernel cannot be inverted (or numerically stable).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes. This kernel is near to be singular: #19939 (comment)
I assume that this fine for a non-regression test.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the problem is about diagonal elements of gpr.L_
being close to zero, maybe we could make that explicit, e.g. with:
assert np.diag(gpr.L_).min() < 0.01
although I am not 100% sure this is the true cause of the problem.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Or maybe:
assert np.linalg.eigvalsh(kernel(X_train)).min() < 1e-4
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The condition number of gpr.L_
might also be a good proxy to assert if solving this system comes with numerical instability.
In this case, numerical instability is prone be present as cond(gpr.L_) >> 1
:
import numpy as np
from sklearn.gaussian_process import GaussianProcessRegressor
from sklearn.gaussian_process.kernels import RBF, ConstantKernel as C, WhiteKernel
kernel = (C(8.98576054e+05, (1e-12, 1e12)) *
RBF([5.91326520e+02, 1.32584051e+03], (1e-12, 1e12)) +
WhiteKernel(noise_level=1e-5))
gpr = GaussianProcessRegressor(kernel=kernel, alpha=0, optimizer=None)
X_train = np.array([[0., 0.], [1.54919334, -0.77459667], [-1.54919334, 0.],
[0., -1.54919334], [0.77459667, 0.77459667],
[-0.77459667, 1.54919334]])
y_train = np.array([[-2.14882017e-10], [-4.66975823e+00], [4.01823986e+00],
[-1.30303674e+00], [-1.35760156e+00],
[3.31215668e+00]])
gpr.fit(X_train, y_train)
print(np.linalg.cond(gpr.L_))
720955.9866810681
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yep this is exactly what we can observe in the branch main
when we try to inverse L_.T
with constant target. I print the condition number of L_
and when it fails for the constant target it is due to the fact that L_
is ill-conditioned. When the condition number >> 1, the inversion is numerically instable but the L_
is not singular.
sklearn/gaussian_process/tests/test_gpr.py::test_predict_cov_vs_std[target0-kernel0] 3.6020373678666835
PASSED
sklearn/gaussian_process/tests/test_gpr.py::test_predict_cov_vs_std[target0-kernel1] 4.000792132354216
PASSED
sklearn/gaussian_process/tests/test_gpr.py::test_predict_cov_vs_std[target0-kernel2] 3.6020373678666835
PASSED
sklearn/gaussian_process/tests/test_gpr.py::test_predict_cov_vs_std[target0-kernel3] 20.254629250748767
PASSED
sklearn/gaussian_process/tests/test_gpr.py::test_predict_cov_vs_std[target0-kernel4] 20.254602147788138
PASSED
sklearn/gaussian_process/tests/test_gpr.py::test_predict_cov_vs_std[target0-kernel5] 1.0000016459179157
PASSED
sklearn/gaussian_process/tests/test_gpr.py::test_predict_cov_vs_std[target1-kernel0] 244949.11539454578
FAILED
sklearn/gaussian_process/tests/test_gpr.py::test_predict_cov_vs_std[target1-kernel1] 4.000792132354216
PASSED
sklearn/gaussian_process/tests/test_gpr.py::test_predict_cov_vs_std[target1-kernel2] 244948.4153273236
FAILED
sklearn/gaussian_process/tests/test_gpr.py::test_predict_cov_vs_std[target1-kernel3] 165965.62854978096
FAILED
sklearn/gaussian_process/tests/test_gpr.py::test_predict_cov_vs_std[target1-kernel4] 165956.90214640685
FAILED
sklearn/gaussian_process/tests/test_gpr.py::test_predict_cov_vs_std[target1-kernel5] 165975.30738621883
FAILED
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just adding my comments on what is occurring here:
Covariance matrices must be positive semi-definite, so all eigenvalues >=0
Inversion of matrices require all eigenvalues <>0
So theoretically, we could have a covariance matrix with a 0 eigenvalue -> non-invertible
Minimal example: If I ask you to invert A=[[1 1] [1 1]] you will likely not have an answer. In an analogy to what the previous code using solve_triangular(self.L_.T, np.eye(self.L_.shape[0]))
we are asking: "Solve Ax=[1 0]". Once again no solution for x.
However, in the fix code cho_solve((self.L_, True), K_trans.T)
, but L_ and K_trans are both generated from the same kernel function, so we would not expect to ever be posed with such a problem. Instead we pose a problem such as "Solve Ax=[1 1]", at which point you can probably provide an answer (if not quite a few).
Kernel functions such as the test case form matrices like A=[[1 1] [1 1]] because the length scales of the RBF are quite large, so all points become highly correlated with all other points (almost regardless of distance in the X space). I believe a true linear fit using a GP with an RBF kernel would have a correlation length of inf.
The alpha
parameter in GaussianProcessRegressor attempts to avoid this, and with a sufficiently large alpha
value I predict it would for this case. But regardless the return_cov=True
and return_std=True
should be as consistent as possible.
But I would welcome a GP expert to weigh in.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here is another round of reviews. I agree that it's good to make the 2 return_std
/ return_cov
options consistent but I am not familiar enough with GP to assert that the way of computing return_cov
is correct.
Can you please execute all the examples in :
https://scikit-learn.org/dev/auto_examples/index.html#gaussian-process-for-machine-learning
And report here whether or not the plots have changed significantly or not with the fix of your PR, and your analysis on whether or not this is expected/correct.
Here is a quick way to find those examples:
% git grep "return_std=" examples/gaussian_process
examples/gaussian_process/plot_compare_gpr_krr.py:y_gpr = gpr.predict(X_plot, return_std=False)
examples/gaussian_process/plot_compare_gpr_krr.py:y_gpr, y_std = gpr.predict(X_plot, return_std=True)
examples/gaussian_process/plot_gpr_co2.py:y_pred, y_std = gp.predict(X_, return_std=True)
examples/gaussian_process/plot_gpr_noisy_targets.py:y_pred, sigma = gp.predict(x, return_std=True)
examples/gaussian_process/plot_gpr_noisy_targets.py:y_pred, sigma = gp.predict(x, return_std=True)
examples/gaussian_process/plot_gpr_prior_posterior.py: y_mean, y_std = gp.predict(X_[:, np.newaxis], return_std=True)
examples/gaussian_process/plot_gpr_prior_posterior.py: y_mean, y_std = gp.predict(X_[:, np.newaxis], return_std=True)
Ideally it would be great to find a second reviewer with more expertise on GPs than I do. Maybe @jmetzen @plgreenLIRU could give this PR a quick look?
I will have a quick look now. My feeling is that the previous way was working properly when But this is better to have a look at the examples. |
Actually, we have an interesting example: https://scikit-learn.org/dev/auto_examples/gaussian_process/plot_gpr_prior_posterior.html#sphx-glr-auto-examples-gaussian-process-plot-gpr-prior-posterior-py You need to look at the standard deviation of the
So we force the negative std. dev. to 0 but we have numerical unstable positive std. dev. With the current PR, we don't have this issue: no warning and all std. dev. are close to 0 with a nicer plot: |
All other examples are the same. |
Can someone aid in the test errors? It seems just like a timeout, which is hard to explain. Curious if it is a test server problem. I do not believe I have the authority to restart the tests without making dummy commits. |
Will await PR #19964 merge to retest. |
Yes you can ignore the MacOS failure for the moment.
…On Fri, 23 Apr 2021 at 18:33, iwhalvic ***@***.***> wrote:
Will await PR #19964
<#19964> merge to retest.
—
You are receiving this because you were assigned.
Reply to this email directly, view it on GitHub
<#19939 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ABY32P32R6HECTA6HMMQJR3TKGOMJANCNFSM43JBMF5Q>
.
--
Guillaume Lemaitre
Scikit-learn @ Inria Foundation
https://glemaitre.github.io/
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Thanks @iwhalvic !
Reference Issues/PRs
Fixes Issue #19936
What does this implement/fix? Explain your changes.
Provides more consistency in
GaussianProcessRegressor
when computing the predictive standard deviation (often taken as a form of predictive uncertainty). Specifically consistency between usingreturn_std=True
andreturn_cov=True
. For certain kernel hyperparameters,return_std=True
appears unstable.return_cov=True
can be used as an alternative, and the end user can quickly find the standard deviation from the returned predictive covariance, however id the predicted dataset is large the covariance matrix may become too large for memory.The key to the inconsistency appears to be
return_std=True
attempting to explicitly form the inverse training kernel matrix. Typically in GP regression this inverse is never explicitly formed, instead the Cholesky decomposition is stored and cholesky solves take the place of the explicit inverse (this is the approach thereturn_cov=True
) uses.This fix attempts to improve consistency and stability by using the Cholesky solve in both cases, while still mitigating the large predictive covariance matrix formation when
return_std=True
.Changes are localized to _gpr.py
before:
after:
The matrix
v
has the same dimensions asK_trans.T
, so size should not be an issue.To my understanding the use of
np.einsum
avoids the formation of the (potentially large) covariance matrixK_trans, v
just to get the diagonal entries.The computation of
self._K_inv
is left in the code, even though it is useless in the built in workflows. This retains compatibility should an end user be leveragingself._K_inv
in some workflow. I will defer to more experienced developers if this is the correct course of action.Any other comments?
Running the reproducing code in Issue #19936 results in close agreement between the two computation methods now