FIX Use cho_solve when return_std=True for GaussianProcessRegressor #19939

iwhalvic · 2021-04-20T21:00:33Z

Reference Issues/PRs

Fixes Issue #19936

What does this implement/fix? Explain your changes.

Provides more consistency in GaussianProcessRegressor when computing the predictive standard deviation (often taken as a form of predictive uncertainty). Specifically consistency between using return_std=True and return_cov=True. For certain kernel hyperparameters, return_std=True appears unstable.

return_cov=True can be used as an alternative, and the end user can quickly find the standard deviation from the returned predictive covariance, however id the predicted dataset is large the covariance matrix may become too large for memory.

The key to the inconsistency appears to be return_std=True attempting to explicitly form the inverse training kernel matrix. Typically in GP regression this inverse is never explicitly formed, instead the Cholesky decomposition is stored and cholesky solves take the place of the explicit inverse (this is the approach the return_cov=True ) uses.

This fix attempts to improve consistency and stability by using the Cholesky solve in both cases, while still mitigating the large predictive covariance matrix formation when return_std=True.

Changes are localized to _gpr.py

before:

elif return_std:
    # cache result of K_inv computation
    if self._K_inv is None:
        # compute inverse K_inv of K based on its Cholesky
        # decomposition L and its inverse L_inv
        L_inv = solve_triangular(self.L_.T,
                                 np.eye(self.L_.shape[0]))
        self._K_inv = L_inv.dot(L_inv.T)

    # Compute variance of predictive distribution
    y_var = self.kernel_.diag(X)
    y_var -= np.einsum("ij,ij->i",
                       np.dot(K_trans, self._K_inv), K_trans)

after:

elif return_std:
    # cache result of K_inv computation,
    # this is done for compatibility, K_inv should be avoided
    if self._K_inv is None:
        # compute inverse K_inv of K based on its Cholesky
        # decomposition L and its inverse L_inv
        L_inv = solve_triangular(self.L_.T,
                                 np.eye(self.L_.shape[0]))
        self._K_inv = L_inv.dot(L_inv.T)
    v = cho_solve((self.L_, True), K_trans.T)  # Line 5

    # Compute variance of predictive distribution
    # Use einsum to avoid large matrix
    y_var = self.kernel_.diag(X)
    y_var -= np.einsum("ij,ji->i",
                       K_trans, v)

The matrix v has the same dimensions as K_trans.T, so size should not be an issue.

To my understanding the use of np.einsum avoids the formation of the (potentially large) covariance matrix K_trans, v just to get the diagonal entries.

The computation of self._K_inv is left in the code, even though it is useless in the built in workflows. This retains compatibility should an end user be leveraging self._K_inv in some workflow. I will defer to more experienced developers if this is the correct course of action.

Any other comments?

Running the reproducing code in Issue #19936 results in close agreement between the two computation methods now

[0.00754465 0.00633869 0.00642104 0.00834402]
[0.00754465 0.0063387  0.00642104 0.00834401]

glemaitre · 2021-04-21T18:06:29Z

Thanks for the PR. I can see that your PR is as well fixing the inconsistency that we saw there: #19703 (comment)

I am going to finish the work on this PR and have a look at your PR.
Once #19703 merged, could change the following test with a new parameterization:

@pytest.mark.parametrize("target", [y, np.ones(X.shape[0], dtype=np.float64)])
def test_predict_cov_vs_std(kernel, target):

In addition, it could be nice to add your minimal example as a non-regression test as well.

sklearn/gaussian_process/_gpr.py

glemaitre · 2021-04-21T19:10:54Z

Please add an entry to the change log at doc/whats_new/v*.rst. Like the other entries there, please reference this pull request with :pr: and credit yourself (and other contributors if applicable) with :user:.

You can use 0.24.2 for adding an entry.

iwhalvic · 2021-04-21T21:51:17Z

(First contribution to this project, let me know if I have messed up procedures)

Resolved PR comments. Test cases existed in test_gpr.py aimed at testing K_inv, since this is no longer computed it has been removed.

test_predict_cov_vs_std as well as adding my minimal example to test_gpr.py, please let me know is this is nor the correct place for this.

glemaitre

Mainly some style comment but otherwise everything seems correct.

sklearn/gaussian_process/tests/test_gpr.py

doc/whats_new/v0.24.rst

…est.

glemaitre

LGTM. I am happy with this PR. I did a quick test regarding the speed and I could not point out a regression of the performance. It looks like a win-win.

We need a second review to be able to merge this but I would like to have it in the upcoming bug release 0.24.2

jjerphan

LGTM, thank you @iwhalvic.

I have included minor suggestions.

jjerphan · 2021-04-22T07:46:03Z

sklearn/gaussian_process/tests/test_gpr.py

+    Non-regression test for:
+    https://github.com/scikit-learn/scikit-learn/issues/19936


Suggested change

Non-regression test for:

https://github.com/scikit-learn/scikit-learn/issues/19936

Non-regression test for issues #19936.

To be honest I prefer to have the URL :)

Ah yes, I just have seen that both styles exist.

@iwhalvic: you can ignore this comment.

jjerphan · 2021-04-22T08:34:43Z

sklearn/gaussian_process/tests/test_gpr.py

+    kernel = (C(8.98576054e+05, (1e-12, 1e12)) *
+              RBF([5.91326520e+02, 1.32584051e+03], (1e-12, 1e12)) +
+              WhiteKernel(noise_level=1e-5))
+    gpr = GaussianProcessRegressor(kernel=kernel, alpha=0, optimizer=None)
+    X_train = np.array([[0., 0.], [1.54919334, -0.77459667], [-1.54919334, 0.],
+                        [0., -1.54919334], [0.77459667, 0.77459667],
+                        [-0.77459667, 1.54919334]])
+    y_train = np.array([[-2.14882017e-10], [-4.66975823e+00], [4.01823986e+00],
+                        [-1.30303674e+00], [-1.35760156e+00],
+                        [3.31215668e+00]])
+    gpr.fit(X_train, y_train)
+    X_test = np.array([[-1.93649167, -1.93649167], [1.93649167, -1.93649167],
+                       [-1.93649167, 1.93649167], [1.93649167, 1.93649167]])


Are those values the ones for the case you are describing?

Inconsistencies were observed when the kernel cannot be inverted (or numerically stable).

Yes. This kernel is near to be singular: #19939 (comment)
I assume that this fine for a non-regression test.

If the problem is about diagonal elements of gpr.L_ being close to zero, maybe we could make that explicit, e.g. with:

assert np.diag(gpr.L_).min() < 0.01

although I am not 100% sure this is the true cause of the problem.

Or maybe:

assert np.linalg.eigvalsh(kernel(X_train)).min() < 1e-4

The condition number of gpr.L_ might also be a good proxy to assert if solving this system comes with numerical instability.

In this case, numerical instability is prone be present as cond(gpr.L_) >> 1:

import numpy as np from sklearn.gaussian_process import GaussianProcessRegressor from sklearn.gaussian_process.kernels import RBF, ConstantKernel as C, WhiteKernel kernel = (C(8.98576054e+05, (1e-12, 1e12)) * RBF([5.91326520e+02, 1.32584051e+03], (1e-12, 1e12)) + WhiteKernel(noise_level=1e-5)) gpr = GaussianProcessRegressor(kernel=kernel, alpha=0, optimizer=None) X_train = np.array([[0., 0.], [1.54919334, -0.77459667], [-1.54919334, 0.], [0., -1.54919334], [0.77459667, 0.77459667], [-0.77459667, 1.54919334]]) y_train = np.array([[-2.14882017e-10], [-4.66975823e+00], [4.01823986e+00], [-1.30303674e+00], [-1.35760156e+00], [3.31215668e+00]]) gpr.fit(X_train, y_train) print(np.linalg.cond(gpr.L_))

720955.9866810681

yep this is exactly what we can observe in the branch main when we try to inverse L_.T with constant target. I print the condition number of L_ and when it fails for the constant target it is due to the fact that L_ is ill-conditioned. When the condition number >> 1, the inversion is numerically instable but the L_ is not singular.

sklearn/gaussian_process/tests/test_gpr.py::test_predict_cov_vs_std[target0-kernel0] 3.6020373678666835 PASSED sklearn/gaussian_process/tests/test_gpr.py::test_predict_cov_vs_std[target0-kernel1] 4.000792132354216 PASSED sklearn/gaussian_process/tests/test_gpr.py::test_predict_cov_vs_std[target0-kernel2] 3.6020373678666835 PASSED sklearn/gaussian_process/tests/test_gpr.py::test_predict_cov_vs_std[target0-kernel3] 20.254629250748767 PASSED sklearn/gaussian_process/tests/test_gpr.py::test_predict_cov_vs_std[target0-kernel4] 20.254602147788138 PASSED sklearn/gaussian_process/tests/test_gpr.py::test_predict_cov_vs_std[target0-kernel5] 1.0000016459179157 PASSED sklearn/gaussian_process/tests/test_gpr.py::test_predict_cov_vs_std[target1-kernel0] 244949.11539454578 FAILED sklearn/gaussian_process/tests/test_gpr.py::test_predict_cov_vs_std[target1-kernel1] 4.000792132354216 PASSED sklearn/gaussian_process/tests/test_gpr.py::test_predict_cov_vs_std[target1-kernel2] 244948.4153273236 FAILED sklearn/gaussian_process/tests/test_gpr.py::test_predict_cov_vs_std[target1-kernel3] 165965.62854978096 FAILED sklearn/gaussian_process/tests/test_gpr.py::test_predict_cov_vs_std[target1-kernel4] 165956.90214640685 FAILED sklearn/gaussian_process/tests/test_gpr.py::test_predict_cov_vs_std[target1-kernel5] 165975.30738621883 FAILED

Just adding my comments on what is occurring here:
Covariance matrices must be positive semi-definite, so all eigenvalues >=0
Inversion of matrices require all eigenvalues <>0

So theoretically, we could have a covariance matrix with a 0 eigenvalue -> non-invertible

Minimal example: If I ask you to invert A=[[1 1] [1 1]] you will likely not have an answer. In an analogy to what the previous code using solve_triangular(self.L_.T, np.eye(self.L_.shape[0])) we are asking: "Solve Ax=[1 0]". Once again no solution for x.

However, in the fix code cho_solve((self.L_, True), K_trans.T), but L_ and K_trans are both generated from the same kernel function, so we would not expect to ever be posed with such a problem. Instead we pose a problem such as "Solve Ax=[1 1]", at which point you can probably provide an answer (if not quite a few).

Kernel functions such as the test case form matrices like A=[[1 1] [1 1]] because the length scales of the RBF are quite large, so all points become highly correlated with all other points (almost regardless of distance in the X space). I believe a true linear fit using a GP with an RBF kernel would have a correlation length of inf.

The alpha parameter in GaussianProcessRegressor attempts to avoid this, and with a sufficiently large alpha value I predict it would for this case. But regardless the return_cov=True and return_std=True should be as consistent as possible.

But I would welcome a GP expert to weigh in.

sklearn/gaussian_process/tests/test_gpr.py

ogrisel

Here is another round of reviews. I agree that it's good to make the 2 return_std / return_cov options consistent but I am not familiar enough with GP to assert that the way of computing return_cov is correct.

Can you please execute all the examples in :

https://scikit-learn.org/dev/auto_examples/index.html#gaussian-process-for-machine-learning

And report here whether or not the plots have changed significantly or not with the fix of your PR, and your analysis on whether or not this is expected/correct.

Here is a quick way to find those examples:

% git grep "return_std=" examples/gaussian_process 
examples/gaussian_process/plot_compare_gpr_krr.py:y_gpr = gpr.predict(X_plot, return_std=False)
examples/gaussian_process/plot_compare_gpr_krr.py:y_gpr, y_std = gpr.predict(X_plot, return_std=True)
examples/gaussian_process/plot_gpr_co2.py:y_pred, y_std = gp.predict(X_, return_std=True)
examples/gaussian_process/plot_gpr_noisy_targets.py:y_pred, sigma = gp.predict(x, return_std=True)
examples/gaussian_process/plot_gpr_noisy_targets.py:y_pred, sigma = gp.predict(x, return_std=True)
examples/gaussian_process/plot_gpr_prior_posterior.py:    y_mean, y_std = gp.predict(X_[:, np.newaxis], return_std=True)
examples/gaussian_process/plot_gpr_prior_posterior.py:    y_mean, y_std = gp.predict(X_[:, np.newaxis], return_std=True)

Ideally it would be great to find a second reviewer with more expertise on GPs than I do. Maybe @jmetzen @plgreenLIRU could give this PR a quick look?

doc/whats_new/v0.24.rst

sklearn/gaussian_process/_gpr.py

sklearn/gaussian_process/tests/test_gpr.py

glemaitre · 2021-04-22T08:58:26Z

And report here whether or not the plots have changed significantly or not with the fix of your PR, and your analysis on whether or not this is expected/correct.

I will have a quick look now. My feeling is that the previous way was working properly when ~~K_trans~~ L_ could be inverted. Issues arise when it is not the case such as the two case with the constant target or the specific kernel provided in the non-regression tests.

But this is better to have a look at the examples.

glemaitre · 2021-04-22T09:12:42Z

Actually, we have an interesting example: https://scikit-learn.org/dev/auto_examples/gaussian_process/plot_gpr_prior_posterior.html#sphx-glr-auto-examples-gaussian-process-plot-gpr-prior-posterior-py

You need to look at the standard deviation of the DotProduct kernel (Fig. 8). Computing the kernel is numerically unstable: we get the following warning:

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
  _check_optimize_result("lbfgs", opt_res)
/home/circleci/project/sklearn/gaussian_process/_gpr.py:378: UserWarning: Predicted variances smaller than 0. Setting those variances to 0.
  warnings.warn("Predicted variances smaller than 0. "

So we force the negative std. dev. to 0 but we have numerical unstable positive std. dev.

With the current PR, we don't have this issue: no warning and all std. dev. are close to 0 with a nicer plot:

glemaitre · 2021-04-22T09:15:29Z

All other examples are the same.

iwhalvic · 2021-04-23T14:05:32Z

Can someone aid in the test errors? It seems just like a timeout, which is hard to explain. Curious if it is a test server problem. I do not believe I have the authority to restart the tests without making dummy commits.

iwhalvic · 2021-04-23T16:32:55Z

Will await PR #19964 merge to retest.

glemaitre · 2021-04-23T18:07:48Z

Yes you can ignore the MacOS failure for the moment.

…

On Fri, 23 Apr 2021 at 18:33, iwhalvic ***@***.***> wrote: Will await PR #19964 <#19964> merge to retest. — You are receiving this because you were assigned. Reply to this email directly, view it on GitHub <#19939 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ABY32P32R6HECTA6HMMQJR3TKGOMJANCNFSM43JBMF5Q> .

-- Guillaume Lemaitre Scikit-learn @ Inria Foundation https://glemaitre.github.io/

glemaitre

I had a new look at it. LGTM. @ogrisel do you mind to have a new look at it. We will address the documentation and further improvement in #19952

jeremiedbb

LGTM. Thanks @iwhalvic !

sklearn/gaussian_process/_gpr.py

…cikit-learn#19939)

…19939)

Use cho_solve when return_std=True

a313674

github-actions bot added the module:gaussian_process label Apr 20, 2021

glemaitre reviewed Apr 21, 2021

View reviewed changes

sklearn/gaussian_process/_gpr.py Outdated Show resolved Hide resolved

glemaitre reviewed Apr 21, 2021

View reviewed changes

sklearn/gaussian_process/_gpr.py Outdated Show resolved Hide resolved

glemaitre added Bug To backport PR merged in master that need a backport to a release branch defined based on the milestone. labels Apr 21, 2021

glemaitre added this to the 0.24.2 milestone Apr 21, 2021

glemaitre mentioned this pull request Apr 21, 2021

Prevent division by zero in GPR when y_train is constant #19703

Merged

Merge remote-tracking branch 'upstream/main' into gpr_std_use_cho

3ba61f2

Resolving doc issue

0be8fa3

glemaitre reviewed Apr 21, 2021

View reviewed changes

doc/whats_new/v0.24.rst Outdated Show resolved Hide resolved

Responding to comments, primarily style changes, minor revisions to t…

17b22fb

…est.

glemaitre approved these changes Apr 22, 2021

View reviewed changes

jjerphan approved these changes Apr 22, 2021

View reviewed changes

ogrisel reviewed Apr 22, 2021

View reviewed changes

glemaitre self-assigned this Apr 22, 2021

glemaitre mentioned this pull request Apr 22, 2021

EHN/DOC add reference and small optimizations for GPR #19952

Merged

Addressing Comments

7917673

Trigger Build/Tests

c61139a

glemaitre mentioned this pull request Apr 26, 2021

Fix Inconsistency in GaussianProcessRegressor between return_std and return_cov #19936 #19980

Closed

glemaitre self-requested a review April 26, 2021 10:29

glemaitre approved these changes Apr 26, 2021

View reviewed changes

Merge remote-tracking branch 'origin/main' into pr/iwhalvic/19939

7d62590

jeremiedbb approved these changes Apr 27, 2021

View reviewed changes

sklearn/gaussian_process/_gpr.py Outdated Show resolved Hide resolved

Update sklearn/gaussian_process/_gpr.py

0160a67

jeremiedbb merged commit 7b1c9af into scikit-learn:main Apr 27, 2021

glemaitre pushed a commit to glemaitre/scikit-learn that referenced this pull request Apr 27, 2021

FIX Use cho_solve when return_std=True for GaussianProcessRegressor (s…

fefd804

…cikit-learn#19939)

glemaitre pushed a commit that referenced this pull request Apr 28, 2021

FIX Use cho_solve when return_std=True for GaussianProcessRegressor (#…

e25c9b1

…19939)

sqbl mentioned this pull request Nov 9, 2022

Inconsistency with book (Rasmussen+Williams) regarding calculation of std and cov novonordisk-research/ProcessOptimizer#128

Closed

		Non-regression test for:
		https://github.com/scikit-learn/scikit-learn/issues/19936

Uh oh!

FIX Use cho_solve when return_std=True for GaussianProcessRegressor #19939

FIX Use cho_solve when return_std=True for GaussianProcessRegressor #19939

Uh oh!

Conversation

iwhalvic commented Apr 20, 2021

Reference Issues/PRs

What does this implement/fix? Explain your changes.

Any other comments?

Uh oh!

glemaitre commented Apr 21, 2021

Uh oh!

Uh oh!

Uh oh!

glemaitre commented Apr 21, 2021

Uh oh!

iwhalvic commented Apr 21, 2021

Uh oh!

glemaitre left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

glemaitre left a comment

Choose a reason for hiding this comment

Uh oh!

jjerphan left a comment

Choose a reason for hiding this comment

Uh oh!

jjerphan Apr 22, 2021

Choose a reason for hiding this comment

Uh oh!

glemaitre Apr 22, 2021

Choose a reason for hiding this comment

Uh oh!

jjerphan Apr 22, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jjerphan Apr 22, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

glemaitre Apr 22, 2021

Choose a reason for hiding this comment

Uh oh!

ogrisel Apr 22, 2021

Choose a reason for hiding this comment

Uh oh!

ogrisel Apr 22, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jjerphan Apr 22, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

glemaitre Apr 22, 2021

Choose a reason for hiding this comment

Uh oh!

iwhalvic Apr 22, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

ogrisel left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

jjerphan Apr 22, 2021 •

edited

Loading

jjerphan Apr 22, 2021 •

edited

Loading

ogrisel Apr 22, 2021 •

edited

Loading

jjerphan Apr 22, 2021 •

edited

Loading

iwhalvic Apr 22, 2021 •

edited

Loading

glemaitre commented Apr 22, 2021 •

edited

Loading