Skip to content

Handling division by zero std for GaussianProcessRegressor #19361

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
8 changes: 5 additions & 3 deletions sklearn/gaussian_process/_gpr.py
Original file line number Diff line number Diff line change
Expand Up @@ -8,12 +8,13 @@
from operator import itemgetter

import numpy as np
from scipy.linalg import cholesky, cho_solve, solve_triangular
import scipy.optimize
from scipy.linalg import cholesky, cho_solve, solve_triangular

from .kernels import RBF, ConstantKernel as C
from ..base import BaseEstimator, RegressorMixin, clone
from ..base import MultiOutputMixin
from .kernels import RBF, ConstantKernel as C
from ..preprocessing._data import _handle_zeros_in_scale
from ..utils import check_random_state
from ..utils.optimize import _check_optimize_result
from ..utils.validation import _deprecate_positional_args
Expand Down Expand Up @@ -200,7 +201,8 @@ def fit(self, X, y):
self._y_train_std = np.std(y, axis=0)

# Remove mean and make unit variance
y = (y - self._y_train_mean) / self._y_train_std
y = (y - self._y_train_mean) / \
Copy link

@jaburke166 jaburke166 Feb 15, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi there!

I'm very new to the whole "contributing" side of GitHub but I've been affected by the std_dev==0 issue that has seemingly been discussed since as early as November of last year! I'm glad this is being rectified now.

The reason I'm commenting is because I copied this fix into the current version of _gpr.py (version 0.24.0) but the issue seems to be persistent. My thoughts are that self._y_std_dev needs to be stored as _handle_zeros_in_scale(np.std(y,axis=0), copy=False) (or something similar) instead of the scaling issue only being dealt with only in the denominator.

I write this because in line 201 the self._y_std_dev will still be 0 and be carried to other functions (like predict() and sample_y(). For example, further downstream in predict() the covariance matrix on line 357 will be multiplied by this small number: y_cov = y_cov * self._y_train_std**2. This will subsequently cause issues in sample_y() as well.

I could be wrong here as I'm no expert, but I thought I'd comment my thoughts nonetheless!

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jaburke166
Thank you for your comment, I've just added a case for y_with_zero_std to test_y_normalization. It works with the current approach but fails if self._y_train_std = _handle_zeros_in_scale(np.std(y, axis=0), copy=False)
Could you please provide more details about your concern about passing zeros as std to predict()?

Copy link

@jaburke166 jaburke166 Feb 16, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @sobkevich,

With the changes you've made on _gpr.py, it seems that self._y_train_std is not overwritten by _handle_zeros_in_scale() in the case where self._y_train_std is extremely small or 0 . Wouldn't this mean that further downstream in predict(self, X, **return_cov=True**), when the normalisation is inverted (lines 349-350 and 356-357), the output of y_mean and y_cov will not be aligned with the fact we've set self._y_train_std=1?

Given the assumption that self._y_train_std is almost 0, squaring this and multiplying it by y_cov on line 357 will surely run into numerical issues which could results in y_cov losing its positive semi-definite property which when sampling from the GP in sample_y() on line 416 (or 419 for multidimensional y) using rng.multivariate_normal() we will be trying to sample from a non-positive semi-definite covariance matrix which would throw back rubbish samples (I think, but I am no GP expert at all).

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @sobkevich,

I'm wondering if you've had any time to have a look at this?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jaburke166 I believe that the fix you suggest would be to define:

self._y_train_std = _handle_zeros_in_scale(np.std(y, axis=0), copy=False)

instead of detecting constant y at scaling time but I would like to be sure.

I am worried that that this would also be wrong in some other way but I am not sure.

Could you suggest the code snippet for a new test case to add to the test suite that would highlight the problem you are raising above?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jblackburne this solution is being implemented in #18388 but it lacks good tests so feel free to suggest one there.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ogrisel
@jaburke166, yes, you're right that with my approach all y_cov are going to be filled with zeros, so I tried the fix you've suggested: self._y_train_std = _handle_zeros_in_scale(np.std(y, axis=0), copy=False). With this approach with some kernels y_cov would also be filled with zeros (if y_cov has only zero elements it is also positive semi-definite because all eigenvalues >=0, in such case all samples would be in y_mean point) but for some kernels it is not zero-like. So it is a better solution.
But I've also noticed that test test_predict_cov_vs_std(kernel) fails when it receives y_with_zero_std both with and without normalization. I've considered adding some noise np.random.normal(0, 1e-8, y.shape) to y instead of _handle_zeros_in_scale, it seems to be a working solution, but I'm also not a GP expert.
And tomorrow I'll check if I missed anything in tests.

_handle_zeros_in_scale(self._y_train_std, copy=False)

else:
self._y_train_mean = np.zeros(1)
Expand Down
20 changes: 17 additions & 3 deletions sklearn/gaussian_process/tests/test_gpr.py
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,8 @@
# License: BSD 3 clause

import sys
from itertools import product

import numpy as np
import warnings

Expand All @@ -18,6 +20,7 @@
from sklearn.gaussian_process.kernels import DotProduct, ExpSineSquared
from sklearn.gaussian_process.tests._mini_sequence_kernel import MiniSeqKernel
from sklearn.exceptions import ConvergenceWarning
from sklearn.preprocessing._data import _handle_zeros_in_scale

from sklearn.utils._testing \
import (assert_array_less,
Expand All @@ -33,6 +36,7 @@ def f(x):
X = np.atleast_2d([1., 3., 5., 6., 7., 8.]).T
X2 = np.atleast_2d([2., 4., 5.5, 6.5, 7.5]).T
y = f(X).ravel()
y_with_zero_std = np.ones(X.shape[0])

fixed_kernel = RBF(length_scale=1.0, length_scale_bounds="fixed")
kernels = [RBF(length_scale=1.0), fixed_kernel,
Expand Down Expand Up @@ -234,8 +238,9 @@ def test_random_starts():
last_lml = lml


@pytest.mark.parametrize('kernel', kernels)
def test_y_normalization(kernel):
@pytest.mark.parametrize('kernel,y',
list(product(kernels, [y, y_with_zero_std])))
def test_y_normalization(kernel, y):
"""
Test normalization of the target values in GP

Expand All @@ -248,7 +253,7 @@ def test_y_normalization(kernel):

y_mean = np.mean(y)
y_std = np.std(y)
y_norm = (y - y_mean) / y_std
y_norm = (y - y_mean) / _handle_zeros_in_scale(y_std, copy=False)

# Fit non-normalizing GP on normalized y
gpr = GaussianProcessRegressor(kernel=kernel)
Expand Down Expand Up @@ -543,3 +548,12 @@ def test_bound_check_fixed_hyperparameter():
periodicity_bounds="fixed") # seasonal component
kernel = k1 + k2
GaussianProcessRegressor(kernel=kernel).fit(X, y)


def test_handling_zeros_in_std():
# Test that zero std is handled properly
gpr = GaussianProcessRegressor(normalize_y=True).fit(X, y_with_zero_std)
y_pred, y_cov = gpr.predict(X, return_cov=True)

assert_almost_equal(y_pred, y_with_zero_std)
assert_almost_equal(np.diag(y_cov), 0.)