Skip to content

GaussianProcessRegressor cannot correctly predict std in multi-target scene #17394

@bitosky

Description

@bitosky

I used GaussianProcessRegressor like this:

from sklearn.gaussian_process.kernels import Matern
from sklearn.gaussian_process import GaussianProcessRegressor
import numpy as np

ls = [
    ([2, 3, 4, 5], [1, 1, 3]),
    ([5, 3, 1, 3], [1, 4, 3]),
    ([2, 7, 2, 5], [8, 2, 3]),
    ([-2, 3, 4, 5], [1, 0, 4]),
    ([2, 8, 7, 5], [-1, 2, 3]),
]

X = np.array([ele[0] for ele in ls])
y = np.array([ele[1] for ele in ls])

gp = GaussianProcessRegressor(
    kernel=Matern(nu=2.5),
    alpha=1e-6,
    normalize_y=True,
    n_restarts_optimizer=5,
    random_state=np.random.RandomState(None),
)

gp.fit(X, y)
# x = np.array([200, 300, 600, 500.]).reshape(1, -1)
x = np.array([
    [2, 3, 6, 5.],
    [2, 3, 3, 5.],
    [5, 3, 1, 3.00001],
    [5, 3, 1, 3]
])

v, d = gp.predict(x, return_std=True)

print(v, d)

_, c = gp.predict(x, return_cov=True)

print(c)

But I got this Error:

ValueError: operands could not be broadcast together with shapes (4,) (3,) 

I scaned the code of GaussianProcessRegressor, and I found out what happened in function predict(self, X, return_std=False, return_cov=False):

  • code below: y_var.shape==(4,) but self._y_train_std.shape==(3,)
# undo normalisation
y_var = y_var * self._y_train_std ** 2

I think y_var = y_var * self._y_train_std ** 2 will work well if there is only one target.
In multi-target scene, we should change it like this:

# undo normalisation
# y_var = y_var * self._y_train_std ** 2
y_var = y_var.reshape((-1, 1))
y_var = np.einsum("ij,j->ij", y_var, self._y_train_std ** 2)

PS:
I think we need to add an EPS to the self._y_train_std to avoid zero division error.

# Normalize target value
if self.normalize_y:
    self._y_train_mean = np.mean(y, axis=0)
    # self._y_train_std = np.std(y, axis=0)
    self._y_train_std = np.std(y, axis=0) + self.EPS # avoid zero division error

    # Remove mean and make unit variance
    y = (y - self._y_train_mean) / self._y_train_std

Metadata

Metadata

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions