-
-
Notifications
You must be signed in to change notification settings - Fork 25.8k
[MRG] Fixes: Predicted standard deviation values of Gaussian Processes are only within [0, 1] #15782
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[MRG] Fixes: Predicted standard deviation values of Gaussian Processes are only within [0, 1] #15782
Changes from all commits
34cef3a
9b154a3
edb6b67
4d154a3
275eaf1
04024f5
717209d
b571754
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,6 +1,7 @@ | ||
"""Testing for Gaussian process regression """ | ||
|
||
# Author: Jan Hendrik Metzen <jhm@informatik.uni-bremen.de> | ||
# Modified by: Pete Green <p.l.green@liverpool.ac.uk> | ||
# License: BSD 3 clause | ||
|
||
import sys | ||
|
@@ -19,7 +20,8 @@ | |
from sklearn.utils._testing \ | ||
import (assert_array_less, | ||
assert_almost_equal, assert_raise_message, | ||
assert_array_almost_equal, assert_array_equal) | ||
assert_array_almost_equal, assert_array_equal, | ||
assert_allclose) | ||
|
||
|
||
def f(x): | ||
|
@@ -232,33 +234,103 @@ def test_random_starts(): | |
|
||
@pytest.mark.parametrize('kernel', kernels) | ||
def test_y_normalization(kernel): | ||
# Test normalization of the target values in GP | ||
""" | ||
Test normalization of the target values in GP | ||
|
||
# Fitting non-normalizing GP on normalized y and fitting normalizing GP | ||
# on unnormalized y should yield identical results | ||
y_mean = y.mean(0) | ||
y_norm = y - y_mean | ||
Fitting non-normalizing GP on normalized y and fitting normalizing GP | ||
on unnormalized y should yield identical results. Note that, here, | ||
'normalized y' refers to y that has been made zero mean and unit | ||
variance. | ||
|
||
""" | ||
|
||
y_mean = np.mean(y) | ||
y_std = np.std(y) | ||
y_norm = (y - y_mean) / y_std | ||
|
||
# Fit non-normalizing GP on normalized y | ||
gpr = GaussianProcessRegressor(kernel=kernel) | ||
gpr.fit(X, y_norm) | ||
|
||
# Fit normalizing GP on unnormalized y | ||
gpr_norm = GaussianProcessRegressor(kernel=kernel, normalize_y=True) | ||
gpr_norm.fit(X, y) | ||
|
||
# Compare predicted mean, std-devs and covariances | ||
y_pred, y_pred_std = gpr.predict(X2, return_std=True) | ||
y_pred = y_mean + y_pred | ||
y_pred = y_pred * y_std + y_mean | ||
y_pred_std = y_pred_std * y_std | ||
y_pred_norm, y_pred_std_norm = gpr_norm.predict(X2, return_std=True) | ||
|
||
assert_almost_equal(y_pred, y_pred_norm) | ||
assert_almost_equal(y_pred_std, y_pred_std_norm) | ||
|
||
_, y_cov = gpr.predict(X2, return_cov=True) | ||
y_cov = y_cov * y_std**2 | ||
_, y_cov_norm = gpr_norm.predict(X2, return_cov=True) | ||
|
||
assert_almost_equal(y_cov, y_cov_norm) | ||
|
||
|
||
def test_large_variance_y(): | ||
""" | ||
Here we test that, when noramlize_y=True, our GP can produce a | ||
sensible fit to training data whose variance is significantly | ||
larger than unity. This test was made in response to issue #15612. | ||
|
||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. rm blank line There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Done. |
||
GP predictions are verified against predictions that were made | ||
using GPy which, here, is treated as the 'gold standard'. Note that we | ||
only investigate the RBF kernel here, as that is what was used in the | ||
GPy implementation. | ||
|
||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. rm blank line There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Done. Also corrected a small typo in the test description. |
||
The following code can be used to recreate the GPy data: | ||
|
||
-------------------------------------------------------------------------- | ||
import GPy | ||
|
||
kernel_gpy = GPy.kern.RBF(input_dim=1, lengthscale=1.) | ||
gpy = GPy.models.GPRegression(X, np.vstack(y_large), kernel_gpy) | ||
gpy.optimize() | ||
y_pred_gpy, y_var_gpy = gpy.predict(X2) | ||
y_pred_std_gpy = np.sqrt(y_var_gpy) | ||
-------------------------------------------------------------------------- | ||
""" | ||
|
||
# Here we utilise a larger variance version of the training data | ||
y_large = 10 * y | ||
|
||
# Standard GP with normalize_y=True | ||
RBF_params = {'length_scale': 1.0} | ||
kernel = RBF(**RBF_params) | ||
gpr = GaussianProcessRegressor(kernel=kernel, normalize_y=True) | ||
gpr.fit(X, y_large) | ||
y_pred, y_pred_std = gpr.predict(X2, return_std=True) | ||
|
||
# 'Gold standard' mean predictions from GPy | ||
y_pred_gpy = np.array([15.16918303, | ||
-27.98707845, | ||
-39.31636019, | ||
14.52605515, | ||
69.18503589]) | ||
|
||
# 'Gold standard' std predictions from GPy | ||
y_pred_std_gpy = np.array([7.78860962, | ||
3.83179178, | ||
0.63149951, | ||
0.52745188, | ||
0.86170042]) | ||
|
||
# Based on numerical experiments, it's reasonable to expect our | ||
# GP's mean predictions to get within 7% of predictions of those | ||
# made by GPy. | ||
assert_allclose(y_pred, y_pred_gpy, rtol=0.07, atol=0) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This assertion fails at master. Is it correct that we should expect the mean prediction to change with this PR? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. By changing the amplitude of y, we also change the optimum hyperparameters so yes, this would also alter the mean predictions. A really hand-wavy explanation: for the higher amplitude cases two points that are next to each other on the x-axis will now be further apart on the y-axis, so can be viewed as 'less correlated'. Running the current code on data generated from y = x * sin(x) gives us lengthscale 1.587, but running the code on data generated from y = 10 * (x * sin(x)) gives us lengthscale 0.93, for example. Like I said, very hand-wavy, but the optimum hyperparameters are different between the two cases. It's worth noting that this change brings the predictions made by the current code and GPy much closer but they won't be the same, as GPy is also using the vertical lengthscale parameter in the radial basis function kernel. Initially I thought that the normalisation I'm suggesting here would make the optimum vertical lengthscale equal to 1 (so the difference wouldn't matter) but the maths I did earlier showed that this isn't the case. It is very close to 1 though. I think that the normalisation helps to improve the fidelity of the GP a lot. After that, the introduction of a vertical lengthscale parameter would improve it even more, but not by as much. If you like, I'd be happy to take a look at introducing a vertical length scale parameter later, as a separate PR in the future? |
||
|
||
# Based on numerical experiments, it's reasonable to expect our | ||
# GP's std predictions to get within 15% of predictions of those | ||
# made by GPy. | ||
assert_allclose(y_pred_std, y_pred_std_gpy, rtol=0.15, atol=0) | ||
|
||
|
||
def test_y_multioutput(): | ||
# Test that GPR can deal with multi-dimensional target values | ||
y_2d = np.vstack((y, y * 2)).T | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is this comment about the theoretical concerns of using this parameter removed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Now this is a tricky one! Partly because discussions around the likelihood principle can go on for ever...
Really my concern here is a practical one. The comment seems to be telling the user that they should not normalise the data but, as we now know, the model only actually provides predictions whose standard deviation can only be between 0 and 1. So, normalisation is essential in this case, given the model's limitations. I think it's important that we don't put the user off from using the normalisation as, for this implementation, it's really crucial.