-
-
Notifications
You must be signed in to change notification settings - Fork 26.2k
Description
I love the new GaussianProcessRegressor class, thanks so much. One thing that I think would make it a little easier for me to use/explain to others is if there were an alternative to the alpha
option in the constructor to include the "weight" of individual samples. Here is a minimal example of the current approach:
import numpy as np, sklearn.gaussian_process, sklearn.gaussian_process.kernels
X = [[0], [1]]
y = [.5, .5]
se = np.array([.1, .5])
kernel = sklearn.gaussian_process.kernels.Matern(length_scale=1.0, nu=1.5)
gp = sklearn.gaussian_process.GaussianProcessRegressor(kernel=kernel, alpha=se**2, optimizer=None) # set data variance with alpha parameter _here_
gp.fit(X, y) # then set data values _here_
The second row of data has 5x more variation than the first, and the GPR handles this beautifully (see notebook), but I find it aesthetically unappealing to set alpha
in the constructor when the data that needs to match it is not set until the fit
method is called.
I would prefer an approach where the alpha value was set in the fit function, such as
gp = sklearn.gaussian_process.GaussianProcessRegressor(kernel=kernel, optimizer=None)
gp.fit(X, y, alpha=se**2)
or, since it seems like sample_weight
is used for doing this in LinearRegression
and other places, it might be more consistent to use that instead of alpha
:
gp.fit(X, y, sample_weight=se**-2)
I can potentially put together a pull request if this is a change that you are interested in. Thanks again!