Skip to content

A possible alternative to alpha at fit-time for GaussianProcessRegressor #7975

@aflaxman

Description

@aflaxman

I love the new GaussianProcessRegressor class, thanks so much. One thing that I think would make it a little easier for me to use/explain to others is if there were an alternative to the alpha option in the constructor to include the "weight" of individual samples. Here is a minimal example of the current approach:

import numpy as np, sklearn.gaussian_process, sklearn.gaussian_process.kernels

X = [[0], [1]]
y = [.5, .5]
se = np.array([.1, .5])

kernel = sklearn.gaussian_process.kernels.Matern(length_scale=1.0, nu=1.5)
gp = sklearn.gaussian_process.GaussianProcessRegressor(kernel=kernel, alpha=se**2, optimizer=None) # set data variance with alpha parameter _here_
gp.fit(X, y)  # then set data values _here_

The second row of data has 5x more variation than the first, and the GPR handles this beautifully (see notebook), but I find it aesthetically unappealing to set alpha in the constructor when the data that needs to match it is not set until the fit method is called.

I would prefer an approach where the alpha value was set in the fit function, such as

gp = sklearn.gaussian_process.GaussianProcessRegressor(kernel=kernel, optimizer=None)
gp.fit(X, y, alpha=se**2)

or, since it seems like sample_weight is used for doing this in LinearRegression and other places, it might be more consistent to use that instead of alpha:

gp.fit(X, y, sample_weight=se**-2)

I can potentially put together a pull request if this is a change that you are interested in. Thanks again!

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions