Cross validation error of a Gaussian process with noisy target #26328

iliakh · 2023-05-05T02:56:32Z

Describe the bug

Hi,

I'm trying to use RandomizedSearchCV with GP with a vector of alpha's.
It seems that with the cross validation the alpha are not being split into train/test sets because I get the following error:
ValueError: alpha must be a scalar or an array with same number of entries as y. (100 != 80)

Is there any workaround, or am I missing something?

Thanks!

Steps/Code to Reproduce

import numpy as np
from sklearn.gaussian_process import GaussianProcessRegressor
from sklearn.gaussian_process.kernels import RBF
from sklearn.model_selection import RandomizedSearchCV

X = np.random.rand(100, 2)
y = np.sin(X[:, 0]) + np.cos(X[:, 1])
alpha = 0.1 * np.ones_like(y)

kernel = RBF(length_scale=1.)

param_dist = {
    'kernel__length_scale': (1e-3, 1e3),
}


gp = GaussianProcessRegressor(kernel=kernel, alpha=alpha)
rs = RandomizedSearchCV(gp, param_distributions=param_dist, cv=5, n_iter=50)

rs.fit(X, y)

print("Best hyperparameters: ", rs.best_params_)
print("Best score: ", rs.best_score_)

Expected Results

No error is thrown and estimation is based on the values and target uncertainty

Actual Results

ValueError: alpha must be a scalar or an array with same number of entries as y. (100 != 80)

Versions

System:
    python: 3.10.10 | packaged by conda-forge | (main, Mar 24 2023, 20:12:31) [Clang 14.0.6 ]
   machine: macOS-13.3.1-arm64-arm-64bit

Python dependencies:
      sklearn: 1.2.2
          pip: 23.1.2
   setuptools: 67.7.2
        numpy: 1.23.5
        scipy: 1.10.0
       Cython: None
       pandas: 1.5.2
   matplotlib: 3.6.2
       joblib: 1.2.0
threadpoolctl: 3.1.0

Built with OpenMP: True

AndersonYin · 2023-05-16T16:18:48Z

Yeah, I checked the code, parameters, alpha here, of the estimator are not splited into k-folder form. It is easy to cope this case, but what I wander is there a general way to cope with all estimator with arguments need to be splited?

adrinjalali · 2023-06-01T13:10:45Z

That's because while doing a CV search, the data is split into train/test, but your alpha is still of length 100.

You can instead pass a single scalar since you're adding a constant there anyway. Once SLEP006 #24027 is merged we could think of a better API for constructor args which are sample aligned (they're metadata in a way).

For now using a scalar works.

iliakh added Bug Needs Triage Issue requires triage labels May 5, 2023

adrinjalali closed this as completed Jun 1, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Cross validation error of a Gaussian process with noisy target #26328

Cross validation error of a Gaussian process with noisy target #26328

iliakh commented May 5, 2023 •

edited by adrinjalali

Loading

AndersonYin commented May 16, 2023

adrinjalali commented Jun 1, 2023

Cross validation error of a Gaussian process with noisy target #26328

Cross validation error of a Gaussian process with noisy target #26328

Comments

iliakh commented May 5, 2023 • edited by adrinjalali Loading

Describe the bug

Steps/Code to Reproduce

Expected Results

Actual Results

Versions

AndersonYin commented May 16, 2023

adrinjalali commented Jun 1, 2023

iliakh commented May 5, 2023 •

edited by adrinjalali

Loading