Skip to content

Cross validation error of a Gaussian process with noisy target #26328

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
iliakh opened this issue May 5, 2023 · 2 comments
Closed

Cross validation error of a Gaussian process with noisy target #26328

iliakh opened this issue May 5, 2023 · 2 comments
Labels
Bug Needs Triage Issue requires triage

Comments

@iliakh
Copy link

iliakh commented May 5, 2023

Describe the bug

Hi,

I'm trying to use RandomizedSearchCV with GP with a vector of alpha's.
It seems that with the cross validation the alpha are not being split into train/test sets because I get the following error:
ValueError: alpha must be a scalar or an array with same number of entries as y. (100 != 80)

Is there any workaround, or am I missing something?

Thanks!

Steps/Code to Reproduce

import numpy as np
from sklearn.gaussian_process import GaussianProcessRegressor
from sklearn.gaussian_process.kernels import RBF
from sklearn.model_selection import RandomizedSearchCV

X = np.random.rand(100, 2)
y = np.sin(X[:, 0]) + np.cos(X[:, 1])
alpha = 0.1 * np.ones_like(y)

kernel = RBF(length_scale=1.)

param_dist = {
    'kernel__length_scale': (1e-3, 1e3),
}


gp = GaussianProcessRegressor(kernel=kernel, alpha=alpha)
rs = RandomizedSearchCV(gp, param_distributions=param_dist, cv=5, n_iter=50)

rs.fit(X, y)

print("Best hyperparameters: ", rs.best_params_)
print("Best score: ", rs.best_score_)

Expected Results

No error is thrown and estimation is based on the values and target uncertainty

Actual Results

ValueError: alpha must be a scalar or an array with same number of entries as y. (100 != 80)

Versions

System:
    python: 3.10.10 | packaged by conda-forge | (main, Mar 24 2023, 20:12:31) [Clang 14.0.6 ]
   machine: macOS-13.3.1-arm64-arm-64bit

Python dependencies:
      sklearn: 1.2.2
          pip: 23.1.2
   setuptools: 67.7.2
        numpy: 1.23.5
        scipy: 1.10.0
       Cython: None
       pandas: 1.5.2
   matplotlib: 3.6.2
       joblib: 1.2.0
threadpoolctl: 3.1.0

Built with OpenMP: True
@iliakh iliakh added Bug Needs Triage Issue requires triage labels May 5, 2023
@AndersonYin
Copy link

Yeah, I checked the code, parameters, alpha here, of the estimator are not splited into k-folder form. It is easy to cope this case, but what I wander is there a general way to cope with all estimator with arguments need to be splited?

@adrinjalali
Copy link
Member

That's because while doing a CV search, the data is split into train/test, but your alpha is still of length 100.

You can instead pass a single scalar since you're adding a constant there anyway. Once SLEP006 #24027 is merged we could think of a better API for constructor args which are sample aligned (they're metadata in a way).

For now using a scalar works.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Needs Triage Issue requires triage
Projects
None yet
Development

No branches or pull requests

3 participants