Skip to content

Conversation

afonari
Copy link
Contributor

@afonari afonari commented Mar 17, 2021

This is merged PR of two PRs: #18388, #19361.
This fixes: #18318.

@ogrisel
Copy link
Member

ogrisel commented Mar 17, 2021

Also please fix the linting problem reported by the CI and expand the test by testing with multi-target data: a Y matrix where one column is a constant 2. for instance and the other is random normal data:

n_samples = X.shape[0]
rng = np.random.RandomState(0)
Y = np.concatenate([
    rng.normal(size=(n_samples, 1)),  # non-constant target
    np.full(shape=(n_samples, 1), fill_value=2)  # constant target
], axis=1)

@cmarmo cmarmo added this to the 0.24.2 milestone Mar 17, 2021
Copy link
Member

@ogrisel ogrisel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for improving the tests. LGTM. Just a few more suggestions below.

I am no GPR specialist so I would appreciate it if others (e.g. @jaburke166 @boricles, @sobkevich, @jmetzen, @plgreenLIRU) could have a look.

@afonari
Copy link
Contributor Author

afonari commented Mar 19, 2021

Added a commented test as discussed.

Copy link
Member

@ogrisel ogrisel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

@afonari
Copy link
Contributor Author

afonari commented Mar 24, 2021

Can this be merged?

Copy link
Contributor

@cmarmo cmarmo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Last detail, then LGTM.

Co-authored-by: Chiara Marmo <cmarmo@users.noreply.github.com>
@sobkevich
Copy link

Thanks for improving the tests. LGTM. Just a few more suggestions below.

I am no GPR specialist so I would appreciate it if others (e.g. @jaburke166 @boricles, @sobkevich, @jmetzen, @plgreenLIRU) could have a look.

Hi,
with the current approach I had problems with test test_predict_cov_vs_std(kernel) if constant y is added to it.
If this test is run for constant y there will be:

(test_predict_cov_vs_std[kernel1-y1])
kernel = RBF(length_scale=1), y = array([1., 1., 1., 1., 1., 1.])

    @pytest.mark.parametrize('kernel,y',
                             list(product(kernels, [y, y_with_zero_std])))
    def test_predict_cov_vs_std(kernel, y):
        if sys.maxsize <= 2 ** 32 and sys.version_info[:2] == (3, 6):
            pytest.xfail("This test may fail on 32bit Py3.6")
    
        # Test that predicted std.-dev. is consistent with cov's diagonal.
        gpr = GaussianProcessRegressor(kernel=kernel).fit(X, y)
        y_mean, y_cov = gpr.predict(X2, return_cov=True)
        y_mean, y_std = gpr.predict(X2, return_std=True)
>       assert_almost_equal(np.sqrt(np.diag(y_cov)), y_std)
E       AssertionError: 
E       Arrays are not almost equal to 7 decimals
E       
E       Mismatch: 100%
E       Max absolute difference: 0.00182264
E       Max relative difference: inf
E        x: array([6.5100511e-06, 4.4185900e-06, 4.1690509e-06, 4.8057573e-06,
E              5.8756981e-06])
E        y: array([1.5440852e-03, 1.8270599e-03, 0.0000000e+00, 1.0005933e-05,
E              1.4597007e-03])

But I really don't know if it is important to save this equality for case where y is constant

@afonari
Copy link
Contributor Author

afonari commented Mar 30, 2021

with the current approach I had problems with test test_predict_cov_vs_std(kernel) if constant y is added to it.

Note that for the fixed kernel: RBF(length_scale=1.0, length_scale_bounds="fixed") it passes. I also tried settings length_scale_bounds="fixed" to other kernels from the kernels list and those pass too. So maybe it is expected to fail for variable length kernels.

@sobkevich
Copy link

with the current approach I had problems with test test_predict_cov_vs_std(kernel) if constant y is added to it.

Note that for the fixed kernel: RBF(length_scale=1.0, length_scale_bounds="fixed") it passes. I also tried settings length_scale_bounds="fixed" to other kernels from the kernels list and those pass too. So maybe it is expected to fail foar variable length?

Maybe)

@afonari
Copy link
Contributor Author

afonari commented Apr 6, 2021

Anything to be done here?

@afonari
Copy link
Contributor Author

afonari commented Apr 19, 2021

Ping.

@glemaitre glemaitre self-requested a review April 21, 2021 15:08
Copy link
Member

@glemaitre glemaitre left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I merge main in the branch. LGTM. I just move the code of the test regarding multitarget in the related PR.

@glemaitre glemaitre merged commit b84afe5 into scikit-learn:main Apr 21, 2021
@afonari
Copy link
Contributor Author

afonari commented Apr 21, 2021

Thanks a lot everyone!

@glemaitre
Copy link
Member

The bug regarding the covariance and standard deviation is solved here: #19939

@glemaitre glemaitre added the To backport PR merged in master that need a backport to a release branch defined based on the milestone. label Apr 21, 2021
@glemaitre glemaitre mentioned this pull request Apr 22, 2021
12 tasks
glemaitre added a commit to glemaitre/scikit-learn that referenced this pull request Apr 22, 2021
…n#19703)

Co-authored-by: Sasha Fonari <fonari@schrodinger.com>
Co-authored-by: Chiara Marmo <cmarmo@users.noreply.github.com>
Co-authored-by: Guillaume Lemaitre <g.lemaitre58@gmail.com>
glemaitre added a commit that referenced this pull request Apr 28, 2021
Co-authored-by: Sasha Fonari <fonari@schrodinger.com>
Co-authored-by: Chiara Marmo <cmarmo@users.noreply.github.com>
Co-authored-by: Guillaume Lemaitre <g.lemaitre58@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug module:gaussian_process Regression To backport PR merged in master that need a backport to a release branch defined based on the milestone.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Regression in GP standard deviation where y_train.std() == 0
5 participants