Skip to content

BUG Fix covariance and stdev shape in GPR with normalize_y #22199

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 17 commits into from
Feb 14, 2022

Conversation

Tenavi
Copy link
Contributor

@Tenavi Tenavi commented Jan 12, 2022

Reference Issues/PRs

Resolves #22174
Resolves #22175

What does this implement/fix? Explain your changes.

  • Makes _y_train_std the appropriate shape in case normalize_y=False. Then numpy.outer(y_var, self._y_train_std**2) and numpy.outer(y_cov, self._y_train_std**2) always make n_targets predictions. These still get squeezed as per convention. This fixes the shapes of arrays returned by predict when return_std=True or return_cov=True.
  • Applies a squeeze to y_mean inpredict if the shape is (n_samples, 1) to make this (n_samples,), following the documentation.
  • In sample_y, when n_targets > 1 then the loop in multivariate normal predictions also loops over the last axis (targets) of the predicted covariance array. This fixes the problem with incorrect shapes returned by sample_y.

Any other comments?

An old test had to be modified to reflect the correct shapes output by predict.

@@ -682,3 +683,43 @@ def test_y_std_with_multitarget_normalized():
assert y_pred.shape == (n_samples, n_targets)
assert y_std.shape == (n_samples, n_targets)
assert y_cov.shape == (n_samples, n_samples, n_targets)


def test_y_std_cov_with_multitarget():
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For this test, it would be better to use pytest.mark.parametrize on the previous test to avoid code redundancy.

In addition, I see that I already did a PR but forgot about it: #21996

It was covering one of the problems. Could you check that the solution is equivalent? I also see that we made an additional test to check the value of the std. dev. and covariance. This should be added in yours as well.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi! Yes your solution in #21996 is equivalent. In addition, I added line 399-400 which make sure that y_mean is also the right shape corresponding to the docstring for gpr.predict.

I merged your fixes in with this branch and updated the test use pytest.mark.parametrize as suggested. Is the test for the values of std. dev. and covariance that you mentioned the one included in #21996?

@glemaitre glemaitre changed the title [MRG] gaussian process shape fixes BUG fix the shape of the covariance and std. dev. in GPR depending on normalize_y Jan 27, 2022
"""Check the proper normalization of `y_std` and `y_cov` in multi-target scene.
@pytest.mark.parametrize("normalize_y", [True, False])
@pytest.mark.parametrize("n_targets", [0, 1, 10])
def test_multitarget_shape(normalize_y, n_targets):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since this test is only testing multi-target I think that we don't need to parameterize it with n_targets.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oops, yes I think the name on the test should be changed. I included n_targets = 0 and 1 because it was having problems with that, too.

Copy link
Member

@glemaitre glemaitre left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Otherwise LGTM

Copy link
Member

@glemaitre glemaitre left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I merged the comment. The PR LGTM.
@thomasjpfan since you reviewed my original PR, you might want to have a look such that we can merge this fix. I thought that we added it in 1.0.2 indeed.

Copy link
Member

@thomasjpfan thomasjpfan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor nits in the test, otherwise LGTM

@thomasjpfan thomasjpfan changed the title BUG fix the shape of the covariance and std. dev. in GPR depending on normalize_y BUG Fix covariance and stdev shape in GPR with normalize_y Feb 14, 2022
@thomasjpfan thomasjpfan merged commit 3786daf into scikit-learn:main Feb 14, 2022
@thomasjpfan
Copy link
Member

Thanks for working on this PR @Tenavi !

thomasjpfan pushed a commit to thomasjpfan/scikit-learn that referenced this pull request Mar 1, 2022
…arn#22199)

Co-authored-by: Guillaume Lemaitre <g.lemaitre58@gmail.com>
Co-authored-by: Nakamura-Zimmerer, Tenavi (ARC-AF) <tenavi.nakamura-zimmerer@nasa.gov>
joshua-cogliati-inl added a commit to joshua-cogliati-inl/raven that referenced this pull request Dec 6, 2024
wangcj05 pushed a commit to idaholab/raven that referenced this pull request Jan 14, 2025
* Updating to scipy 1.12

* Scikitlearn 1.0 is incompatible with scipy 1.12, so updating to 1.1

* Sometimes only a 1d array is returned.

* Removing testing kulsinski metric.

* simps is deprecated so switching to simpson

* Handle change from scikit-learn/scikit-learn#22199

* Improving test to classify better.

* Regold due to scikit change
scikit-learn/scikit-learn#22604

* Regold PolyExponential files (had rel err of 1e-03 or less)

* Make timestep uniform for scipy update.

* Regolding because of changes in scipy 1.12

* Increase limits to improve convergence.

* Unpinning xarray and updating numpy

* Updating various libraries.

* Fix working with newer tensorflow.

* Values need to be switched to tuples for hstack in numpy 1.26

* Updating to new ray version.

* The deque size can be bigger in python 3.11

* Report difference in row lengths, instead of crashing OrderedCSVDiffer.

Also report gold file name.

* Remove Fourier__signal_f__period10.0__phase

This was either +pi or -pi semirandomly, so nolonger testing it.

* Regolding changes to ROM/TimeSeries/DMD/BOPDMD because of library changes.

* Support xarray 2024.7 and newer.

Pre 2024.7 automatically squeeze()ed groupby results, so now need
to explicitly call squeeze().

* Fixing long line.

* Increasing zero threshold because of change in libraries.

* Remove version from setuptools since ray updated.

* Optimizing persistence in BayesianMatyas.

* Switch OVO to use estimator that is not constantly zero.

* Use keepdims instead of try catch block.

* Updating to default using python 3.11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Multi-target GPR sample_y fails when normalize_y=True Multi-target GPR predicts only 1 std when normalize_y=False
3 participants