Skip to content

Handling division by zero std for GaussianProcessRegressor #19361

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed

Conversation

sobkevich
Copy link

@sobkevich sobkevich commented Feb 5, 2021

Reference Issues/PRs

Related to #994 which is posted to scikit-optimize but actually refers to scikit-learn

What does this implement/fix?

This PR fixes behavior for cases where std is zero (or contains zeros).
In the current behavior division by zero std (during target values normalization) produces an array of nans, which leads to ValueError: array must not contain infs or NaNs.
In a new version division by zero is handled in the same way as in StandardScaler.

@@ -519,3 +519,20 @@ def _constrained_optimization(self, obj_func, initial_theta, bounds):

def _more_tags(self):
return {'requires_fit': False}


def _handle_zeros_in_std(std, copy=True):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not rather import existing?

from sklearn.preprocessing._data import _handle_zeros_in_scale

Copy link
Author

@sobkevich sobkevich Feb 6, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kernc
I thought it would be "unconventional" to import function with underscore prefix. I'll fix this.
upd.: done.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ideally, the function would be rephrased as return np.where(std, std, 1, copy=copy) and phased out.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.
It seems that np.where() doesn't have an argument copy, please, let me know if I'm missing something

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It doesn't. I said ideally. 😅

You are aware I'm not a maintainer here? With the amount of change now, this might have fewer chances to be accepted. Change is expensive. Not to mention how np.where() always produces a copy, which might be less efficient. Best get some maintainer feedback first.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are aware I'm not a maintainer here?

Sorry, I've just noticed that 🤦‍♀️

You are right, for some cases additional resource consumption caused by np.where() may appear to be crucial, so it seems logical not to touch all parts that work correct and return to the usage of _handle_zeros_in_scale with copy=False in GaussianProcessRegressor

@kernc
Copy link
Contributor

kernc commented Feb 6, 2021

Also make sure the linters pass. It's what's failing the CI now.

@cmarmo
Copy link
Contributor

cmarmo commented Feb 7, 2021

Hi @sobkevich, thanks for your pull request. Note that this pull request could fix #18318, for which another pull request (#18388) was already opened.

@sobkevich
Copy link
Author

@cmarmo thank you for letting me know that there is already the PR for fixing this behavior. Could you please tell if I should close this pull request now or after merging PR #18388?

@cmarmo
Copy link
Contributor

cmarmo commented Feb 13, 2021

Could you please tell if I should close this pull request now or after merging PR #18388?

To be honest ... I don't know... I find your solution quite 'a la scikit-learn' ... :) ... let's wait if there would be any answer in the previous PR. Thanks for your patience.

@@ -200,7 +201,8 @@ def fit(self, X, y):
self._y_train_std = np.std(y, axis=0)

# Remove mean and make unit variance
y = (y - self._y_train_mean) / self._y_train_std
y = (y - self._y_train_mean) / \
Copy link

@jaburke166 jaburke166 Feb 15, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi there!

I'm very new to the whole "contributing" side of GitHub but I've been affected by the std_dev==0 issue that has seemingly been discussed since as early as November of last year! I'm glad this is being rectified now.

The reason I'm commenting is because I copied this fix into the current version of _gpr.py (version 0.24.0) but the issue seems to be persistent. My thoughts are that self._y_std_dev needs to be stored as _handle_zeros_in_scale(np.std(y,axis=0), copy=False) (or something similar) instead of the scaling issue only being dealt with only in the denominator.

I write this because in line 201 the self._y_std_dev will still be 0 and be carried to other functions (like predict() and sample_y(). For example, further downstream in predict() the covariance matrix on line 357 will be multiplied by this small number: y_cov = y_cov * self._y_train_std**2. This will subsequently cause issues in sample_y() as well.

I could be wrong here as I'm no expert, but I thought I'd comment my thoughts nonetheless!

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jaburke166
Thank you for your comment, I've just added a case for y_with_zero_std to test_y_normalization. It works with the current approach but fails if self._y_train_std = _handle_zeros_in_scale(np.std(y, axis=0), copy=False)
Could you please provide more details about your concern about passing zeros as std to predict()?

Copy link

@jaburke166 jaburke166 Feb 16, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @sobkevich,

With the changes you've made on _gpr.py, it seems that self._y_train_std is not overwritten by _handle_zeros_in_scale() in the case where self._y_train_std is extremely small or 0 . Wouldn't this mean that further downstream in predict(self, X, **return_cov=True**), when the normalisation is inverted (lines 349-350 and 356-357), the output of y_mean and y_cov will not be aligned with the fact we've set self._y_train_std=1?

Given the assumption that self._y_train_std is almost 0, squaring this and multiplying it by y_cov on line 357 will surely run into numerical issues which could results in y_cov losing its positive semi-definite property which when sampling from the GP in sample_y() on line 416 (or 419 for multidimensional y) using rng.multivariate_normal() we will be trying to sample from a non-positive semi-definite covariance matrix which would throw back rubbish samples (I think, but I am no GP expert at all).

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @sobkevich,

I'm wondering if you've had any time to have a look at this?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jaburke166 I believe that the fix you suggest would be to define:

self._y_train_std = _handle_zeros_in_scale(np.std(y, axis=0), copy=False)

instead of detecting constant y at scaling time but I would like to be sure.

I am worried that that this would also be wrong in some other way but I am not sure.

Could you suggest the code snippet for a new test case to add to the test suite that would highlight the problem you are raising above?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jblackburne this solution is being implemented in #18388 but it lacks good tests so feel free to suggest one there.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ogrisel
@jaburke166, yes, you're right that with my approach all y_cov are going to be filled with zeros, so I tried the fix you've suggested: self._y_train_std = _handle_zeros_in_scale(np.std(y, axis=0), copy=False). With this approach with some kernels y_cov would also be filled with zeros (if y_cov has only zero elements it is also positive semi-definite because all eigenvalues >=0, in such case all samples would be in y_mean point) but for some kernels it is not zero-like. So it is a better solution.
But I've also noticed that test test_predict_cov_vs_std(kernel) fails when it receives y_with_zero_std both with and without normalization. I've considered adding some noise np.random.normal(0, 1e-8, y.shape) to y instead of _handle_zeros_in_scale, it seems to be a working solution, but I'm also not a GP expert.
And tomorrow I'll check if I missed anything in tests.

@cmarmo
Copy link
Contributor

cmarmo commented Apr 21, 2021

Solved in #19703.

@cmarmo cmarmo closed this Apr 21, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
module:gaussian_process Superseded PR has been replace by a newer PR
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants