-
-
Notifications
You must be signed in to change notification settings - Fork 25.8k
Handling division by zero std for GaussianProcessRegressor #19361
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Handling division by zero std for GaussianProcessRegressor #19361
Conversation
sklearn/gaussian_process/_gpr.py
Outdated
@@ -519,3 +519,20 @@ def _constrained_optimization(self, obj_func, initial_theta, bounds): | |||
|
|||
def _more_tags(self): | |||
return {'requires_fit': False} | |||
|
|||
|
|||
def _handle_zeros_in_std(std, copy=True): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not rather import existing?
from sklearn.preprocessing._data import _handle_zeros_in_scale |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@kernc
I thought it would be "unconventional" to import function with underscore prefix. I'll fix this.
upd.: done.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ideally, the function would be rephrased as return np.where(std, std, 1, copy=copy)
and phased out.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
It seems that np.where()
doesn't have an argument copy
, please, let me know if I'm missing something
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It doesn't. I said ideally. 😅
You are aware I'm not a maintainer here? With the amount of change now, this might have fewer chances to be accepted. Change is expensive. Not to mention how np.where()
always produces a copy, which might be less efficient. Best get some maintainer feedback first.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You are aware I'm not a maintainer here?
Sorry, I've just noticed that 🤦♀️
You are right, for some cases additional resource consumption caused by np.where()
may appear to be crucial, so it seems logical not to touch all parts that work correct and return to the usage of _handle_zeros_in_scale
with copy=False
in GaussianProcessRegressor
Also make sure the linters pass. It's what's failing the CI now. |
Hi @sobkevich, thanks for your pull request. Note that this pull request could fix #18318, for which another pull request (#18388) was already opened. |
To be honest ... I don't know... I find your solution quite 'a la scikit-learn' ... :) ... let's wait if there would be any answer in the previous PR. Thanks for your patience. |
@@ -200,7 +201,8 @@ def fit(self, X, y): | |||
self._y_train_std = np.std(y, axis=0) | |||
|
|||
# Remove mean and make unit variance | |||
y = (y - self._y_train_mean) / self._y_train_std | |||
y = (y - self._y_train_mean) / \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi there!
I'm very new to the whole "contributing" side of GitHub but I've been affected by the std_dev==0
issue that has seemingly been discussed since as early as November of last year! I'm glad this is being rectified now.
The reason I'm commenting is because I copied this fix into the current version of _gpr.py (version 0.24.0) but the issue seems to be persistent. My thoughts are that self._y_std_dev
needs to be stored as _handle_zeros_in_scale(np.std(y,axis=0), copy=False)
(or something similar) instead of the scaling issue only being dealt with only in the denominator.
I write this because in line 201 the self._y_std_dev
will still be 0 and be carried to other functions (like predict()
and sample_y()
. For example, further downstream in predict()
the covariance matrix on line 357 will be multiplied by this small number: y_cov = y_cov * self._y_train_std**2
. This will subsequently cause issues in sample_y()
as well.
I could be wrong here as I'm no expert, but I thought I'd comment my thoughts nonetheless!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jaburke166
Thank you for your comment, I've just added a case for y_with_zero_std
to test_y_normalization
. It works with the current approach but fails if self._y_train_std = _handle_zeros_in_scale(np.std(y, axis=0), copy=False)
Could you please provide more details about your concern about passing zeros as std to predict()
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @sobkevich,
With the changes you've made on _gpr.py, it seems that self._y_train_std
is not overwritten by _handle_zeros_in_scale()
in the case where self._y_train_std
is extremely small or 0 . Wouldn't this mean that further downstream in predict(self, X, **return_cov=True**)
, when the normalisation is inverted (lines 349-350 and 356-357), the output of y_mean
and y_cov
will not be aligned with the fact we've set self._y_train_std=1
?
Given the assumption that self._y_train_std
is almost 0, squaring this and multiplying it by y_cov
on line 357 will surely run into numerical issues which could results in y_cov
losing its positive semi-definite property which when sampling from the GP in sample_y()
on line 416 (or 419 for multidimensional y
) using rng.multivariate_normal()
we will be trying to sample from a non-positive semi-definite covariance matrix which would throw back rubbish samples (I think, but I am no GP expert at all).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @sobkevich,
I'm wondering if you've had any time to have a look at this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jaburke166 I believe that the fix you suggest would be to define:
self._y_train_std = _handle_zeros_in_scale(np.std(y, axis=0), copy=False)
instead of detecting constant y at scaling time but I would like to be sure.
I am worried that that this would also be wrong in some other way but I am not sure.
Could you suggest the code snippet for a new test case to add to the test suite that would highlight the problem you are raising above?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jblackburne this solution is being implemented in #18388 but it lacks good tests so feel free to suggest one there.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ogrisel
@jaburke166, yes, you're right that with my approach all y_cov
are going to be filled with zeros, so I tried the fix you've suggested: self._y_train_std = _handle_zeros_in_scale(np.std(y, axis=0), copy=False)
. With this approach with some kernels y_cov
would also be filled with zeros (if y_cov
has only zero elements it is also positive semi-definite because all eigenvalues >=0, in such case all samples would be in y_mean
point) but for some kernels it is not zero-like. So it is a better solution.
But I've also noticed that test test_predict_cov_vs_std(kernel)
fails when it receives y_with_zero_std
both with and without normalization. I've considered adding some noise np.random.normal(0, 1e-8, y.shape)
to y
instead of _handle_zeros_in_scale
, it seems to be a working solution, but I'm also not a GP expert.
And tomorrow I'll check if I missed anything in tests.
Solved in #19703. |
Reference Issues/PRs
Related to #994 which is posted to scikit-optimize but actually refers to scikit-learn
What does this implement/fix?
This PR fixes behavior for cases where std is zero (or contains zeros).
In the current behavior division by zero std (during target values normalization) produces an array of nans, which leads to ValueError: array must not contain infs or NaNs.
In a new version division by zero is handled in the same way as in StandardScaler.