-
-
Notifications
You must be signed in to change notification settings - Fork 25.8k
Handling division by zero std for GaussianProcessRegressor #19361
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Closed
sobkevich
wants to merge
10
commits into
scikit-learn:main
from
sobkevich:gpr_division_by_zero_handling
Closed
Changes from all commits
Commits
Show all changes
10 commits
Select commit
Hold shift + click to select a range
0ce10e4
handling division by zero std for GaussianProcessRegressor
sobkevich e4ae1c3
test behavior for devision by zero std
sobkevich 7df01af
usage of the existing func _handle_zeros_in_scale
sobkevich 8af2dca
rephrase _handle_zeros_in_scale using np.where()
sobkevich 12e9bbf
replace _handle_zeros_in_scale with np.where()
sobkevich 68829d2
fix linting, return to _handle_zeros_in_scale
sobkevich adde6db
fix linting in docstring
sobkevich 6f715f2
add y with zero std to test_y_normalization, update normalization fun…
sobkevich 7250179
get empty line back
sobkevich dae1b0c
fix line length
sobkevich File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi there!
I'm very new to the whole "contributing" side of GitHub but I've been affected by the
std_dev==0
issue that has seemingly been discussed since as early as November of last year! I'm glad this is being rectified now.The reason I'm commenting is because I copied this fix into the current version of _gpr.py (version 0.24.0) but the issue seems to be persistent. My thoughts are that
self._y_std_dev
needs to be stored as_handle_zeros_in_scale(np.std(y,axis=0), copy=False)
(or something similar) instead of the scaling issue only being dealt with only in the denominator.I write this because in line 201 the
self._y_std_dev
will still be 0 and be carried to other functions (likepredict()
andsample_y()
. For example, further downstream inpredict()
the covariance matrix on line 357 will be multiplied by this small number:y_cov = y_cov * self._y_train_std**2
. This will subsequently cause issues insample_y()
as well.I could be wrong here as I'm no expert, but I thought I'd comment my thoughts nonetheless!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jaburke166
Thank you for your comment, I've just added a case for
y_with_zero_std
totest_y_normalization
. It works with the current approach but fails ifself._y_train_std = _handle_zeros_in_scale(np.std(y, axis=0), copy=False)
Could you please provide more details about your concern about passing zeros as std to
predict()
?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @sobkevich,
With the changes you've made on _gpr.py, it seems that
self._y_train_std
is not overwritten by_handle_zeros_in_scale()
in the case whereself._y_train_std
is extremely small or 0 . Wouldn't this mean that further downstream inpredict(self, X, **return_cov=True**)
, when the normalisation is inverted (lines 349-350 and 356-357), the output ofy_mean
andy_cov
will not be aligned with the fact we've setself._y_train_std=1
?Given the assumption that
self._y_train_std
is almost 0, squaring this and multiplying it byy_cov
on line 357 will surely run into numerical issues which could results iny_cov
losing its positive semi-definite property which when sampling from the GP insample_y()
on line 416 (or 419 for multidimensionaly
) usingrng.multivariate_normal()
we will be trying to sample from a non-positive semi-definite covariance matrix which would throw back rubbish samples (I think, but I am no GP expert at all).There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @sobkevich,
I'm wondering if you've had any time to have a look at this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jaburke166 I believe that the fix you suggest would be to define:
instead of detecting constant y at scaling time but I would like to be sure.
I am worried that that this would also be wrong in some other way but I am not sure.
Could you suggest the code snippet for a new test case to add to the test suite that would highlight the problem you are raising above?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jblackburne this solution is being implemented in #18388 but it lacks good tests so feel free to suggest one there.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ogrisel
@jaburke166, yes, you're right that with my approach all
y_cov
are going to be filled with zeros, so I tried the fix you've suggested:self._y_train_std = _handle_zeros_in_scale(np.std(y, axis=0), copy=False)
. With this approach with some kernelsy_cov
would also be filled with zeros (ify_cov
has only zero elements it is also positive semi-definite because all eigenvalues >=0, in such case all samples would be iny_mean
point) but for some kernels it is not zero-like. So it is a better solution.But I've also noticed that test
test_predict_cov_vs_std(kernel)
fails when it receivesy_with_zero_std
both with and without normalization. I've considered adding some noisenp.random.normal(0, 1e-8, y.shape)
toy
instead of_handle_zeros_in_scale
, it seems to be a working solution, but I'm also not a GP expert.And tomorrow I'll check if I missed anything in tests.