-
-
Notifications
You must be signed in to change notification settings - Fork 25.8k
Added tolerance to _handle_zeros_in_scale #17805
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Added floating point tolerance to _handle_zeros_in_scale to address issue scikit-learn#17794 created on 6/30/2020. I'm using numpy's isclose() function with default absolute and relative tolerance values. The default values handled my test cases fine up until floats around 1e+20 when the variable 'scale' grew to non-zero values even for constant-valued vectors. There may be floating point sensitivities in that function as well but that's outside the scope of this issue. I also could not test the first if-statement in _handle_zeros_in_scale which checks for scalars close to zero through StandardScaler(). Scalar values passed in are stopped by check_array(). It may be prudent to adjust this statement as well, but without a way to properly check it and deeper knowledge of the package at the moment, I didn't want to mess with it.
Updating format from linting results. Added floating point tolerance to _handle_zeros_in_scale to address issue scikit-learn#17794 created on 6/30/2020. I'm using numpy's isclose() function with default absolute and relative tolerance values. The default values handled my test cases fine up until floats around 1e+20 when the variable 'scale' grew to non-zero values even for constant-valued vectors. There may be floating point sensitivities in that function as well but that's outside the scope of this issue. I also could not test the first if-statement in _handle_zeros_in_scale which checks for scalars close to zero through StandardScaler(). Scalar values passed in are stopped by check_array(). It may be prudent to adjust this statement as well, but without a way to properly check it and deeper knowledge of the package at the moment, I didn't want to mess with it.
I'm not sure why my circleci lint test is failing. It looks like it's have an issue checking out the code. |
Yes, a known and recent issue in master.
|
Ok, let me know if I need to make any adjustments on my end for the pull request related to the circleci test or otherwise. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for the PR @rpstanley90 !
@@ -74,7 +74,7 @@ def _handle_zeros_in_scale(scale, copy=True): | |||
if copy: | |||
# New array to avoid side-effects | |||
scale = scale.copy() | |||
scale[scale == 0.0] = 1.0 | |||
scale[np.isclose(scale, 0.0)] = 1.0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please add a non-regression test that would fail at master but pass in this PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @thomasjpfan do you have a link or anything with an example showing me what you're looking for? This is my first pull request so it's all new to me. I have a script in issue #17794 highlighting the issue. Would modifying this to demonstrate the correction be sufficient, or is there something else I need to do?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Adding a test similar to #17794 to make sure this fixes the issue will work.
The original issue has been fixed in #19788. This is no longer needed. |
Added floating point tolerance to _handle_zeros_in_scale to address issue #17794 created on 6/30/2020. I'm using numpy's isclose() function with default absolute and relative tolerance values. The default values handled my test cases fine up until floats around 1e+20 when the variable 'scale' grew to non-zero values even for constant-valued vectors. There may be floating point sensitivities in that function as well but that's outside the scope of this issue.
I also could not test the first if-statement in _handle_zeros_in_scale which checks for scalars close to zero through StandardScaler(). Scalar values passed in are stopped by check_array(). It may be prudent to adjust this statement as well, but without a way to properly check it and deeper knowledge of the package at the moment, I didn't want to mess with it.
Reference Issues/PRs
What does this implement/fix? Explain your changes.
Any other comments?