Skip to content

Added tolerance to _handle_zeros_in_scale #17805

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 2 commits into from

Conversation

rastna12
Copy link

@rastna12 rastna12 commented Jul 1, 2020

Added floating point tolerance to _handle_zeros_in_scale to address issue #17794 created on 6/30/2020. I'm using numpy's isclose() function with default absolute and relative tolerance values. The default values handled my test cases fine up until floats around 1e+20 when the variable 'scale' grew to non-zero values even for constant-valued vectors. There may be floating point sensitivities in that function as well but that's outside the scope of this issue.

I also could not test the first if-statement in _handle_zeros_in_scale which checks for scalars close to zero through StandardScaler(). Scalar values passed in are stopped by check_array(). It may be prudent to adjust this statement as well, but without a way to properly check it and deeper knowledge of the package at the moment, I didn't want to mess with it.

Reference Issues/PRs

What does this implement/fix? Explain your changes.

Any other comments?

Added floating point tolerance to _handle_zeros_in_scale to address issue scikit-learn#17794 created on 6/30/2020. I'm using numpy's isclose() function with default absolute and relative tolerance values. The default values handled my test cases fine up until floats around 1e+20 when the variable 'scale' grew to non-zero values even for constant-valued vectors. There may be floating point sensitivities in that function as well but that's outside the scope of this issue.  

I also could not test the first if-statement in _handle_zeros_in_scale which checks for scalars close to zero through StandardScaler(). Scalar values passed in are stopped by check_array(). It may be prudent to adjust this statement as well, but without a way to properly check it and deeper knowledge of the package at the moment, I didn't want to mess with it.
Updating format from linting results.

Added floating point tolerance to _handle_zeros_in_scale to address issue scikit-learn#17794 created on 6/30/2020. I'm using numpy's isclose() function with default absolute and relative tolerance values. The default values handled my test cases fine up until floats around 1e+20 when the variable 'scale' grew to non-zero values even for constant-valued vectors. There may be floating point sensitivities in that function as well but that's outside the scope of this issue.

I also could not test the first if-statement in _handle_zeros_in_scale which checks for scalars close to zero through StandardScaler(). Scalar values passed in are stopped by check_array(). It may be prudent to adjust this statement as well, but without a way to properly check it and deeper knowledge of the package at the moment, I didn't want to mess with it.
@rastna12 rastna12 closed this Jul 1, 2020
@rastna12 rastna12 reopened this Jul 1, 2020
@rastna12
Copy link
Author

rastna12 commented Jul 1, 2020

I'm not sure why my circleci lint test is failing. It looks like it's have an issue checking out the code.

@jnothman
Copy link
Member

jnothman commented Jul 2, 2020 via email

@rastna12
Copy link
Author

rastna12 commented Jul 2, 2020

Ok, let me know if I need to make any adjustments on my end for the pull request related to the circleci test or otherwise.

Copy link
Member

@thomasjpfan thomasjpfan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the PR @rpstanley90 !

@@ -74,7 +74,7 @@ def _handle_zeros_in_scale(scale, copy=True):
if copy:
# New array to avoid side-effects
scale = scale.copy()
scale[scale == 0.0] = 1.0
scale[np.isclose(scale, 0.0)] = 1.0
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add a non-regression test that would fail at master but pass in this PR.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @thomasjpfan do you have a link or anything with an example showing me what you're looking for? This is my first pull request so it's all new to me. I have a script in issue #17794 highlighting the issue. Would modifying this to demonstrate the correction be sufficient, or is there something else I need to do?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Adding a test similar to #17794 to make sure this fixes the issue will work.

Base automatically changed from master to main January 22, 2021 10:52
@jeremiedbb
Copy link
Member

The original issue has been fixed in #19788. This is no longer needed.

@jeremiedbb jeremiedbb closed this Mar 12, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants