Added tolerance to _handle_zeros_in_scale #17805

rastna12 · 2020-07-01T16:17:22Z

Added floating point tolerance to _handle_zeros_in_scale to address issue #17794 created on 6/30/2020. I'm using numpy's isclose() function with default absolute and relative tolerance values. The default values handled my test cases fine up until floats around 1e+20 when the variable 'scale' grew to non-zero values even for constant-valued vectors. There may be floating point sensitivities in that function as well but that's outside the scope of this issue.

I also could not test the first if-statement in _handle_zeros_in_scale which checks for scalars close to zero through StandardScaler(). Scalar values passed in are stopped by check_array(). It may be prudent to adjust this statement as well, but without a way to properly check it and deeper knowledge of the package at the moment, I didn't want to mess with it.

Reference Issues/PRs

What does this implement/fix? Explain your changes.

Any other comments?

Added floating point tolerance to _handle_zeros_in_scale to address issue scikit-learn#17794 created on 6/30/2020. I'm using numpy's isclose() function with default absolute and relative tolerance values. The default values handled my test cases fine up until floats around 1e+20 when the variable 'scale' grew to non-zero values even for constant-valued vectors. There may be floating point sensitivities in that function as well but that's outside the scope of this issue. I also could not test the first if-statement in _handle_zeros_in_scale which checks for scalars close to zero through StandardScaler(). Scalar values passed in are stopped by check_array(). It may be prudent to adjust this statement as well, but without a way to properly check it and deeper knowledge of the package at the moment, I didn't want to mess with it.

Updating format from linting results. Added floating point tolerance to _handle_zeros_in_scale to address issue scikit-learn#17794 created on 6/30/2020. I'm using numpy's isclose() function with default absolute and relative tolerance values. The default values handled my test cases fine up until floats around 1e+20 when the variable 'scale' grew to non-zero values even for constant-valued vectors. There may be floating point sensitivities in that function as well but that's outside the scope of this issue. I also could not test the first if-statement in _handle_zeros_in_scale which checks for scalars close to zero through StandardScaler(). Scalar values passed in are stopped by check_array(). It may be prudent to adjust this statement as well, but without a way to properly check it and deeper knowledge of the package at the moment, I didn't want to mess with it.

rastna12 · 2020-07-01T17:11:39Z

I'm not sure why my circleci lint test is failing. It looks like it's have an issue checking out the code.

jnothman · 2020-07-02T07:42:03Z

Yes, a known and recent issue in master.

rastna12 · 2020-07-02T15:11:05Z

Ok, let me know if I need to make any adjustments on my end for the pull request related to the circleci test or otherwise.

thomasjpfan

Thank you for the PR @rpstanley90 !

thomasjpfan · 2020-07-05T17:44:51Z

sklearn/preprocessing/_data.py

@@ -74,7 +74,7 @@ def _handle_zeros_in_scale(scale, copy=True):
        if copy:
            # New array to avoid side-effects
            scale = scale.copy()
-        scale[scale == 0.0] = 1.0
+        scale[np.isclose(scale, 0.0)] = 1.0


Please add a non-regression test that would fail at master but pass in this PR.

Hi @thomasjpfan do you have a link or anything with an example showing me what you're looking for? This is my first pull request so it's all new to me. I have a script in issue #17794 highlighting the issue. Would modifying this to demonstrate the correction be sufficient, or is there something else I need to do?

Adding a test similar to #17794 to make sure this fixes the issue will work.

jeremiedbb · 2022-03-12T00:05:20Z

The original issue has been fixed in #19788. This is no longer needed.

github-actions bot added the module:preprocessing label Jul 1, 2020

rastna12 closed this Jul 1, 2020

rastna12 reopened this Jul 1, 2020

thomasjpfan reviewed Jul 5, 2020

View reviewed changes

Base automatically changed from master to main January 22, 2021 10:52

jeremiedbb closed this Mar 12, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Added tolerance to _handle_zeros_in_scale #17805

Added tolerance to _handle_zeros_in_scale #17805

Uh oh!

rastna12 commented Jul 1, 2020

Uh oh!

rastna12 commented Jul 1, 2020

Uh oh!

jnothman commented Jul 2, 2020 via email

Uh oh!

rastna12 commented Jul 2, 2020

Uh oh!

thomasjpfan left a comment

Uh oh!

thomasjpfan Jul 5, 2020

Uh oh!

rastna12 Jul 13, 2020

Uh oh!

thomasjpfan Jul 16, 2020

Uh oh!

jeremiedbb commented Mar 12, 2022

Uh oh!

Uh oh!

Uh oh!

Added tolerance to _handle_zeros_in_scale #17805

Added tolerance to _handle_zeros_in_scale #17805

Uh oh!

Conversation

rastna12 commented Jul 1, 2020

Reference Issues/PRs

What does this implement/fix? Explain your changes.

Any other comments?

Uh oh!

rastna12 commented Jul 1, 2020

Uh oh!

jnothman commented Jul 2, 2020 via email

Uh oh!

rastna12 commented Jul 2, 2020

Uh oh!

thomasjpfan left a comment

Choose a reason for hiding this comment

Uh oh!

thomasjpfan Jul 5, 2020

Choose a reason for hiding this comment

Uh oh!

rastna12 Jul 13, 2020

Choose a reason for hiding this comment

Uh oh!

thomasjpfan Jul 16, 2020

Choose a reason for hiding this comment

Uh oh!

jeremiedbb commented Mar 12, 2022

Uh oh!

Uh oh!