Feature/26308 yeo johnson2 #3

smarie · 2025-02-18T15:23:52Z

Reference Issues/PRs

This PR is finalizing scikit-learn#27818 to close scikit-learn#26308 and also fix the warning issue reported in scikit-learn#23319 (comment)

What does this implement/fix? Explain your changes.

Use scipy.stats.yeojohnson instead of our own implementation as @lorentzenchr suggested.

Any other comments?

@lesteve

… review @lesteve

github-actions · 2025-02-18T15:25:08Z

❌ Linting issues

This PR is introducing linting issues. Here's a summary of the issues. Note that you can avoid having linting issues by enabling pre-commit hooks. Instructions to enable them can be found here.

You can see the details of the linting issues under the lint job here

`black`

black detected issues. Please run black . locally and push the changes. Here you can see the detected issues. Note that running black might also fix some of the issues which might be detected by ruff. Note that the installed black version is black=24.3.0.


--- /home/runner/work/scikit-learn/scikit-learn/sklearn/preprocessing/_data.py	2025-02-18 15:24:07.940936+00:00
+++ /home/runner/work/scikit-learn/scikit-learn/sklearn/preprocessing/_data.py	2025-02-18 15:24:28.634307+00:00
@@ -3524,14 +3524,14 @@
 
             return -loglike
 
         # the computation of lambda is influenced by NaNs so we need to
         # get rid of them
-        #x = x[~np.isnan(x)]
+        # x = x[~np.isnan(x)]
         # choosing bracket -2, 2 like for boxcox
-        #return optimize.brent(_neg_log_likelihood, brack=(-2, 2))
-        _yeojohnson_lambda( _neg_log_likelihood, x)
+        # return optimize.brent(_neg_log_likelihood, brack=(-2, 2))
+        _yeojohnson_lambda(_neg_log_likelihood, x)
 
     def _check_input(self, X, in_fit, check_positive=False, check_shape=False):
         """Validate the input before fit and transform.
 
         Parameters
would reformat /home/runner/work/scikit-learn/scikit-learn/sklearn/preprocessing/_data.py
--- /home/runner/work/scikit-learn/scikit-learn/sklearn/preprocessing/tests/test_data.py	2025-02-18 15:24:07.941936+00:00
+++ /home/runner/work/scikit-learn/scikit-learn/sklearn/preprocessing/tests/test_data.py	2025-02-18 15:24:30.217980+00:00
@@ -2618,10 +2618,11 @@
         if standardize:
             assert_allclose(Xt_, np.zeros_like(X))
         else:
             assert_allclose(Xt_, X)
 
+
 def test_yeojohnson_for_different_scipy_version():
-     """Check the result are the same for different scipy version"""
-     # https://github.com/scikit-learn/scikit-learn/pull/27818
-     pt = PowerTransformer(method="yeo-johnson").fit(X_1col)
-     assert_almost_equal(pt.lambdas_[0], 0.99999353)
+    """Check the result are the same for different scipy version"""
+    # https://github.com/scikit-learn/scikit-learn/pull/27818
+    pt = PowerTransformer(method="yeo-johnson").fit(X_1col)
+    assert_almost_equal(pt.lambdas_[0], 0.99999353)
would reformat /home/runner/work/scikit-learn/scikit-learn/sklearn/preprocessing/tests/test_data.py
--- /home/runner/work/scikit-learn/scikit-learn/sklearn/utils/fixes.py	2025-02-18 15:24:07.954936+00:00
+++ /home/runner/work/scikit-learn/scikit-learn/sklearn/utils/fixes.py	2025-02-18 15:24:31.532372+00:00
@@ -84,12 +84,12 @@
 
 def _object_dtype_isnan(X):
     return X != X
 
 
-#TODO:  remove this when minimum version of scipy >= 1.9.0
-def _yeojohnson_lambda( _neg_log_likelihood, x):
+# TODO:  remove this when minimum version of scipy >= 1.9.0
+def _yeojohnson_lambda(_neg_log_likelihood, x):
 
     x = x[~np.isnan(x)]
     scipy_version = parse_version(scipy.__version__)
     min_scipy_version = "1.9.0"
     if scipy_version < parse_version(min_scipy_version):
would reformat /home/runner/work/scikit-learn/scikit-learn/sklearn/utils/fixes.py

Oh no! 💥 💔 💥
3 files would be reformatted, 918 files would be left unchanged.

`ruff`

ruff detected issues. Please run ruff check --fix --output-format=full . locally, fix the remaining issues, and push the changes. Here you can see the detected issues. Note that the installed ruff version is ruff=0.5.1.


sklearn/preprocessing/_data.py:9:19: F401 [*] `scipy.optimize` imported but unused
   |
 8 | import numpy as np
 9 | from scipy import optimize, sparse, stats
   |                   ^^^^^^^^ F401
10 | from scipy.special import boxcox, inv_boxcox
   |
   = help: Remove unused import: `scipy.optimize`

sklearn/utils/fixes.py:10:1: I001 [*] Import block is un-sorted or un-formatted
   |
 8 |   # SPDX-License-Identifier: BSD-3-Clause
 9 |   
10 | / import platform
11 | | import struct
12 | | import sys
13 | | 
14 | | import numpy as np
15 | | import scipy
16 | | from scipy import optimize
17 | | from scipy import stats
18 | | import scipy.sparse.linalg
19 | | import scipy.stats
20 | | 
21 | | try:
   | |_^ I001
22 |       import pandas as pd
23 |   except ImportError:
   |
   = help: Organize imports

Found 2 errors.
[*] 2 fixable with the `--fix` option.

_{Generated for commit: 3013d88. Link to the linter CI: here}

smarie · 2025-03-21T14:16:34Z

sklearn/utils/fixes.py

@@ -84,6 +86,20 @@ def _object_dtype_isnan(X):
    return X != X


+#TODO:  remove this when minimum version of scipy >= 1.9.0
+def _yeojohnson_lambda( _neg_log_likelihood, x):
+


Please add docstring here

smarie · 2025-03-21T14:18:00Z

sklearn/utils/fixes.py

+#TODO:  remove this when minimum version of scipy >= 1.9.0
+def _yeojohnson_lambda( _neg_log_likelihood, x):
+
+    x = x[~np.isnan(x)]


Should this line move or not ? (check wrt scipy implementation to get the answer)

smarie · 2025-03-21T14:19:22Z

sklearn/preprocessing/_data.py

@@ -3525,9 +3526,10 @@ def _neg_log_likelihood(lmbda):

        # the computation of lambda is influenced by NaNs so we need to
        # get rid of them
-        x = x[~np.isnan(x)]
+        #x = x[~np.isnan(x)]


Rather leave this here and remove it from the other side

Suggested change

#x = x[~np.isnan(x)]

x = x[~np.isnan(x)]

smarie · 2025-03-21T14:19:59Z

sklearn/preprocessing/_data.py

        # choosing bracket -2, 2 like for boxcox
-        return optimize.brent(_neg_log_likelihood, brack=(-2, 2))
+        #return optimize.brent(_neg_log_likelihood, brack=(-2, 2))


Remove this comment and the above comment as they should be in the new function now

smarie · 2025-03-21T14:28:32Z

sklearn/preprocessing/tests/test_data.py

@@ -2619,3 +2619,9 @@ def test_power_transformer_constant_feature(standardize):
            assert_allclose(Xt_, np.zeros_like(X))
        else:
            assert_allclose(Xt_, X)
+


Please also add a test for the warning : scikit-learn#23319 (comment)

Documentation from pytest : https://stackoverflow.com/questions/45671803/how-to-use-pytest-to-assert-no-warning-is-raised

Sylvain MARIE added 2 commits February 18, 2025 16:22

Moved yeo johnson implementation switch to fixes.py according to peer…

590e8d4

… review @lesteve

Added a test for yeo johson implementation of power transformer

3013d88

smarie commented Mar 21, 2025

View reviewed changes

lesteve mentioned this pull request Mar 31, 2025

ENH Use scipy.stats.yeojohnson in PowerTransformer scikit-learn/scikit-learn#27818

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Feature/26308 yeo johnson2 #3

Feature/26308 yeo johnson2 #3

Uh oh!

smarie commented Feb 18, 2025 •

edited

Loading

Uh oh!

github-actions bot commented Feb 18, 2025

Uh oh!

smarie Mar 21, 2025

Uh oh!

smarie Mar 21, 2025

Uh oh!

smarie Mar 21, 2025 •

edited

Loading

Uh oh!

smarie Mar 21, 2025

Uh oh!

smarie Mar 21, 2025 •

edited

Loading

Uh oh!

Uh oh!

Feature/26308 yeo johnson2 #3

Are you sure you want to change the base?

Feature/26308 yeo johnson2 #3

Uh oh!

Conversation

smarie commented Feb 18, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Reference Issues/PRs

What does this implement/fix? Explain your changes.

Any other comments?

Uh oh!

github-actions bot commented Feb 18, 2025

❌ Linting issues

black

ruff

Uh oh!

smarie Mar 21, 2025

Choose a reason for hiding this comment

Uh oh!

smarie Mar 21, 2025

Choose a reason for hiding this comment

Uh oh!

smarie Mar 21, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

smarie Mar 21, 2025

Choose a reason for hiding this comment

Uh oh!

smarie Mar 21, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

smarie commented Feb 18, 2025 •

edited

Loading

`black`

`ruff`

smarie Mar 21, 2025 •

edited

Loading

smarie Mar 21, 2025 •

edited

Loading