-
-
Notifications
You must be signed in to change notification settings - Fork 26.2k
Description
Describe the bug
The Yeo-Johnson is not a surjective transformation for negative lambdas. Therefore, the inverse transformation returns np.nan
when inverse transforming values outside the range of the transform. This failure is silent, so it took me quite a while of debugging to understand this behavior.
The problematic lines are
scikit-learn/sklearn/preprocessing/_data.py
Line 3390 in 8721245
x_inv[~pos] = 1 - np.power(-(2 - lmbda) * x[~pos] + 1, 1 / (2 - lmbda)) |
and
scikit-learn/sklearn/preprocessing/_data.py
Line 3386 in 8721245
x_inv[pos] = np.power(x[pos] * lmbda + 1, 1 / lmbda) - 1 |
in which we might compute np.power(something_negative, not_integral_value)
, which of course returns np.nan
as per https://numpy.org/doc/stable/reference/generated/numpy.power.html
Steps/Code to Reproduce
To reproduce for positive values (there is a similar problem for negative values):
import numpy as np
import sklearn.preprocessing
trans = sklearn.preprocessing.PowerTransformer(method='yeo-johnson')
x = np.array([1,1,1e10]).reshape(-1, 1) # extreme skew
trans.fit(x)
lmbda = trans.lambdas_[0]
print(lmbda)
assert lmbda < 0 # == -0.096 negative value
# any value `psi` for which lambda*psi+1 <= 0 will result in nan due to lacking support, since the forwards transformation
# is not surjective on negative lambdas. In this specific case, 10*-0.096 < 1
psi = np.array([10]).reshape(-1, 1)
x = trans.inverse_transform(psi).item()
print(x)
assert np.isnan(x)
Expected Results
The code should either:
- validate its inputs and raise an exception
- validate its inputs and raise a warning
- fail silently, but have it documented behavior
Actual Results
It just prints
-0.0962322261004418
nan
Versions
System:
python: 3.11.3 (main, Jan 18 2024, 19:07:12) [Clang 18.0.0 (https://github.com/llvm/llvm-project 75501f53624de92aafce2f1da698
executable: /home/pyodide/this.program
machine: Emscripten-3.1.46-wasm32-32bit
Python dependencies:
sklearn: 1.3.1
pip: None
setuptools: None
numpy: 1.26.1
scipy: 1.11.2
Cython: None
pandas: None
matplotlib: None
joblib: 1.3.2
threadpoolctl: 3.2.0
Built with OpenMP: False
Metadata
Metadata
Assignees
Labels
Type
Projects
Status