Skip to content

Yeo-Johnson inverse_transform fails silently on extreme skew data #28946

@el-hult

Description

@el-hult

Describe the bug

The Yeo-Johnson is not a surjective transformation for negative lambdas. Therefore, the inverse transformation returns np.nan when inverse transforming values outside the range of the transform. This failure is silent, so it took me quite a while of debugging to understand this behavior.

The problematic lines are

x_inv[~pos] = 1 - np.power(-(2 - lmbda) * x[~pos] + 1, 1 / (2 - lmbda))

and

x_inv[pos] = np.power(x[pos] * lmbda + 1, 1 / lmbda) - 1

in which we might compute np.power(something_negative, not_integral_value), which of course returns np.nan as per https://numpy.org/doc/stable/reference/generated/numpy.power.html

Steps/Code to Reproduce

To reproduce for positive values (there is a similar problem for negative values):

import numpy as np
import sklearn.preprocessing
trans = sklearn.preprocessing.PowerTransformer(method='yeo-johnson')
x = np.array([1,1,1e10]).reshape(-1, 1) # extreme skew
trans.fit(x)
lmbda = trans.lambdas_[0] 
print(lmbda)
assert lmbda < 0 # == -0.096 negative value

# any value `psi` for which lambda*psi+1 <= 0 will result in nan due to lacking support, since the forwards transformation 
# is not surjective on negative lambdas. In this specific case, 10*-0.096 < 1
psi = np.array([10]).reshape(-1, 1)
x = trans.inverse_transform(psi).item()
print(x)
assert np.isnan(x)

Expected Results

The code should either:

  1. validate its inputs and raise an exception
  2. validate its inputs and raise a warning
  3. fail silently, but have it documented behavior

Actual Results

It just prints

-0.0962322261004418
nan

Versions

System:
    python: 3.11.3 (main, Jan 18 2024, 19:07:12) [Clang 18.0.0 (https://github.com/llvm/llvm-project 75501f53624de92aafce2f1da698
executable: /home/pyodide/this.program
   machine: Emscripten-3.1.46-wasm32-32bit

Python dependencies:
      sklearn: 1.3.1
          pip: None
   setuptools: None
        numpy: 1.26.1
        scipy: 1.11.2
       Cython: None
       pandas: None
   matplotlib: None
       joblib: 1.3.2
threadpoolctl: 3.2.0

Built with OpenMP: False

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    Status

    Done

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions