Skip to content

Numpy "BracketError" appears in some cases when using power transformer with columns that contain the same values #27499

Closed
@Gloggglogg

Description

@Gloggglogg

Describe the bug

I encountered this error for the first time while transforming a metabolomics dataset using power transformer. Prior to using PowerTransformer I had imputed the dataset with "median" strategy (using SimpleImputer), which in this case means making all the missing values 1.0 because this dataset was produced to have a 1.0 median for all features. After various trouble shooting steps I have found out that there are some data inputs that consistently produce this numpy "BracketError" error. It is likely to happen when you have a feature that contains all the same values. The error can go away by changing number of rows or changing values. In other words, you can create different datasets that give the error every time, and with a small change to those datasets they no longer produce the error.

Here is some code that produces the error:

import numpy as np
from sklearn.preprocessing import PowerTransformer
data = np.array([0.9] * 400)
transformed_data = PowerTransformer().fit_transform(data.reshape(-1, 1))

if you manipulate the array value and length you will find that some input data produces the error and some input data does not.

Eg. an array of [1.1] * 400 will not produce the error but [1.0] * 400 produces the error.
Eg. data = [1] * 9 (and * 8, * 7, * 6, * 5, ...) produces the error, while data = [1] * 10 does not.

I had the feeling that I made this error occur also with columns that contained a few more than just one unique value (2, 3, and possibly even 4 unique values), with the rest being 1.0, but i was not able to reproduce that while writing this report, and I might be mistaken (I even wrote a function that made thousands of random iterations with this type of data to try and reproduce this, but came up empty handed).

The error does not tell you what or why this is happening. My dataset consists of over 6000 rows and 900 features and the error did not tell me which part of the data was producing the error. By an educated lucky guess I thought it might be related to having used SimpleImputer strategy=median on features with a huge range of missing values, including some features that had only a few non-missing values, and I tested this hypothesis by finding and removing features that had 3 or less non-missing values before imputation, and indeed that got rid of the error, which led me to investigating more.

If this occurs with other versions too, I suggest either adding a sentence about this in the documentation of power transformer
eg. "Warning: features in which all the values are the same may produce a numpy BracketError", or something of that nature. (as I said, I was not able to prove that this can occur with features that have more than one unique value)

As a side note, using this type of data frequently produces a couple of numpy warnings:

Lib\site-packages\numpy\core\_methods.py:176: RuntimeWarning: overflow encountered in multiply
    x = um.multiply(x, x, out=x)

and

Lib\site-packages\numpy\core\_methods.py:187: RuntimeWarning: overflow encountered in reduce
  ret = umr_sum(x, axis, dtype, out, keepdims=keepdims, where=where)

Steps/Code to Reproduce

import numpy as np
from sklearn.preprocessing import PowerTransformer
data = np.array([0.9] * 400)
transformed_data = PowerTransformer().fit_transform(data.reshape(-1, 1))

Expected Results

No error.

Actual Results

BracketError                              Traceback (most recent call last)
Cell In[63], line 5
      3 data = [0.9] * 400
      4 df = pd.DataFrame(data)
----> 5 df['feature_transformed'] = PowerTransformer().fit_transform(df)
      6 print(df)

File ~\anaconda3\envs\***\Lib\site-packages\sklearn\utils\_set_output.py:140, in _wrap_method_output.<locals>.wrapped(self, X, *args, **kwargs)
    138 @wraps(f)
    139 def wrapped(self, X, *args, **kwargs):
--> 140     data_to_wrap = f(self, X, *args, **kwargs)
    141     if isinstance(data_to_wrap, tuple):
    142         # only wrap the first output for cross decomposition
    143         return (
    144             _wrap_data_with_container(method, data_to_wrap[0], X, self),
    145             *data_to_wrap[1:],
    146         )

File ~\anaconda3\envs\***\Lib\site-packages\sklearn\preprocessing\_data.py:3103, in PowerTransformer.fit_transform(self, X, y)
   3086 """Fit `PowerTransformer` to `X`, then transform `X`.
   3087 
   3088 Parameters
   (...)
   3100     Transformed data.
   3101 """
   3102 self._validate_params()
-> 3103 return self._fit(X, y, force_transform=True)

File ~\anaconda3\envs\***\Lib\site-packages\sklearn\preprocessing\_data.py:3116, in PowerTransformer._fit(self, X, y, force_transform)
   3111 optim_function = {
   3112     "box-cox": self._box_cox_optimize,
   3113     "yeo-johnson": self._yeo_johnson_optimize,
   3114 }[self.method]
   3115 with np.errstate(invalid="ignore"):  # hide NaN warnings
-> 3116     self.lambdas_ = np.array([optim_function(col) for col in X.T])
   3118 if self.standardize or force_transform:
   3119     transform_function = {
   3120         "box-cox": boxcox,
   3121         "yeo-johnson": self._yeo_johnson_transform,
   3122     }[self.method]

File ~\anaconda3\envs\***\Lib\site-packages\sklearn\preprocessing\_data.py:3116, in <listcomp>(.0)
   3111 optim_function = {
   3112     "box-cox": self._box_cox_optimize,
   3113     "yeo-johnson": self._yeo_johnson_optimize,
   3114 }[self.method]
   3115 with np.errstate(invalid="ignore"):  # hide NaN warnings
-> 3116     self.lambdas_ = np.array([optim_function(col) for col in X.T])
   3118 if self.standardize or force_transform:
   3119     transform_function = {
   3120         "box-cox": boxcox,
   3121         "yeo-johnson": self._yeo_johnson_transform,
   3122     }[self.method]

File ~\anaconda3\envs\***\Lib\site-packages\sklearn\preprocessing\_data.py:3307, in PowerTransformer._yeo_johnson_optimize(self, x)
   3305 x = x[~np.isnan(x)]
   3306 # choosing bracket -2, 2 like for boxcox
-> 3307 return optimize.brent(_neg_log_likelihood, brack=(-2, 2))

File ~\anaconda3\envs\***\Lib\site-packages\scipy\optimize\_optimize.py:2641, in brent(func, args, brack, tol, full_output, maxiter)
   2569 """
   2570 Given a function of one variable and a possible bracket, return
   2571 a local minimizer of the function isolated to a fractional precision
   (...)
   2637 
   2638 """
   2639 options = {'xtol': tol,
   2640            'maxiter': maxiter}
-> 2641 res = _minimize_scalar_brent(func, brack, args, **options)
   2642 if full_output:
   2643     return res['x'], res['fun'], res['nit'], res['nfev']

File ~\anaconda3\envs\***\Lib\site-packages\scipy\optimize\_optimize.py:2678, in _minimize_scalar_brent(func, brack, args, xtol, maxiter, disp, **unknown_options)
   2675 brent = Brent(func=func, args=args, tol=tol,
   2676               full_output=True, maxiter=maxiter, disp=disp)
   2677 brent.set_bracket(brack)
-> 2678 brent.optimize()
   2679 x, fval, nit, nfev = brent.get_result(full_output=True)
   2681 success = nit < maxiter and not (np.isnan(x) or np.isnan(fval))

File ~\anaconda3\envs\***\Lib\site-packages\scipy\optimize\_optimize.py:2448, in Brent.optimize(self)
   2445 def optimize(self):
   2446     # set up for optimization
   2447     func = self.func
-> 2448     xa, xb, xc, fa, fb, fc, funcalls = self.get_bracket_info()
   2449     _mintol = self._mintol
   2450     _cg = self._cg

File ~\anaconda3\envs\***\Lib\site-packages\scipy\optimize\_optimize.py:2417, in Brent.get_bracket_info(self)
   2415     xa, xb, xc, fa, fb, fc, funcalls = bracket(func, args=args)
   2416 elif len(brack) == 2:
-> 2417     xa, xb, xc, fa, fb, fc, funcalls = bracket(func, xa=brack[0],
   2418                                                xb=brack[1], args=args)
   2419 elif len(brack) == 3:
   2420     xa, xb, xc = brack

File ~\anaconda3\envs\***\Lib\site-packages\scipy\optimize\_optimize.py:3047, in bracket(func, xa, xb, args, grow_limit, maxiter)
   3045     e = BracketError(msg)
   3046     e.data = (xa, xb, xc, fa, fb, fc, funcalls)
-> 3047     raise e
   3049 return xa, xb, xc, fa, fb, fc, funcalls

BracketError: The algorithm terminated without finding a valid bracket. Consider trying different initial points.

Versions

System:
    python: 3.11.4 | packaged by Anaconda, Inc. | (main, Jul  5 2023, 13:47:18) [MSC v.1916 64 bit (AMD64)]
executable: C:\Users\***\anaconda3\envs\***\python.exe
   machine: Windows-10-10.0.22621-SP0

Python dependencies:
      sklearn: 1.2.2
          pip: 23.2.1
   setuptools: 68.0.0
        numpy: 1.25.2
        scipy: 1.11.1
       Cython: None
       pandas: 2.0.3
   matplotlib: 3.7.1
       joblib: 1.2.0
threadpoolctl: 2.2.0

Built with OpenMP: True

threadpoolctl info:
       filepath: C:\Users\***\anaconda3\envs\***\Library\bin\mkl_rt.2.dll
         prefix: mkl_rt
       user_api: blas
   internal_api: mkl
        version: 2023.1-Product
    num_threads: 8
threading_layer: intel

       filepath: C:\Users\***\anaconda3\envs\***\vcomp140.dll
         prefix: vcomp
       user_api: openmp
   internal_api: openmp
        version: None
    num_threads: 16

       filepath: C:\Users\***\anaconda3\envs\***\Library\bin\libiomp5md.dll
         prefix: libiomp
       user_api: openmp
   internal_api: openmp
        version: None
    num_threads: 16

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions