Skip to content

KBinsDiscretizer: quantile strategy fails due to unsorted bin_edges #13194

@SandroCasagrande

Description

@SandroCasagrande

Description

KBinsDiscretizer with strategy='quantile fails in certain situations with an exception. It happens when multiple percentiles returned from numpy are expected to be identical but show numerical instability and render bin_edges non-monotonic, which is fatal for np.digitize. This is probably related to numpy/numpy#10373

Steps/Code to Reproduce

import numpy as np
from sklearn.preprocessing import KBinsDiscretizer
X = np.array([0.05, 0.05, 0.95]).reshape(-1, 1)
KBinsDiscretizer(n_bins=10, encode='ordinal', strategy='quantile').fit_transform(X)

The example is a bit contrived (3 values and 10 bins), but isolates the problem well enough.

Expected Results

No error is thrown. Robust handling of close percentiles.

Actual Results

ValueError                                Traceback (most recent call last)
<ipython-input-2-304908356f18> in <module>()
      2 from sklearn.preprocessing import KBinsDiscretizer
      3 X = np.array([0.05, 0.05, 0.95]).reshape(-1, 1)
----> 4 KBinsDiscretizer(n_bins=10, encode='ordinal', strategy='quantile').fit_transform(X)
      5 """
      6 Xdi = transformer.inverse_transform(Xd)

/home/sandro/code/scikit-learn/sklearn/base.py in fit_transform(self, X, y, **fit_params)
    461         if y is None:
    462             # fit method of arity 1 (unsupervised transformation)
--> 463             return self.fit(X, **fit_params).transform(X)
    464         else:
    465             # fit method of arity 2 (supervised transformation)

/home/sandro/code/scikit-learn/sklearn/preprocessing/_discretization.py in transform(self, X)
    259             atol = 1.e-8
    260             eps = atol + rtol * np.abs(Xt[:, jj])
--> 261             Xt[:, jj] = np.digitize(Xt[:, jj] + eps, bin_edges[jj][1:])
    262         np.clip(Xt, 0, self.n_bins_ - 1, out=Xt)
    263 

ValueError: bins must be monotonically increasing or decreasing

Versions

System:
   machine: Linux-4.15.0-45-generic-x86_64-with-Ubuntu-16.04-xenial
executable: /home/sandro/.virtualenvs/scikit-learn/bin/python
    python: 3.5.2 (default, Nov 23 2017, 16:37:01)  [GCC 5.4.0 20160609]

BLAS:
    macros: 
cblas_libs: cblas
  lib_dirs: 

Python deps:
       pip: 10.0.1
setuptools: 39.1.0
   sklearn: 0.21.dev0
    Cython: 0.28.5
     scipy: 1.1.0
    pandas: 0.23.4
     numpy: 1.15.2

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions