-
-
Notifications
You must be signed in to change notification settings - Fork 26.2k
Closed
Description
Description
KBinsDiscretizer
with strategy='quantile
fails in certain situations with an exception. It happens when multiple percentiles returned from numpy are expected to be identical but show numerical instability and render bin_edges non-monotonic, which is fatal for np.digitize. This is probably related to numpy/numpy#10373
Steps/Code to Reproduce
import numpy as np
from sklearn.preprocessing import KBinsDiscretizer
X = np.array([0.05, 0.05, 0.95]).reshape(-1, 1)
KBinsDiscretizer(n_bins=10, encode='ordinal', strategy='quantile').fit_transform(X)
The example is a bit contrived (3 values and 10 bins), but isolates the problem well enough.
Expected Results
No error is thrown. Robust handling of close percentiles.
Actual Results
ValueError Traceback (most recent call last)
<ipython-input-2-304908356f18> in <module>()
2 from sklearn.preprocessing import KBinsDiscretizer
3 X = np.array([0.05, 0.05, 0.95]).reshape(-1, 1)
----> 4 KBinsDiscretizer(n_bins=10, encode='ordinal', strategy='quantile').fit_transform(X)
5 """
6 Xdi = transformer.inverse_transform(Xd)
/home/sandro/code/scikit-learn/sklearn/base.py in fit_transform(self, X, y, **fit_params)
461 if y is None:
462 # fit method of arity 1 (unsupervised transformation)
--> 463 return self.fit(X, **fit_params).transform(X)
464 else:
465 # fit method of arity 2 (supervised transformation)
/home/sandro/code/scikit-learn/sklearn/preprocessing/_discretization.py in transform(self, X)
259 atol = 1.e-8
260 eps = atol + rtol * np.abs(Xt[:, jj])
--> 261 Xt[:, jj] = np.digitize(Xt[:, jj] + eps, bin_edges[jj][1:])
262 np.clip(Xt, 0, self.n_bins_ - 1, out=Xt)
263
ValueError: bins must be monotonically increasing or decreasing
Versions
System:
machine: Linux-4.15.0-45-generic-x86_64-with-Ubuntu-16.04-xenial
executable: /home/sandro/.virtualenvs/scikit-learn/bin/python
python: 3.5.2 (default, Nov 23 2017, 16:37:01) [GCC 5.4.0 20160609]
BLAS:
macros:
cblas_libs: cblas
lib_dirs:
Python deps:
pip: 10.0.1
setuptools: 39.1.0
sklearn: 0.21.dev0
Cython: 0.28.5
scipy: 1.1.0
pandas: 0.23.4
numpy: 1.15.2