-
-
Notifications
You must be signed in to change notification settings - Fork 26.2k
Closed
Labels
EasyWell-defined and straightforward way to resolveWell-defined and straightforward way to resolvehelp wanted
Description
Description
When calling VarianceThreshold().fit_transform() on certain inputs, it fails to remove a column that has only one unique value.
Steps/Code to Reproduce
import numpy as np
from sklearn.feature_selection import VarianceThreshold
works_correctly = np.array([[-0.13725701, 7. ],
[-0.13725701, -0.09853293],
[-0.13725701, -0.09853293],
[-0.13725701, -0.09853293],
[-0.13725701, -0.09853293],
[-0.13725701, -0.09853293],
[-0.13725701, -0.09853293],
[-0.13725701, -0.09853293],
[-0.13725701, -0.09853293]])
broken = np.array([[-0.13725701, 7. ],
[-0.13725701, -0.09853293],
[-0.13725701, -0.09853293],
[-0.13725701, -0.09853293],
[-0.13725701, -0.09853293],
[-0.13725701, -0.09853293],
[-0.13725701, -0.09853293],
[-0.13725701, -0.09853293],
[-0.13725701, -0.09853293],
[-0.13725701, -0.09853293]])
selector = VarianceThreshold()
print(selector.fit_transform(works_correctly))
selector = VarianceThreshold()
print(selector.fit_transform(broken))
print(set(broken[:, 0]))
Expected Results
The Variance threshold should produce
[[ 7. ]
[-0.09853293]
[-0.09853293]
[-0.09853293]
[-0.09853293]
[-0.09853293]
[-0.09853293]
[-0.09853293]
[-0.09853293]]
Actual Results
[[ 7. ]
[-0.09853293]
[-0.09853293]
[-0.09853293]
[-0.09853293]
[-0.09853293]
[-0.09853293]
[-0.09853293]
[-0.09853293]]
[[-0.13725701 7. ]
[-0.13725701 -0.09853293]
[-0.13725701 -0.09853293]
[-0.13725701 -0.09853293]
[-0.13725701 -0.09853293]
[-0.13725701 -0.09853293]
[-0.13725701 -0.09853293]
[-0.13725701 -0.09853293]
[-0.13725701 -0.09853293]
[-0.13725701 -0.09853293]]
{-0.13725701}
This issue arose when I was using VarianceThreshold on a real dataset (of which this is a subset). It appears to work correctly in other situations (for instance I can't reproduce this behaviour if I replace the first column with 1's).
Versions
System
python: 3.5.6 |Anaconda, Inc.| (default, Aug 26 2018, 16:30:03) [GCC 4.2.1 Compatible Clang 4.0.1 (tags/RELEASE_401/final)]
machine: Darwin-18.2.0-x86_64-i386-64bit
executable: /anaconda3/envs/tensorflow/bin/python3
BLAS
macros: NO_ATLAS_INFO=3, HAVE_CBLAS=None
lib_dirs:
cblas_libs: cblas
Python deps
setuptools: 40.2.0
numpy: 1.15.4
sklearn: 0.20.0
Cython: None
scipy: 1.1.0
pandas: 0.24.0
pip: 19.0.1
Metadata
Metadata
Assignees
Labels
EasyWell-defined and straightforward way to resolveWell-defined and straightforward way to resolvehelp wanted