Skip to content

TypeError : Wrong type for parameter n_values in OneHotEncoder #12881

Closed
@vivekk0903

Description

@vivekk0903

Steps/Code to Reproduce

import numpy as np
from sklearn.preprocessing import OneHotEncoder

numerical_features = np.random.randint(10, size=(5,4))
categorical = np.array([2, 2, 3, 2, 3]).reshape(-1,1)

X = np.hstack((numerical_features, categorical))

onehotencoder = OneHotEncoder(categorical_features=[4], 
                              handle_unknown='ignore')

X_encoded = onehotencoder.fit_transform(X)

Expected Results

No error should be thrown. OneHotEncoder should work as legacy and encode only the supplied columns.

Actual Results

/home/vivek/anaconda3/envs/my_env/lib/python3.6/site-packages/sklearn/preprocessing/_encoders.py:390: DeprecationWarning: The 'categorical_features' keyword is deprecated in version 0.20 and will be removed in 0.22. You can use the ColumnTransformer instead.
  "use the ColumnTransformer instead.", DeprecationWarning)
Traceback (most recent call last):

  File "<ipython-input-15-c174bb78e628>", line 1, in <module>
    runfile('/home/vivek/untitless.py', wdir='/home/vivek')

  File "/home/vivek/anaconda3/envs/my_env/lib/python3.6/site-packages/spyder_kernels/customize/spydercustomize.py", line 668, in runfile
    execfile(filename, namespace)

  File "/home/vivek/anaconda3/envs/my_env/lib/python3.6/site-packages/spyder_kernels/customize/spydercustomize.py", line 108, in execfile
    exec(compile(f.read(), filename, 'exec'), namespace)

  File "/home/vivek/untitless.py", line 24, in <module>
    X_encoded = onehotencoder.fit_transform(X)

  File "/home/vivek/anaconda3/envs/my_env/lib/python3.6/site-packages/sklearn/preprocessing/_encoders.py", line 514, in fit_transform
    self._categorical_features, copy=True)

  File "/home/vivek/anaconda3/envs/my_env/lib/python3.6/site-packages/sklearn/preprocessing/base.py", line 71, in _transform_selected
    X_sel = transform(X[:, ind[sel]])

  File "/home/vivek/anaconda3/envs/my_env/lib/python3.6/site-packages/sklearn/preprocessing/_encoders.py", line 456, in _legacy_fit_transform
    % type(X))

TypeError: Wrong type for parameter `n_values`. Expected 'auto', int or array of ints, got <class 'numpy.ndarray'>

Description

There is a difference between the actual default n_values parameter in OneHotEncoder and the assumption made in documentation and some internal code. This is leading to errors in specific conditions.

  • The documentation here states that the default value is 'auto'.

  • The code here for _handle_deprecations assumes that the default value is 'auto'.

  • But the actual __init__ method as n_values=None as default.

  • If I remove the handle_unknown='ignore' or add n_values='auto' in the code, the code runs successfully, but the following warnings are shown:

/home/vivek/anaconda3/envs/tensorflow/lib/python3.6/site-packages/sklearn/preprocessing/_encoders.py:368: FutureWarning: The handling of integer data will change in version 0.22. Currently, the categories are determined based on the range [0, max(values)], while in the future they will be determined based on the unique values.
If you want the future behaviour and silence this warning, you can specify "categories='auto'".
In case you used a LabelEncoder before this OneHotEncoder to convert the categories to integers, then you can now use the OneHotEncoder directly.
  warnings.warn(msg, FutureWarning)
/home/vivek/anaconda3/envs/tensorflow/lib/python3.6/site-packages/sklearn/preprocessing/_encoders.py:390: DeprecationWarning: The 'categorical_features' keyword is deprecated in version 0.20 and will be removed in 0.22. You can use the ColumnTransformer instead.
  "use the ColumnTransformer instead.", DeprecationWarning)

Versions

System:
python: 3.6.6 |Anaconda, Inc.| (default, Jun 28 2018, 17:14:51) [GCC 7.2.0]
executable: /home/vivek/anaconda3/envs/my_env/bin/python
machine: Linux-4.15.0-43-generic-x86_64-with-debian-buster-sid

BLAS:
macros: SCIPY_MKL_H=None, HAVE_CBLAS=None
lib_dirs: /home/vivek/anaconda3/envs/my_env/lib
cblas_libs: mkl_rt, pthread

Python deps:
pip: 18.1
setuptools: 40.2.0
sklearn: 0.20.1
numpy: 1.15.4
scipy: 1.1.0
Cython: 0.29
pandas: 0.23.4

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions