Skip to content

OneHotEncoder - categories parameter not working as deprecated n_values parameter #13791

@aakash-sahu

Description

@aakash-sahu

When using the 'n_values' parameter, the code works as expected and I get the deprecation warning. However, when the code is changed to what's suggested in deprecation warning, getting a ValueError.
This works:

from sklearn.preprocessing import OneHotEncoder
ohe = OneHotEncoder(n_values= 8,sparse = False)
o = ohe.fit_transform(np.array([[3, 5, 1]]))
o = o.reshape(1,3,8)
print(o)

[[[0. 0. 0. 1. 0. 0. 0. 0.] [0. 0. 0. 0. 0. 1. 0. 0.] [0. 1. 0. 0. 0. 0. 0. 0.]]] C:\Anaconda3\lib\site-packages\sklearn\preprocessing\_encoders.py:331: DeprecationWarning: Passing 'n_values' is deprecated in version 0.20 and will be removed in 0.22. You can use the 'categories' keyword instead. 'n_values=n' corresponds to 'categories=[range(n)]'. warnings.warn(msg, DeprecationWarning)

When I remove the n_values with categories = [range(n)] and run code as below, I get the error:

from sklearn.preprocessing import OneHotEncoder
ohe = OneHotEncoder(categories = [range(8)],sparse = False)
o = ohe.fit_transform(np.array([[3, 5, 1]]))
o = o.reshape(1,3,8)
print(o)


``ValueError Traceback (most recent call last)
in
2 import pandas as pd
3 ohe = OneHotEncoder(categories = [range(8)],sparse = False)
----> 4 o = ohe.fit_transform(np.array([[3, 5, 1]]))
5 # pd.get_dummies(np.array([3, 5, 1]))
6 o.reshape(1,3,8)

C:\Anaconda3\lib\site-packages\sklearn\preprocessing_encoders.py in fit_transform(self, X, y)
516 self._categorical_features, copy=True)
517 else:
--> 518 return self.fit(X).transform(X)
519
520 def _legacy_transform(self, X):

C:\Anaconda3\lib\site-packages\sklearn\preprocessing_encoders.py in fit(self, X, y)
427 return self
428 else:
--> 429 self._fit(X, handle_unknown=self.handle_unknown)
430 return self
431

C:\Anaconda3\lib\site-packages\sklearn\preprocessing_encoders.py in _fit(self, X, handle_unknown)
70 "supported for numerical categories")
71 if len(self._categories) != n_features:
---> 72 raise ValueError("Shape mismatch: if n_values is an array,"
73 " it has to be of shape (n_features,).")
74

ValueError: Shape mismatch: if n_values is an array, it has to be of shape (n_features,).``

Is this a usage error or a bug?

System:
python: 3.7.1 | packaged by conda-forge | (default, Nov 13 2018, 19:01:41) [MSC v.1900 64 bit (AMD64)]
executable: C:\Anaconda3\python.exe
machine: Windows-10-10.0.16299-SP0

BLAS:
macros:
lib_dirs:
cblas_libs: cblas

Python deps:
pip: 18.1
setuptools: 40.6.3
sklearn: 0.20.3
numpy: 1.15.4
scipy: 1.1.0
Cython: 0.29.2
pandas: 0.23.4

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions