You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I want to use ColumnTransformer with make_column_selector.
When all transformers provide get_feature_names then for whole ColumnTransformer method get_feature_names works after fit.
Unfortunately when make_column_selector returns empty list of columns then transformer stays unfitted (discussed: #12071) and subsequently raise error on get_feature_names(). In my toy-example, I've used OneHotEncoder from https://github.com/scikit-learn-contrib/categorical-encoding.
Current behavior is not a bug but is undesired if we want resilient pipelines. As a solution, I see attributes: transformers_in, transformers_out for ColumnTransformer similarly to planned features_in, features_out. In such case get_feature_names will be applied only on transformers_out.
runfile('/home/mglowacki/Desktop/issue__.py', wdir='/home/mglowacki/Desktop')
[ColumnTransformer] ... (1 of 1) Processing One-hot_enc, total= 0.0s
Traceback (most recent call last):
File "<ipython-input-3-4225168516fb>", line 1, in <module>
runfile('/home/mglowacki/Desktop/issue__.py', wdir='/home/mglowacki/Desktop')
File "/home/mglowacki/anaconda3/envs/py37/lib/python3.7/site-packages/spyder_kernels/customize/spydercustomize.py", line 827, in runfile
execfile(filename, namespace)
File "/home/mglowacki/anaconda3/envs/py37/lib/python3.7/site-packages/spyder_kernels/customize/spydercustomize.py", line 110, in execfile
exec(compile(f.read(), filename, 'exec'), namespace)
File "/home/mglowacki/Desktop/issue__.py", line 31, in <module>
ct.get_feature_names()
File "/home/mglowacki/anaconda3/envs/py37/lib/python3.7/site-packages/sklearn/compose/_column_transformer.py", line 365, in get_feature_names
trans.get_feature_names()])
File "/home/mglowacki/anaconda3/envs/py37/lib/python3.7/site-packages/category_encoders/one_hot.py", line 406, in get_feature_names
'Must transform data first. Affected feature names are not known before.')
ValueError: Must transform data first. Affected feature names are not known before.
jnothman
changed the title
ColumnTransformer - undesired behavior when Transformer provided with empty list
ColumnTransformer - undesired get_feature_names behavior when Transformer provided with empty list
Dec 21, 2019
Description
I want to use
ColumnTransformer
withmake_column_selector
.When all transformers provide
get_feature_names
then for wholeColumnTransformer
methodget_feature_names
works afterfit
.Unfortunately when
make_column_selector
returns empty list of columns then transformer stays unfitted (discussed: #12071) and subsequently raise error onget_feature_names()
. In my toy-example, I've used OneHotEncoder from https://github.com/scikit-learn-contrib/categorical-encoding.Current behavior is not a bug but is undesired if we want resilient pipelines. As a solution, I see attributes:
transformers_in
,transformers_out
forColumnTransformer
similarly to plannedfeatures_in
,features_out
. In such caseget_feature_names
will be applied only ontransformers_out
.Steps/Code to Reproduce
Actual Results
Versions
The text was updated successfully, but these errors were encountered: