-
-
Notifications
You must be signed in to change notification settings - Fork 26.2k
Closed as not planned
Closed as not planned
Copy link
Description
Description
One needs to specify columns to which certain transformers are to be applied. However, when a dataset fed is missing one of the specified columns, an error is raised. I would like to still be able to use the ColumnTransformer, even when some of the specified columns are not present in the data. Is this possible?
Steps/Code to Reproduce
from sklearn.compose import ColumnTransformer
from sklearn.ensemble import RandomForestRegressor
from sklearn.experimental import enable_iterative_imputer
from sklearn.impute import IterativeImputer, SimpleImputer
from sklearn.pipeline import Pipeline
import sklearn.datasets
import pandas as pd
X, y = sklearn.datasets.fetch_openml('iris', 1, return_X_y=True)
X = pd.DataFrame(X, columns = ["one", "two", "three", "four"])
numeric_features = ["one", "two", "three", "four", "five"]
numeric_transformer = Pipeline(
steps=[
("imputer", SimpleImputer(strategy="constant", fill_value=0)),
]
)
preprocessor = ColumnTransformer(
transformers=[
("num", numeric_transformer, numeric_features),
],
remainder="drop",
sparse_threshold=0,
)
preprocessor.fit_transform(X)
Expected Results
A fitted df/array
Actual Results
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/usr/lib/python3.8/site-packages/sklearn/compose/_column_transformer.py", line 474, in fit_transform
self._validate_remainder(X)
File "/usr/lib/python3.8/site-packages/sklearn/compose/_column_transformer.py", line 315, in _validate_remainder
cols.extend(_get_column_indices(X, columns))
File "/usr/lib/python3.8/site-packages/sklearn/compose/_column_transformer.py", line 703, in _get_column_indices
return [all_columns.index(col) for col in columns]
File "/usr/lib/python3.8/site-packages/sklearn/compose/_column_transformer.py", line 703, in <listcomp>
return [all_columns.index(col) for col in columns]
ValueError: 'five' is not in list
Versions
System:
python: 3.8.0 (default, Oct 23 2019, 18:51:26) [GCC 9.2.0]
executable: /bin/python
machine: Linux-4.9.130-xxxx-std-ipv6-64-x86_64-with-glibc2.2.5
Python deps:
pip: 19.2.3
setuptools: 41.6.0
sklearn: 0.21.3
numpy: 1.17.4
scipy: 1.3.1
Cython: 0.29.14
pandas: 0.25.3