Description
Versions
sklearn 0.23.2
Description
Instantiating ColumnTransformer
with the remainder argument set to "passthrough" produces a TypeError under certain circumstances. I narrowed down one such circumstance.
The error occurs when exactly n - 1 columns are transformed (where n is the total number of columns) and the one column that gets passed through (i.e., not transformed) has a dtype that cannot be converted to that of the other columns. The root cause is that sklearn tries to combine the arrays with numpy.hstack
and fails.
Code to Reproduce
from sklearn import preprocessing, compose
import numpy as np
import pandas as pd
import datetime
prng = np.random.default_rng()
d = pd.DataFrame(prng.random((20,3)), columns = ["aaa", "bbb", "ccc"])
d["time"] = datetime.datetime.now()
columns = ["aaa", "bbb", "ccc"]
t = compose.ColumnTransformer(
[("stnd", preprocessing.StandardScaler(), columns)],
remainder="passthrough"
)
t.fit_transform(d)
The above code produces "TypeError: invalid type promotion". The dtype of the "time" column is datetime and that of the others is float. Like described above, letting only the "time" column to pass through results in hstack failing when it tries to concatenate the arrays. If you change columns to columns = ["aaa", "bbb"]
, it works as is expected. Also changing the remainder argument to remainder="drop"
also works.