-
-
Notifications
You must be signed in to change notification settings - Fork 25.8k
ColumnTransformer give "TypeError: invalid type promotion" #20090
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
I get a different traceback with the latest library version: In [1]: from sklearn import preprocessing, compose
...: import numpy as np
...: import pandas as pd
...: import datetime
...:
...: prng = np.random.default_rng()
...: d = pd.DataFrame(prng.random((20,3)), columns = ["aaa", "bbb", "ccc"])
...: d["time"] = datetime.datetime.now()
...:
...: columns = ["aaa", "bbb", "ccc"]
...: t = compose.ColumnTransformer(
...: [("stnd", preprocessing.StandardScaler(), columns)],
...: remainder="passthrough"
...: )
...: t.fit_transform(d)
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-1-321b39c9e11e> in <module>
13 remainder="passthrough"
14 )
---> 15 t.fit_transform(d)
~/Documents/packages/scikit-learn/sklearn/compose/_column_transformer.py in fit_transform(self, X, y)
571 self._record_output_indices(Xs)
572
--> 573 return self._hstack(list(Xs))
574
575 def transform(self, X):
~/Documents/packages/scikit-learn/sklearn/compose/_column_transformer.py in _hstack(self, Xs)
657 else:
658 Xs = [f.toarray() if sparse.issparse(f) else f for f in Xs]
--> 659 return np.hstack(Xs)
660
661 def _sk_visual_block_(self):
<__array_function__ internals> in hstack(*args, **kwargs)
~/miniconda3/envs/dev/lib/python3.9/site-packages/numpy/core/shape_base.py in hstack(tup)
344 return _nx.concatenate(arrs, 0)
345 else:
--> 346 return _nx.concatenate(arrs, 1)
347
348
<__array_function__ internals> in concatenate(*args, **kwargs)
TypeError: The DTypes <class 'numpy.dtype[datetime64]'> and <class 'numpy.dtype[float64]'> do not have a common DType. For example they cannot be stored in a single array unless the dtype is `object`.
In [2]: import sklearn; sklearn.show_versions()
System:
python: 3.9.1 (default, Dec 11 2020, 14:32:07) [GCC 7.3.0]
executable: /home/glemaitre/miniconda3/envs/dev/bin/python
machine: Linux-5.8.0-50-generic-x86_64-with-glibc2.32
Python dependencies:
pip: 20.3.3
setuptools: 52.0.0.post20210125
sklearn: 1.0.dev0
numpy: 1.20.0
scipy: 1.6.0
Cython: 0.29.21
pandas: 1.2.1
matplotlib: 3.3.4
joblib: 1.0.0
threadpoolctl: 2.1.0
Built with OpenMP: True I get a different traceback but it comes to some concatenation with numpy that fails due to the datetime column. We should check why it does work when we pass a subset of the feature. |
I had a quick look at this. |
take |
Same problem here. The function does not work as expected even if I add another dummy column to the data frame so that my "Date" column is not the only one that gets passed through. It transforms both columns |
With |
Versions
sklearn 0.23.2
Description
Instantiating
ColumnTransformer
with the remainder argument set to "passthrough" produces a TypeError under certain circumstances. I narrowed down one such circumstance.The error occurs when exactly n - 1 columns are transformed (where n is the total number of columns) and the one column that gets passed through (i.e., not transformed) has a dtype that cannot be converted to that of the other columns. The root cause is that sklearn tries to combine the arrays with
numpy.hstack
and fails.Code to Reproduce
The above code produces "TypeError: invalid type promotion". The dtype of the "time" column is datetime and that of the others is float. Like described above, letting only the "time" column to pass through results in hstack failing when it tries to concatenate the arrays. If you change columns to
columns = ["aaa", "bbb"]
, it works as is expected. Also changing the remainder argument toremainder="drop"
also works.The text was updated successfully, but these errors were encountered: