You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
In Scikit-Learn 1.0.0, if you fit a OneHotEncoder with a Pandas DataFrame, it records the feature names in feature_names_in_. Great! But if you fit it again with a NumPy array, the old feature names are not removed, even when the number of features in the NumPy array does not match the number of feature names in the old feature_names_in_.
This is probably true as well for other estimators, but I haven't tested it.
I feel like this could be a source of confusion and bugs. I believe feature_names_in_ should always be deleted when fit() is called (and possibly replaced with a new one if a DataFrame is passed to fit()). At the very least, it should be deleted if the number of features is different.
Thanks for the report. I think it would be great to include a fix for this in 1.0.1 but I let @jeremiedbb and @glemaitre decide since they volunteered to be release managers for 1.0.1.
Describe the bug
In Scikit-Learn 1.0.0, if you fit a
OneHotEncoder
with a Pandas DataFrame, it records the feature names infeature_names_in_
. Great! But if you fit it again with a NumPy array, the old feature names are not removed, even when the number of features in the NumPy array does not match the number of feature names in the oldfeature_names_in_
.This is probably true as well for other estimators, but I haven't tested it.
I feel like this could be a source of confusion and bugs. I believe
feature_names_in_
should always be deleted whenfit()
is called (and possibly replaced with a new one if a DataFrame is passed tofit()
). At the very least, it should be deleted if the number of features is different.Steps/Code to Reproduce
Expected Results
I expect
feature_names_in_
to be removed after the second call tofit()
with a NumPy array. Especially since the number of features has changed.Actual Results
Versions
System:
python: 3.7.12 (default, Sep 10 2021, 00:21:48) [GCC 7.5.0]
executable: /usr/bin/python3
machine: Linux-5.4.104+-x86_64-with-Ubuntu-18.04-bionic
Python dependencies:
pip: 21.1.3
setuptools: 57.4.0
sklearn: 1.0
numpy: 1.19.5
scipy: 1.4.1
Cython: 0.29.24
pandas: 1.1.5
matplotlib: 3.2.2
joblib: 1.0.1
threadpoolctl: 3.0.0
Built with OpenMP: True
The text was updated successfully, but these errors were encountered: