Closed
Description
Describe the bug
check_array(X, dtype='numeric') converts all data to strings if X contains any strings.
This was discussed a while ago in #10229, but the implemented fix doesn't fix what looks to me like counter-intuitive and possibly buggy behavior.
Steps/Code to Reproduce
from sklearn.utils import check_array
arr = np.array([[1, 's'],
[1, 1]])
check_array(arr, dtype='numeric')
Expected Results
I would expect an error indicating that data could not be converted to a float.
Actual Results
FutureWarning: Beginning in version 0.22, arrays of bytes/strings will be converted to decimal numbers
if dtype='numeric'. It is recommended that you convert the array to a float dtype before using it in
scikit-learn, for example by using your_array = your_array.astype(np.float64).
return f(**kwargs)
array([['1', 's'],
['1', '1']], dtype='<U21')
Versions
System:
python: 3.8.5 | packaged by conda-forge | (default, Aug 29 2020, 01:18:42) [Clang 10.0.1 ]
executable: /Users/mdj292/anaconda/envs/xfactor/bin/python
machine: macOS-10.15.7-x86_64-i386-64bit
Python dependencies:
pip: 20.2.3
setuptools: 49.6.0.post20200814
sklearn: 0.23.2
numpy: 1.19.1
scipy: 1.5.2
Cython: None
pandas: 1.1.2
matplotlib: None
joblib: 0.16.0
threadpoolctl: 2.1.0
Built with OpenMP: True