Skip to content

check_array(X, dtype='numeric') converts all data to strings if X contains any strings #18660

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
zeromh opened this issue Oct 20, 2020 · 1 comment · Fixed by #18496
Closed

Comments

@zeromh
Copy link

zeromh commented Oct 20, 2020

Describe the bug

check_array(X, dtype='numeric') converts all data to strings if X contains any strings.

This was discussed a while ago in #10229, but the implemented fix doesn't fix what looks to me like counter-intuitive and possibly buggy behavior.

Steps/Code to Reproduce

from sklearn.utils import check_array
arr = np.array([[1, 's'],
                [1, 1]])
check_array(arr, dtype='numeric')

Expected Results

I would expect an error indicating that data could not be converted to a float.

Actual Results

FutureWarning: Beginning in version 0.22, arrays of bytes/strings will be converted to decimal numbers
 if dtype='numeric'. It is recommended that you convert the array to a float dtype before using it in 
scikit-learn, for example by using your_array = your_array.astype(np.float64).
  return f(**kwargs)

array([['1', 's'],
       ['1', '1']], dtype='<U21')

Versions

System:
python: 3.8.5 | packaged by conda-forge | (default, Aug 29 2020, 01:18:42) [Clang 10.0.1 ]
executable: /Users/mdj292/anaconda/envs/xfactor/bin/python
machine: macOS-10.15.7-x86_64-i386-64bit

Python dependencies:
pip: 20.2.3
setuptools: 49.6.0.post20200814
sklearn: 0.23.2
numpy: 1.19.1
scipy: 1.5.2
Cython: None
pandas: 1.1.2
matplotlib: None
joblib: 0.16.0
threadpoolctl: 2.1.0

Built with OpenMP: True

@glemaitre
Copy link
Member

The following PR will solve the issue and define the expected behaviour: #18496

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants