Skip to content

TypeError: "iteration over a 0-d array" when trying to preprocessing.scale a pandas.Series #12622

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
Andi144 opened this issue Nov 20, 2018 · 4 comments · Fixed by #12625
Closed

Comments

@Andi144
Copy link

Andi144 commented Nov 20, 2018

Description

When trying to call preprocessing.scale on a pandas.Series instance, an error is thrown with scikit-learn version 0.20.0. Version 0.19.1. works just fine. The documentation states that the input to preprocessing.scale can be "array-like", and pandas.Series should fulfill this requirement since it is a "one-dimensional ndarray".

Steps/Code to Reproduce

import pandas as pd
from sklearn import preprocessing

s = pd.Series([1.0, 2.0, 3.0])
preprocessing.scale(s)

Expected Results

This should be the output (as it is in version 0.19.1):

[-1.22474487,  0.        ,  1.22474487]

A workaround is replacing preprocessing.scale(s) with preprocessing.scale([i for i in s]), which also yields this output.

Actual Results

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-16-ef1d298414c3> in <module>
      3 
      4 s = pd.Series([1.0, 2.0, 3.0])
----> 5 preprocessing.scale(s)

~\anaconda3\envs\tensorflow\lib\site-packages\sklearn\preprocessing\data.py in scale(X, axis, with_mean, with_std, copy)
    143     X = check_array(X, accept_sparse='csc', copy=copy, ensure_2d=False,
    144                     warn_on_dtype=True, estimator='the scale function',
--> 145                     dtype=FLOAT_DTYPES, force_all_finite='allow-nan')
    146     if sparse.issparse(X):
    147         if with_mean:

~\anaconda3\envs\tensorflow\lib\site-packages\sklearn\utils\validation.py in check_array(array, accept_sparse, accept_large_sparse, dtype, order, copy, force_all_finite, ensure_2d, allow_nd, ensure_min_samples, ensure_min_features, warn_on_dtype, estimator)
    594 
    595     if (warn_on_dtype and dtypes_orig is not None and
--> 596             {array.dtype} != set(dtypes_orig)):
    597         # if there was at the beginning some other types than the final one
    598         # (for instance in a DataFrame that can contain several dtypes) then

TypeError: iteration over a 0-d array

Versions

System
------
    python: 3.6.7 |Anaconda, Inc.| (default, Oct 28 2018, 19:44:12) [MSC v.1915 64 bit (AMD64)]
executable: C:\Users\...\anaconda3\envs\tensorflow\python.exe
   machine: Windows-10-10.0.17134-SP0

Python deps
-----------
       pip: 18.1
setuptools: 40.6.2
   sklearn: 0.20.0
     numpy: 1.15.4
     scipy: 1.1.0
    Cython: None
    pandas: 0.23.4
@qinhanmin2014
Copy link
Member

qinhanmin2014 commented Nov 20, 2018

Minimal script to reproduce:

import pandas as pd
from sklearn.utils.validation import check_array
check_array(pd.Series([1, 2, 3]), ensure_2d=False, warn_on_dtype=True)

Related PR #10949

@amueller
Copy link
Member

Thanks for reporting. Yeah would be good to get this into 0.20.1, hrm...

@qinhanmin2014
Copy link
Member

Heading to bed, seems that an easy solution will be:
change

if hasattr(array, "dtypes") and hasattr(array, "__array__"):
        dtypes_orig = np.array(array.dtypes)

to something like

if hasattr(array, "dtypes") and hasattr(array, "__array__") and hasattr(array.dtypes, "__array__"):
        dtypes_orig = np.array(array.dtypes)

@amueller
Copy link
Member

Wow I think this must have been our quickest turn-around for a user-reported bug so far ;)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants