fix check_array dtype check for pandas series #12625

amueller · 2018-11-20T16:10:31Z

Reference Issues/PRs

Example: Fixes #12622

What does this implement/fix? Explain your changes.

Can't call set on zero-ndim array.

does this need a whatsnew? probably?

jorisvandenbossche · 2018-11-20T19:46:02Z

sklearn/utils/validation.py

@@ -478,7 +478,7 @@ def check_array(array, accept_sparse=False, accept_large_sparse=True,
    # DataFrame), and store them. If not, store None.
    dtypes_orig = None
    if hasattr(array, "dtypes") and hasattr(array, "__array__"):
-        dtypes_orig = np.array(array.dtypes)
+        dtypes_orig = np.array(array.dtypes, ndmin=1)


I am not sure this is the "correct" fix. It fixes the problem, but in essence, a Series should not take this path as it can never have multiple dtypes in it. So I would rather ensure that a Series does not pass this if check

@qinhanmin2014 had one in #12622 (comment)

I am trying to think of a robust "duck type check" for Series .. Personally, I would actually start doing actual isinstance checks (we could eg have a util function that combines that with trying to import pandas), but that's maybe a broader issue.

we could also do and array.dtypes.ndim so it stays None?

Yes, that's a good idea (for sure nicer than the hasattr(array.dtypes, "__array__"))

rth

LGTM, a "what's new" is probably needed?

rth · 2018-11-20T21:17:20Z

sklearn/utils/tests/test_validation.py

+def test_check_array_series():
+    # regression test that check_array works on pandas Series
+    pd = importorskip("pandas")
+    check_array(pd.Series([1, 2, 3]), ensure_2d=False, warn_on_dtype=True)


maybe assert that the output is equal to array([1, 2, 3])?

ogrisel

LGTM as well besides the CI failure on style :) Too many reviews.

amueller · 2018-11-20T21:37:42Z

hopefully good now ;)

jnothman · 2018-11-20T22:21:22Z

doc/whats_new/v0.20.rst

@@ -176,6 +176,10 @@ Changelog
  precision issues in :class:`preprocessing.StandardScaler` and
  :class:`decomposition.IncrementalPCA` when using float32 datasets.
  :issue:`12338` by :user:`bauks <bauks>`.
+
+- |Fix| Calling :func:`utils.check_array` on pandas `Series`, which


(I think you can use `pandas.Series` for an intersphinx link)

I was wondering after I saw the edit. Would be cool ;)

amueller · 2018-11-20T22:24:04Z

thanks!

…al) (#12706) Closes #12699. Related to #12625

…al) (scikit-learn#12706) Closes scikit-learn#12699. Related to scikit-learn#12625

…2625)" This reverts commit d03931a.

…al) (scikit-learn#12706) Closes scikit-learn#12699. Related to scikit-learn#12625

fix check_array dtype check for pandas series

0d85e3e

amueller added this to the 0.20.1 milestone Nov 20, 2018

amueller mentioned this pull request Nov 20, 2018

[MRG+2] warn_on_dtype for DataFrames #10949

Merged

add comment explaining fix

2e35165

eamanu approved these changes Nov 20, 2018

View reviewed changes

jorisvandenbossche reviewed Nov 20, 2018

View reviewed changes

don't define dtypes for series

75ed85b

jorisvandenbossche approved these changes Nov 20, 2018

View reviewed changes

rth approved these changes Nov 20, 2018

View reviewed changes

amueller added 2 commits November 20, 2018 16:20

check output of check_array on series

87cd3fa

add whatsnew for pandas series fix, fix link to my website

7bfaf37

ogrisel approved these changes Nov 20, 2018

View reviewed changes

fix flake8

6734c93

Update v0.20.rst

b2e0849

jnothman approved these changes Nov 20, 2018

View reviewed changes

jnothman merged commit 104f684 into scikit-learn:master Nov 20, 2018

jnothman pushed a commit to jnothman/scikit-learn that referenced this pull request Nov 20, 2018

FIX check_array dtype check for pandas series (scikit-learn#12625)

4aac96d

rth mentioned this pull request Nov 30, 2018

utils.validation.check_array throws bad TypeError pandas series is passed in #12699

Closed

jorisvandenbossche mentioned this pull request Dec 1, 2018

BUG: fix check_array on pandas Series with custom dtype (eg categorical) #12706

Merged

jnothman pushed a commit that referenced this pull request Dec 3, 2018

BUG: fix check_array on pandas Series with custom dtype (eg categoric…

4f5dd77

…al) (#12706) Closes #12699. Related to #12625

amueller pushed a commit to amueller/scikit-learn that referenced this pull request Dec 14, 2018

BUG: fix check_array on pandas Series with custom dtype (eg categoric…

8daa852

…al) (scikit-learn#12706) Closes scikit-learn#12699. Related to scikit-learn#12625

amueller pushed a commit to amueller/scikit-learn that referenced this pull request Dec 17, 2018

BUG: fix check_array on pandas Series with custom dtype (eg categoric…

765f3e4

…al) (scikit-learn#12706) Closes scikit-learn#12699. Related to scikit-learn#12625

xhluca pushed a commit to xhluca/scikit-learn that referenced this pull request Apr 28, 2019

FIX check_array dtype check for pandas series (scikit-learn#12625)

d03931a

xhluca pushed a commit to xhluca/scikit-learn that referenced this pull request Apr 28, 2019

BUG: fix check_array on pandas Series with custom dtype (eg categoric…

0de50b4

…al) (scikit-learn#12706) Closes scikit-learn#12699. Related to scikit-learn#12625

xhluca pushed a commit to xhluca/scikit-learn that referenced this pull request Apr 28, 2019

Revert "FIX check_array dtype check for pandas series (scikit-learn#1…

b17111d

…2625)" This reverts commit d03931a.

xhluca pushed a commit to xhluca/scikit-learn that referenced this pull request Apr 28, 2019

Revert "FIX check_array dtype check for pandas series (scikit-learn#1…

5d473a8

…2625)" This reverts commit d03931a.

koenvandevelde pushed a commit to koenvandevelde/scikit-learn that referenced this pull request Jul 12, 2019

FIX check_array dtype check for pandas series (scikit-learn#12625)

eb35f2a

koenvandevelde pushed a commit to koenvandevelde/scikit-learn that referenced this pull request Jul 12, 2019

BUG: fix check_array on pandas Series with custom dtype (eg categoric…

c48c7d3

…al) (scikit-learn#12706) Closes scikit-learn#12699. Related to scikit-learn#12625

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix check_array dtype check for pandas series #12625

fix check_array dtype check for pandas series #12625

amueller commented Nov 20, 2018

jorisvandenbossche Nov 20, 2018

jorisvandenbossche Nov 20, 2018

amueller Nov 20, 2018

jorisvandenbossche Nov 20, 2018

rth left a comment

rth Nov 20, 2018

ogrisel left a comment

amueller commented Nov 20, 2018

jnothman Nov 20, 2018

amueller Nov 20, 2018

amueller commented Nov 20, 2018

fix check_array dtype check for pandas series #12625

fix check_array dtype check for pandas series #12625

Conversation

amueller commented Nov 20, 2018

Reference Issues/PRs

What does this implement/fix? Explain your changes.

jorisvandenbossche Nov 20, 2018

Choose a reason for hiding this comment

jorisvandenbossche Nov 20, 2018

Choose a reason for hiding this comment

amueller Nov 20, 2018

Choose a reason for hiding this comment

jorisvandenbossche Nov 20, 2018

Choose a reason for hiding this comment

rth left a comment

Choose a reason for hiding this comment

rth Nov 20, 2018

Choose a reason for hiding this comment

ogrisel left a comment

Choose a reason for hiding this comment

amueller commented Nov 20, 2018

jnothman Nov 20, 2018

Choose a reason for hiding this comment

amueller Nov 20, 2018

Choose a reason for hiding this comment

amueller commented Nov 20, 2018