Skip to content

fix check_array dtype check for pandas series #12625

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 7 commits into from
Nov 20, 2018

Conversation

amueller
Copy link
Member

Reference Issues/PRs

Example: Fixes #12622

What does this implement/fix? Explain your changes.

Can't call set on zero-ndim array.

does this need a whatsnew? probably?

@amueller amueller added this to the 0.20.1 milestone Nov 20, 2018
@@ -478,7 +478,7 @@ def check_array(array, accept_sparse=False, accept_large_sparse=True,
# DataFrame), and store them. If not, store None.
dtypes_orig = None
if hasattr(array, "dtypes") and hasattr(array, "__array__"):
dtypes_orig = np.array(array.dtypes)
dtypes_orig = np.array(array.dtypes, ndmin=1)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure this is the "correct" fix. It fixes the problem, but in essence, a Series should not take this path as it can never have multiple dtypes in it. So I would rather ensure that a Series does not pass this if check

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@qinhanmin2014 had one in #12622 (comment)

I am trying to think of a robust "duck type check" for Series .. Personally, I would actually start doing actual isinstance checks (we could eg have a util function that combines that with trying to import pandas), but that's maybe a broader issue.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we could also do and array.dtypes.ndim so it stays None?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, that's a good idea (for sure nicer than the hasattr(array.dtypes, "__array__"))

Copy link
Member

@rth rth left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, a "what's new" is probably needed?

def test_check_array_series():
# regression test that check_array works on pandas Series
pd = importorskip("pandas")
check_array(pd.Series([1, 2, 3]), ensure_2d=False, warn_on_dtype=True)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe assert that the output is equal to array([1, 2, 3])?

Copy link
Member

@ogrisel ogrisel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM as well besides the CI failure on style :) Too many reviews.

@amueller
Copy link
Member Author

hopefully good now ;)

@@ -176,6 +176,10 @@ Changelog
precision issues in :class:`preprocessing.StandardScaler` and
:class:`decomposition.IncrementalPCA` when using float32 datasets.
:issue:`12338` by :user:`bauks <bauks>`.

- |Fix| Calling :func:`utils.check_array` on pandas `Series`, which
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(I think you can use `pandas.Series` for an intersphinx link)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was wondering after I saw the edit. Would be cool ;)

@jnothman jnothman merged commit 104f684 into scikit-learn:master Nov 20, 2018
@amueller
Copy link
Member Author

thanks!

jnothman pushed a commit to jnothman/scikit-learn that referenced this pull request Nov 20, 2018
jnothman pushed a commit that referenced this pull request Dec 3, 2018
amueller pushed a commit to amueller/scikit-learn that referenced this pull request Dec 14, 2018
amueller pushed a commit to amueller/scikit-learn that referenced this pull request Dec 17, 2018
xhluca pushed a commit to xhluca/scikit-learn that referenced this pull request Apr 28, 2019
xhluca pushed a commit to xhluca/scikit-learn that referenced this pull request Apr 28, 2019
xhluca pushed a commit to xhluca/scikit-learn that referenced this pull request Apr 28, 2019
xhluca pushed a commit to xhluca/scikit-learn that referenced this pull request Apr 28, 2019
koenvandevelde pushed a commit to koenvandevelde/scikit-learn that referenced this pull request Jul 12, 2019
koenvandevelde pushed a commit to koenvandevelde/scikit-learn that referenced this pull request Jul 12, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

TypeError: "iteration over a 0-d array" when trying to preprocessing.scale a pandas.Series
6 participants