Skip to content

ColumnTransformer breaks where X is a list #12096

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
jnothman opened this issue Sep 17, 2018 · 5 comments · Fixed by #12104
Closed

ColumnTransformer breaks where X is a list #12096

jnothman opened this issue Sep 17, 2018 · 5 comments · Fixed by #12104
Labels
Bug Easy Well-defined and straightforward way to resolve help wanted
Milestone

Comments

@jnothman
Copy link
Member

>>> from sklearn.preprocessing import StandardScaler
>>> from sklearn.compose import ColumnTransformer
>>> ColumnTransformer([('foobar', StandardScaler(), [0, 1, 2])]).fit([[1, 2, 3]])
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/Users/joel/repos/scikit-learn/sklearn/compose/_column_transformer.py", line 398, in fit
    self.fit_transform(X, y=y)
  File "/Users/joel/repos/scikit-learn/sklearn/compose/_column_transformer.py", line 422, in fit_transform
    self._validate_remainder(X)
  File "/Users/joel/repos/scikit-learn/sklearn/compose/_column_transformer.py", line 275, in _validate_remainder
    n_columns = X.shape[1]
AttributeError: 'list' object has no attribute 'shape'

The passed list should be interpreted as an array for the sake of extracting columns. Instead an error is raised.

@jnothman jnothman added Bug Easy Well-defined and straightforward way to resolve help wanted labels Sep 17, 2018
@jnothman jnothman added this to the 0.20 milestone Sep 17, 2018
@jorisvandenbossche
Copy link
Member

Yes, would be good to convert the list to an array, as is done for other estimators (I would however not consider it as a blocker for 0.20)

@jnothman
Copy link
Member Author

jnothman commented Sep 19, 2018 via email

@jnothman
Copy link
Member Author

jnothman commented Sep 19, 2018 via email

@ghost
Copy link

ghost commented Sep 23, 2018

Has the issue solved?

@jorisvandenbossche
Copy link
Member

@Aditya1994 there is an open PR: #12104

amueller pushed a commit that referenced this issue Sep 25, 2018
<!--
Thanks for contributing a pull request! Please ensure you have taken a look at
the contribution guidelines: https://github.com/scikit-learn/scikit-learn/blob/master/CONTRIBUTING.md#pull-request-checklist
-->

#### Reference Issues/PRs
<!--
Example: Fixes #1234. See also #3456.
Please use keywords (e.g., Fixes) to create link to the issues or pull requests
you resolved, so that they will automatically be closed when your pull request
is merged. See https://github.com/blog/1506-closing-issues-via-pull-requests
-->
Fixes #12096.

#### What does this implement/fix? Explain your changes.
Converts the input list for ColumnTransformer to a numpy array.

Added a check inside `transform` and `fit_transform` to check if the input `X` is a list, if it is then it gets converted to a numpy array.

#### Any other comments?
Should this conversion be documented in the docstrings for ColumnTransfomer's `fit`, `transform` and `fit_transform`?

<!--
Please be aware that we are a loose team of volunteers so patience is
necessary; assistance handling other issues is very welcome. We value
all user contributions, no matter how minor they are. If we are slow to
review, either the pull request needs some benchmarking, tinkering,
convincing, etc. or more likely the reviewers are simply busy. In either
case, we ask for your understanding during the review process.
For more information, see our FAQ on this topic:
http://scikit-learn.org/dev/faq.html#why-is-my-pull-request-not-getting-any-attention.

Thanks for contributing!
-->
amueller pushed a commit that referenced this issue Sep 25, 2018
<!--
Thanks for contributing a pull request! Please ensure you have taken a look at
the contribution guidelines: https://github.com/scikit-learn/scikit-learn/blob/master/CONTRIBUTING.md#pull-request-checklist
-->

#### Reference Issues/PRs
<!--
Example: Fixes #1234. See also #3456.
Please use keywords (e.g., Fixes) to create link to the issues or pull requests
you resolved, so that they will automatically be closed when your pull request
is merged. See https://github.com/blog/1506-closing-issues-via-pull-requests
-->
Fixes #12096.

#### What does this implement/fix? Explain your changes.
Converts the input list for ColumnTransformer to a numpy array.

Added a check inside `transform` and `fit_transform` to check if the input `X` is a list, if it is then it gets converted to a numpy array.

#### Any other comments?
Should this conversion be documented in the docstrings for ColumnTransfomer's `fit`, `transform` and `fit_transform`?

<!--
Please be aware that we are a loose team of volunteers so patience is
necessary; assistance handling other issues is very welcome. We value
all user contributions, no matter how minor they are. If we are slow to
review, either the pull request needs some benchmarking, tinkering,
convincing, etc. or more likely the reviewers are simply busy. In either
case, we ask for your understanding during the review process.
For more information, see our FAQ on this topic:
http://scikit-learn.org/dev/faq.html#why-is-my-pull-request-not-getting-any-attention.

Thanks for contributing!
-->
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Easy Well-defined and straightforward way to resolve help wanted
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants