-
-
Notifications
You must be signed in to change notification settings - Fork 25.8k
ENH add feature_names_in_ in FeatureUnion #25220
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ENH add feature_names_in_ in FeatureUnion #25220
Conversation
- added self._check_feature_names(...) to the .fit(...) method in FeatureUnion to allow access to the `.feature_names_in_` attribute if `X` has features names, e.g. a pandas.DataFrame - updated FeatureUnion docstring to reflect the addition of .feature_names_in_ attribute modified: sklearn/tests/test_pipeline.py - added test_feature_union_feature_names_in_() to test that FeatureUnion has a `.feature_names_in_` attribute if fitted with a pandas.DataFrame and not if fitted with a numpy array
- changelog updated with description of work
- made changelog description more precise
- typo -- removed period (.) before `columns`
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the PR!
- removed `self._check_feature_names(...) from `.fit(...)` method in `FeatureUnion` - added `feature_names_in_()` property to `FeatureUnion` to use first transformer's `feature_names_in_` attribute if present modified: sklearn/tests/test_pipeline.py - updated docstring for `test_feature_union_feature_names_in_()` to be more precise - added additional assertions to check if the `feature_names_in_` attribute is available to `FeatureUnion` if it's instantiated with a transformer that has already been fit
- updated changelog description to include `pandas.DataFrame` - corrected user signature to match github account
- added pandas import to `test_feature_union_feature_names_in_` so ImportError in azure-pipelines will pass
…it176131/scikit-learn into feature_union_feature_names_in_
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for the update!
newline/whitespace between change log updates. Co-authored-by: Thomas J. Fan <thomasjpfan@gmail.com>
added period at end of docstring Co-authored-by: Thomas J. Fan <thomasjpfan@gmail.com>
- removed train-test-split per code suggestion -- using `X` directly
@thomasjpfan -- I've made the changes you suggested. Ready for another review! 🙂 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Thanks @it176131
Reference Issues/PRs
ENHANCEMENT #24754
What does this implement/fix? Explain your changes.
The
FeatureUnion
class did not previously have the.feature_names_in_
attribute if fitted with apandas.DataFrame
. This allows access to the attribute.Any other comments?
modified: sklearn/pipeline.py
self._check_feature_names(...)
to the.fit(...)
method inFeatureUnion
to allow access to the.feature_names_in_
attribute ifX
has features names, e.g. apandas.DataFrame
FeatureUnion
docstring to reflect the addition of.feature_names_in_
attributemodified: sklearn/tests/test_pipeline.py
test_feature_union_feature_names_in_()
to test thatFeatureUnion
has a.feature_names_in_
attribute if fitted with apandas.DataFrame
and not if fitted with anumpy
array