Skip to content

ENH add feature_names_in_ in FeatureUnion #25220

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 35 commits into from
Jan 14, 2023

Conversation

it176131
Copy link
Contributor

@it176131 it176131 commented Dec 21, 2022

Reference Issues/PRs

ENHANCEMENT #24754

What does this implement/fix? Explain your changes.

The FeatureUnion class did not previously have the .feature_names_in_ attribute if fitted with a pandas.DataFrame. This allows access to the attribute.

Any other comments?

  • modified: sklearn/pipeline.py

    • added self._check_feature_names(...) to the .fit(...) method in FeatureUnion to allow access to the .feature_names_in_ attribute if X has features names, e.g. a pandas.DataFrame
    • updated FeatureUnion docstring to reflect the addition of .feature_names_in_ attribute
  • modified: sklearn/tests/test_pipeline.py

    • added test_feature_union_feature_names_in_() to test that FeatureUnion has a .feature_names_in_ attribute if fitted with a pandas.DataFrame and not if fitted with a numpy array

	- added self._check_feature_names(...) to the .fit(...) method in FeatureUnion to allow access to the `.feature_names_in_` attribute if `X` has features names, e.g. a pandas.DataFrame
	- updated FeatureUnion docstring to reflect the addition of .feature_names_in_ attribute

modified:   sklearn/tests/test_pipeline.py
	- added test_feature_union_feature_names_in_() to test that FeatureUnion has a `.feature_names_in_` attribute if fitted with a pandas.DataFrame and not if fitted with a numpy array
	- changelog updated with description of work
@it176131 it176131 changed the title FIX FeatureUnion feature_names_in_ attribute FEATURE FeatureUnion feature_names_in_ attribute Dec 22, 2022
	- made changelog description more precise
	- typo -- removed period (.) before `columns`
Copy link
Member

@thomasjpfan thomasjpfan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR!

	- removed `self._check_feature_names(...) from `.fit(...)` method in `FeatureUnion`
	- added `feature_names_in_()` property to `FeatureUnion` to use first transformer's `feature_names_in_` attribute if present

modified:   sklearn/tests/test_pipeline.py
	- updated docstring for `test_feature_union_feature_names_in_()` to be more precise
	- added additional assertions to check if the `feature_names_in_` attribute is available to `FeatureUnion` if it's instantiated with a transformer that has already been fit
	- updated changelog description to include `pandas.DataFrame`
	- corrected user signature to match github account
@glemaitre glemaitre changed the title FEATURE FeatureUnion feature_names_in_ attribute ENH add feature_names_in_ in FeatureUnion Dec 23, 2022
@it176131 it176131 requested review from thomasjpfan and Kshitij68 and removed request for thomasjpfan and Kshitij68 December 29, 2022 16:27
Copy link
Member

@thomasjpfan thomasjpfan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the update!

it176131 and others added 4 commits January 3, 2023 08:56
newline/whitespace between change log updates.

Co-authored-by: Thomas J. Fan <thomasjpfan@gmail.com>
added period at end of docstring

Co-authored-by: Thomas J. Fan <thomasjpfan@gmail.com>
	- removed train-test-split per code suggestion -- using `X` directly
@it176131
Copy link
Contributor Author

it176131 commented Jan 3, 2023

@thomasjpfan -- I've made the changes you suggested. Ready for another review! 🙂

Copy link
Member

@thomasjpfan thomasjpfan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@thomasjpfan thomasjpfan added the Waiting for Second Reviewer First reviewer is done, need a second one! label Jan 3, 2023
@it176131 it176131 changed the title ENH add feature_names_in_ in FeatureUnion [MRG] ENH add feature_names_in_ in FeatureUnion Jan 10, 2023
@betatim betatim added Quick Review For PRs that are quick to review and removed Waiting for Second Reviewer First reviewer is done, need a second one! labels Jan 13, 2023
@glemaitre glemaitre self-requested a review January 14, 2023 16:34
@glemaitre glemaitre changed the title [MRG] ENH add feature_names_in_ in FeatureUnion ENH add feature_names_in_ in FeatureUnion Jan 14, 2023
Copy link
Member

@glemaitre glemaitre left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Thanks @it176131

@glemaitre glemaitre merged commit bf03a63 into scikit-learn:main Jan 14, 2023
@it176131 it176131 deleted the feature_union_feature_names_in_ branch February 1, 2023 15:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
module:pipeline Quick Review For PRs that are quick to review
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants