Skip to content

ENH add metadata routing to ColumnTransformer #27005

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 16 commits into from
Aug 21, 2023

Conversation

adrinjalali
Copy link
Member

@adrinjalali adrinjalali commented Aug 3, 2023

Towards #22893

This adds metadata routing to ColumnTransformer.

It also adds some docstrings to private methods and does a a bit of refactoring / clean up.

Fixes #19465
Fixes #24490

@github-actions
Copy link

github-actions bot commented Aug 3, 2023

✔️ Linting Passed

All linting checks passed. Your pull request is in excellent shape! ☀️

Generated for commit: 01daf0c. Link to the linter CI: here

@adrinjalali adrinjalali marked this pull request as ready for review August 7, 2023 15:58
@adrinjalali
Copy link
Member Author

Copy link
Contributor

@OmarManzoor OmarManzoor left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR @adrinjalali . Here are a few comments considering files other than the tests.

@glemaitre glemaitre self-requested a review August 11, 2023 20:37
Copy link
Member

@glemaitre glemaitre left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me. Only some nitpicks.

@@ -591,7 +613,9 @@ def _update_fitted_transformers(self, transformers):
transformers_ = []
self._name_to_fitted_passthrough = {}

for name, old, column, _ in self._iter():
for name, old, column, _ in self._iter(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since this function was complex to understand, do you mind putting a docstring (at least a summary).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have another branch where I'm doing more docstrings and some refactoring, adding the docstring there. Wanted to keep this PR somewhat small.

_transform_one.

fitted : bool
Used to get an iterable of transformers. If True, use the fitted
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could almost infer this parameter from the func callable: if requesting _transform then you expect it to be fitted=True, if _fit in the name then fitted=False. I don't know if we having the keyword makes more explicit in some ways?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fair enough, remember to leave this comment in the upcoming refactoring PR please 😁

Copy link
Contributor

@OmarManzoor OmarManzoor left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A small suggestion, otherwise LGTM. Thanks @adrinjalali

Co-authored-by: Omar Salman <omar.salman@arbisoft.com>
fitted=False, column_as_strings=False, replace_strings=True
):
if step is None:
continue
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It might be safer to have a test for this uncovered line.

Copy link
Member

@glemaitre glemaitre left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Apart of the missing line, LGTM.

@adrinjalali
Copy link
Member Author

Apart of the missing line, LGTM.

@glemaitre doesn't seem like we're checking for a step being None anywhere else, so I removed the line.

@glemaitre
Copy link
Member

Fine with me if the tests are passing. Activating the auto-merge here.

@glemaitre glemaitre enabled auto-merge (squash) August 21, 2023 12:00
@glemaitre glemaitre merged commit 96b5814 into scikit-learn:main Aug 21, 2023
TamaraAtanasoska pushed a commit to TamaraAtanasoska/scikit-learn that referenced this pull request Aug 21, 2023
Co-authored-by: Omar Salman <omar.salman@arbisoft.com>
@adrinjalali adrinjalali deleted the slep6/columntransformer branch August 24, 2023 11:16
akaashpatelmns pushed a commit to akaashp2000/scikit-learn that referenced this pull request Aug 25, 2023
Co-authored-by: Omar Salman <omar.salman@arbisoft.com>
avm19 pushed a commit to avm19/scikit-learn that referenced this pull request Sep 13, 2023
REDVM pushed a commit to REDVM/scikit-learn that referenced this pull request Nov 16, 2023
Co-authored-by: Omar Salman <omar.salman@arbisoft.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
3 participants