Skip to content

FIX Remove validation from __init__ and set_params for ColumnTransformer #22537

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 4 commits into from
Mar 9, 2022

Conversation

iofall
Copy link
Contributor

@iofall iofall commented Feb 18, 2022

Addresses #21406

What does this implement/fix? Explain your changes.

  1. Although no explicit validation was defined in the __init__() method or the set_params() method, due to the nature of ColumnTransformer and its internal implementation, for some smoke test values in test_common.py, we did not pass the test.
  2. To solve this we used try-except to handle these input based errors.

Any other comments?

  1. We pass all the tests for test_common.py and test_column_transformer.py
  2. We already use the try-except handler in metaestimators.py, however since set_params() internally calls get_params() which is defined by iterating over self.transformers in ColumnTransformer, we need to handle the exception in cases where the self.transformers attribute does not have an iterator.

Questions:

  1. I wonder in this case should we try to handle validation the way we have done above or would it be useful to keep the class as it already is. That way we are more explicit and users get an immediate error when they try to set the params in an improper format (Although this would mean going against our convention).

Other Contributors:

CC: @arisayosh

Co-authored-by: iofall <50991099+iofall@users.noreply.github.com>
Co-authored-by: arisayosh <15692997+arisayosh@users.noreply.github.com>
Co-authored-by: iofall <50991099+iofall@users.noreply.github.com>
Co-authored-by: arisayosh <15692997+arisayosh@users.noreply.github.com>
Copy link
Member

@thomasjpfan thomasjpfan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR! Fundamentally, I would prefer to do non-data validating in of __init__ and set_params, but that is a whole API discussion.

As for this PR, I think catching the error is good solution for avoiding the warning.

Comment on lines 59 to 65
try:
item_names, _ = zip(*items)
for name in list(params.keys()):
if "__" not in name and name in item_names:
self._replace_estimator(attr, name, params.pop(name))
except TypeError:
pass
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do not think we need this try/except because of the explicit check for list above.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So this is for a special case - Say someone defines their ColumnTransformer as est = ColumnTransformer(transformers=[1]) due to line 60, we get an error that TypeError: not an iterator when we try to zip it.

In the smoke values we have a similar test for an empty list, this try-except just takes care of a list which does not have a proper name.

If you add to the smoke values test one more value as [1], we get 7 failures. This try-except helps remove those failures which are raised for ColumnTransformer and ensemble modules.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah I see. Can you add the [1] to the smoke test?

Also a comment to expand on why we are catching the TypeError:

Suggested change
try:
item_names, _ = zip(*items)
for name in list(params.keys()):
if "__" not in name and name in item_names:
self._replace_estimator(attr, name, params.pop(name))
except TypeError:
pass
# `zip` raises a TypeError when `items` does not contains
# elements of length 2
with suppress(TypeError):
item_names, _ = zip(*items)
for name in list(params.keys()):
if "__" not in name and name in item_names:
self._replace_estimator(attr, name, params.pop(name))

iofall and others added 2 commits February 19, 2022 03:34
Co-authored-by: iofall <50991099+iofall@users.noreply.github.com>
Co-authored-by: arisayosh <15692997+arisayosh@users.noreply.github.com>
Co-authored-by: iofall <50991099+iofall@users.noreply.github.com>
Co-authored-by: arisayosh <15692997+arisayosh@users.noreply.github.com>
@iofall iofall requested a review from thomasjpfan March 2, 2022 17:24
@thomasjpfan thomasjpfan changed the title Remove validation from __init__ and set_params for ColumnTransformer FIX Remove validation from __init__ and set_params for ColumnTransformer Mar 3, 2022
Copy link
Member

@thomasjpfan thomasjpfan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@glemaitre glemaitre merged commit ff1c6f3 into scikit-learn:main Mar 9, 2022
@glemaitre
Copy link
Member

Thanks @iofall

@avm19
Copy link
Contributor

avm19 commented Mar 31, 2022

A quick question related to PR #22574, where I implement an estimator, which is in many ways similar to ColumnTransformer. @thomasjpfan, from what you wrote:

Thanks for the PR! Fundamentally, I would prefer to do non-data validating in of __init__ and set_params, but that is a whole API discussion.

As for this PR, I think catching the error is good solution for avoiding the warning.

I understand that this PR's solution is not ideal in principle, but it is accepted as optimal for now and the foreseeable future. Is this interpretation correct? If so, I will this PR's changes as example. Thanks.

@thomasjpfan
Copy link
Member

Yes it's the acceptable solution for foreseeable future.

avm19 pushed a commit to avm19/scikit-learn that referenced this pull request Apr 1, 2022
avm19 pushed a commit to avm19/scikit-learn that referenced this pull request Apr 1, 2022
glemaitre pushed a commit to glemaitre/scikit-learn that referenced this pull request Apr 6, 2022
…mer (scikit-learn#22537)

Co-authored-by: iofall <50991099+iofall@users.noreply.github.com>
Co-authored-by: arisayosh <15692997+arisayosh@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants