-
-
Notifications
You must be signed in to change notification settings - Fork 25.8k
FIX Remove validation from __init__ and set_params for ColumnTransformer #22537
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
FIX Remove validation from __init__ and set_params for ColumnTransformer #22537
Conversation
Co-authored-by: iofall <50991099+iofall@users.noreply.github.com> Co-authored-by: arisayosh <15692997+arisayosh@users.noreply.github.com>
Co-authored-by: iofall <50991099+iofall@users.noreply.github.com> Co-authored-by: arisayosh <15692997+arisayosh@users.noreply.github.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the PR! Fundamentally, I would prefer to do non-data validating in of __init__
and set_params
, but that is a whole API discussion.
As for this PR, I think catching the error is good solution for avoiding the warning.
sklearn/utils/metaestimators.py
Outdated
try: | ||
item_names, _ = zip(*items) | ||
for name in list(params.keys()): | ||
if "__" not in name and name in item_names: | ||
self._replace_estimator(attr, name, params.pop(name)) | ||
except TypeError: | ||
pass |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I do not think we need this try/except
because of the explicit check for list
above.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So this is for a special case - Say someone defines their ColumnTransformer
as est = ColumnTransformer(transformers=[1]
) due to line 60, we get an error that TypeError: not an iterator
when we try to zip
it.
In the smoke values we have a similar test for an empty list, this try-except
just takes care of a list which does not have a proper name
.
If you add to the smoke values test one more value as [1]
, we get 7 failures. This try-except
helps remove those failures which are raised for ColumnTransformer
and ensemble
modules.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah I see. Can you add the [1]
to the smoke test?
Also a comment to expand on why we are catching the TypeError
:
try: | |
item_names, _ = zip(*items) | |
for name in list(params.keys()): | |
if "__" not in name and name in item_names: | |
self._replace_estimator(attr, name, params.pop(name)) | |
except TypeError: | |
pass | |
# `zip` raises a TypeError when `items` does not contains | |
# elements of length 2 | |
with suppress(TypeError): | |
item_names, _ = zip(*items) | |
for name in list(params.keys()): | |
if "__" not in name and name in item_names: | |
self._replace_estimator(attr, name, params.pop(name)) |
Co-authored-by: iofall <50991099+iofall@users.noreply.github.com> Co-authored-by: arisayosh <15692997+arisayosh@users.noreply.github.com>
Co-authored-by: iofall <50991099+iofall@users.noreply.github.com> Co-authored-by: arisayosh <15692997+arisayosh@users.noreply.github.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Thanks @iofall |
A quick question related to PR #22574, where I implement an estimator, which is in many ways similar to ColumnTransformer. @thomasjpfan, from what you wrote:
I understand that this PR's solution is not ideal in principle, but it is accepted as optimal for now and the foreseeable future. Is this interpretation correct? If so, I will this PR's changes as example. Thanks. |
Yes it's the acceptable solution for foreseeable future. |
…mer (scikit-learn#22537) Co-authored-by: iofall <50991099+iofall@users.noreply.github.com> Co-authored-by: arisayosh <15692997+arisayosh@users.noreply.github.com>
Addresses #21406
What does this implement/fix? Explain your changes.
__init__()
method or theset_params()
method, due to the nature ofColumnTransformer
and its internal implementation, for some smoke test values intest_common.py
, we did not pass the test.try-except
to handle these input based errors.Any other comments?
test_common.py
andtest_column_transformer.py
try-except
handler inmetaestimators.py
, however sinceset_params()
internally callsget_params()
which is defined by iterating overself.transformers
inColumnTransformer
, we need to handle the exception in cases where theself.transformers
attribute does not have an iterator.Questions:
Other Contributors:
CC: @arisayosh