-
-
Notifications
You must be signed in to change notification settings - Fork 25.8k
ENH Improve set_output compatibility in ColumnTransformer #24699
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ENH Improve set_output compatibility in ColumnTransformer #24699
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am fine with the proposed behaviour. I think that this is something that third-party libraries could find useful.
X_df = pd.DataFrame({"feat1": [1, 2, 3], "feat2": [3, 4, 5]}) | ||
|
||
X_wrapped = _wrap_in_pandas_container(X_df, columns=get_columns) | ||
assert_array_equal(X_wrapped.columns, X_df.columns) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since the documentation mentioned that raising an error is equivalent to None
, I think that we should test the case where we raise an error and we pass something else than a dataframe to check that we return range(X.shape[1])
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added test in 0fc62e6
(#24699) and adjusted it slightly in 2fb935f
(#24699)
Co-authored-by: Guillaume Lemaitre <g.lemaitre58@gmail.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Reference Issues/PRs
Follow up to #23734
What does this implement/fix? Explain your changes.
On
main
, if the inner transformers does not defineget_feature_names_out
, thenColumnTransformer
will error even if all the transformers return a DataFrame. This is becauseColumnTransformer.get_feature_names_out
is called to adjust the column names to followverbose_feature_names_out
.This PR makes
ColumnTransformer
more lenient toward transformers that return DataFrames but does not defineget_feature_names_out
. Feature names out are prefixed followingverbose_feature_names_out
. The prefixing logic is shared withget_feature_names_out
and refactored into a_add_prefix_for_feature_names_out
method.Any other comments?
I think it is common to have third-party transformers that only expect dataframes and will always return DataFrames regardless of how
set_output
is configured.