-
-
Notifications
You must be signed in to change notification settings - Fork 25.8k
FIX Ensure that index is correct when global transform_output is pandas #26454
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
FIX Ensure that index is correct when global transform_output is pandas #26454
Conversation
@@ -627,7 +627,7 @@ def _initial_imputation(self, X, in_fit=False): | |||
strategy=self.initial_strategy, | |||
fill_value=self.fill_value, | |||
keep_empty_features=self.keep_empty_features, | |||
) | |||
).set_output(transform="default") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't love this, cause it seems it could be quite buggy, as in, in our codebase I don't know where else we have such instances where we're not explicitly setting the output type.
I would suggest having a default which is to raise and be like: you haven't explicitly set this value for this estimator, please do so, and have that mode for scikit-learn at least.
This way we'd know we should always set it explicitly to: default, pandas, or the global config.
WDYT?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can get behind a new "raise_if_not_configured" configuration flag. Implementation-wise we'll need a standard place to check that configuration at fit
. That can be in _validate_data
or _fit_context
. _fit_context
is a more natural place but it is not used everywhere yet.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
_fit_context is a more natural place but it is not used everywhere yet.
It will be the case when #26473 is merged
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I really don't like this solution, but rather have in the release than not.
Even with the suggestion in #26454 (comment), we'll end up with the same solution as this PR. The suggestion would improve our coverage for catching this type of bug. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Reference Issues/PRs
Fixes #26443
What does this implement/fix? Explain your changes.
This PR updates the common test to ensure that the output dataframe from
transform
is consistent with the input. I updatedIsomap
,IterativeImputer
andPowerTransformer
so they pass the updated common test.