FIX Ensure that index is correct when global transform_output is pandas #26454

thomasjpfan · 2023-05-28T20:33:51Z

Reference Issues/PRs

What does this implement/fix? Explain your changes.

This PR updates the common test to ensure that the output dataframe from transform is consistent with the input. I updated Isomap, IterativeImputer and PowerTransformer so they pass the updated common test.

sklearn/utils/estimator_checks.py

adrinjalali · 2023-06-01T13:58:31Z

sklearn/impute/_iterative.py

@@ -627,7 +627,7 @@ def _initial_imputation(self, X, in_fit=False):
                strategy=self.initial_strategy,
                fill_value=self.fill_value,
                keep_empty_features=self.keep_empty_features,
-            )
+            ).set_output(transform="default")


I don't love this, cause it seems it could be quite buggy, as in, in our codebase I don't know where else we have such instances where we're not explicitly setting the output type.

I would suggest having a default which is to raise and be like: you haven't explicitly set this value for this estimator, please do so, and have that mode for scikit-learn at least.

This way we'd know we should always set it explicitly to: default, pandas, or the global config.

WDYT?

I can get behind a new "raise_if_not_configured" configuration flag. Implementation-wise we'll need a standard place to check that configuration at fit. That can be in _validate_data or _fit_context. _fit_context is a more natural place but it is not used everywhere yet.

_fit_context is a more natural place but it is not used everywhere yet.

It will be the case when #26473 is merged

adrinjalali

I really don't like this solution, but rather have in the release than not.

thomasjpfan · 2023-06-14T10:45:26Z

Even with the suggestion in #26454 (comment), we'll end up with the same solution as this PR. The suggestion would improve our coverage for catching this type of bug.

jeremiedbb

LGTM

…as (scikit-learn#26454)

thomasjpfan added 2 commits May 28, 2023 16:31

FIX Ensure that index is correct when global transform_output is pandas

b8092e7

DOC Adds PR number

a3b65c8

thomasjpfan mentioned this pull request May 28, 2023

PowerTransformer returns inconsistent index when transform output is set globally #26443

Closed

thomasjpfan added this to the 1.3 milestone May 28, 2023

adrinjalali reviewed Jun 1, 2023

View reviewed changes

Merge remote-tracking branch 'upstream/main' into fix_power_transformer

59fc3b5

adrinjalali approved these changes Jun 14, 2023

View reviewed changes

Merge remote-tracking branch 'upstream/main' into fix_power_transformer

ea08009

jeremiedbb approved these changes Jun 14, 2023

View reviewed changes

jeremiedbb merged commit 96878ba into scikit-learn:main Jun 14, 2023

REDVM pushed a commit to REDVM/scikit-learn that referenced this pull request Nov 16, 2023

FIX Ensure that index is correct when global transform_output is pand…

4d2dafa

…as (scikit-learn#26454)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

FIX Ensure that index is correct when global transform_output is pandas #26454

FIX Ensure that index is correct when global transform_output is pandas #26454

Uh oh!

thomasjpfan commented May 28, 2023

Uh oh!

Uh oh!

adrinjalali Jun 1, 2023

Uh oh!

thomasjpfan Jun 14, 2023

Uh oh!

jeremiedbb Jun 14, 2023

Uh oh!

adrinjalali left a comment

Uh oh!

thomasjpfan commented Jun 14, 2023

Uh oh!

jeremiedbb left a comment

Uh oh!

Uh oh!

Uh oh!

FIX Ensure that index is correct when global transform_output is pandas #26454

FIX Ensure that index is correct when global transform_output is pandas #26454

Uh oh!

Conversation

thomasjpfan commented May 28, 2023

Reference Issues/PRs

What does this implement/fix? Explain your changes.

Uh oh!

Uh oh!

adrinjalali Jun 1, 2023

Choose a reason for hiding this comment

Uh oh!

thomasjpfan Jun 14, 2023

Choose a reason for hiding this comment

Uh oh!

jeremiedbb Jun 14, 2023

Choose a reason for hiding this comment

Uh oh!

adrinjalali left a comment

Choose a reason for hiding this comment

Uh oh!

thomasjpfan commented Jun 14, 2023

Uh oh!

jeremiedbb left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!