Add inverse-transform to _set_output #27891

sky-2002 · 2023-12-02T18:40:49Z

Reference Issues/PRs

What does this implement/fix? Explain your changes.

The problem is mentioned in the issue linked above. Here is how the current solution works.

from sklearn.preprocessing import StandardScaler
from sklearn.datasets import load_breast_cancer

X, _ = load_breast_cancer(return_X_y=True, as_frame=True)

scaler = StandardScaler().set_output(transform="pandas").fit(X)
Xt = scaler.transform(X)

print(scaler.inverse_transform(Xt))

The above code will print a DataFrame.

github-actions · 2023-12-02T18:42:04Z

✔️ Linting Passed

All linting checks passed. Your pull request is in excellent shape! ☀️

_{Generated for commit: bd99f2f. Link to the linter CI: here}

Nish-Bhana · 2023-12-02T19:04:07Z

@sky-2002 By the sounds of it this works but is it robust enough since it wouldn't allow the inverse_transform method to have it's own output configuration but rather just share those of the transform method. Curious to get @glemaitre thoughts.

Perhaps a more robust solution would involve modifying the _SetOutputMixin class further with steps something like:

Add "inverse_transform" to the auto_wrap_output_keys tuple in the __init_subclass__ method of the _SetOutputMixin class.
- This step is to include "inverse_transform" as one of the methods that we want to automatically wrap when a subclass is created. The __init_subclass__ method is a special method in Python that is called when a subclass of the class is created. The auto_wrap_output_keys tuple contains the names of the methods that we want to automatically wrap.
Add a mapping for "inverse_transform" in the method_to_key dictionary in the __init_subclass__ method of the _SetOutputMixin class.
- This step is to map the "inverse_transform" method to a key that will be used to store the output configuration for this method. The method_to_key dictionary maps method names to keys.
Create a new method _wrap_inverse_transform_output similar to _wrap_method_output but for the inverse_transform method.
- This step is to create a method that will wrap the inverse_transform method of the subclass. The wrapping is done to modify the output of the inverse_transform method based on the output configuration.
In the __init_subclass__ method of the _SetOutputMixin class, add a condition to check if "inverse_transform" is in cls.__dict__ and if so, wrap it with _wrap_inverse_transform_output.
- This step is to wrap the inverse_transform method of the subclass with _wrap_inverse_transform_output if it exists. The cls.__dict__ is a dictionary that contains the attributes and methods of the class.
Add a new parameter inverse_transform to the set_output method of the _SetOutputMixin class to allow setting the output for inverse_transform.
- This step is to modify the set_output method to allow setting the output configuration for the inverse_transform method.
In the set_output method, add a condition to set the output configuration for inverse_transform if it's not None.
- This step is to set the output configuration for the inverse_transform method if it's provided.
Add a condition in the _safe_set_output function to call set_output for inverse_transform if the estimator has an inverse_transform method.
- This step is to call set_output for inverse_transform in the _safe_set_output function if the estimator has an inverse_transform method. The _safe_set_output function is a helper function that safely calls set_output with error checking.

In the end getting something like:

      class _SetOutputMixin:
          """Mixin that dynamically wraps methods to return container based on config.

          Currently `_SetOutputMixin` wraps `transform`, `fit_transform`, and `inverse_transform`
          and configures their output based on `set_output` of the global configuration.

          `set_output` is only defined if `get_feature_names_out` is defined and
          `auto_wrap_output_keys` is the default value.
          """

          def __init_subclass__(cls, auto_wrap_output_keys=("transform", "inverse_transform"), **kwargs):
              super().__init_subclass__(**kwargs)

              # Dynamically wraps `transform`, `fit_transform`, and `inverse_transform` and configure their output based on `set_output`.
              if not (
                  isinstance(auto_wrap_output_keys, tuple) or auto_wrap_output_keys is None
              ):
                  raise ValueError("auto_wrap_output_keys must be None or a tuple of keys.")

              if auto_wrap_output_keys is None:
                  cls._sklearn_auto_wrap_output_keys = set()
                  return

              # Mapping from method to key in configurations
              method_to_key = {
                  "transform": "transform",
                  "fit_transform": "transform",
                  "inverse_transform": "inverse_transform"
              }
              cls._sklearn_auto_wrap_output_keys = set()

              for method, key in method_to_key.items():
                  if not hasattr(cls, method) or key not in auto_wrap_output_keys:
                      continue
                  cls._sklearn_auto_wrap_output_keys.add(key)

                  # Only wrap methods defined by cls itself
                  if method not in cls.__dict__:
                      continue
                  wrapped_method = _wrap_method_output(getattr(cls, method), key)
                  setattr(cls, method, wrapped_method)

          @available_if(_auto_wrap_is_configured)
          def set_output(self, *, transform=None, inverse_transform=None):
              """Set output container.

              See :ref:`sphx_glr_auto_examples_miscellaneous_plot_set_output.py`
              for an example on how to use the API.

              Parameters
              ----------
              transform : {"default", "pandas"}, default=None
                  Configure output of `transform` and `fit_transform`.

                  - `"default"`: Default output format of a transformer
                  - `"pandas"`: DataFrame output
                  - `"polars"`: Polars output
                  - `None`: Transform configuration is unchanged

                  .. versionadded:: 1.4
                      `"polars"` option was added.

              inverse_transform : {"default", "pandas"}, default=None
                  Configure output of `inverse_transform`.

                  - `"default"`: Default output format of `inverse_transform`
                  - `"pandas"`: DataFrame output
                  - `"polars"`: Polars output
                  - `None`: `inverse_transform` configuration is unchanged

                  .. versionadded:: 1.4
                      `"polars"` option was added.

              Returns
              -------
              self : estimator instance
                  Estimator instance.
              """
              if transform is None and inverse_transform is None:
                  return self

              if not hasattr(self, "_sklearn_output_config"):
                  self._sklearn_output_config = {}

              if transform is not None:
                  self._sklearn_output_config["transform"] = transform

              if inverse_transform is not None:
                  self._sklearn_output_config["inverse_transform"] = inverse_transform

              return self

This would also seem to mimic more of what has been done for the transform method rather than just "pointing to it" in the mapping

sky-2002 · 2023-12-02T19:25:44Z

@Nish-Bhana Thanks for pointing out and the suggested changes. These seem valid to me. Though I wanted to know why we would need a separate _wrap_inverse_transform when we already have _wrap_method_output, isn't it common for all methods?

Nish-Bhana · 2023-12-02T19:48:43Z

@sky-2002 Yeah good point! Was thinking encase perhaps some more specific functionality would be required just for the inverse_transform itself but taking another glance it seems like that most likely wouldn't be the case (plus doesn't seem like the best design choice necessarily).

Think you could ignore those steps.

glemaitre · 2023-12-04T08:54:38Z

Though I wanted to know why we would need a separate _wrap_inverse_transform when we already have _wrap_method_output, isn't it common for all methods?

At first I would not separate. But I need to have the entire PR to have an opinion. We also need a common test to check the behaviour of the inverse_transform functionality.

adrinjalali · 2023-12-04T11:23:10Z

Regarding the public API, I think we better be explicit and have:

from sklearn.preprocessing import StandardScaler
from sklearn.datasets import load_breast_cancer

X, _ = load_breast_cancer(return_X_y=True, as_frame=True)

scaler = StandardScaler().set_output(
	transform="pandas", inverse_transform="pandas"
).fit(X)
Xt = scaler.transform(X)

print(scaler.inverse_transform(Xt))

WDYT @glemaitre

glemaitre · 2023-12-04T11:48:21Z

Yes, I completely agree with the public API. I assume at some point we will want to have other methods.

lorenzozanisi · 2024-03-22T12:39:00Z

Hi, any updates on this? Having inverse_transform return a dataframe would be massively helpful

Nish-Bhana · 2024-03-26T10:34:56Z

Hi, any updates on this? Having inverse_transform return a dataframe would be massively helpful

I think @sky-2002 was looking into this feature, seems like some checks in the pipeline are still failing.

SuccessMoses · 2024-11-26T15:23:36Z

Since the first PR was closed. I am working on a new PR

Add inverse-transform to _set_output

bd99f2f

github-actions bot added the module:utils label Dec 2, 2023

sky-2002 closed this by deleting the head repository May 27, 2024

Alex-Cremers mentioned this pull request Jul 13, 2024

feat: RepeatingBasisFunction.inverse_transform koaning/scikit-lego#687

Open

9 tasks

SuccessMoses mentioned this pull request Nov 29, 2024

FEAT add inverse_transform parameter to _SetOutputMixin.set_output #30376

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Add inverse-transform to _set_output #27891

Add inverse-transform to _set_output #27891

Uh oh!

sky-2002 commented Dec 2, 2023

Uh oh!

github-actions bot commented Dec 2, 2023

Uh oh!

Nish-Bhana commented Dec 2, 2023 •

edited

Loading

Uh oh!

sky-2002 commented Dec 2, 2023

Uh oh!

Nish-Bhana commented Dec 2, 2023 •

edited

Loading

Uh oh!

glemaitre commented Dec 4, 2023

Uh oh!

adrinjalali commented Dec 4, 2023

Uh oh!

glemaitre commented Dec 4, 2023

Uh oh!

lorenzozanisi commented Mar 22, 2024

Uh oh!

Nish-Bhana commented Mar 26, 2024

Uh oh!

SuccessMoses commented Nov 26, 2024

Uh oh!

Uh oh!

Uh oh!

Add inverse-transform to _set_output #27891

Add inverse-transform to _set_output #27891

Uh oh!

Conversation

sky-2002 commented Dec 2, 2023

Reference Issues/PRs

What does this implement/fix? Explain your changes.

Uh oh!

github-actions bot commented Dec 2, 2023

✔️ Linting Passed

Uh oh!

Nish-Bhana commented Dec 2, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

sky-2002 commented Dec 2, 2023

Uh oh!

Nish-Bhana commented Dec 2, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

glemaitre commented Dec 4, 2023

Uh oh!

adrinjalali commented Dec 4, 2023

Uh oh!

glemaitre commented Dec 4, 2023

Uh oh!

lorenzozanisi commented Mar 22, 2024

Uh oh!

Nish-Bhana commented Mar 26, 2024

Uh oh!

SuccessMoses commented Nov 26, 2024

Uh oh!

Uh oh!

Nish-Bhana commented Dec 2, 2023 •

edited

Loading

Nish-Bhana commented Dec 2, 2023 •

edited

Loading