Skip to content

Add inverse-transform to _set_output #27891

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 1 commit into from
Closed

Add inverse-transform to _set_output #27891

wants to merge 1 commit into from

Conversation

sky-2002
Copy link

@sky-2002 sky-2002 commented Dec 2, 2023

Reference Issues/PRs

Fixes #27843

What does this implement/fix? Explain your changes.

The problem is mentioned in the issue linked above. Here is how the current solution works.

from sklearn.preprocessing import StandardScaler
from sklearn.datasets import load_breast_cancer

X, _ = load_breast_cancer(return_X_y=True, as_frame=True)

scaler = StandardScaler().set_output(transform="pandas").fit(X)
Xt = scaler.transform(X)

print(scaler.inverse_transform(Xt))

The above code will print a DataFrame.

Copy link

github-actions bot commented Dec 2, 2023

✔️ Linting Passed

All linting checks passed. Your pull request is in excellent shape! ☀️

Generated for commit: bd99f2f. Link to the linter CI: here

@Nish-Bhana
Copy link

Nish-Bhana commented Dec 2, 2023

@sky-2002 By the sounds of it this works but is it robust enough since it wouldn't allow the inverse_transform method to have it's own output configuration but rather just share those of the transform method. Curious to get @glemaitre thoughts.

Perhaps a more robust solution would involve modifying the _SetOutputMixin class further with steps something like:

  1. Add "inverse_transform" to the auto_wrap_output_keys tuple in the __init_subclass__ method of the _SetOutputMixin class.
    • This step is to include "inverse_transform" as one of the methods that we want to automatically wrap when a subclass is created. The __init_subclass__ method is a special method in Python that is called when a subclass of the class is created. The auto_wrap_output_keys tuple contains the names of the methods that we want to automatically wrap.
  2. Add a mapping for "inverse_transform" in the method_to_key dictionary in the __init_subclass__ method of the _SetOutputMixin class.
    • This step is to map the "inverse_transform" method to a key that will be used to store the output configuration for this method. The method_to_key dictionary maps method names to keys.
  3. Create a new method _wrap_inverse_transform_output similar to _wrap_method_output but for the inverse_transform method.
    • This step is to create a method that will wrap the inverse_transform method of the subclass. The wrapping is done to modify the output of the inverse_transform method based on the output configuration.
  4. In the __init_subclass__ method of the _SetOutputMixin class, add a condition to check if "inverse_transform" is in cls.__dict__ and if so, wrap it with _wrap_inverse_transform_output.
    • This step is to wrap the inverse_transform method of the subclass with _wrap_inverse_transform_output if it exists. The cls.__dict__ is a dictionary that contains the attributes and methods of the class.
  5. Add a new parameter inverse_transform to the set_output method of the _SetOutputMixin class to allow setting the output for inverse_transform.
    • This step is to modify the set_output method to allow setting the output configuration for the inverse_transform method.
  6. In the set_output method, add a condition to set the output configuration for inverse_transform if it's not None.
    • This step is to set the output configuration for the inverse_transform method if it's provided.
  7. Add a condition in the _safe_set_output function to call set_output for inverse_transform if the estimator has an inverse_transform method.
    • This step is to call set_output for inverse_transform in the _safe_set_output function if the estimator has an inverse_transform method. The _safe_set_output function is a helper function that safely calls set_output with error checking.

In the end getting something like:

      class _SetOutputMixin:
          """Mixin that dynamically wraps methods to return container based on config.

          Currently `_SetOutputMixin` wraps `transform`, `fit_transform`, and `inverse_transform`
          and configures their output based on `set_output` of the global configuration.

          `set_output` is only defined if `get_feature_names_out` is defined and
          `auto_wrap_output_keys` is the default value.
          """

          def __init_subclass__(cls, auto_wrap_output_keys=("transform", "inverse_transform"), **kwargs):
              super().__init_subclass__(**kwargs)

              # Dynamically wraps `transform`, `fit_transform`, and `inverse_transform` and configure their output based on `set_output`.
              if not (
                  isinstance(auto_wrap_output_keys, tuple) or auto_wrap_output_keys is None
              ):
                  raise ValueError("auto_wrap_output_keys must be None or a tuple of keys.")

              if auto_wrap_output_keys is None:
                  cls._sklearn_auto_wrap_output_keys = set()
                  return

              # Mapping from method to key in configurations
              method_to_key = {
                  "transform": "transform",
                  "fit_transform": "transform",
                  "inverse_transform": "inverse_transform"
              }
              cls._sklearn_auto_wrap_output_keys = set()

              for method, key in method_to_key.items():
                  if not hasattr(cls, method) or key not in auto_wrap_output_keys:
                      continue
                  cls._sklearn_auto_wrap_output_keys.add(key)

                  # Only wrap methods defined by cls itself
                  if method not in cls.__dict__:
                      continue
                  wrapped_method = _wrap_method_output(getattr(cls, method), key)
                  setattr(cls, method, wrapped_method)

          @available_if(_auto_wrap_is_configured)
          def set_output(self, *, transform=None, inverse_transform=None):
              """Set output container.

              See :ref:`sphx_glr_auto_examples_miscellaneous_plot_set_output.py`
              for an example on how to use the API.

              Parameters
              ----------
              transform : {"default", "pandas"}, default=None
                  Configure output of `transform` and `fit_transform`.

                  - `"default"`: Default output format of a transformer
                  - `"pandas"`: DataFrame output
                  - `"polars"`: Polars output
                  - `None`: Transform configuration is unchanged

                  .. versionadded:: 1.4
                      `"polars"` option was added.

              inverse_transform : {"default", "pandas"}, default=None
                  Configure output of `inverse_transform`.

                  - `"default"`: Default output format of `inverse_transform`
                  - `"pandas"`: DataFrame output
                  - `"polars"`: Polars output
                  - `None`: `inverse_transform` configuration is unchanged

                  .. versionadded:: 1.4
                      `"polars"` option was added.

              Returns
              -------
              self : estimator instance
                  Estimator instance.
              """
              if transform is None and inverse_transform is None:
                  return self

              if not hasattr(self, "_sklearn_output_config"):
                  self._sklearn_output_config = {}

              if transform is not None:
                  self._sklearn_output_config["transform"] = transform

              if inverse_transform is not None:
                  self._sklearn_output_config["inverse_transform"] = inverse_transform

              return self

This would also seem to mimic more of what has been done for the transform method rather than just "pointing to it" in the mapping

@sky-2002
Copy link
Author

sky-2002 commented Dec 2, 2023

@Nish-Bhana Thanks for pointing out and the suggested changes. These seem valid to me. Though I wanted to know why we would need a separate _wrap_inverse_transform when we already have _wrap_method_output, isn't it common for all methods?

@Nish-Bhana
Copy link

Nish-Bhana commented Dec 2, 2023

@sky-2002 Yeah good point! Was thinking encase perhaps some more specific functionality would be required just for the inverse_transform itself but taking another glance it seems like that most likely wouldn't be the case (plus doesn't seem like the best design choice necessarily).

Think you could ignore those steps.

@glemaitre
Copy link
Member

Though I wanted to know why we would need a separate _wrap_inverse_transform when we already have _wrap_method_output, isn't it common for all methods?

At first I would not separate. But I need to have the entire PR to have an opinion. We also need a common test to check the behaviour of the inverse_transform functionality.

@adrinjalali
Copy link
Member

Regarding the public API, I think we better be explicit and have:

from sklearn.preprocessing import StandardScaler
from sklearn.datasets import load_breast_cancer

X, _ = load_breast_cancer(return_X_y=True, as_frame=True)

scaler = StandardScaler().set_output(
	transform="pandas", inverse_transform="pandas"
).fit(X)
Xt = scaler.transform(X)

print(scaler.inverse_transform(Xt))

WDYT @glemaitre

@glemaitre
Copy link
Member

Yes, I completely agree with the public API. I assume at some point we will want to have other methods.

@lorenzozanisi
Copy link

Hi, any updates on this? Having inverse_transform return a dataframe would be massively helpful

@Nish-Bhana
Copy link

Hi, any updates on this? Having inverse_transform return a dataframe would be massively helpful

I think @sky-2002 was looking into this feature, seems like some checks in the pipeline are still failing.

@sky-2002 sky-2002 closed this by deleting the head repository May 27, 2024
@SuccessMoses
Copy link
Contributor

Since the first PR was closed. I am working on a new PR

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

set_output doesn't work for inverse_transform method
6 participants