-
-
Notifications
You must be signed in to change notification settings - Fork 25.8k
Add inverse-transform to _set_output #27891
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add inverse-transform to _set_output #27891
Conversation
@sky-2002 By the sounds of it this works but is it robust enough since it wouldn't allow the Perhaps a more robust solution would involve modifying the
In the end getting something like: class _SetOutputMixin:
"""Mixin that dynamically wraps methods to return container based on config.
Currently `_SetOutputMixin` wraps `transform`, `fit_transform`, and `inverse_transform`
and configures their output based on `set_output` of the global configuration.
`set_output` is only defined if `get_feature_names_out` is defined and
`auto_wrap_output_keys` is the default value.
"""
def __init_subclass__(cls, auto_wrap_output_keys=("transform", "inverse_transform"), **kwargs):
super().__init_subclass__(**kwargs)
# Dynamically wraps `transform`, `fit_transform`, and `inverse_transform` and configure their output based on `set_output`.
if not (
isinstance(auto_wrap_output_keys, tuple) or auto_wrap_output_keys is None
):
raise ValueError("auto_wrap_output_keys must be None or a tuple of keys.")
if auto_wrap_output_keys is None:
cls._sklearn_auto_wrap_output_keys = set()
return
# Mapping from method to key in configurations
method_to_key = {
"transform": "transform",
"fit_transform": "transform",
"inverse_transform": "inverse_transform"
}
cls._sklearn_auto_wrap_output_keys = set()
for method, key in method_to_key.items():
if not hasattr(cls, method) or key not in auto_wrap_output_keys:
continue
cls._sklearn_auto_wrap_output_keys.add(key)
# Only wrap methods defined by cls itself
if method not in cls.__dict__:
continue
wrapped_method = _wrap_method_output(getattr(cls, method), key)
setattr(cls, method, wrapped_method)
@available_if(_auto_wrap_is_configured)
def set_output(self, *, transform=None, inverse_transform=None):
"""Set output container.
See :ref:`sphx_glr_auto_examples_miscellaneous_plot_set_output.py`
for an example on how to use the API.
Parameters
----------
transform : {"default", "pandas"}, default=None
Configure output of `transform` and `fit_transform`.
- `"default"`: Default output format of a transformer
- `"pandas"`: DataFrame output
- `"polars"`: Polars output
- `None`: Transform configuration is unchanged
.. versionadded:: 1.4
`"polars"` option was added.
inverse_transform : {"default", "pandas"}, default=None
Configure output of `inverse_transform`.
- `"default"`: Default output format of `inverse_transform`
- `"pandas"`: DataFrame output
- `"polars"`: Polars output
- `None`: `inverse_transform` configuration is unchanged
.. versionadded:: 1.4
`"polars"` option was added.
Returns
-------
self : estimator instance
Estimator instance.
"""
if transform is None and inverse_transform is None:
return self
if not hasattr(self, "_sklearn_output_config"):
self._sklearn_output_config = {}
if transform is not None:
self._sklearn_output_config["transform"] = transform
if inverse_transform is not None:
self._sklearn_output_config["inverse_transform"] = inverse_transform
return self This would also seem to mimic more of what has been done for the |
@Nish-Bhana Thanks for pointing out and the suggested changes. These seem valid to me. Though I wanted to know why we would need a separate |
@sky-2002 Yeah good point! Was thinking encase perhaps some more specific functionality would be required just for the Think you could ignore those steps. |
At first I would not separate. But I need to have the entire PR to have an opinion. We also need a common test to check the behaviour of the |
Regarding the public API, I think we better be explicit and have: from sklearn.preprocessing import StandardScaler
from sklearn.datasets import load_breast_cancer
X, _ = load_breast_cancer(return_X_y=True, as_frame=True)
scaler = StandardScaler().set_output(
transform="pandas", inverse_transform="pandas"
).fit(X)
Xt = scaler.transform(X)
print(scaler.inverse_transform(Xt)) WDYT @glemaitre |
Yes, I completely agree with the public API. I assume at some point we will want to have other methods. |
Hi, any updates on this? Having |
I think @sky-2002 was looking into this feature, seems like some checks in the pipeline are still failing. |
Since the first PR was closed. I am working on a new PR |
Reference Issues/PRs
Fixes #27843
What does this implement/fix? Explain your changes.
The problem is mentioned in the issue linked above. Here is how the current solution works.
The above code will print a DataFrame.