DOC Improve docstring around set_output #24672

thomasjpfan · 2022-10-15T15:23:38Z

Reference Issues/PRs

Related to scikit-learn/enhancement_proposals#78

What does this implement/fix? Explain your changes.

This PR improves the user docs and developer docs around set_output.

glemaitre · 2022-10-17T09:12:36Z

sklearn/pipeline.py

@@ -158,6 +158,10 @@ def set_output(self, transform=None):
        transform : {"default", "pandas"}, default=None
            Configure output of `transform` and `fit_transform`.

+            - `"default"`: Output of an un-configured transformer
+            - `"pandas"`: DataFrames output
+            - `"None"`: No-op


Uhm. It should be the keyword None and not the string "None", isn't it?

betatim · 2022-10-17T11:29:30Z

sklearn/compose/_column_transformer.py

@@ -265,6 +265,10 @@ def set_output(self, transform=None):
        transform : {"default", "pandas"}, default=None
            Configure output of `transform` and `fit_transform`.

+            - `"default"`: Output of an un-configured transformer
+            - `"pandas"`: DataFrames output
+            - `"None"`: No-op


After reading the "no op" explanation I was left wondering how "no-op" was different from "default". Looking at the code I think the answer is "they are the same". If that is correct, can we say "None: Default output (format) of a transformer"?

I think I like "Default output format of a transformer" better than "Output of an un-configured transformer", but I am not sure if they are equivalent? As in, I could imagine that there is a subtle difference between "unconfigured" and "default"?,

I think no-op is supposed to mean transform=None would do nothing, but transform="default" sets the output to be pandas. As in: est.set_output(transform="pandas").set_outout(transform=None) vs est.set_output(transform="pandas").set_outout(transform="default"): the first one returns a pandas dataframe.

Ouf, that is subtle indeed. Why does None exist as an option if it does nothing (not even reset the state to "unconfigured")?

but transform="default" sets the output to be pandas

is this right? I was expecting "default" to be "not pandas"/numpy because there is an option to set pandas

is this right?

Sorry, typo, but you got the intention :D

Why does None exist as an option if it does nothing

This method might start having other args as well, and then we'd want to be able to set only one thing, while leaving the rest unchanged:

def f(a=None, b=None): ... # This would not change the value set by the method for `b` f(a="value")

Why does None exist as an option if it does nothing (not even reset the state to "unconfigured")?

In the future, we may want to add predict to set_output:

pipe.set_output(transform="pandas") pipe.set_output(predict="pandas")

The second call with predict="pandas" should not adjust the configuration for transform.

is this right? I was expecting "default" to be "not pandas"/numpy because there is an option to set pandas

I took into account existing third party estimators that only return DataFrames as the default. If we had "default" == "numpy" output, then they would not be following the set_output API. I did not want to add more work for third party libraries to convert their DataFrames into NumPy arrays just to conform to the API.

Even in scikit-learn, there are three possible return containers for transform="default":

With the recent Array API support: Any Array API container such as cupy.array_api.Array

NumPy's ndarray

SciPy's sparse matrix

From this discussion, I updated the docstring to:

- `"default"`: Default output format of a transformer - `"pandas"`: DataFrame output - `None`: Transform configuration is unchanged

Cool! Thanks for all the explanations. I like your change Thomas.

Micky774

LGTM

* DOC Improve docstring around set_output * DOC Improve docs around set_output * DOC Address comments * DOC Better grammar * DOC Improve wording * DOC Improves docstring in set_config

thomasjpfan added 2 commits October 15, 2022 11:16

DOC Improve docstring around set_output

b939336

DOC Improve docs around set_output

0bf2c88

thomasjpfan added the Quick Review For PRs that are quick to review label Oct 15, 2022

github-actions bot added the Documentation label Oct 15, 2022

adrinjalali approved these changes Oct 17, 2022

View reviewed changes

glemaitre reviewed Oct 17, 2022

View reviewed changes

betatim reviewed Oct 17, 2022

View reviewed changes

thomasjpfan added 4 commits October 17, 2022 14:29

DOC Address comments

f5355a3

DOC Better grammar

8a7c466

DOC Improve wording

3c7a208

DOC Improves docstring in set_config

c310217

Micky774 approved these changes Oct 18, 2022

View reviewed changes

betatim approved these changes Oct 18, 2022

View reviewed changes

thomasjpfan added this to the 1.2 milestone Oct 18, 2022

Micky774 merged commit d4306ba into scikit-learn:main Oct 18, 2022

StefanieSenger mentioned this pull request Jul 3, 2024

FIX TransformedTargetRegressor warns when set_output expects dataframe #29401

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

DOC Improve docstring around set_output #24672

DOC Improve docstring around set_output #24672

thomasjpfan commented Oct 15, 2022 •

edited

Loading

glemaitre Oct 17, 2022

betatim Oct 17, 2022

adrinjalali Oct 17, 2022 •

edited

Loading

betatim Oct 17, 2022

adrinjalali Oct 17, 2022

thomasjpfan Oct 17, 2022 •

edited

Loading

thomasjpfan Oct 17, 2022

betatim Oct 18, 2022

Micky774 left a comment

DOC Improve docstring around set_output #24672

DOC Improve docstring around set_output #24672

Conversation

thomasjpfan commented Oct 15, 2022 • edited Loading

Reference Issues/PRs

What does this implement/fix? Explain your changes.

glemaitre Oct 17, 2022

Choose a reason for hiding this comment

betatim Oct 17, 2022

Choose a reason for hiding this comment

adrinjalali Oct 17, 2022 • edited Loading

Choose a reason for hiding this comment

betatim Oct 17, 2022

Choose a reason for hiding this comment

adrinjalali Oct 17, 2022

Choose a reason for hiding this comment

thomasjpfan Oct 17, 2022 • edited Loading

Choose a reason for hiding this comment

thomasjpfan Oct 17, 2022

Choose a reason for hiding this comment

betatim Oct 18, 2022

Choose a reason for hiding this comment

Micky774 left a comment

Choose a reason for hiding this comment

thomasjpfan commented Oct 15, 2022 •

edited

Loading

adrinjalali Oct 17, 2022 •

edited

Loading

thomasjpfan Oct 17, 2022 •

edited

Loading