FIX Fixes set_output with list input #27044

thomasjpfan · 2023-08-09T16:50:59Z

Reference Issues/PRs

What does this implement/fix? Explain your changes.

This PR correctly checks that the input is a dataframe before getting index. The original issue happened because list.index exists but is not a valid index.

github-actions · 2023-08-09T16:53:28Z

✔️ Linting Passed

All linting checks passed. Your pull request is in excellent shape! ☀️

_{Generated for commit: 24c7d17. Link to the linter CI: here}

Micky774

LGTM, thanks for the fix!

OmarManzoor

LGTM. Thanks @thomasjpfan

martinroyer · 2024-02-20T17:18:35Z

sklearn/utils/_set_output.py

+    index = original_input.index if _is_pandas_df(original_input) else None
    return _wrap_in_pandas_container(
        data_to_wrap=data_to_wrap,
-        index=getattr(original_input, "index", None),
+        index=index,


Hi @thomasjpfan,

I have had a regression since 1.3.1 and located that in those lines of code.
My understanding is the following but I could be mistaken so please correct me: those lines have changed the set_output behaviour such that

before, if original_input had an index attribute (as is unfortunately the case for lists, but also for pd.Series), the index gets wrapped, and bad things ensued for lists,

now, if original_input is not a pd.DataFrame (including intented targets lists but unfortunately also pd.Series), the index does not get wrapped.

So while the fix is valuable for lists, it is not for the case where input are pd.Series.
Would you agree? and if so is a good way to solve this changing from is_pandas_df to a such function is_pandas_df_or_series?

Thank you for your attention.

The set_output API was only designed with 2-D containers in mind. This means it does not consider pd.Series.

If you have an use case for set_output with a 1-D container, may you open an issue about it?

Oh I did not know that, OK thank you for your feedback!

- Updated scikit-learn version to 1.3.1 to integerate fix from scikit-learn/scikit-learn#27044 modified: environment.yml - Updated scikit-learn version to 1.3.1 to integerate fix from scikit-learn/scikit-learn#27044 modified: requirements.txt - Updated scikit-learn version to 1.3.1 to integerate fix from scikit-learn/scikit-learn#27044

#1087) * modified: test_transactionencoder.py - Added two new tests, `test_get_feature_names_out` and `test_set_output`. Passing these tests is a step towards the output of `TransactionEncoder` being formatted as a pandas.DataFramed by default. * modified: transactionencoder.py - Added `get_feature_names_out` method to `TransactionEncoder` to expose the `set_output` method. * modified: tests/test_transactionencoder.py - Updated test to include more checks. It is now back in a failing state. * modified: tests/test_transactionencoder.py - Updated test_set_output docstring to be more explicit. - Added numpy assertion to check that the transformed output columns match the original columns_ attribute for test_set_output. - Added numpy assertion to check that the get_feature_names_out output match the original columns_ attribute for test_get_feature_names_out. * modified: transactionencoder.py - Added logic similar to that in `sklearn.base.ClassNamePrefixFeaturesOutMixin` and `sklearn.base.OneToOneFeatureMixin` for the get_feature_names_out method. * modified: docs/sources/user_guide/preprocessing/TransactionEncoder.ipynb - Updated the user guide to show both the get_feature_names_out method and the set_output method. * modified: docs/sources/CHANGELOG.md - Updated changelog to reflect new features. * modified: docs/sources/CHANGELOG.md - Updated issue number. * modified: docs/sources/CHANGELOG.md - Updated issue number (again) to reflect the PR link instead of the issue link. * modified: mlxtend/preprocessing/transactionencoder.py - Ran isort over imports to fix failing check in PR. * modified: requirements.txt - Increased scikit-learn version to minimum required for set_output to work. * modified: environment.yml - Bumped scikit-learn version up to 1.2.2 to match requirements.txt. * modified: .github/workflows/python-package-conda.yml - Bumped scikit-learn version up to 1.2.2 to match environment.yml and requirements.txt. * modified: mlxtend/preprocessing/tests/test_transactionencoder.py - Updated `test_inverse_transform` to passing state by removing conversion to numpy array. * modified: .github/workflows/python-package-conda.yml - Updated scikit-learn version to 1.3.1 to integerate fix from scikit-learn/scikit-learn#27044 modified: environment.yml - Updated scikit-learn version to 1.3.1 to integerate fix from scikit-learn/scikit-learn#27044 modified: requirements.txt - Updated scikit-learn version to 1.3.1 to integerate fix from scikit-learn/scikit-learn#27044 * Update mlxtend/preprocessing/transactionencoder.py * Update mlxtend/preprocessing/transactionencoder.py * Update mlxtend/preprocessing/transactionencoder.py --------- Co-authored-by: Sebastian Raschka <mail@sebastianraschka.com>

FIX Fixes set_output with list input

051398f

thomasjpfan added this to the 1.3.1 milestone Aug 9, 2023

github-actions bot added the module:utils label Aug 9, 2023

DOC Adds PR number

24c7d17

Micky774 mentioned this pull request Aug 9, 2023

StandardScaler fit_transform() does not work with list as input data when output is configured to 'pandas' #27037

Closed

Micky774 approved these changes Aug 9, 2023

View reviewed changes

Micky774 added Quick Review For PRs that are quick to review Waiting for Second Reviewer First reviewer is done, need a second one! labels Aug 9, 2023

OmarManzoor approved these changes Aug 10, 2023

View reviewed changes

OmarManzoor removed the Waiting for Second Reviewer First reviewer is done, need a second one! label Aug 10, 2023

OmarManzoor merged commit e4efd8b into scikit-learn:main Aug 10, 2023

TamaraAtanasoska pushed a commit to TamaraAtanasoska/scikit-learn that referenced this pull request Aug 21, 2023

FIX Fixes set_output with list input (scikit-learn#27044)

263be84

glemaitre pushed a commit to glemaitre/scikit-learn that referenced this pull request Sep 18, 2023

FIX Fixes set_output with list input (scikit-learn#27044)

1cf55b5

jeremiedbb pushed a commit that referenced this pull request Sep 20, 2023

FIX Fixes set_output with list input (#27044)

7f8dd75

aazuspan mentioned this pull request Sep 20, 2023

list.index accidentally stored as dataframe_index_in_ lemma-osu/sknnr#55

Closed

REDVM pushed a commit to REDVM/scikit-learn that referenced this pull request Nov 16, 2023

FIX Fixes set_output with list input (scikit-learn#27044)

d30a726

martinroyer reviewed Feb 20, 2024

View reviewed changes

it176131 mentioned this pull request Mar 29, 2024

Integrate scikit-learn's set_output method into TransactionEncoder rasbt/mlxtend#1087

Merged

5 tasks

it176131 mentioned this pull request Mar 31, 2024

Update index handling in PandasAdapter #28731

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

FIX Fixes set_output with list input #27044

FIX Fixes set_output with list input #27044

Uh oh!

thomasjpfan commented Aug 9, 2023

Uh oh!

github-actions bot commented Aug 9, 2023

Uh oh!

Micky774 left a comment

Uh oh!

OmarManzoor left a comment

Uh oh!

martinroyer Feb 20, 2024

Uh oh!

thomasjpfan Feb 26, 2024

Uh oh!

martinroyer Feb 27, 2024

Uh oh!

Uh oh!

Uh oh!

FIX Fixes set_output with list input #27044

FIX Fixes set_output with list input #27044

Uh oh!

Conversation

thomasjpfan commented Aug 9, 2023

Reference Issues/PRs

What does this implement/fix? Explain your changes.

Uh oh!

github-actions bot commented Aug 9, 2023

✔️ Linting Passed

Uh oh!

Micky774 left a comment

Choose a reason for hiding this comment

Uh oh!

OmarManzoor left a comment

Choose a reason for hiding this comment

Uh oh!

martinroyer Feb 20, 2024

Choose a reason for hiding this comment

Uh oh!

thomasjpfan Feb 26, 2024

Choose a reason for hiding this comment

Uh oh!

martinroyer Feb 27, 2024

Choose a reason for hiding this comment

Uh oh!

Uh oh!