Skip to content

Conversation

thomasjpfan
Copy link
Member

Reference Issues/PRs

Closes #27037

What does this implement/fix? Explain your changes.

This PR correctly checks that the input is a dataframe before getting index. The original issue happened because list.index exists but is not a valid index.

@thomasjpfan thomasjpfan added this to the 1.3.1 milestone Aug 9, 2023
@github-actions
Copy link

github-actions bot commented Aug 9, 2023

✔️ Linting Passed

All linting checks passed. Your pull request is in excellent shape! ☀️

Generated for commit: 24c7d17. Link to the linter CI: here

Copy link
Contributor

@Micky774 Micky774 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks for the fix!

@Micky774 Micky774 added Quick Review For PRs that are quick to review Waiting for Second Reviewer First reviewer is done, need a second one! labels Aug 9, 2023
Copy link
Contributor

@OmarManzoor OmarManzoor left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Thanks @thomasjpfan

@OmarManzoor OmarManzoor removed the Waiting for Second Reviewer First reviewer is done, need a second one! label Aug 10, 2023
@OmarManzoor OmarManzoor merged commit e4efd8b into scikit-learn:main Aug 10, 2023
TamaraAtanasoska pushed a commit to TamaraAtanasoska/scikit-learn that referenced this pull request Aug 21, 2023
glemaitre pushed a commit to glemaitre/scikit-learn that referenced this pull request Sep 18, 2023
REDVM pushed a commit to REDVM/scikit-learn that referenced this pull request Nov 16, 2023
Comment on lines +129 to +132
index = original_input.index if _is_pandas_df(original_input) else None
return _wrap_in_pandas_container(
data_to_wrap=data_to_wrap,
index=getattr(original_input, "index", None),
index=index,

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @thomasjpfan,

I have had a regression since 1.3.1 and located that in those lines of code.
My understanding is the following but I could be mistaken so please correct me: those lines have changed the set_output behaviour such that

  • before, if original_input had an index attribute (as is unfortunately the case for lists, but also for pd.Series), the index gets wrapped, and bad things ensued for lists,
  • now, if original_input is not a pd.DataFrame (including intented targets lists but unfortunately also pd.Series), the index does not get wrapped.

So while the fix is valuable for lists, it is not for the case where input are pd.Series.
Would you agree? and if so is a good way to solve this changing from is_pandas_df to a such function is_pandas_df_or_series?

Thank you for your attention.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The set_output API was only designed with 2-D containers in mind. This means it does not consider pd.Series.

If you have an use case for set_output with a 1-D container, may you open an issue about it?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh I did not know that, OK thank you for your feedback!

it176131 added a commit to it176131/mlxtend that referenced this pull request Mar 29, 2024
	- Updated scikit-learn version to 1.3.1 to integerate fix from scikit-learn/scikit-learn#27044

modified:   environment.yml
	- Updated scikit-learn version to 1.3.1 to integerate fix from scikit-learn/scikit-learn#27044

modified:   requirements.txt
	- Updated scikit-learn version to 1.3.1 to integerate fix from scikit-learn/scikit-learn#27044
rasbt added a commit to rasbt/mlxtend that referenced this pull request Mar 30, 2024
#1087)

* modified:   test_transactionencoder.py
	- Added two new tests, `test_get_feature_names_out` and `test_set_output`. Passing these tests is a step towards the output of `TransactionEncoder` being formatted as a pandas.DataFramed by default.

* modified:   transactionencoder.py
	- Added `get_feature_names_out` method to `TransactionEncoder` to expose the `set_output` method.

* modified:   tests/test_transactionencoder.py
	- Updated test to include more checks. It is now back in a failing state.

* modified:   tests/test_transactionencoder.py
	- Updated test_set_output docstring to be more explicit.
	- Added numpy assertion to check that the transformed output columns match the original columns_ attribute for test_set_output.
	- Added numpy assertion to check that the get_feature_names_out output match the original columns_ attribute for test_get_feature_names_out.

* modified:   transactionencoder.py
	- Added logic similar to that in `sklearn.base.ClassNamePrefixFeaturesOutMixin` and `sklearn.base.OneToOneFeatureMixin` for the get_feature_names_out method.

* modified:   docs/sources/user_guide/preprocessing/TransactionEncoder.ipynb
	- Updated the user guide to show both the get_feature_names_out method and the set_output method.

* modified:   docs/sources/CHANGELOG.md
	- Updated changelog to reflect new features.

* modified:   docs/sources/CHANGELOG.md
	- Updated issue number.

* modified:   docs/sources/CHANGELOG.md
	- Updated issue number (again) to reflect the PR link instead of the issue link.

* modified:   mlxtend/preprocessing/transactionencoder.py
	- Ran isort over imports to fix failing check in PR.

* modified:   requirements.txt
	- Increased scikit-learn version to minimum required for set_output to work.

* modified:   environment.yml
	- Bumped scikit-learn version up to 1.2.2 to match requirements.txt.

* modified:   .github/workflows/python-package-conda.yml
	- Bumped scikit-learn version up to 1.2.2 to match environment.yml and requirements.txt.

* modified:   mlxtend/preprocessing/tests/test_transactionencoder.py
	- Updated `test_inverse_transform` to passing state by removing conversion to numpy array.

* modified:   .github/workflows/python-package-conda.yml
	- Updated scikit-learn version to 1.3.1 to integerate fix from scikit-learn/scikit-learn#27044

modified:   environment.yml
	- Updated scikit-learn version to 1.3.1 to integerate fix from scikit-learn/scikit-learn#27044

modified:   requirements.txt
	- Updated scikit-learn version to 1.3.1 to integerate fix from scikit-learn/scikit-learn#27044

* Update mlxtend/preprocessing/transactionencoder.py

* Update mlxtend/preprocessing/transactionencoder.py

* Update mlxtend/preprocessing/transactionencoder.py

---------

Co-authored-by: Sebastian Raschka <mail@sebastianraschka.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
module:utils Quick Review For PRs that are quick to review
Projects
None yet
Development

Successfully merging this pull request may close these issues.

StandardScaler fit_transform() does not work with list as input data when output is configured to 'pandas'
4 participants