-
-
Notifications
You must be signed in to change notification settings - Fork 26.2k
FIX Fixes set_output with list input #27044
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
FIX Fixes set_output with list input #27044
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thanks for the fix!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Thanks @thomasjpfan
index = original_input.index if _is_pandas_df(original_input) else None | ||
return _wrap_in_pandas_container( | ||
data_to_wrap=data_to_wrap, | ||
index=getattr(original_input, "index", None), | ||
index=index, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @thomasjpfan,
I have had a regression since 1.3.1 and located that in those lines of code.
My understanding is the following but I could be mistaken so please correct me: those lines have changed the set_output behaviour such that
- before, if original_input had an index attribute (as is unfortunately the case for lists, but also for pd.Series), the index gets wrapped, and bad things ensued for lists,
- now, if original_input is not a pd.DataFrame (including intented targets lists but unfortunately also pd.Series), the index does not get wrapped.
So while the fix is valuable for lists, it is not for the case where input are pd.Series.
Would you agree? and if so is a good way to solve this changing from is_pandas_df
to a such function is_pandas_df_or_series
?
Thank you for your attention.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The set_output
API was only designed with 2-D containers in mind. This means it does not consider pd.Series
.
If you have an use case for set_output
with a 1-D container, may you open an issue about it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh I did not know that, OK thank you for your feedback!
- Updated scikit-learn version to 1.3.1 to integerate fix from scikit-learn/scikit-learn#27044 modified: environment.yml - Updated scikit-learn version to 1.3.1 to integerate fix from scikit-learn/scikit-learn#27044 modified: requirements.txt - Updated scikit-learn version to 1.3.1 to integerate fix from scikit-learn/scikit-learn#27044
#1087) * modified: test_transactionencoder.py - Added two new tests, `test_get_feature_names_out` and `test_set_output`. Passing these tests is a step towards the output of `TransactionEncoder` being formatted as a pandas.DataFramed by default. * modified: transactionencoder.py - Added `get_feature_names_out` method to `TransactionEncoder` to expose the `set_output` method. * modified: tests/test_transactionencoder.py - Updated test to include more checks. It is now back in a failing state. * modified: tests/test_transactionencoder.py - Updated test_set_output docstring to be more explicit. - Added numpy assertion to check that the transformed output columns match the original columns_ attribute for test_set_output. - Added numpy assertion to check that the get_feature_names_out output match the original columns_ attribute for test_get_feature_names_out. * modified: transactionencoder.py - Added logic similar to that in `sklearn.base.ClassNamePrefixFeaturesOutMixin` and `sklearn.base.OneToOneFeatureMixin` for the get_feature_names_out method. * modified: docs/sources/user_guide/preprocessing/TransactionEncoder.ipynb - Updated the user guide to show both the get_feature_names_out method and the set_output method. * modified: docs/sources/CHANGELOG.md - Updated changelog to reflect new features. * modified: docs/sources/CHANGELOG.md - Updated issue number. * modified: docs/sources/CHANGELOG.md - Updated issue number (again) to reflect the PR link instead of the issue link. * modified: mlxtend/preprocessing/transactionencoder.py - Ran isort over imports to fix failing check in PR. * modified: requirements.txt - Increased scikit-learn version to minimum required for set_output to work. * modified: environment.yml - Bumped scikit-learn version up to 1.2.2 to match requirements.txt. * modified: .github/workflows/python-package-conda.yml - Bumped scikit-learn version up to 1.2.2 to match environment.yml and requirements.txt. * modified: mlxtend/preprocessing/tests/test_transactionencoder.py - Updated `test_inverse_transform` to passing state by removing conversion to numpy array. * modified: .github/workflows/python-package-conda.yml - Updated scikit-learn version to 1.3.1 to integerate fix from scikit-learn/scikit-learn#27044 modified: environment.yml - Updated scikit-learn version to 1.3.1 to integerate fix from scikit-learn/scikit-learn#27044 modified: requirements.txt - Updated scikit-learn version to 1.3.1 to integerate fix from scikit-learn/scikit-learn#27044 * Update mlxtend/preprocessing/transactionencoder.py * Update mlxtend/preprocessing/transactionencoder.py * Update mlxtend/preprocessing/transactionencoder.py --------- Co-authored-by: Sebastian Raschka <mail@sebastianraschka.com>
Reference Issues/PRs
Closes #27037
What does this implement/fix? Explain your changes.
This PR correctly checks that the input is a dataframe before getting
index
. The original issue happened becauselist.index
exists but is not a valid index.