Skip to content

Make ColumnTransfomer automatically convert sparse inputs to CSR format #30543

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 23 commits into
base: main
Choose a base branch
from

Conversation

AYUSH27112021
Copy link

@AYUSH27112021 AYUSH27112021 commented Dec 27, 2024

Fixes #30275

What does this implement/fix

Enhanced _check_X to verify for sparse matrix subscriptability via __getitem__. Non-subscriptable sparse formats now raise a warning and are converted to CSR format for compatibility.

Any other comments?

The _check_X function in sklearn/compose/_column_transformer.py, line 1324:

Current behavior:

Only checks for sparse.issparse(X) which returns True for dia_array(or any other sparse matrix type from scipy.sparse).

This causes a TypeError: 'dia_array' object is not subscriptable when non-subscriptable sparse formats are used and fit_transform is called.

Fix:

This update ensures that non-subscriptable sparse formats are handled by raising a warning and converting them to CSR format, preventing errors during transformation.

Enhanced _check_X to check for sparse matrix subscriptability via __getitem__. Non-subscriptable sparse formats (e.g., COO, DIA) now raise a warning and are converted to CSR format for compatibility.
Enhanced _check_X to check for sparse matrix subscriptability via __getitem__. Non-subscriptable sparse formats (e.g., COO, DIA) now raise a warning and are converted to CSR format for compatibility.
Copy link

github-actions bot commented Dec 27, 2024

✔️ Linting Passed

All linting checks passed. Your pull request is in excellent shape! ☀️

Generated for commit: 1733f9b. Link to the linter CI: here

@ogrisel ogrisel changed the title Fixes #30275 Make ColumnTransfomer automatically convert sparse inputs to CSR format Dec 30, 2024
AYUSH27112021 and others added 2 commits December 30, 2024 22:34
Removed the User warning added in this PR. Sparse inputs are automatically converted to the CSR format as this is currently the most efficient approach.
Copy link
Member

@ogrisel ogrisel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add a non-regression test based on the code snippet of the original issue under sklearn/compose/tests/test_column_transformer.py to check that it's now possible to use the column transformer on dia or COO formatted sparse matrix inputs without getting an error.

Please also add a changelog entry to document this fix under doc/whats_new/upcoming_changes/sklearn.compose.

AYUSH27112021 and others added 2 commits January 1, 2025 06:36
1-  removed Docstring from `_check_X`.
2- added a non-regression unit test for verification.
3- added the changelog entry.
@AYUSH27112021 AYUSH27112021 requested a review from ogrisel January 1, 2025 05:28
@AYUSH27112021
Copy link
Author

AYUSH27112021 commented Jan 7, 2025

any updates on this PR @ogrisel

@AYUSH27112021
Copy link
Author

Hi @ogrisel,
If this issue has already been resolved or is no longer needed, I’d like to go ahead and close this PR. If any changes are still required, please let me know. Otherwise, I’ll proceed with closing it.

Thanks!

@AYUSH27112021
Copy link
Author

Hi @thomasjpfan ,
I've followed your suggestions and made the necessary changes.
Could you please review them and let me know if any further adjustments are needed?

Copy link
Member

@thomasjpfan thomasjpfan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the update! My preferences:

  • I am +0.6 on doing the automatic conversion.
  • I am +1 on raising a better error, telling the user that we only support a certain subset of sparse data.

I'll approve, and see what other think.

@AYUSH27112021
Copy link
Author

hi @thomasjpfan i guess @ogrisel is not active can you assign someone else

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

ColumnTransformer does not validate sparse formats for X
3 participants