-
-
Notifications
You must be signed in to change notification settings - Fork 25.8k
Make ColumnTransfomer
automatically convert sparse inputs to CSR format
#30543
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Enhanced _check_X to check for sparse matrix subscriptability via __getitem__. Non-subscriptable sparse formats (e.g., COO, DIA) now raise a warning and are converted to CSR format for compatibility.
Enhanced _check_X to check for sparse matrix subscriptability via __getitem__. Non-subscriptable sparse formats (e.g., COO, DIA) now raise a warning and are converted to CSR format for compatibility.
ColumnTransfomer
automatically convert sparse inputs to CSR format
Removed the User warning added in this PR. Sparse inputs are automatically converted to the CSR format as this is currently the most efficient approach.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please add a non-regression test based on the code snippet of the original issue under sklearn/compose/tests/test_column_transformer.py
to check that it's now possible to use the column transformer on dia or COO formatted sparse matrix inputs without getting an error.
Please also add a changelog entry to document this fix under doc/whats_new/upcoming_changes/sklearn.compose
.
1- removed Docstring from `_check_X`. 2- added a non-regression unit test for verification. 3- added the changelog entry.
any updates on this PR @ogrisel |
Hi @ogrisel, Thanks! |
Hi @thomasjpfan , |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for the update! My preferences:
- I am +0.6 on doing the automatic conversion.
- I am +1 on raising a better error, telling the user that we only support a certain subset of sparse data.
I'll approve, and see what other think.
hi @thomasjpfan i guess @ogrisel is not active can you assign someone else |
Fixes #30275
What does this implement/fix
Enhanced
_check_X
to verify for sparse matrix subscriptability via__getitem__
. Non-subscriptable sparse formats now raise a warning and are converted to CSR format for compatibility.Any other comments?
The
_check_X
function insklearn/compose/_column_transformer.py
, line 1324:Current behavior:
Only checks for
sparse.issparse(X)
which returns True for dia_array(or any other sparse matrix type from scipy.sparse).This causes a
TypeError: 'dia_array' object is not subscriptable
when non-subscriptable sparse formats are used andfit_transform
is called.Fix:
This update ensures that non-subscriptable sparse formats are handled by raising a warning and converting them to CSR format, preventing errors during transformation.