Skip to content

Conversation

lesteve
Copy link
Member

@lesteve lesteve commented Jan 2, 2024

Close #28011.

Apparently scipy.sparse.csr_array was able to be constructed from a 1d array. Not anymore so I used another container that can work with 1d array.

The change is likely from scipy/scipy#18530, I need some more time to digest it.

@lesteve lesteve changed the title Fix scipy dev CI Fix scipy dev Jan 2, 2024
@lesteve lesteve changed the title CI Fix scipy dev CI Fix scipy-dev build Jan 2, 2024
Copy link

github-actions bot commented Jan 2, 2024

✔️ Linting Passed

All linting checks passed. Your pull request is in excellent shape! ☀️

Generated for commit: b9b78ea. Link to the linter CI: here

Copy link
Member

@thomasjpfan thomasjpfan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR!

@@ -370,7 +375,7 @@ def test_readonly_kwargs():
TypeError,
"Sparse data was passed for w, but dense data is required",
)
for csr_container in CSR_CONTAINERS
for csr_container in COO_CONTAINERS
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given that the test is checking for an error, I prefer updating the container to use 2d data for w.

(I did not know 1d SciPy CSR sparse matrices were a thing).

@lesteve
Copy link
Member Author

lesteve commented Jan 3, 2024

The change is likely from scipy/scipy#18530, I need some more time to digest it.

I am going to ping @jjerphan in case you have any insights on this, since you were involved in the COO 1d scipy PR review scipy/scipy#18530.

Here is my summary as far as scikit-learn is concerned:

  • in the scikit-learn tests we are sometimes creating sparse containers (most of the time csc or csr I think) from 1d array-like. I am not sure if this is sloppy coding from us or whether it makes sense in some cases ...
  • csc_matrix([1, 2, 3]) and csc_array([1, 2, 3]) both work fine in the latest scipy release (this creates a sparse container with a single row) but since ENH: sparse: Generalize coo_array to support 1d shapes scipy/scipy#18530 was merged, csc_array([1, 2, 3]) raises an exception ValueError: Cannot convert a 1d sparse array to csc format. Since some of our tests are parametrized on both sparse arrays and sparse matrices they started failing with sparse arrays. I am not sure whether the side-effect of csc_array behaving differently than csc_matrix is an unintended side-effect of the scipy COO 1d PR. It is probably doable to fix our tests to avoid this pattern.

@ogrisel
Copy link
Member

ogrisel commented Jan 3, 2024

I think we should be explicit and create sparse arrays/matrix from 2d datastructures only, unless we precisely want to test the behavior of 1d datastructures.

There aren't that many tests with this failure:

ERROR cluster/tests/test_hierarchical.py - ValueError: Cannot convert a 1d sparse array to csr format
ERROR cluster/tests/test_hierarchical.py - ValueError: Cannot convert a 1d sparse array to csr format
ERROR cluster/tests/test_hierarchical.py - ValueError: Cannot convert a 1d sparse array to csr format
ERROR cluster/tests/test_hierarchical.py - ValueError: Cannot convert a 1d sparse array to csr format
ERROR metrics/tests/test_dist_metrics.py - ValueError: Cannot convert a 1d sparse array to csr format
ERROR metrics/tests/test_dist_metrics.py - ValueError: Cannot convert a 1d sparse array to csr format
ERROR metrics/tests/test_dist_metrics.py - ValueError: Cannot convert a 1d sparse array to csr format
ERROR metrics/tests/test_dist_metrics.py - ValueError: Cannot convert a 1d sparse array to csr format
ERROR neighbors/tests/test_neighbors.py - ValueError: Cannot convert a 1d sparse array to csr format
ERROR neighbors/tests/test_neighbors.py - ValueError: Cannot convert a 1d sparse array to csr format

Copy link
Member

@jjerphan jjerphan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM as fix. Thank you @lesteve.

I do not have enough bandwidth to have a look into it in details.

I wonder whether scipy/scipy#18530 might cause subtle changes or bugs for scikit-learn due to the previously implicit promotion to 2 dimensions.

I think this change of behavior has to be documented by SciPy if it is not reverted.

I let one of you merge if everything looks good to you.

@lesteve lesteve merged commit b5827cb into scikit-learn:main Jan 4, 2024
@lesteve lesteve deleted the fix-scipy-dev branch January 4, 2024 13:28
@lesteve
Copy link
Member Author

lesteve commented Jan 4, 2024

Merging my own PR with two approvals, there you go 😅

I added a summary of the scikit-learn tests that broke on the scipy PR in scipy/scipy#18530 (comment). I think this is best to have the discussion there with the Scipy maintainers to get their feeling about it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

⚠️ CI failed on Linux_Nightly.pylatest_pip_scipy_dev ⚠️
4 participants