First steps toward sparray migration pass 2 #31072
Open
+56
−30
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This PR converts
sklearn/metrics/cluster/_supervised.py
[Edit: andsklearn/manifold/_locally_linear.py
] to use sparray. The reason for doing a smallone-filetwo-file PR here is to get feedback on kwargs that direct spmatrix vs sparray. [Edit: the second module includes a function that has no current sparse keyword -- so maybe needs a new one. see the next comment in this PR.]For the function
contingency_matrix
this PR extends thesparse
keyword to allow "boolean or str" input with valid string inputs being"sparray"
or"spmatrix"
. The valueFalse
continues to signify an ndarray return value whileTrue
signifiesspmatrix
for now with a note that this will change behavior tosparray
alongside the deprecation ofspmatrix
. Note that in this module all calls tocontingency_matrix
lead to non-sparse output so have been switched fromsparse=True
tosparse="sparray"
in alignment with Pass 2 of migration to sparray making all internal (non-user facing) sparse usage besparray
.contingency_matrix
repo-wide that use sparse have been updated (all are in this module).There is a doc reference to
sklearn.sparse.csr_matrix
which I assume never existed. So I reworded that doc.This file has a function
fowles_mallow_score
which has asparse
keyword but it is never used. There is a PR FIX ‘sparse’ kwarg was not used by fowlkes_mallows_score #28981 which points out this flaw and attempts to make the keyword work, but the function's code/algorithm uses sparse features of the contingency_matrix -- so the code would not work with dense arrays -- the keyword cannot be meaningful here without new and less efficient code. I believe that the keyword should simply be removed. It is broken and shouldn't be enabled.sparse
keyword fromfowlkes_mallow_score
. Let me know if this should be separated from the rest of this PR.This PR relates to #26418 (
RFC supporting scipy.sparse.sparray
)