FIX normalization in semi_supervised label_propagation #31924
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Fixes #31872 : strange normalization in semi-supervised label propagation
The trouble briefly:
semi_supervised
use axis=1. This does not cause errors so long as we have symmetric affinity_matrix. The dense case arises for kernel"rbf"
which provides symmetric matrices. But if someone provides their own kernel the normalization could be incorrect."knn"
which has all rows sum tok
. But if someone provides their own kernel the normalization could be incorrect.This PR adds tests to of proper normalization that agrees between sparse and dense.
It also adjusts the code so it can work with sparse arrays or sparse matrices.
The tests check that normalization agrees between dense and sparse cases even if the affinity_matrix is not symmetric and does not have equal row sums. The errors corrected here do not arise for users who use the sklearn kernel options.
I discovered this when working on making sure sparse arrays and sparse matrices result in the same values (#31177). This PR splits it out of the other PR because it corrects/changes the current code and adds a test. Separating it from the large number of changes in the other PR is prudent, and eases review.