FIX normalization in semi_supervised label_propagation #31924

dschult · 2025-08-11T12:50:37Z

Fixes #31872 : strange normalization in semi-supervised label propagation

The trouble briefly:

In the dense affinity_matrix case, the current code sums axis=0 and then divides the rows by these sums. Other normalizations in semi_supervised use axis=1. This does not cause errors so long as we have symmetric affinity_matrix. The dense case arises for kernel "rbf" which provides symmetric matrices. But if someone provides their own kernel the normalization could be incorrect.
In the sparse affinity_matrix case, the current code divides all rows by the sum of the first row. This does not cause errors so long as the row sums are all the same. The sparse case arises for kernel "knn" which has all rows sum to k. But if someone provides their own kernel the normalization could be incorrect.
The normalization is different for the dense and sparse cases, which could be confusing to someone writing their own kernel.

This PR adds tests to of proper normalization that agrees between sparse and dense.
It also adjusts the code so it can work with sparse arrays or sparse matrices.

The tests check that normalization agrees between dense and sparse cases even if the affinity_matrix is not symmetric and does not have equal row sums. The errors corrected here do not arise for users who use the sklearn kernel options.

I discovered this when working on making sure sparse arrays and sparse matrices result in the same values (#31177). This PR splits it out of the other PR because it corrects/changes the current code and adds a test. Separating it from the large number of changes in the other PR is prudent, and eases review.

github-actions · 2025-08-11T12:51:45Z

✔️ Linting Passed

All linting checks passed. Your pull request is in excellent shape! ☀️

_{Generated for commit: b35d9dd. Link to the linter CI: here}

adrinjalali · 2025-08-12T08:44:05Z

@snath-xoc @antoinebaker could you have a look here please?

snath-xoc · 2025-08-12T09:36:46Z

I can take a look, thank you for the ping ☺️

dschult added 2 commits August 11, 2025 08:34

tests of label_propagation graph normalizer

580084d

fix normalization of affinity_matrix in _build_graph

b35d9dd

github-actions bot added the module:semi_supervised label Aug 11, 2025

dschult mentioned this pull request Aug 11, 2025

Strange normalization of semi-supervised label propagation in _build_graph #31872

Open

dschult changed the title ~~Sparse normalizer~~ MAINT: Fix normalization in semi_supervised label_propagation Aug 11, 2025

dschult changed the title ~~MAINT: Fix normalization in semi_supervised label_propagation~~ FIX normalization in semi_supervised label_propagation Aug 11, 2025

dschult mentioned this pull request Aug 11, 2025

Enable config setting sparse_interface to control sparray and spmatrix creation #31177

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

FIX normalization in semi_supervised label_propagation #31924

FIX normalization in semi_supervised label_propagation #31924

Uh oh!

dschult commented Aug 11, 2025 •

edited

Loading

Uh oh!

github-actions bot commented Aug 11, 2025

Uh oh!

adrinjalali commented Aug 12, 2025

Uh oh!

snath-xoc commented Aug 12, 2025

Uh oh!

Uh oh!

Uh oh!

FIX normalization in semi_supervised label_propagation #31924

Are you sure you want to change the base?

FIX normalization in semi_supervised label_propagation #31924

Uh oh!

Conversation

dschult commented Aug 11, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions bot commented Aug 11, 2025

✔️ Linting Passed

Uh oh!

adrinjalali commented Aug 12, 2025

Uh oh!

snath-xoc commented Aug 12, 2025

Uh oh!

Uh oh!

dschult commented Aug 11, 2025 •

edited

Loading