FIX normalization in semi_supervised label_propagation #31924

dschult · 2025-08-11T12:50:37Z

Fixes #31872 : strange normalization in semi-supervised label propagation

The trouble briefly:

In the dense affinity_matrix case, the current code sums axis=0 and then divides the rows by these sums. Other normalizations in semi_supervised use axis=1. This does not cause errors so long as we have symmetric affinity_matrix. The dense case arises for kernel "rbf" which provides symmetric matrices. But if someone provides their own kernel the normalization could be incorrect.
In the sparse affinity_matrix case, the current code divides all rows by the sum of the first row. This does not cause errors so long as the row sums are all the same. The sparse case arises for kernel "knn" which has all rows sum to k. But if someone provides their own kernel the normalization could be incorrect.
The normalization is different for the dense and sparse cases, which could be confusing to someone writing their own kernel.

This PR adds tests to of proper normalization that agrees between sparse and dense.
It also adjusts the code so it can work with sparse arrays or sparse matrices.

The tests check that normalization agrees between dense and sparse cases even if the affinity_matrix is not symmetric and does not have equal row sums. The errors corrected here do not arise for users who use the sklearn kernel options.

I discovered this when working on making sure sparse arrays and sparse matrices result in the same values (#31177). This PR splits it out of the other PR because it corrects/changes the current code and adds a test. Separating it from the large number of changes in the other PR is prudent, and eases review.

github-actions · 2025-08-11T12:51:45Z

✔️ Linting Passed

All linting checks passed. Your pull request is in excellent shape! ☀️

_{Generated for commit: e21c436. Link to the linter CI: here}

adrinjalali · 2025-08-12T08:44:05Z

@snath-xoc @antoinebaker could you have a look here please?

snath-xoc · 2025-08-12T09:36:46Z

I can take a look, thank you for the ping ☺️

snath-xoc · 2025-08-15T06:52:38Z

sklearn/semi_supervised/tests/test_label_propagation.py

@@ -143,6 +144,25 @@ def test_sparse_input_types(
    assert_array_equal(clf.predict([[0.5, 2.5]]), np.array([1]))


+@pytest.mark.parametrize("constructor", CONSTRUCTOR_TYPES)
+@pytest.mark.parametrize("Estimator, parameters", ESTIMATORS[1:2])


Is there a reason this is tested only for ESTIMATORS[1:2], I tried it with LabelSpreading estimators as well and it fails.... something to perhaps investigate further (and for now add in XFAIL), unless there are any insights as to why it's expected to fail?

The first and third ESTIMATORS use method rbf which creates a dense affinity_matrix.
And the fourth and following ESTIMATORS use the LabelSpreading class that constructs a laplacian_matrix instead of an affinity_matrix, so the normalization is different.

It might be cleaner to inline the Estimator and parameters here instead of fixtures since we are only testing one case. I agree that it looks strange to use only one of the list of Estimators, but that's all we want to test.

Upon reflection, I think it is worthwhile to test both dense and sparse cases for LabelPropagation._build_graph. So I've included the suggestion to use ESTIMATORS[:2].

I haven't added any tests for LabelSpreading._build_graph because the rows are not supposed to sum to 1 there. [The result there is not normalized beyond the normalization done while computing the laplacian from the affinity matrix].

snath-xoc · 2025-08-15T06:52:52Z

sklearn/semi_supervised/tests/test_label_propagation.py

@@ -143,6 +144,25 @@ def test_sparse_input_types(
    assert_array_equal(clf.predict([[0.5, 2.5]]), np.array([1]))


+@pytest.mark.parametrize("constructor", CONSTRUCTOR_TYPES)
+@pytest.mark.parametrize("Estimator, parameters", ESTIMATORS[1:2])


Suggested change

@pytest.mark.parametrize("Estimator, parameters", ESTIMATORS[1:2])

@pytest.mark.parametrize("Estimator, parameters", ESTIMATORS[:2])

snath-xoc · 2025-08-15T07:38:28Z

sklearn/semi_supervised/_label_propagation.py

+            if sparse.isspmatrix(affinity_matrix):
+                normalizer = np.ravel(normalizer)
+            # common case: knn method gives row sum k for all rows
+            if np.all(normalizer == normalizer[0]):


Suggested change

if np.all(normalizer == normalizer[0]):

# but sometimes row sums are not the same

# and need to account for that

if not np.all(normalizer == normalizer[0]):

I feel like this may be cleaner, but feel free to ignore

Just want to make sure I understand correctly:
This suggestion leaves affinity_matrix unnormalized in the case when all rows sum to the same value.
And you are saying it feels cleaner not to divide the whole matrix by that scalar. So I think you are saying that we can get by without normalizing because we normalize in the subsequent iterative process so we converge to the same value.

Did I get that right? Or did you mean to normalize the rows of the affinity matrix later?

snath-xoc · 2025-08-15T07:38:37Z

sklearn/semi_supervised/_label_propagation.py

+                normalizer = np.ravel(normalizer)
+            # common case: knn method gives row sum k for all rows
+            if np.all(normalizer == normalizer[0]):
+                affinity_matrix.data /= normalizer[0]


Suggested change

affinity_matrix.data /= normalizer[0]

snath-xoc · 2025-08-15T07:38:45Z

sklearn/semi_supervised/_label_propagation.py

+            # common case: knn method gives row sum k for all rows
+            if np.all(normalizer == normalizer[0]):
+                affinity_matrix.data /= normalizer[0]
+            else:  # row sums not the same


Suggested change

else: # row sums not the same

dschult added 2 commits August 11, 2025 08:34

tests of label_propagation graph normalizer

580084d

fix normalization of affinity_matrix in _build_graph

b35d9dd

github-actions bot added the module:semi_supervised label Aug 11, 2025

dschult mentioned this pull request Aug 11, 2025

Strange normalization of semi-supervised label propagation in _build_graph #31872

Open

dschult changed the title ~~Sparse normalizer~~ MAINT: Fix normalization in semi_supervised label_propagation Aug 11, 2025

dschult changed the title ~~MAINT: Fix normalization in semi_supervised label_propagation~~ FIX normalization in semi_supervised label_propagation Aug 11, 2025

dschult mentioned this pull request Aug 11, 2025

Enable config setting sparse_interface to control sparray and spmatrix creation #31177

Open

snath-xoc reviewed Aug 15, 2025

View reviewed changes

test dense and sparse. skip normalizing by scalar

e21c436

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

FIX normalization in semi_supervised label_propagation #31924

FIX normalization in semi_supervised label_propagation #31924

dschult commented Aug 11, 2025 •

edited

Loading

Uh oh!

github-actions bot commented Aug 11, 2025 •

edited

Loading

Uh oh!

adrinjalali commented Aug 12, 2025

Uh oh!

snath-xoc commented Aug 12, 2025

Uh oh!

snath-xoc Aug 15, 2025 •

edited

Loading

Uh oh!

dschult Aug 15, 2025

Uh oh!

dschult Aug 15, 2025

Uh oh!

snath-xoc Aug 15, 2025

Uh oh!

snath-xoc Aug 15, 2025 •

edited

Loading

Uh oh!

snath-xoc Aug 15, 2025 •

edited

Loading

Uh oh!

dschult Aug 15, 2025

Uh oh!

snath-xoc Aug 15, 2025

Uh oh!

snath-xoc Aug 15, 2025

Uh oh!

Uh oh!

	@pytest.mark.parametrize("Estimator, parameters", ESTIMATORS[1:2])
	@pytest.mark.parametrize("Estimator, parameters", ESTIMATORS[:2])

-            if np.all(normalizer == normalizer[0]):
+            # but sometimes row sums are not the same
+            # and need to account for that
+            if not np.all(normalizer == normalizer[0]):

Uh oh!

FIX normalization in semi_supervised label_propagation #31924

Are you sure you want to change the base?

FIX normalization in semi_supervised label_propagation #31924

Conversation

dschult commented Aug 11, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions bot commented Aug 11, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

✔️ Linting Passed

Uh oh!

adrinjalali commented Aug 12, 2025

Uh oh!

snath-xoc commented Aug 12, 2025

Uh oh!

snath-xoc Aug 15, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

dschult Aug 15, 2025

Choose a reason for hiding this comment

Uh oh!

dschult Aug 15, 2025

Choose a reason for hiding this comment

Uh oh!

snath-xoc Aug 15, 2025

Choose a reason for hiding this comment

Uh oh!

snath-xoc Aug 15, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

snath-xoc Aug 15, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

dschult Aug 15, 2025

Choose a reason for hiding this comment

Uh oh!

snath-xoc Aug 15, 2025

Choose a reason for hiding this comment

Uh oh!

snath-xoc Aug 15, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

dschult commented Aug 11, 2025 •

edited

Loading

github-actions bot commented Aug 11, 2025 •

edited

Loading

snath-xoc Aug 15, 2025 •

edited

Loading

snath-xoc Aug 15, 2025 •

edited

Loading

snath-xoc Aug 15, 2025 •

edited

Loading