-
-
Notifications
You must be signed in to change notification settings - Fork 26.1k
Description
The method _build_graph
on the LabelPropagation
class in sklearn/semi_supervised/_label_propagation.py
(line 455) treats normalization differently for sparse and dense kernels. I have questions about both of them.
** (Edited) Summary **
Troubles with the current code normalization:
- In the dense affinity_matrix case, the current code sums axis=0 and then divides the rows by these sums. Other normalizations in semi_supervised use axis=1 (as this case should). This does not cause incorrect result so long as we have symmetric affinity_matrices. The dense case arises for kernel "rbf" which provides symmetric matrices. But if someone provides their own kernel the normalization could be incorrect.
- In the sparse affinity_matrix case, the current code divides all rows by the sum of the first row. This is not standard normalization, but does not cause errors so long as the row sums are all the same. The sparse case arises for kernel "knn" which has all rows sum to k. But if someone provides their own kernel the normalization could be incorrect.
- The normalization is different for the dense and sparse cases, which could be confusing to someone writing their own kernel.
The fix involves changing axis=0
to axis=1
and correcting the sparse case to divide each row by its sum when the row sums are not all equal.
original somewhat rambling description
** Summary **
The method returns a different affinity_matrix
for sparse and for dense versions of the same kernel matrix. Neither sparse nor dense versions normalize the usual way (columns sum to 1). The dense case is correct for symmetric input kernels. The sparse case scales all values by a constant instead of by column sums.
I suspect the results still converge in most non-symmetric cases. That's probably why this hasn't caused any issues. But some label_propagation
issues like #8008, #11784, or #9722 could possibly be related.
sparse treatment
The sparse treatment is not really normalization: scaling the entire matrix by a scalar instead of normalizing by the rows or columns. I guess it is harmless to divide the entire matrix by a scalar, but there is enough code effort involved that it seems the intent was not what actually happens. The normalization is done via:
affinity_matrix.data /= np.diag(np.array(normalizer))
where normalizer = affinity_matrix.sum(axis=0)
. So normalizer
is of type np.matrix
with shape (1, n)
for n
columns. The np.array
turns that into an ndarray
-- but it leaves the shape as 2D. Then np.diag
takes the main diagonal of a 2D row matrix. Note that np.diag takes 1D->2D and 2D->1D. So it is easy to get something different from what you expect if the dimension is 2D instead of 1D. The result here is just the first entry in the row. We go through the trouble of summing across axis 0
, but then we divide the whole matrix by the first column's sum.
I think this is a bug stemming from assuming normalizer
to be a 1D object. Then np.diag
would construct a diagonal matrix with the sums on the main diagonal. Dividing the whole affinity matrix by that diagonal matrix would result in each column being divided by its sum, which is a typical normalization. A second problem here is that we do not divide affinity_matrix
. Instead we divide affinity_matrix.data
. So there is no concept of a column in this division.
If I am correct, one fix would be to avoid trying to divide by a diagonal matrix. Instead divide each data value by the sum corresponding to the column for that data value. We could use np.ravel
to avoid the 2D indexing issues. Something like:
affinity_matrix.data /= np.ravel(normalizer)[affinity_matrix.indices]
dense treatment
The dense treatment scales each row by the column sums (instead of each column by the column sum). Perhaps this is what it should be, but it is not the standard normaliztion of either the columns or the rows. The relevant code is:
affinity_matrix /= normalizer[:, np.newaxis]
In this case, normalizer is 1D (affinity_matrix
is ndarray
). So the index [:, np.newaxis]
results in broadcasting the column sums along the rows, e.g. row 3 is a repetition of the sum of column 3. The more natural code would be
affinity_matrix /= normalizer
and results in scaling each column by the column sums so the resulting columns add to 1. Note that for square symmetric affinity_matrix
row sums and column sums are the same. So, it is possible we are only supporting symmetric kernels. But there are tests without symmetric affinity_matrix
. I haven't figured out why the method still seems to "work" even though the scaling is very strange.
Speculation
dense I suspect the original code used ndarray
only, and tested symmetric affinity_matrix
. It looks like the code mixes up rows and columns when indexing to prepare for broadcasting.
sparse I suspect this code was an attempt to normalize using matrix division by a diagonal matrix. But the code lost track of np.diag
switching from expanding 1D->2D to reducing 2D->1D when the input is 2D instead of 1D. The result is shape (1, 1)
, so it still broadcasts to any size input, thus no exception is raised.
doc_string The doc_string mentions that this method may not result in a normalized matrix. But it justifies that only briefly. It is true that the doc_string is not public (the method is private). But it should be revised if/when this gets fixed.
Notes: This method needs to be updated when shifting from spmatrix
to sparray
because the sparse sum is no longer 2D np.matrix
. It is 1D np.ndarray
. That's how I discovered this rabbit hole. :)
I can make a PR for this, but I wanted to make sure my perspective is correct first. Do we want the affinity matrix to be stochastic (with columns summing to 1)? Or is there another approach this code uses that I'm not seeing. Thanks...