Skip to content

Spectral clustering with lobpcg solver is unstable #10278

Closed
@jmargeta

Description

@jmargeta

Description

Spectral clustering with lobpcg solver appears to be unstable.
An exception is thrown on windows CI build while the code runs well on Linux.
On Linux, the solver is sensitive to tiny image variations.
See #9062 for more details.

This might be because the problem is not particularly well preconditioned.
However at least the behaviour should be consistent across all platforms.

Steps/Code to Reproduce

import numpy as np
from sklearn.cluster import spectral_clustering
from sklearn.feature_extraction import img_to_graph

# make an image with two circles and construct a graph
x, y = np.indices((40, 40))
center1, center2 = (14, 12), (20, 25)
radius1, radius2 = 8, 7

circle1 = (x - center1[0]) ** 2 + (y - center1[1]) ** 2 < radius1 ** 2
circle2 = (x - center2[0]) ** 2 + (y - center2[1]) ** 2 < radius2 ** 2

circles = circle1 | circle2
mask = circles.copy()
img = circles.astype(float)

graph = img_to_graph(img, mask=mask)
graph.data = np.exp(-graph.data / graph.data.std())

# this fails on windows for seemingly random versions of Python
# on linux the results can be sensitive to tiny image variations 
labels = spectral_clustering(graph, n_clusters=2, eigen_solver='lobpcg', random_state=0)

On Linux, when adding random noise to the image, tiny image variations (img = img + 1 + 0.05 * rand.random(*img.shape) vs img += 1 + 0.05 * rand.random(*img.shape) can throw of convergence of the lobpcg solver.

Expected Results

The returned labels of the clusters should be identical to the other two solvers:

labels_arpack = spectral_clustering(graph, n_clusters=2, eigen_solver='arpack', random_state=0)

or

labels_amg = spectral_clustering(graph, n_clusters=2, eigen_solver='amg', random_state=0)

i.e.
clustering_success

Actual Results

Exception is thrown on windows - no noise to the images applied See
E numpy.linalg.linalg.LinAlgError: the leading minor of order 5 of 'b' is not positive definite. The factorization of 'b' could not be completed and no eigenvalues or eigenvectors were computed.

Solver sensitivity to small image variations See

labels_arpack = array([0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,

       0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,...1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,

       1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1], dtype=int32)

labels_lobpcg = array([1, 1, 1, 1, 1, 1, 1, 0, 0, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0,

       1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0,...1, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 1,

       1, 1, 1, 1, 1, 1, 0, 0, 1, 1, 1, 1, 1, 1, 1], dtype=int32)

Or the failed result:
clustering_fail

Versions

The exception is thrown on scikit-learn's CI, windows build.
PYTHON=C:\Python35, PYTHON_VERSION=3.5.0, PYTHON_ARCH=32

The sensitivity to tiny image variations was found on:
Linux-4.10.0-40-generic-x86_64-with-Ubuntu-16.04-xenial
Python 3.6.3 (default, Oct 6 2017, 08:44:35)
[GCC 5.4.0 20160609]
NumPy 1.13.3
SciPy 1.0.0
Scikit-Learn 0.20.dev0

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions