Skip to content

[MRG] clusterQR method added to spectral segmentation #12316

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 55 commits into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
55 commits
Select commit Hold shift + click to select a range
c8e8ff2
clusterQR method added to spectral segmentation
lobpcg Oct 6, 2018
6f819e7
fix comment line too long
lobpcg Oct 6, 2018
9902767
typo fixed in spectral.py
lobpcg Oct 6, 2018
9fa6457
spectral.py trailing white space removed
lobpcg Oct 6, 2018
44daccb
typos fixed
lobpcg Oct 6, 2018
3ce05d0
Update doc/modules/clustering.rst
lobpcg Oct 6, 2018
5789fb8
formatting/typo
lobpcg Oct 6, 2018
9a0638b
spectral.py typo fixed
lobpcg Oct 7, 2018
b313306
Update sklearn/cluster/spectral.py
lobpcg Oct 11, 2018
2146c81
Merge pull request #1 from scikit-learn/master
lobpcg Oct 12, 2018
1a8a552
Merge pull request #2 from scikit-learn/master
lobpcg Jan 26, 2019
33e34fe
Merge pull request #3 from lobpcg/master
lobpcg Jan 26, 2019
ac8445f
github.com/scikit-learn/scikit-learn/pull/12316#discussion_r251665191
lobpcg Feb 5, 2019
06fba54
https://github.com/scikit-learn/scikit-learn/pull/12316#discussion_r2…
lobpcg Feb 5, 2019
4b3f74b
Merge pull request #5 from scikit-learn/master
lobpcg Feb 19, 2019
b19eb7c
Merge pull request #6 from lobpcg/master
lobpcg Feb 19, 2019
9426b4b
Merge pull request #8 from scikit-learn/master
lobpcg Mar 5, 2019
dfd9ed9
Merge pull request #9 from lobpcg/master
lobpcg Mar 5, 2019
e8b3b87
Update spectral.py
lobpcg Mar 5, 2019
6b00bbb
removed redundant SVD
lobpcg Mar 5, 2019
8cdcda2
testing new tol 1e-5 defaults and the laplacian shift in AMG
lobpcg Mar 5, 2019
3b6844d
changed to eigen_solver='amg'
lobpcg Mar 6, 2019
e1fd3a0
eigen_solver='amg' is not installed in CircleCI so back to 'arpack'
lobpcg Mar 6, 2019
a11d94a
comment updated to include amg and lobpcg, now fixed
lobpcg Mar 6, 2019
5b7afbc
AMG test on NOT fully connected graph
lobpcg Apr 10, 2019
4189f7c
last changes reversed for error debugging
lobpcg Apr 10, 2019
32e4d19
new AMG example, still connected graph
lobpcg Apr 10, 2019
035d395
change n_neighbors=5 to 3 for graph to disconnect
lobpcg Apr 10, 2019
5f79163
reversed changes to get back the original
lobpcg Apr 10, 2019
4d3b7f9
added new AMG test NOT fully connected graph
lobpcg Apr 10, 2019
1822016
flake8 line under-indented fixes
lobpcg Apr 10, 2019
abdb873
Merge pull request #11 from scikit-learn/master
lobpcg Apr 11, 2019
8f9d659
Merge pull request #12 from lobpcg/master
lobpcg Apr 11, 2019
b75bd34
Merge pull request #14 from scikit-learn/master
lobpcg May 25, 2019
ea9ebdb
Merge branch 'clusterQR' into master
lobpcg May 25, 2019
42d1414
Merge pull request #16 from lobpcg/master
lobpcg May 25, 2019
1ceecd8
Merge branch 'master' into clusterQR
lobpcg Jul 23, 2019
be2018d
reverse changes that appear in a separate PR
lobpcg Aug 1, 2019
f962ed9
reverse changes that appear in a separate PR
lobpcg Aug 1, 2019
3d20676
remove conj - complex numbers are not neeed
lobpcg Aug 1, 2019
f0d6f5c
reverse irrelevant change
lobpcg Aug 1, 2019
6dfee88
reverse changes not relevant to this PR
lobpcg Aug 1, 2019
ba4a451
reverse irrelevant changes to this PR
lobpcg Aug 1, 2019
6f9b305
typo fixed
lobpcg Aug 1, 2019
a9e9260
reverse changes irrelevant to this PR
lobpcg Aug 1, 2019
ced9b50
Merge pull request #18 from scikit-learn/master
lobpcg Aug 2, 2019
f2b1adc
Merge branch 'master' into clusterQR
lobpcg Aug 30, 2019
13ea224
Merge pull request #21 from lobpcg/clusterQR
lobpcg Aug 30, 2019
6924fa9
Merge pull request #22 from lobpcg/master
lobpcg Aug 30, 2019
78a788a
trying to fix a conflict
lobpcg Nov 29, 2019
5704634
Merge pull request #23 from scikit-learn/master
lobpcg Nov 29, 2019
72d8d60
Merge pull request #24 from lobpcg/master
lobpcg Nov 29, 2019
9556999
restore the edits
lobpcg Nov 29, 2019
540f937
add imports
lobpcg Nov 29, 2019
9bbd667
trying to fix rst warnings
lobpcg Nov 29, 2019
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
23 changes: 13 additions & 10 deletions doc/modules/clustering.rst
Original file line number Diff line number Diff line change
Expand Up @@ -453,7 +453,6 @@ cluster. This criteria is especially interesting when working on images, where
graph vertices are pixels, and weights of the edges of the similarity graph are
computed using a function of a gradient of the image.


.. |noisy_img| image:: ../auto_examples/cluster/images/sphx_glr_plot_segmentation_toy_001.png
:target: ../auto_examples/cluster/plot_segmentation_toy.html
:scale: 50
Expand Down Expand Up @@ -493,22 +492,22 @@ computed using a function of a gradient of the image.
:target: ../auto_examples/cluster/plot_coin_segmentation.html
:scale: 65

.. |coin_clusrerQR| image:: ../auto_examples/cluster/images/sphx_glr_plot_coin_segmentation_003.png
:target: ../auto_examples/cluster/plot_coin_segmentation.html
:scale: 65

Different label assignment strategies
-------------------------------------

Different label assignment strategies can be used, corresponding to the
``assign_labels`` parameter of :class:`SpectralClustering`.
``"kmeans"`` strategy can match finer details, but can be unstable.

``"kmeans"`` strategy can match finer details, but it can be unstable.
In particular, unless you control the ``random_state``, it may not be
reproducible from run-to-run, as it depends on random initialization.
The alternative ``"discretize"`` strategy is 100% reproducible, but tends
to create parcels of fairly even and geometrical shape.

===================================== =====================================
``assign_labels="kmeans"`` ``assign_labels="discretize"``
===================================== =====================================
|coin_kmeans| |coin_discretize|
===================================== =====================================
Alternative ``"discretize"`` strategy is 100% reproducible, but tends
to create parcels of fairly even and geometrical shape.
The recently added option ``clusterQR`` is 100% also reproducible.

Spectral Clustering Graphs
--------------------------
Expand Down Expand Up @@ -540,6 +539,10 @@ graph, and SpectralClustering is initialized with `affinity='precomputed'`::
<http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.19.8100>`_
Andrew Y. Ng, Michael I. Jordan, Yair Weiss, 2001

* `"Robust and efficient multi-way spectral clustering"
<https://github.com/asdamle/QR-spectral-clustering>`_
Anil Damle, Victor Minden, Lexing Ying

* `"Preconditioned Spectral Clustering for Stochastic
Block Partition Streaming Graph Challenge"
<https://arxiv.org/abs/1708.07481>`_
Expand Down
36 changes: 23 additions & 13 deletions examples/cluster/plot_coin_segmentation.py
Original file line number Diff line number Diff line change
Expand Up @@ -10,16 +10,20 @@
This procedure (spectral clustering on an image) is an efficient
approximate solution for finding normalized graph cuts.

There are two options to assign labels:
There are three options to assign labels:

* with 'kmeans' spectral clustering will cluster samples in the embedding space
using a kmeans algorithm
using a kmeans algorithm,
* with 'clusterQR' will cluster samples in the embedding space
using a clusterQR algorithm,
* whereas 'discrete' will iteratively search for the closest partition
space to the embedding space.

"""
print(__doc__)

# Author: Gael Varoquaux <gael.varoquaux@normalesup.org>, Brian Cheung
# Andrew Knyazev added clusterQR
# License: BSD 3 clause

import time
Expand Down Expand Up @@ -62,28 +66,34 @@
eps = 1e-6
graph.data = np.exp(-beta * graph.data / graph.data.std()) + eps

# Apply spectral clustering (this step goes much faster if you have pyamg
# installed)
N_REGIONS = 25
# The actual number of regions in this example is 27: background and 26 coins
N_REGIONS = 26

#############################################################################
# Visualize the resulting regions
# Compute and visualize the resulting regions

# Any eigen_solver: 'arpack', 'lobpcg', 'amg' can be used. AMG is usually best
# It often helps the spectral clustering to compute a few extra eigenvectors
N_REGIONS_PLUS = 3

for assign_labels in ('kmeans', 'discretize'):
for assign_labels in ('kmeans', 'discretize', 'clusterQR'):
t0 = time.time()
labels = spectral_clustering(graph, n_clusters=N_REGIONS,
assign_labels=assign_labels, random_state=42)
labels = spectral_clustering(graph,
n_clusters=(N_REGIONS + N_REGIONS_PLUS),
assign_labels=assign_labels, random_state=42,
eigen_solver='arpack')
t1 = time.time()
labels = labels.reshape(rescaled_coins.shape)

plt.figure(figsize=(5, 5))
plt.imshow(rescaled_coins, cmap=plt.cm.gray)
for l in range(N_REGIONS):
plt.contour(labels == l,
colors=[plt.cm.nipy_spectral(l / float(N_REGIONS))])
plt.imshow(rescaled_coins, cmap=plt.get_cmap('gray'))
plt.xticks(())
plt.yticks(())
title = 'Spectral clustering: %s, %.2fs' % (assign_labels, (t1 - t0))
print(title)
plt.title(title)
for l in range(N_REGIONS):
plt.contour(labels == l,
colors=[plt.cm.nipy_spectral((l+3) / float(N_REGIONS+3))])
plt.pause(0.5)
plt.show()
55 changes: 47 additions & 8 deletions sklearn/cluster/_spectral.py
Original file line number Diff line number Diff line change
Expand Up @@ -9,6 +9,8 @@

import numpy as np

from scipy.linalg import qr, svd

from ..base import BaseEstimator, ClusterMixin
from ..utils import check_random_state, as_float_array
from ..utils.validation import check_array
Expand All @@ -18,6 +20,36 @@
from ._k_means import k_means


def clusterQR(vectors):
"""Search for a partition matrix (clustering) which is
closest to the eigenvector embedding.

Parameters
----------
vectors : array-like, shape: (n_samples, n_clusters)
The embedding space of the samples.

Returns
-------
labels : array of integers, shape: n_samples
The labels of the clusters.

References
----------
https://github.com/asdamle/QR-spectral-clustering
https://arxiv.org/abs/1708.07481
"""

k = vectors.shape[1]
piv = qr(vectors.T, pivoting=True)[2]
piv = piv[0:k]
UtSV = svd(vectors[piv, :].T)
Ut = UtSV[0]
Vt = UtSV[2].T.conj()
vectors = abs(np.dot(vectors, np.dot(Ut, Vt.T)))
return vectors.argmax(axis=1).T


def discretize(vectors, copy=True, max_svd_restarts=30, n_iter_max=20,
random_state=None):
"""Search for a partition matrix (clustering) which is closest to the
Expand Down Expand Up @@ -210,14 +242,18 @@ def spectral_clustering(affinity, n_clusters=8, n_components=None,
Stopping criterion for eigendecomposition of the Laplacian matrix
when using arpack eigen_solver.

assign_labels : {'kmeans', 'discretize'}, default: 'kmeans'
assign_labels : {'kmeans', 'discretize', 'clusterQR'}, default: 'kmeans'
The strategy to use to assign labels in the embedding
space. There are two ways to assign labels after the laplacian
embedding. k-means can be applied and is a popular choice. But it can
space. There are three ways to assign labels after the laplacian
embedding. k-means can be applied and is a popular choice. But it can
also be sensitive to initialization. Discretization is another
approach which is less sensitive to random initialization. See
the 'Multiclass spectral clustering' paper referenced below for
more details on the discretization approach.
more details on the discretization approach. The newest clusterQR
directly extract clusters from eigenvectors in spectral clustering.
In contrast to k-means and discretization, clusterQR has no tuning
parameters, e.g., runs no iterations, yet may outperform k-means and
discretization in terms of both quality and speed.

Returns
-------
Expand Down Expand Up @@ -247,9 +283,10 @@ def spectral_clustering(affinity, n_clusters=8, n_components=None,
This algorithm solves the normalized cut for k=2: it is a
normalized spectral clustering.
"""
if assign_labels not in ('kmeans', 'discretize'):
raise ValueError("The 'assign_labels' parameter should be "
"'kmeans' or 'discretize', but '%s' was given"
if assign_labels not in ('kmeans', 'discretize', 'clusterQR'):
raise ValueError(
"The 'assign_labels' parameter should be "
"'kmeans', 'discretize', or 'clusterQR' but '%s' was given"
% assign_labels)

random_state = check_random_state(random_state)
Expand All @@ -266,6 +303,8 @@ def spectral_clustering(affinity, n_clusters=8, n_components=None,
if assign_labels == 'kmeans':
_, labels, _ = k_means(maps, n_clusters, random_state=random_state,
n_init=n_init)
elif assign_labels == 'clusterQR':
labels = clusterQR(maps)
else:
labels = discretize(maps, random_state=random_state)

Expand Down Expand Up @@ -351,7 +390,7 @@ class SpectralClustering(ClusterMixin, BaseEstimator):
Stopping criterion for eigendecomposition of the Laplacian matrix
when ``eigen_solver='arpack'``.

assign_labels : {'kmeans', 'discretize'}, default: 'kmeans'
assign_labels : {'kmeans', 'discretize', 'clusterQR'}, default: 'kmeans'
The strategy to use to assign labels in the embedding
space. There are two ways to assign labels after the laplacian
embedding. k-means can be applied and is a popular choice. But it can
Expand Down
6 changes: 5 additions & 1 deletion sklearn/cluster/tests/test_spectral.py
Original file line number Diff line number Diff line change
Expand Up @@ -28,7 +28,11 @@


@pytest.mark.parametrize('eigen_solver', ('arpack', 'lobpcg'))
@pytest.mark.parametrize('assign_labels', ('kmeans', 'discretize'))
@pytest.mark.parametrize(
'assign_labels',
('kmeans',
'discretize',
'clusterQR'))
def test_spectral_clustering(eigen_solver, assign_labels):
S = np.array([[1.0, 1.0, 1.0, 0.2, 0.0, 0.0, 0.0],
[1.0, 1.0, 1.0, 0.2, 0.0, 0.0, 0.0],
Expand Down