Image Segmentation Adaptive Clustering
Image Segmentation Adaptive Clustering
Image Segmentation Adaptive Clustering
base of the algorithm. Although the algorithm does not need any predefined
number of clusters as it generates a dendrogram, we discuss about various
stopping criteria. This is important because in some cases one might want
1
Contents
1 Introduction 3
2.1 Metric . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
2.2 Linkage criteria . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
2.3 Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
4 Conclusion 8
2
1 Introduction
Clustering is a machine learning technique for grouping similar objects together. Given a set of
objects, we can use a clustering algorithm to classify each object into a specific group. Objects
that belong to the same group should have similar, while objects in different groups highly
dissimilar properties.
Clustering is a method of unsupervised learning, where the data we want to describe is not
labeled. We do not know much information of what is the expected outcome. The algorithm
only has the data and it should perform clustering in the best way possible.
There are many well-known clustering algorithms (K-Means, DBSCAN, Mean-Shift etc.),
The agglomerative clustering [2] is the most common type of hierarchical clustering used to
group objects in clusters based on their similarity. It is a bottom-up approach that treats each
object as a singleton cluster, and then successively merges pairs of clusters until all clusters
have been merged into a single cluster that contains all objects.
In order to decide which clusters should be combined, a measure of dissimilarity between
groups of objects is required. This is achieved by using an appropriate metric (a measure of
distance between two objects), and a linkage criterion which specifies the distance between two
clusters.
2.1 Metric
The metric is a function that measures the distance between pairs of objects. Next we present
some commonly used metric functions between two n-dimensional objects a and b:
3
qP
n
• Euclidean distance: k a − b k2 = i=1 (ai − bi )2 .
Pn
• Squared Euclidean distance: k a − b k22 = i=1 (ai − bi )2 .
Pn
• Manhattan distance: k a − b k= i=1 |ai − bi |.
The choice of an appropriate metric function is important because it will influence the shape
of the clusters. Some elements may be closer to each other under one metric than under another.
For non-numeric data the Hamming or Levenshtein distances can be used.
The linkage criteria determines the distance between two clusters as a function of the pairwise
distances between objects. Next we present some commonly used linkage criteria between two
clusters A and B where d is the chosen metric:
1
P P
• Average linkage clustering: D (A, B) = |A|·|B| a∈A b∈B d (a, b) .
4
(a) Complete-linkage merge strategy. (b) Single-linkage merge strategy.
(c) Average linkage merge strategy. (d) Ward’s method merge strategy.
2·|A|·|B|
• Ward’s method [3]: D (A, B) = |A|+|B|
· k cA − cB k22 where cA and cB are the centroids
of clusters A and B.
– Those two clusters are combined whose merge results in minimal information loss.
2.3 Algorithm
The below steps describe how the hierarchical agglomerative clustering algorithm works:
2. At each iteration, we merge two clusters together. The two clusters to be combined are
those that optimize the function defined by the linkage criteria.
3. Step 2 is repeated until we only have one cluster that contains all the objects.
Example
5
Figure 2: An example of the hierarchical agglomerative clustering.
Let’s say that we’ve chosen single-linkage clustering as our linkage criteria. This means that
we combine those two clusters together that contain the two closest elements. We assume that
in this example elements {b} and {c} are the closest and merge them into a single cluster {b, c}.
We now have the following clusters {a}, {b, c}, {d}, {e} and {f }, and we want to merge them
further. The next closest pair of objects are {d} and {e}, and so we merge them also into a
single cluster {d, e}. We continue this process until all the clusters have been merged into a
single cluster that contains all the elements {a, b, c, d, e, f }.
In case of equal minimum distances, a pair of clusters is randomly chosen. This way we
are able to generate several structurally different dendrograms. As an alternative, all equal pairs
can be merged at the same time, generating a multidendrogram [1].
Hierarchical clustering does not require a prespecified number of clusters. We have the possi-
bility to select which number of clusters fits our input data the most since the algorithm builds
a tree. However, in some cases we may want to stop merging clusters together at a given point
to save computational resources.
In these cases, a number of stopping criteria can be used to determine the cutting point [2]:
• Number criterion: we stop clustering when we reach the desired amount of clusters.
6
• Distance criterion: we stop clustering when the clusters are too far apart to be merged.
It is a rather difficult problem to determine the optimal number of clusters in an input data,
regardless of the clustering method we choose to use. There are several methods in the literature
• The elbow method [5]: we plot the average internal per cluster sum of squares distance
against the number of clusters to find a visual ”elbow” (the slope changes from steep to
shallow) which is the optimal number of clusters. The average internal sum of squares is
• The silhouette method [4]: we calculate the average silhouette score of all the objects for
different values of k. The optimal number of clusters k is the one that maximizes the
average silhouette score.
Figure 3 shows an example for both the elbow and silhouette methods. Both methods iden-
tify k = 2 as the optimal partition size. In the case of the elbow method we can see that exactly
Figure 3: A visual representation of the elbow method on the left and of the silhouette method
on the right.
7
at k = 2 the slope changes from steep to shallow. Meanwhile, in case of the silhouette method
we can see that k = 2 yields the highest average silhouette score.
4 Conclusion
erative clustering, an unsupervised machine learning technique for grouping similar objects
together along with two methods for determining the optimal partition size: the elbow and the
silhouette methods. Finally, it can be said that the hierarchical agglomerative clustering is a
8
References
[1] Alberto Fernández and Sergio Gómez. “Solving non-uniqueness in agglomerative hier-
archical clustering using multidendrograms”. In: Journal of Classification 25.1 (2008),
pp. 43–65.
[2] Christopher D Manning, Hinrich Schütze, and Prabhakar Raghavan. Introduction to infor-
mation retrieval. Cambridge university press, 2008.
[3] Fionn Murtagh and Pierre Legendre. “Ward’s hierarchical agglomerative clustering method:
which algorithms implement Ward’s criterion?” In: Journal of classification 31.3 (2014),
pp. 274–295.
[4] Tippaya Thinsungnoena et al. “The clustering validity with silhouette and sum of squared
errors”. In: learning 3.7 (2015).
[5] Antoine E Zambelli. “A data-driven approach to estimating the number of clusters in hi-
erarchical clustering”. In: F1000Research 5 (2016).