Image Segmentation Adaptive Clustering

Download as pdf or txt
Download as pdf or txt
You are on page 1of 9

Adaptive clustering algorithm for determining the

optimal partition size


Eliézer Béczi

November 17, 2020

In this paper we present the hierarchical agglomerative clustering algorithm.


We introduce concepts such as metric and linkage criteria, which constitute the

base of the algorithm. Although the algorithm does not need any predefined
number of clusters as it generates a dendrogram, we discuss about various
stopping criteria. This is important because in some cases one might want

to stop at a prespecified number of clusters. For these cases we present two


methods from the literature (the elbow and the silhouette methods), which help
us determine the optimal partition size for a given input.

1
Contents

1 Introduction 3

2 Hierarchical agglomerative clustering 3

2.1 Metric . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
2.2 Linkage criteria . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
2.3 Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5

2.4 Stopping criteria . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6

3 Optimal partition size 7

4 Conclusion 8

2
1 Introduction

Clustering is a machine learning technique for grouping similar objects together. Given a set of
objects, we can use a clustering algorithm to classify each object into a specific group. Objects

that belong to the same group should have similar, while objects in different groups highly
dissimilar properties.
Clustering is a method of unsupervised learning, where the data we want to describe is not

labeled. We do not know much information of what is the expected outcome. The algorithm
only has the data and it should perform clustering in the best way possible.
There are many well-known clustering algorithms (K-Means, DBSCAN, Mean-Shift etc.),

but we will only focus on the hierarchical agglomerative clustering.

2 Hierarchical agglomerative clustering

The agglomerative clustering [2] is the most common type of hierarchical clustering used to
group objects in clusters based on their similarity. It is a bottom-up approach that treats each
object as a singleton cluster, and then successively merges pairs of clusters until all clusters

have been merged into a single cluster that contains all objects.
In order to decide which clusters should be combined, a measure of dissimilarity between
groups of objects is required. This is achieved by using an appropriate metric (a measure of

distance between two objects), and a linkage criterion which specifies the distance between two
clusters.

2.1 Metric

The metric is a function that measures the distance between pairs of objects. Next we present

some commonly used metric functions between two n-dimensional objects a and b:

3
qP
n
• Euclidean distance: k a − b k2 = i=1 (ai − bi )2 .

Pn
• Squared Euclidean distance: k a − b k22 = i=1 (ai − bi )2 .

Pn
• Manhattan distance: k a − b k= i=1 |ai − bi |.

• Maximum distance: k a − b k= max |ai − bi |.


i

The choice of an appropriate metric function is important because it will influence the shape
of the clusters. Some elements may be closer to each other under one metric than under another.
For non-numeric data the Hamming or Levenshtein distances can be used.

2.2 Linkage criteria

The linkage criteria determines the distance between two clusters as a function of the pairwise
distances between objects. Next we present some commonly used linkage criteria between two
clusters A and B where d is the chosen metric:

• Complete-linkage clustering: D (A, B) = max {d (a, b) : a ∈ A, b ∈ B} .

– Distance between farthest elements in clusters.

• Single-linkage clustering: D (A, B) = min {d (a, b) : a ∈ A, b ∈ B} .

– Distance between closest elements in clusters.

1
P P
• Average linkage clustering: D (A, B) = |A|·|B| a∈A b∈B d (a, b) .

– Average of all pairwise distances.

4
(a) Complete-linkage merge strategy. (b) Single-linkage merge strategy.

(c) Average linkage merge strategy. (d) Ward’s method merge strategy.

2·|A|·|B|
• Ward’s method [3]: D (A, B) = |A|+|B|
· k cA − cB k22 where cA and cB are the centroids
of clusters A and B.

– Minimizes the total within-cluster variance.

– Those two clusters are combined whose merge results in minimal information loss.

2.3 Algorithm

The below steps describe how the hierarchical agglomerative clustering algorithm works:

1. We treat each object as a single cluster.

2. At each iteration, we merge two clusters together. The two clusters to be combined are
those that optimize the function defined by the linkage criteria.

3. Step 2 is repeated until we only have one cluster that contains all the objects.

Example

Figure 2 shows an example of how a hierarchical agglomerative clustering algorithm works.


We see that our input data consists of 6 objects labeled from {a} to {f }. The first step is to
determine which two clusters to merge together. This is done according to the linkage criteria.

5
Figure 2: An example of the hierarchical agglomerative clustering.

Let’s say that we’ve chosen single-linkage clustering as our linkage criteria. This means that

we combine those two clusters together that contain the two closest elements. We assume that
in this example elements {b} and {c} are the closest and merge them into a single cluster {b, c}.
We now have the following clusters {a}, {b, c}, {d}, {e} and {f }, and we want to merge them

further. The next closest pair of objects are {d} and {e}, and so we merge them also into a
single cluster {d, e}. We continue this process until all the clusters have been merged into a
single cluster that contains all the elements {a, b, c, d, e, f }.

In case of equal minimum distances, a pair of clusters is randomly chosen. This way we
are able to generate several structurally different dendrograms. As an alternative, all equal pairs
can be merged at the same time, generating a multidendrogram [1].

2.4 Stopping criteria

Hierarchical clustering does not require a prespecified number of clusters. We have the possi-

bility to select which number of clusters fits our input data the most since the algorithm builds
a tree. However, in some cases we may want to stop merging clusters together at a given point
to save computational resources.

In these cases, a number of stopping criteria can be used to determine the cutting point [2]:

• Number criterion: we stop clustering when we reach the desired amount of clusters.

6
• Distance criterion: we stop clustering when the clusters are too far apart to be merged.

3 Optimal partition size

It is a rather difficult problem to determine the optimal number of clusters in an input data,
regardless of the clustering method we choose to use. There are several methods in the literature

that provide a solution to this problem:

• The elbow method [5]: we plot the average internal per cluster sum of squares distance

against the number of clusters to find a visual ”elbow” (the slope changes from steep to
shallow) which is the optimal number of clusters. The average internal sum of squares is

the average distance between objects inside of a cluster.

• The silhouette method [4]: we calculate the average silhouette score of all the objects for

different values of k. The optimal number of clusters k is the one that maximizes the
average silhouette score.

Figure 3 shows an example for both the elbow and silhouette methods. Both methods iden-
tify k = 2 as the optimal partition size. In the case of the elbow method we can see that exactly

Figure 3: A visual representation of the elbow method on the left and of the silhouette method
on the right.

7
at k = 2 the slope changes from steep to shallow. Meanwhile, in case of the silhouette method
we can see that k = 2 yields the highest average silhouette score.

4 Conclusion

In conclusion, we presented an adaptive clustering algorithm, namely the hierarchical agglom-

erative clustering, an unsupervised machine learning technique for grouping similar objects
together along with two methods for determining the optimal partition size: the elbow and the
silhouette methods. Finally, it can be said that the hierarchical agglomerative clustering is a

useful technique but there is always room for improvement.

8
References
[1] Alberto Fernández and Sergio Gómez. “Solving non-uniqueness in agglomerative hier-
archical clustering using multidendrograms”. In: Journal of Classification 25.1 (2008),
pp. 43–65.
[2] Christopher D Manning, Hinrich Schütze, and Prabhakar Raghavan. Introduction to infor-
mation retrieval. Cambridge university press, 2008.
[3] Fionn Murtagh and Pierre Legendre. “Ward’s hierarchical agglomerative clustering method:
which algorithms implement Ward’s criterion?” In: Journal of classification 31.3 (2014),
pp. 274–295.
[4] Tippaya Thinsungnoena et al. “The clustering validity with silhouette and sum of squared
errors”. In: learning 3.7 (2015).
[5] Antoine E Zambelli. “A data-driven approach to estimating the number of clusters in hi-
erarchical clustering”. In: F1000Research 5 (2016).

You might also like