0% found this document useful (0 votes)

35 views9 pages

Image Segmentation Adaptive Clustering

The document summarizes the hierarchical agglomerative clustering algorithm. It discusses key concepts like metrics, linkage criteria, and stopping criteria. The algorithm works by iteratively merging the closest clusters based on the linkage criteria. This generates a dendrogram showing the cluster relationships. While the algorithm does not require a predefined number of clusters, methods like the elbow method and silhouette method can determine the optimal partition size or number of clusters for a given dataset based on metrics like within-cluster distances and silhouette scores.

Uploaded by

Eliezer Beczi

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

35 views9 pages

Image Segmentation Adaptive Clustering

Uploaded by

Eliezer Beczi

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 9

Adaptive clustering algorithm for determining the

optimal partition size

Eliézer Béczi

November 17, 2020

In this paper we present the hierarchical agglomerative clustering algorithm.

We introduce concepts such as metric and linkage criteria, which constitute the

base of the algorithm. Although the algorithm does not need any predefined
number of clusters as it generates a dendrogram, we discuss about various
stopping criteria. This is important because in some cases one might want

to stop at a prespecified number of clusters. For these cases we present two

methods from the literature (the elbow and the silhouette methods), which help
us determine the optimal partition size for a given input.

1
Contents

1 Introduction 3

2 Hierarchical agglomerative clustering 3

2.1 Metric . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
2.2 Linkage criteria . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
2.3 Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5

2.4 Stopping criteria . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6

3 Optimal partition size 7

4 Conclusion 8

2
1 Introduction

Clustering is a machine learning technique for grouping similar objects together. Given a set of
objects, we can use a clustering algorithm to classify each object into a specific group. Objects

that belong to the same group should have similar, while objects in different groups highly
dissimilar properties.
Clustering is a method of unsupervised learning, where the data we want to describe is not

labeled. We do not know much information of what is the expected outcome. The algorithm
only has the data and it should perform clustering in the best way possible.
There are many well-known clustering algorithms (K-Means, DBSCAN, Mean-Shift etc.),

but we will only focus on the hierarchical agglomerative clustering.

2 Hierarchical agglomerative clustering

The agglomerative clustering [2] is the most common type of hierarchical clustering used to
group objects in clusters based on their similarity. It is a bottom-up approach that treats each
object as a singleton cluster, and then successively merges pairs of clusters until all clusters

have been merged into a single cluster that contains all objects.
In order to decide which clusters should be combined, a measure of dissimilarity between
groups of objects is required. This is achieved by using an appropriate metric (a measure of

distance between two objects), and a linkage criterion which specifies the distance between two
clusters.

2.1 Metric

The metric is a function that measures the distance between pairs of objects. Next we present

some commonly used metric functions between two n-dimensional objects a and b:

3
qP
n
• Euclidean distance: k a − b k2 = i=1 (ai − bi )2 .

Pn
• Squared Euclidean distance: k a − b k22 = i=1 (ai − bi )2 .

Pn
• Manhattan distance: k a − b k= i=1 |ai − bi |.

• Maximum distance: k a − b k= max |ai − bi |.

The choice of an appropriate metric function is important because it will influence the shape
of the clusters. Some elements may be closer to each other under one metric than under another.
For non-numeric data the Hamming or Levenshtein distances can be used.

2.2 Linkage criteria

The linkage criteria determines the distance between two clusters as a function of the pairwise
distances between objects. Next we present some commonly used linkage criteria between two
clusters A and B where d is the chosen metric:

• Complete-linkage clustering: D (A, B) = max {d (a, b) : a ∈ A, b ∈ B} .

– Distance between farthest elements in clusters.

• Single-linkage clustering: D (A, B) = min {d (a, b) : a ∈ A, b ∈ B} .

– Distance between closest elements in clusters.

1
P P
• Average linkage clustering: D (A, B) = |A|·|B| a∈A b∈B d (a, b) .

– Average of all pairwise distances.

4
(a) Complete-linkage merge strategy. (b) Single-linkage merge strategy.

2·|A|·|B|
• Ward’s method [3]: D (A, B) = |A|+|B|
· k cA − cB k22 where cA and cB are the centroids
of clusters A and B.

– Minimizes the total within-cluster variance.

– Those two clusters are combined whose merge results in minimal information loss.

2.3 Algorithm

The below steps describe how the hierarchical agglomerative clustering algorithm works:

1. We treat each object as a single cluster.

2. At each iteration, we merge two clusters together. The two clusters to be combined are
those that optimize the function defined by the linkage criteria.

3. Step 2 is repeated until we only have one cluster that contains all the objects.

Example

Figure 2 shows an example of how a hierarchical agglomerative clustering algorithm works.

We see that our input data consists of 6 objects labeled from {a} to {f }. The first step is to
determine which two clusters to merge together. This is done according to the linkage criteria.

5
Figure 2: An example of the hierarchical agglomerative clustering.

Let’s say that we’ve chosen single-linkage clustering as our linkage criteria. This means that

we combine those two clusters together that contain the two closest elements. We assume that
in this example elements {b} and {c} are the closest and merge them into a single cluster {b, c}.
We now have the following clusters {a}, {b, c}, {d}, {e} and {f }, and we want to merge them

further. The next closest pair of objects are {d} and {e}, and so we merge them also into a
single cluster {d, e}. We continue this process until all the clusters have been merged into a
single cluster that contains all the elements {a, b, c, d, e, f }.

In case of equal minimum distances, a pair of clusters is randomly chosen. This way we
are able to generate several structurally different dendrograms. As an alternative, all equal pairs
can be merged at the same time, generating a multidendrogram [1].

2.4 Stopping criteria

Hierarchical clustering does not require a prespecified number of clusters. We have the possi-

bility to select which number of clusters fits our input data the most since the algorithm builds
a tree. However, in some cases we may want to stop merging clusters together at a given point
to save computational resources.

In these cases, a number of stopping criteria can be used to determine the cutting point [2]:

• Number criterion: we stop clustering when we reach the desired amount of clusters.

6
• Distance criterion: we stop clustering when the clusters are too far apart to be merged.

3 Optimal partition size

It is a rather difficult problem to determine the optimal number of clusters in an input data,
regardless of the clustering method we choose to use. There are several methods in the literature

that provide a solution to this problem:

• The elbow method [5]: we plot the average internal per cluster sum of squares distance

against the number of clusters to find a visual ”elbow” (the slope changes from steep to
shallow) which is the optimal number of clusters. The average internal sum of squares is

the average distance between objects inside of a cluster.

• The silhouette method [4]: we calculate the average silhouette score of all the objects for

different values of k. The optimal number of clusters k is the one that maximizes the
average silhouette score.

Figure 3 shows an example for both the elbow and silhouette methods. Both methods iden-
tify k = 2 as the optimal partition size. In the case of the elbow method we can see that exactly

Figure 3: A visual representation of the elbow method on the left and of the silhouette method
on the right.

7
at k = 2 the slope changes from steep to shallow. Meanwhile, in case of the silhouette method
we can see that k = 2 yields the highest average silhouette score.

4 Conclusion

In conclusion, we presented an adaptive clustering algorithm, namely the hierarchical agglom-

erative clustering, an unsupervised machine learning technique for grouping similar objects
together along with two methods for determining the optimal partition size: the elbow and the
silhouette methods. Finally, it can be said that the hierarchical agglomerative clustering is a

useful technique but there is always room for improvement.

8
References
[1] Alberto Fernández and Sergio Gómez. “Solving non-uniqueness in agglomerative hier-
archical clustering using multidendrograms”. In: Journal of Classification 25.1 (2008),
pp. 43–65.
[2] Christopher D Manning, Hinrich Schütze, and Prabhakar Raghavan. Introduction to infor-
mation retrieval. Cambridge university press, 2008.
[3] Fionn Murtagh and Pierre Legendre. “Ward’s hierarchical agglomerative clustering method:
which algorithms implement Ward’s criterion?” In: Journal of classification 31.3 (2014),
pp. 274–295.
[4] Tippaya Thinsungnoena et al. “The clustering validity with silhouette and sum of squared
errors”. In: learning 3.7 (2015).
[5] Antoine E Zambelli. “A data-driven approach to estimating the number of clusters in hi-
erarchical clustering”. In: F1000Research 5 (2016).

A.K. Sarkar, P.B. Gajendragadkar and T.L. Venkatarama Aiyyar, JJ
No ratings yet
A.K. Sarkar, P.B. Gajendragadkar and T.L. Venkatarama Aiyyar, JJ
20 pages
Motion Control NC63 Complete English 2021 10 Glossary
0% (1)
Motion Control NC63 Complete English 2021 10 Glossary
383 pages
Embracing Education 4.0 - A Paradigm Shift Through Classpoint
100% (4)
Embracing Education 4.0 - A Paradigm Shift Through Classpoint
6 pages
SWEP report by Ayomide Akinleye FUNAAB
No ratings yet
SWEP report by Ayomide Akinleye FUNAAB
83 pages
03 Hierarchical Clustering
100% (1)
03 Hierarchical Clustering
15 pages
IS3370 Part4 Sec2 2021
No ratings yet
IS3370 Part4 Sec2 2021
352 pages
2.Asias Development Challenges and Opportunities
No ratings yet
2.Asias Development Challenges and Opportunities
27 pages
Q4 Math Week 5 Activity
No ratings yet
Q4 Math Week 5 Activity
1 page
Dodge - Motorized Torque Arm II Reducers CA1611
No ratings yet
Dodge - Motorized Torque Arm II Reducers CA1611
64 pages
6 - Chapter 6 - Hierarchical Clustering
No ratings yet
6 - Chapter 6 - Hierarchical Clustering
32 pages
Lust Witches 2 A Men s Urban Fantasy Adventure 1st Edition Justin Trublood download
No ratings yet
Lust Witches 2 A Men s Urban Fantasy Adventure 1st Edition Justin Trublood download
49 pages
AI20- Hierarchical-clustering
No ratings yet
AI20- Hierarchical-clustering
31 pages
Clustering
No ratings yet
Clustering
17 pages
K-Means and Hierarchical Clustering
No ratings yet
K-Means and Hierarchical Clustering
30 pages
Pattern Recognition 21BR551 MODULE 04 NOTES
No ratings yet
Pattern Recognition 21BR551 MODULE 04 NOTES
16 pages
Hierarchical Clustering
No ratings yet
Hierarchical Clustering
4 pages
The Historical and Educational Conditions of The Emergence of Modernist Schools in Bukhara
100% (1)
The Historical and Educational Conditions of The Emergence of Modernist Schools in Bukhara
7 pages
Hierarchical Clustering in Machine Learning
No ratings yet
Hierarchical Clustering in Machine Learning
11 pages
Unit 3 Clustering
No ratings yet
Unit 3 Clustering
101 pages
Lecture+Notes+ +clustering
No ratings yet
Lecture+Notes+ +clustering
13 pages
1629189889 ML TCS Lecture Hierarchical 1608
No ratings yet
1629189889 ML TCS Lecture Hierarchical 1608
41 pages
clustering1
No ratings yet
clustering1
2 pages
Daikin DAR Specs
No ratings yet
Daikin DAR Specs
8 pages
Lec.4.D. M. spring 2025
No ratings yet
Lec.4.D. M. spring 2025
19 pages
Data Interpretation
No ratings yet
Data Interpretation
28 pages
A Short Review of Catalysis
No ratings yet
A Short Review of Catalysis
12 pages
Screenshot 2022-08-30 (12.16.18)
No ratings yet
Screenshot 2022-08-30 (12.16.18)
1 page
A Short Review in Model Order Reduction Based On Proper Generalized Decomposition
No ratings yet
A Short Review in Model Order Reduction Based On Proper Generalized Decomposition
11 pages
Stress Crisis and Change Mana___
No ratings yet
Stress Crisis and Change Mana___
5 pages
6 - Machine Learning and Unlabeled Data
No ratings yet
6 - Machine Learning and Unlabeled Data
67 pages
High Performance Computing Using Apache Spark
No ratings yet
High Performance Computing Using Apache Spark
10 pages
Hierarchical Clustering
No ratings yet
Hierarchical Clustering
6 pages
A Cyclic Peptide Inhibitor
No ratings yet
A Cyclic Peptide Inhibitor
8 pages
Week-9-Part-2 Agglomerative Clustering
No ratings yet
Week-9-Part-2 Agglomerative Clustering
40 pages
Chitosan and Alginate Wound Dressings
No ratings yet
Chitosan and Alginate Wound Dressings
7 pages
Breaking The Diffusion Limit
No ratings yet
Breaking The Diffusion Limit
7 pages
Unit 4 Self Made (1)
No ratings yet
Unit 4 Self Made (1)
28 pages
Advanced_UIUX_Course_Proposal
No ratings yet
Advanced_UIUX_Course_Proposal
3 pages
Hierar Scale4
No ratings yet
Hierar Scale4
51 pages
Temperature Control Trainer: Experiment No: 04
No ratings yet
Temperature Control Trainer: Experiment No: 04
3 pages
Equations of Motion Formulation of A Pendulum Containing N-Point Masses
No ratings yet
Equations of Motion Formulation of A Pendulum Containing N-Point Masses
17 pages
Hierarchical
No ratings yet
Hierarchical
31 pages
Skeletal Muscle Expression and Abnormal Function
No ratings yet
Skeletal Muscle Expression and Abnormal Function
5 pages
Hierarchical Clustering: Relationship Between Clusters
No ratings yet
Hierarchical Clustering: Relationship Between Clusters
23 pages
How To Write A Paper For: Chemical Engineering Science
No ratings yet
How To Write A Paper For: Chemical Engineering Science
86 pages
UnSupervisedLearning
No ratings yet
UnSupervisedLearning
22 pages
Adaptive Clustering Algorithm
No ratings yet
Adaptive Clustering Algorithm
1 page
Questions Syllabus Answers 1 2 3 4 5 6 7 8: MA4-4NA MA4-4NA MA4-5NA
No ratings yet
Questions Syllabus Answers 1 2 3 4 5 6 7 8: MA4-4NA MA4-4NA MA4-5NA
7 pages
RK Clustering
No ratings yet
RK Clustering
77 pages
Exercises of LLD Master 2 de 2022-2023
No ratings yet
Exercises of LLD Master 2 de 2022-2023
4 pages
MACHINE LEARNING NOTES ANNA UNIVERSITY
No ratings yet
MACHINE LEARNING NOTES ANNA UNIVERSITY
14 pages
Hierarchical Clustering PDF
No ratings yet
Hierarchical Clustering PDF
5 pages
Lecture Notes - Clustering
No ratings yet
Lecture Notes - Clustering
13 pages
Hierarchical Clustering
No ratings yet
Hierarchical Clustering
41 pages
Cable Solutions For The Mining Industry
No ratings yet
Cable Solutions For The Mining Industry
12 pages
P 3.1.3 Hierarchical
No ratings yet
P 3.1.3 Hierarchical
30 pages
6902 An Applied Algorithmic Foundation For Hierarchical Clustering
No ratings yet
6902 An Applied Algorithmic Foundation For Hierarchical Clustering
10 pages
Example For Agglomerative Clustering
No ratings yet
Example For Agglomerative Clustering
2 pages
Grade 2 Sses Science
100% (6)
Grade 2 Sses Science
8 pages
Hierarchial Clustering
No ratings yet
Hierarchial Clustering
14 pages
3CP10 Mjj Hierarchical Clustering
No ratings yet
3CP10 Mjj Hierarchical Clustering
40 pages
4.4 Hierarchical Clustering Methods
No ratings yet
4.4 Hierarchical Clustering Methods
39 pages
Hierarchical-Clustering-in-Machine-Learning
No ratings yet
Hierarchical-Clustering-in-Machine-Learning
10 pages
EXHIBITION CENTER Shenzen
No ratings yet
EXHIBITION CENTER Shenzen
9 pages
Katalog Quantum
No ratings yet
Katalog Quantum
9 pages
Agnes
No ratings yet
Agnes
25 pages
Hierarchical Clustering
No ratings yet
Hierarchical Clustering
7 pages
A_new_hierarchical_clustering_algorithm (1)
No ratings yet
A_new_hierarchical_clustering_algorithm (1)
5 pages
Clustering Hierarchical PDF
No ratings yet
Clustering Hierarchical PDF
31 pages
Clustering
No ratings yet
Clustering
19 pages
Hierarchical Clustering: Required Data
No ratings yet
Hierarchical Clustering: Required Data
6 pages
Hierarchical Clustering Unit 4 ML
No ratings yet
Hierarchical Clustering Unit 4 ML
14 pages
ML Module Iv
No ratings yet
ML Module Iv
27 pages
Hierarchical Clusters
No ratings yet
Hierarchical Clusters
6 pages
FTU-3311
No ratings yet
FTU-3311
1 page
A Short Review of Failure Mechanisms of Lithium Metal and Lithiated Graphite Anodes in Liquid Electrolyte Solutions
No ratings yet
A Short Review of Failure Mechanisms of Lithium Metal and Lithiated Graphite Anodes in Liquid Electrolyte Solutions
12 pages
Fresenius Optima PT - User Manual PDF
100% (1)
Fresenius Optima PT - User Manual PDF
24 pages
Hierarchical Clustering
No ratings yet
Hierarchical Clustering
34 pages
Hierarchical Clustering
No ratings yet
Hierarchical Clustering
26 pages
Hierarchical Clustering pdf
No ratings yet
Hierarchical Clustering pdf
7 pages
10Hierarchical&Probabilistic Clustering & GMM (ML)
No ratings yet
10Hierarchical&Probabilistic Clustering & GMM (ML)
24 pages
Unit-6 Clustering Techniques
No ratings yet
Unit-6 Clustering Techniques
110 pages
Module-5-Cluster Analysis-Part1
No ratings yet
Module-5-Cluster Analysis-Part1
24 pages
DA Seminar
No ratings yet
DA Seminar
29 pages
Cluster Analysis Concept & Methods
No ratings yet
Cluster Analysis Concept & Methods
14 pages
Clustering: EE-671 Prof L. Behera, IITK
No ratings yet
Clustering: EE-671 Prof L. Behera, IITK
33 pages
Chap15 Cluster Analysis
No ratings yet
Chap15 Cluster Analysis
55 pages
The Wind Energy Revolution
No ratings yet
The Wind Energy Revolution
18 pages
Forensic 5 - Prelim - Chapter 1 & 2
No ratings yet
Forensic 5 - Prelim - Chapter 1 & 2
4 pages
Clustring
No ratings yet
Clustring
20 pages
Agglomerative Hierarchical Clustering Algorithm-A Review: K.Sasirekha, P.Baby
No ratings yet
Agglomerative Hierarchical Clustering Algorithm-A Review: K.Sasirekha, P.Baby
3 pages
Hierarchical Clustering - 11.3.2024 - Full
No ratings yet
Hierarchical Clustering - 11.3.2024 - Full
14 pages
Agglomerative Hierarchical Clustering
No ratings yet
Agglomerative Hierarchical Clustering
22 pages
SE-Mechanical CBCGS Syllabus
No ratings yet
SE-Mechanical CBCGS Syllabus
40 pages
Common Service Centres
No ratings yet
Common Service Centres
16 pages
Data Mining Unit 5
No ratings yet
Data Mining Unit 5
30 pages
Arteche - Mechanical Protection RF4XR
No ratings yet
Arteche - Mechanical Protection RF4XR
28 pages
K Nearest Neighbor Algorithm: Fundamentals and Applications
From Everand
K Nearest Neighbor Algorithm: Fundamentals and Applications
Fouad Sabry
No ratings yet
Image Segmentation: Unlocking Insights through Pixel Precision
From Everand
Image Segmentation: Unlocking Insights through Pixel Precision
Fouad Sabry
No ratings yet