TOPIC 6 – PART D
CLUSTERING: HIERARCHICAL
APPROACH
HIERARCHICAL CLUSTERING
• Produces a set of nested clusters organized as a hierarchical tree
• Can be visualized as a dendrogram
• A tree like diagram that records the sequences of merges or splits
6 5
0.2
4
3 4
0.15 2
5
0.1 2
0.05 1
3 1
0
1 3 2 5 4 6
HIERARCHICAL CLUSTERING
• Use distance matrix as clustering criteria. This method does not
require the number of clusters k as an input, but needs a termination
condition Step 0 Step 1 Step 2 Step 3 Step 4
agglomerative
(AGNES)
a
ab
b
abcde
c
cde
d
de
e
divisive
Step 4 Step 3 Step 2 Step 1 Step 0 (DIANA)
HIERARCHICAL CLUSTERING
Two main types of hierarchical clustering
◦ Agglomerative:
Start with the points as individual clusters
At each step, merge the closest pair of clusters until only one cluster (or k clusters) left
◦ Divisive:
Start with one, all-inclusive cluster
At each step, split a cluster until each cluster contains a point (or there are k clusters)
Traditional hierarchical algorithms use a similarity or distance matrix
◦ Merge or split one cluster at a time
HIERARCHICAL CLUSTERING
Introduced in Kaufmann and Rousseeuw (1990)
Implemented in statistical packages, e.g., Splus
Use the single-link method and the dissimilarity matrix
Merge nodes that have the least dissimilarity (min. Euclidean distance)
Go on in a non-descending fashion
Eventually all nodes belong to the same cluster
10 10 10
9 9 9
8 8 8
7 7 7
6 6 6
5 5 5
4 4 4
3 3 3
2 2 2
1 1 1
0 0 0
0 1 2 3 4 5 6 7 8 9 10 0 1 2 3 4 5 6 7 8 9 10 0 1 2 3 4 5 6 7 8 9 10
AGGLOMERATIVE CLUSTERING ALGORITHM
More popular hierarchical clustering technique
Basic algorithm :
1. Compute the dissimilarity matrix
2. Let each data point be a cluster
3. Repeat
4. Merge the two closest clusters
5. Update the dissimilarity matrix
6. Until only a single cluster remains
Key operation is the computation of the dissimilarity of two clusters
◦ Different approaches to defining the distance between clusters
DENDROGRAM
DENDROGRAM: shows how
the clusters are merged
Decompose data objects into a
several levels of nested
partitioning (tree of clusters),
called a dendrogram.
A clustering of the data objects is
obtained by cutting the
dendrogram at the desired level,
then each connected component
forms a cluster.
STARTING SITUATION
• Start with clusters of individual points and a
dissimilarity matrix
p1 p2 p3 p4 p5 ...
p1
p2
p3
p4
p5
.
.
. Dissimilarity Matrix
...
p1 p2 p3 p4 p9 p10 p11 p12
INTERMEDIATE SITUATION
c1 c2 c3 c4 c5 ...
• After some merging steps, c1
c2
we have some clusters
c3
c4
C3 c5
.
C4 .
Dissimilarity Matrix
.
C1
C2 C5
...
p1 p2 p3 p4 p9 p10 p11 p12
INTERMEDIATE SITUATION
c1 c2 c3 c4 c5 ...
• We want to merge the two closest clusters c1
(C2 and C5) and update the dissimilarity c2
matrix. c3
c4
C3 c5
.
C4 . Dissimilarity Matrix
.
C1
C2 C5
...
p1 p2 p3 p4 p9 p10 p11 p12
AFTER MERGING
C2
U
C1 C5 C3 C4
• The question is “How do we
update the dissimilarity C1 ?
matrix?” C2 U C5 ? ? ? ?
C3 ?
C3 C4 ?
C4 Dissimilarity Matrix
C1
C2 U C5
...
p1 p2 p3 p4 p9 p10 p11 p12
HOW TO DEFINE INTER-CLUSTER SIMILARITY
p1 p2 p3 p4 p5 ...
Similarity? p1
p2
p3
p4
MIN p5
MAX .
.
Group Average
Distance Between Centroids
.
Dissimilarity Matrix
Other methods driven by an objective
function
HOW TO DEFINE INTER-CLUSTER SIMILARITY
p1 p2 p3 p4 p5 ...
p1
p2
p3
p4
MIN p5
MAX .
Group Average .
. Dissimilarity Matrix
Distance Between Centroids
Other methods driven by an objective function
HOW TO DEFINE INTER-CLUSTER SIMILARITY
p1 p2 p3 p4 p5 ...
p1
p2
p3
p4
MIN p5
MAX .
Group Average .
. Dissimilarity Matrix
Distance Between Centroids
Other methods driven by an objective function
HOW TO DEFINE INTER-CLUSTER SIMILARITY
p1 p2 p3 p4 p5 ...
p1
p2
p3
p4
MIN p5
MAX .
Group Average .
. Dissimilarity Matrix
Distance Between Centroids
Other methods driven by an objective function
HOW TO DEFINE INTER-CLUSTER SIMILARITY
p1 p2 p3 p4 p5 ...
p1
p2
p3
p4
MIN p5
MAX .
Group Average .
. Dissimilarity Matrix
Distance Between Centroids
Other methods driven by an objective function
MIN AND MAX
MIN or Single Link • MAX or Complete Linkage
• Similarity of two clusters is based on • Similarity of two clusters is based on
the two most similar (closest) points the two least similar (most distant)
in the different clusters points in the different clusters
• Determined by one pair of points, i.e., by • Determined by all pairs of points in the
one link in the dissimilarity graph. two clusters
• Using MIN OPERATOR (minimum • Using MAX OPERATOR (maximum
value) value)
Example: Example:
dist({3,6},{2,5}) dist({3,6},{2,5})
= min (dist(3,2), dist(6,2), dist(3,5), dist(6,5)) = max (dist(3,2), dist(6,2), dist(3,5), dist(6,5))
= min (0.15,0.25,0.28,0.39) = max (0.15,0.25,0.28,0.39)
= 0.15 = 0.39
CLUSTER SIMILARITY: MIN OR SINGLE LINK
• Similarity of two clusters is based on
the two most similar (closest) points in
the different clusters
• Determined by one pair of points, i.e., by
one link in the dissimilarity graph.
1 2 3 4 5
CLUSTER SIMILARITY: MIN OR SINGLE LINK
Point x coordinate y coordinate
p1 0.40 0.53
p2 0.22 0.38 Numeric
p3 0.35 0.32 Attributes
p4 0.26 0.19
p5 0.08 0.41
p6 0.45 0.30
p1 p2 p3 p4 p5 p6
p1 0.00
p2 0.24 0.00
p3 0.22 0.15 0.00
p4 0.37 0.20 0.15 0.00
p5 0.34 0.14 0.28 0.29 0.00
√ 2
𝑑 ( 𝑝 1 , 𝑝 2 ) = ( 𝑥1 − 𝑥 2 ) + ( 𝑦 1 − 𝑦 2 )
2
p6 0.23 0.25 0.11 0.22 0.39 0.00
𝑑 ( 𝑝 1 , 𝑝 2 ) =√ ( 0 . 40 − 0 . 22 ) + ( 0 . 53 −0 . 38 )
2 2
CLUSTER SIMILARITY: MIN OR SINGLE LINK
• Step 1 – find 2 shortest distances from the dissimilarity matrix become
the first 2 clusters
cluster (p3, p6) and cluster (p2,p5)
p1 p2 p3 p4 p5 p6
Draw the dendrogram
p1 0.00 0.2
p2 0.24 0.00
p3 0.22 0.15 0.00 0.15
p4 0.37 0.20 0.16 0.00
0.1
p5 0.34 0.14 0.28 0.29 0.00
p6 0.23 0.25 0.11 0.22 0.39 0.00 0.05
0
3 6 2 5 4 1
CLUSTER SIMILARITY: MIN OR SINGLE LINK
1 O Step 2 – form the next cluster by finding the point
with the minimum distance to the other points in
the existing clusters, so update the matrix with two
5
2 1 newly formed clusters
2 3 6 Update dissimilarity matrix
p1 P(2,5) P(3,6) p4
p1 0.00
4
P(2,5) ??? 0.00
P(3,6) ??? ??? 0.00
p4 0.37 ??? ??? 0.00
CLUSTER SIMILARITY: MIN OR SINGLE LINK
Distance Single link
dist({3,6},{2,5}) = min (dist(3,2), dist(6,2), dist(3,5), dist(6,5)) p1 P(2,5) P(3,6) p4
= min (0.15,0.25,0.28,0.39) p1 0.00
= 0.15 P(2,5) 0.24 0.00
dist({3,6},{1}) = min (dist(3,1), dist(6,1)) P(3,6) 0.22 0.15 0.00
= min (0.22,0.23)
= 0.22 p4 0.37 0.20 0.16 0.00
dist({3,6},{4}) = min (dist(3,4), dist(6,4))
= min (0.16,0.22)
= 0.16 The closest clusters are
dist({2,5},{1}) = min (dist(2,1), dist(5,1)) cluster {3,6} and cluster
= min (0.24,0.34) {2,5} with 0.15, so the
= 0.24 clusters are merged.
dist({2,5},{4}) = min (dist(2,4), dist(5,4))
= min (0.20,0.29)
= 0.20
CLUSTER SIMILARITY: MIN OR SINGLE LINK
0.2
Draw the 0.15
dendrogram
0.1
0.05
0
3 6 2 5 4 1
CLUSTER SIMILARITY: MIN OR SINGLE LINK
O Step 3 – form the next cluster by finding the point p1 P(2,5),(3,6)
P(2,5),(3,6) p4
p4
with the minimum distance to the other points in p1
p1 0.00
0.00
the existing clusters; P(2,5),(3,6) ??? 0.00
0.00
P(2,5),(3,6) 0.22
p4
p4 0.37 0.16
0.37 ??? 0.00
0.00
Distance Single link
dist({{3,6},{2,5}}, 4) = min (dist{(3,6), 4}, dist{(2,5), 4}) Updated matrix
= min (0.16,0.20)
= 0.16
dist({{3,6},{2,5}}, 1) = min (dist{(3,6), 1}, dist{(2,5), 1}) 0.2
= min (0.22,0.24)
= 0.22 0.15
0.1
0.05
Draw the dendogram
0
3 6 2 5 4 1
CLUSTER SIMILARITY: MIN OR SINGLE LINK
O Step 4 – form the next cluster by finding the distance p1 P((2,5),(3,6)),4
to the existing clusters; p1 0.00
P((2,5),(3,6)),4 ???
0.22 0.00
Distance Single link
Updated matrix
dist({{{3,6},{2,5}}, 4},1) = min ((dist({{3,6},{2,5}}, 1), dist(4,1))
= min (0.22,0.37)
= 0.22 0.2
0.15
0.1
Draw the dendogram
0.05
0
3 6 2 5 4 1
P(((2,5),(3,6)),4),1
Final matrix P(((2,5),(3,6)),4),1 0 Terminate
CLUSTER SIMILARITY: MIN OR SINGLE LINK
Nested Clusters Dendrogram
5
1 0.2
3
0.15
5
2 1 0.1
2 3 6 0.05
0
3 6 2 5 4 1
4
4
CLUSTER SIMILARITY: MAX OR COMPLETE LINKAGE
• Similarity of two clusters is based on the two least similar (most
distant) points in the different clusters
• Determined by all pairs of points in the two clusters
Calculation is using the maximum
but selection to merge is based on
the minimum
1 2 3 4 5
CLUSTER SIMILARITY: MAX OR COMPLETE LINKAGE
• Step 1 – find 2 shortest distance from Draw the dendogram
the dissimilarity table to become the 0.2
first 2 clusters (p3, p6), (p2,p5) 0.15
p1 p2 p3 p4 p5 p6 0.1
p1 0.00
0.05
p2 0.24 0.00
p3 0.22 0.15 0.00 0
3 6 2 5 4 1
p4 0.37 0.20 0.16 0.00
p5 0.34 0.14 0.28 0.29 0.00 p1 P(2,5) P(3,6) p4
p6 0.23 0.25 0.11 0.22 0.39 0.00 p1 0.00
P(2,5) XX 0.00
P(3,6) XX XX 0.00
UPDATE MATRIX p4 0.37 XX XX 0.00
CLUSTER SIMILARITY: MAX OR COMPLETE LINKAGE
Points Distance
dist({3,6},{2,5}) = max (dist(3,2), dist(6,2), dist(3,5), dist(6,5)) p1 P(2,5) P(3,6) p4
= max (0.15,0.25,0.28,0.39) p1 0.00
= 0.39
dist({3,6},{1}) = max (dist(3,1), dist(6,1)) P(2,5) 0.34 0.00
= max (0.22,0.23) P(3,6) 0.23 0.39 0.00
= 0.23
p4 0.37 0.29 0.22 0.00
dist({3,6},{4}) = max (dist(3,4), dist(6,4))
= max (0.16,0.22)
= 0.22 UPDATED MATRIX
dist({2,5},{1}) = max (dist(2,1), dist(5,1))
= max (0.24,0.34)
= 0.34
dist({2,5},{4}) = max (dist(2,4), dist(5,4))
= max (0.20,0.29)
= 0.29
CLUSTER SIMILARITY: MAX OR COMPLETE LINKAGE
O Step 2 – form the next cluster by finding the point with the
minimum distance to the other points in the existing
clusters;
newly formed cluster {(3,6), 4}
Draw dendrogram
p1 P(2,5) P(3,6) p4 0.4
p1 0.00 0.35
0.3
P(2,5) 0.34 0.00 0.25
P(3,6) 0.23 0.39 0.00 0.2
0.15
p4 0.37 0.29 0.22 0.00 0.1
0.05
0
3 6 4 1 2 5
CLUSTER SIMILARITY: MAX OR COMPLETE LINKAGE
UPDATE MATRIX Points Distance
p1 P(2,5) P(3,6), 4 dist({{3,6},{4}}, = max (dist{(3,6), (2,5)}, dist{4, (2,5)})
{2,5}) = max (0.39,0.29)
= 0.39
p1 0.00
dist({{3,6},{4}}, 1) = max (dist{(3,6), 1}, dist{4, 1})
P(2,5) 0.34
XXX 0.00 = max (0.37,0.37)
P(3,6),4 0.37
XXX 0.39
XXX 0.00 = 0.37
dist({{2,5}, 1) = 0.34
O Step 3 – form the next cluster by 0.4
0.35
finding the point with the Draw the dendogram 0.3
minimum distance to the other 0.25
0.2
points in the existing clusters; 0.15
0.1
newly formed cluster {(2,5), 1} 0.05
0
3 6 4 1 2 5
CLUSTER SIMILARITY: MAX OR COMPLETE LINKAGE
O Step 4 – form the next cluster by finding the distance to the other points in the
existing clusters; newly formed cluster {{{2,5}, 1},{{3,6},4}}
P(2,5),1 P(3,6), 4 Points Distance
dist({{3,6},{4}}, = max (dist{(3,6),4}, (2,5)}, dist{(3,6),4}, 1}})
P(2,5),1 0.00 {{2,5},1}) = max (0.39,0.37)
P(3,6),4 0.39 0.00 = 0.39
0.4
UPDATED MATRIX
0.35
0.3 P({{3,6},{4}},
0.25
{{2,5},1})
Draw the dendrogram 0.2 P({{3,6},{4}}, {{2,5},1}) 0
0.15
0.1
0.05
0
3 6 4 1 2 5 Terminate
CLUSTER SIMILARITY: MAX OR COMPLETE LINKAGE
Nested Clusters
Dendrogram
4 1
0.4
2 5 0.35
5 0.3
2 0.25
0.2
3 6
0.15
0.1
3
1
0.05
0
3 6 4 1 2 5
4
HIERARCHICAL CLUSTERING: COMPARISON
Single Link OR MIN Complete Link OR MAX
5
1 4
3 1
2 5
5
2 1 5
2
2 3 6 3 6
3
1
4
4 4
0.4
0.2 0.35
0.3
0.15 0.25
0.2
0.1
0.15
0.1
0.05
0.05
0 0
3 6 2 5 4 1 3 6 4 1 2 5
SUMMARY
• Cluster analysis groups objects based on their similarity/dissimilarity
and has wide applications
• Measure of dissimilarity can be computed for various types of data:
nominal, binary, ordinal, numeric, textual
• Clustering algorithms discussed are categorized into partitioning
methods (k means) and hierarchical methods.
References
1. Jiawei Han and Micheline Kamber, Data Mining: Concepts and
Techniques, 3rd Edition, Morgan Kaufmann, 2012.
2. Pang-Ning Tan, Michael Steinbach & Vipin Kumar, Introduction to Data
Mining, Addison Wesley, 2019.
THANK YOU
Shuzlina Abdul Rahman | Sofianita Mutalib | Siti Nur Kamaliah Kamarudin