0% found this document useful (0 votes)

124 views38 pages

Topic 6d - Hierarchical Algorithm

Hierarchical clustering is an unsupervised machine learning algorithm that builds nested clusters based on the distances between observations. It does not require specifying the number of clusters as input. There are two main approaches: agglomerative, which starts with each observation in its own cluster and merges them sequentially, and divisive, which starts with all observations in one cluster and splits them sequentially. The algorithm uses a distance or similarity matrix to merge or split clusters at each step. The results can be visualized as a dendrogram tree structure.

Uploaded by

Nurizzati Md Nizam

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

124 views38 pages

Topic 6d - Hierarchical Algorithm

Uploaded by

Nurizzati Md Nizam

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 38

TOPIC 6 – PART D

CLUSTERING: HIERARCHICAL
APPROACH
HIERARCHICAL CLUSTERING

• Produces a set of nested clusters organized as a hierarchical tree

• Can be visualized as a dendrogram
• A tree like diagram that records the sequences of merges or splits

6 5
0.2
4
3 4
0.15 2
5
0.1 2

0.05 1
3 1
0
1 3 2 5 4 6
HIERARCHICAL CLUSTERING

• Use distance matrix as clustering criteria. This method does not

require the number of clusters k as an input, but needs a termination
condition Step 0 Step 1 Step 2 Step 3 Step 4
agglomerative
(AGNES)
a
ab
b
abcde
c
cde
d
de
e
divisive
Step 4 Step 3 Step 2 Step 1 Step 0 (DIANA)
HIERARCHICAL CLUSTERING

 Two main types of hierarchical clustering

◦ Agglomerative:
 Start with the points as individual clusters
 At each step, merge the closest pair of clusters until only one cluster (or k clusters) left

◦ Divisive:
 Start with one, all-inclusive cluster
 At each step, split a cluster until each cluster contains a point (or there are k clusters)

 Traditional hierarchical algorithms use a similarity or distance matrix

◦ Merge or split one cluster at a time
HIERARCHICAL CLUSTERING
 Introduced in Kaufmann and Rousseeuw (1990)
 Implemented in statistical packages, e.g., Splus
 Use the single-link method and the dissimilarity matrix
 Merge nodes that have the least dissimilarity (min. Euclidean distance)
 Go on in a non-descending fashion
 Eventually all nodes belong to the same cluster
10 10 10

9 9 9

8 8 8

7 7 7

6 6 6

5 5 5

4 4 4

3 3 3

2 2 2

1 1 1

0 0 0
0 1 2 3 4 5 6 7 8 9 10 0 1 2 3 4 5 6 7 8 9 10 0 1 2 3 4 5 6 7 8 9 10
AGGLOMERATIVE CLUSTERING ALGORITHM

 More popular hierarchical clustering technique

 Basic algorithm :
1. Compute the dissimilarity matrix
2. Let each data point be a cluster
3. Repeat
4. Merge the two closest clusters
5. Update the dissimilarity matrix
6. Until only a single cluster remains

 Key operation is the computation of the dissimilarity of two clusters

◦ Different approaches to defining the distance between clusters
DENDROGRAM
DENDROGRAM: shows how
the clusters are merged
Decompose data objects into a
several levels of nested
partitioning (tree of clusters),
called a dendrogram.

A clustering of the data objects is

obtained by cutting the
dendrogram at the desired level,
then each connected component
forms a cluster.
STARTING SITUATION
• Start with clusters of individual points and a
dissimilarity matrix
p1 p2 p3 p4 p5 ...
p1

p2
p3

p4
p5
.
.
. Dissimilarity Matrix

...
p1 p2 p3 p4 p9 p10 p11 p12
INTERMEDIATE SITUATION
c1 c2 c3 c4 c5 ...

• After some merging steps, c1

c2
we have some clusters
c3

C3 c5
.
C4 .
Dissimilarity Matrix
.

C2 C5

...
p1 p2 p3 p4 p9 p10 p11 p12
INTERMEDIATE SITUATION
c1 c2 c3 c4 c5 ...
• We want to merge the two closest clusters c1

(C2 and C5) and update the dissimilarity c2

matrix. c3

C3 c5
.
C4 . Dissimilarity Matrix
.

C2 C5

...
p1 p2 p3 p4 p9 p10 p11 p12
AFTER MERGING
C2
U
C1 C5 C3 C4
• The question is “How do we
update the dissimilarity C1 ?

matrix?” C2 U C5 ? ? ? ?

C3 ?

C3 C4 ?

C4 Dissimilarity Matrix

C2 U C5

...
p1 p2 p3 p4 p9 p10 p11 p12
HOW TO DEFINE INTER-CLUSTER SIMILARITY

p1 p2 p3 p4 p5 ...

Similarity? p1

p2
p3

p4
 MIN p5
 MAX .
.
 Group Average
 Distance Between Centroids
.
Dissimilarity Matrix
 Other methods driven by an objective
function
HOW TO DEFINE INTER-CLUSTER SIMILARITY

p1 p2 p3 p4 p5 ...

p2
p3

p4
 MIN p5
 MAX .
 Group Average .
. Dissimilarity Matrix
 Distance Between Centroids
 Other methods driven by an objective function
HOW TO DEFINE INTER-CLUSTER SIMILARITY

p1 p2 p3 p4 p5 ...

p2
p3

p4
 MIN p5
 MAX .
 Group Average .
. Dissimilarity Matrix
 Distance Between Centroids
 Other methods driven by an objective function
HOW TO DEFINE INTER-CLUSTER SIMILARITY

p1 p2 p3 p4 p5 ...

p2
p3

p4
 MIN p5
 MAX .
 Group Average .
. Dissimilarity Matrix
 Distance Between Centroids
 Other methods driven by an objective function
HOW TO DEFINE INTER-CLUSTER SIMILARITY

p1 p2 p3 p4 p5 ...

  p2
p3

p4
 MIN p5
 MAX .
 Group Average .
. Dissimilarity Matrix
 Distance Between Centroids
 Other methods driven by an objective function
MIN AND MAX
MIN or Single Link • MAX or Complete Linkage
• Similarity of two clusters is based on • Similarity of two clusters is based on
the two most similar (closest) points the two least similar (most distant)
in the different clusters points in the different clusters
• Determined by one pair of points, i.e., by • Determined by all pairs of points in the
one link in the dissimilarity graph. two clusters
• Using MIN OPERATOR (minimum • Using MAX OPERATOR (maximum
value) value)

Example: Example:
dist({3,6},{2,5}) dist({3,6},{2,5})
= min (dist(3,2), dist(6,2), dist(3,5), dist(6,5)) = max (dist(3,2), dist(6,2), dist(3,5), dist(6,5))
= min (0.15,0.25,0.28,0.39) = max (0.15,0.25,0.28,0.39)
= 0.15 = 0.39
CLUSTER SIMILARITY: MIN OR SINGLE LINK

• Similarity of two clusters is based on

the two most similar (closest) points in
the different clusters
• Determined by one pair of points, i.e., by
one link in the dissimilarity graph.

1 2 3 4 5
CLUSTER SIMILARITY: MIN OR SINGLE LINK
Point x coordinate y coordinate
p1 0.40 0.53
p2 0.22 0.38 Numeric
p3 0.35 0.32 Attributes
p4 0.26 0.19
p5 0.08 0.41
p6 0.45 0.30

p1 p2 p3 p4 p5 p6
p1 0.00
p2 0.24 0.00
p3 0.22 0.15 0.00
p4 0.37 0.20 0.15 0.00
p5 0.34 0.14 0.28 0.29 0.00
√ 2
𝑑 ( 𝑝 1 , 𝑝 2 ) = ( 𝑥1 − 𝑥 2 ) + ( 𝑦 1 − 𝑦 2 )
2

p6 0.23 0.25 0.11 0.22 0.39 0.00

𝑑 ( 𝑝 1 , 𝑝 2 ) =√ ( 0 . 40 − 0 . 22 ) + ( 0 . 53 −0 . 38 )
2 2
CLUSTER SIMILARITY: MIN OR SINGLE LINK
• Step 1 – find 2 shortest distances from the dissimilarity matrix become
the first 2 clusters
cluster (p3, p6) and cluster (p2,p5)

p1 p2 p3 p4 p5 p6
Draw the dendrogram
p1 0.00 0.2
p2 0.24 0.00
p3 0.22 0.15 0.00 0.15

p4 0.37 0.20 0.16 0.00

0.1
p5 0.34 0.14 0.28 0.29 0.00
p6 0.23 0.25 0.11 0.22 0.39 0.00 0.05

0
3 6 2 5 4 1
CLUSTER SIMILARITY: MIN OR SINGLE LINK

1 O Step 2 – form the next cluster by finding the point

with the minimum distance to the other points in
the existing clusters, so update the matrix with two
5
2 1 newly formed clusters
2 3 6 Update dissimilarity matrix
p1 P(2,5) P(3,6) p4
p1 0.00
4
P(2,5) ??? 0.00
P(3,6) ??? ??? 0.00
p4 0.37 ??? ??? 0.00
CLUSTER SIMILARITY: MIN OR SINGLE LINK
Distance Single link
dist({3,6},{2,5}) = min (dist(3,2), dist(6,2), dist(3,5), dist(6,5)) p1 P(2,5) P(3,6) p4
= min (0.15,0.25,0.28,0.39) p1 0.00
= 0.15 P(2,5) 0.24 0.00
dist({3,6},{1}) = min (dist(3,1), dist(6,1)) P(3,6) 0.22 0.15 0.00
= min (0.22,0.23)
= 0.22 p4 0.37 0.20 0.16 0.00
dist({3,6},{4}) = min (dist(3,4), dist(6,4))
= min (0.16,0.22)
= 0.16 The closest clusters are
dist({2,5},{1}) = min (dist(2,1), dist(5,1)) cluster {3,6} and cluster
= min (0.24,0.34) {2,5} with 0.15, so the
= 0.24 clusters are merged.
dist({2,5},{4}) = min (dist(2,4), dist(5,4))
= min (0.20,0.29)
= 0.20
CLUSTER SIMILARITY: MIN OR SINGLE LINK

0.2

Draw the 0.15

dendrogram
0.1

0.05

0
3 6 2 5 4 1
CLUSTER SIMILARITY: MIN OR SINGLE LINK
O Step 3 – form the next cluster by finding the point p1 P(2,5),(3,6)
P(2,5),(3,6) p4
p4
with the minimum distance to the other points in p1
p1 0.00
0.00
the existing clusters; P(2,5),(3,6) ??? 0.00
0.00
P(2,5),(3,6) 0.22
p4
p4 0.37 0.16
0.37 ??? 0.00
0.00
Distance Single link
dist({{3,6},{2,5}}, 4) = min (dist{(3,6), 4}, dist{(2,5), 4}) Updated matrix
= min (0.16,0.20)
= 0.16
dist({{3,6},{2,5}}, 1) = min (dist{(3,6), 1}, dist{(2,5), 1}) 0.2
= min (0.22,0.24)
= 0.22 0.15

0.1

0.05
Draw the dendogram
0
3 6 2 5 4 1
CLUSTER SIMILARITY: MIN OR SINGLE LINK
O Step 4 – form the next cluster by finding the distance p1 P((2,5),(3,6)),4
to the existing clusters; p1 0.00

P((2,5),(3,6)),4 ???
0.22 0.00
Distance Single link
Updated matrix
dist({{{3,6},{2,5}}, 4},1) = min ((dist({{3,6},{2,5}}, 1), dist(4,1))
= min (0.22,0.37)
= 0.22 0.2

0.15

0.1
Draw the dendogram
0.05

0
3 6 2 5 4 1

P(((2,5),(3,6)),4),1
Final matrix P(((2,5),(3,6)),4),1 0 Terminate
CLUSTER SIMILARITY: MIN OR SINGLE LINK

Nested Clusters Dendrogram

5
1 0.2
3
0.15
5
2 1 0.1

2 3 6 0.05

0
3 6 2 5 4 1
4
4
CLUSTER SIMILARITY: MAX OR COMPLETE LINKAGE

• Similarity of two clusters is based on the two least similar (most

distant) points in the different clusters
• Determined by all pairs of points in the two clusters

Calculation is using the maximum

but selection to merge is based on
the minimum

1 2 3 4 5
CLUSTER SIMILARITY: MAX OR COMPLETE LINKAGE

• Step 1 – find 2 shortest distance from Draw the dendogram

the dissimilarity table to become the 0.2

first 2 clusters (p3, p6), (p2,p5) 0.15

p1 p2 p3 p4 p5 p6 0.1

p1 0.00
0.05
p2 0.24 0.00
p3 0.22 0.15 0.00 0
3 6 2 5 4 1

p4 0.37 0.20 0.16 0.00

p5 0.34 0.14 0.28 0.29 0.00 p1 P(2,5) P(3,6) p4
p6 0.23 0.25 0.11 0.22 0.39 0.00 p1 0.00
P(2,5) XX 0.00
P(3,6) XX XX 0.00
UPDATE MATRIX p4 0.37 XX XX 0.00
CLUSTER SIMILARITY: MAX OR COMPLETE LINKAGE

Points Distance
dist({3,6},{2,5}) = max (dist(3,2), dist(6,2), dist(3,5), dist(6,5)) p1 P(2,5) P(3,6) p4
= max (0.15,0.25,0.28,0.39) p1 0.00
= 0.39
dist({3,6},{1}) = max (dist(3,1), dist(6,1)) P(2,5) 0.34 0.00
= max (0.22,0.23) P(3,6) 0.23 0.39 0.00
= 0.23
p4 0.37 0.29 0.22 0.00
dist({3,6},{4}) = max (dist(3,4), dist(6,4))
= max (0.16,0.22)
= 0.22 UPDATED MATRIX
dist({2,5},{1}) = max (dist(2,1), dist(5,1))
= max (0.24,0.34)
= 0.34
dist({2,5},{4}) = max (dist(2,4), dist(5,4))
= max (0.20,0.29)
= 0.29
CLUSTER SIMILARITY: MAX OR COMPLETE LINKAGE

O Step 2 – form the next cluster by finding the point with the
minimum distance to the other points in the existing
clusters;
 newly formed cluster {(3,6), 4}
Draw dendrogram
p1 P(2,5) P(3,6) p4 0.4

p1 0.00 0.35

0.3
P(2,5) 0.34 0.00 0.25

P(3,6) 0.23 0.39 0.00 0.2

0.15
p4 0.37 0.29 0.22 0.00 0.1

0.05

0
3 6 4 1 2 5
CLUSTER SIMILARITY: MAX OR COMPLETE LINKAGE

UPDATE MATRIX Points Distance

p1 P(2,5) P(3,6), 4 dist({{3,6},{4}}, = max (dist{(3,6), (2,5)}, dist{4, (2,5)})
{2,5}) = max (0.39,0.29)
= 0.39
p1 0.00
dist({{3,6},{4}}, 1) = max (dist{(3,6), 1}, dist{4, 1})
P(2,5) 0.34
XXX 0.00 = max (0.37,0.37)
P(3,6),4 0.37
XXX 0.39
XXX 0.00 = 0.37
dist({{2,5}, 1) = 0.34

O Step 3 – form the next cluster by 0.4

0.35
finding the point with the Draw the dendogram 0.3

minimum distance to the other 0.25

0.2

points in the existing clusters; 0.15

0.1

 newly formed cluster {(2,5), 1} 0.05

0
3 6 4 1 2 5
CLUSTER SIMILARITY: MAX OR COMPLETE LINKAGE

O Step 4 – form the next cluster by finding the distance to the other points in the
existing clusters;  newly formed cluster {{{2,5}, 1},{{3,6},4}}
P(2,5),1 P(3,6), 4 Points Distance
dist({{3,6},{4}}, = max (dist{(3,6),4}, (2,5)}, dist{(3,6),4}, 1}})
P(2,5),1 0.00 {{2,5},1}) = max (0.39,0.37)
P(3,6),4 0.39 0.00 = 0.39

0.4
UPDATED MATRIX
0.35

0.3 P({{3,6},{4}},
0.25
{{2,5},1})
Draw the dendrogram 0.2 P({{3,6},{4}}, {{2,5},1}) 0
0.15

0.1

0.05

0
3 6 4 1 2 5 Terminate
CLUSTER SIMILARITY: MAX OR COMPLETE LINKAGE

Nested Clusters

Dendrogram
4 1
0.4
2 5 0.35

5 0.3

2 0.25

0.2

3 6
0.15

0.1
3
1
0.05

0
3 6 4 1 2 5

4
HIERARCHICAL CLUSTERING: COMPARISON
Single Link OR MIN Complete Link OR MAX
5
1 4
3 1
2 5
5
2 1 5
2
2 3 6 3 6
3
1
4
4 4

0.4

0.2 0.35

0.3
0.15 0.25

0.2
0.1
0.15

0.1
0.05
0.05

0 0
3 6 2 5 4 1 3 6 4 1 2 5
SUMMARY

• Cluster analysis groups objects based on their similarity/dissimilarity

and has wide applications
• Measure of dissimilarity can be computed for various types of data:
nominal, binary, ordinal, numeric, textual
• Clustering algorithms discussed are categorized into partitioning
methods (k means) and hierarchical methods.
References

1. Jiawei Han and Micheline Kamber, Data Mining: Concepts and

Techniques, 3rd Edition, Morgan Kaufmann, 2012.

2. Pang-Ning Tan, Michael Steinbach & Vipin Kumar, Introduction to Data

Mining, Addison Wesley, 2019.
THANK YOU
Shuzlina Abdul Rahman | Sofianita Mutalib | Siti Nur Kamaliah Kamarudin

BMW NBT Idrive. For Retrofit in Ibus Cars (BMW E38, E39, E46, E53, E83, Range Rover L332)
No ratings yet
BMW NBT Idrive. For Retrofit in Ibus Cars (BMW E38, E39, E46, E53, E83, Range Rover L332)
32 pages
6 - KNN Classifier
No ratings yet
6 - KNN Classifier
10 pages
Marketing Test Mark Scheme
100% (1)
Marketing Test Mark Scheme
3 pages
K-Means Clustering Algorithm
No ratings yet
K-Means Clustering Algorithm
13 pages
Building Recommendation System Using Movielens Data
No ratings yet
Building Recommendation System Using Movielens Data
6 pages
Introduction For Assembly and Disassembly of PC
86% (7)
Introduction For Assembly and Disassembly of PC
2 pages
A Beginner's Guide To Hierarchical Clustering
No ratings yet
A Beginner's Guide To Hierarchical Clustering
23 pages
U20cs604 Machine Learning Unit III
No ratings yet
U20cs604 Machine Learning Unit III
23 pages
Customer Purchasing Behavior Prediction Using Machine Learning Classification Techniques
No ratings yet
Customer Purchasing Behavior Prediction Using Machine Learning Classification Techniques
26 pages
Hierarchical Clustering in Unsupervised Learning
No ratings yet
Hierarchical Clustering in Unsupervised Learning
9 pages
Cluster Analysis: Abu Bashar
No ratings yet
Cluster Analysis: Abu Bashar
18 pages
7.introduction To Clustering
No ratings yet
7.introduction To Clustering
11 pages
PRODUCT
100% (1)
PRODUCT
34 pages
Text Categorization and Classification
No ratings yet
Text Categorization and Classification
13 pages
Ppt-Chap 1
No ratings yet
Ppt-Chap 1
34 pages
HRM in Retail: Dr. Parveen Nagpal
No ratings yet
HRM in Retail: Dr. Parveen Nagpal
23 pages
Generative AI IIT Patna 6e3dcde9a4
0% (1)
Generative AI IIT Patna 6e3dcde9a4
13 pages
Unit 3 Uml Diagrams
No ratings yet
Unit 3 Uml Diagrams
86 pages
Data Visualization With Python For Beginners
No ratings yet
Data Visualization With Python For Beginners
302 pages
Uml Finalized
No ratings yet
Uml Finalized
46 pages
Recommendation System in Python
No ratings yet
Recommendation System in Python
13 pages
AMR Iimr 2019 For Students
100% (1)
AMR Iimr 2019 For Students
273 pages
Telecommunication Customer Churn (New)
100% (1)
Telecommunication Customer Churn (New)
23 pages
Vinay Deokar: Core Competencies
No ratings yet
Vinay Deokar: Core Competencies
2 pages
Data Transformation and Arima Models A S
No ratings yet
Data Transformation and Arima Models A S
8 pages
Elasticity and Its Application
No ratings yet
Elasticity and Its Application
33 pages
Suhail Bamzena CV
No ratings yet
Suhail Bamzena CV
4 pages
Time Series and Index
No ratings yet
Time Series and Index
27 pages
Hierarchical Diagram of The Members of The Company
No ratings yet
Hierarchical Diagram of The Members of The Company
2 pages
Cdma Technology Seminar Report
No ratings yet
Cdma Technology Seminar Report
24 pages
Machine Learning in Genomics Medicine
No ratings yet
Machine Learning in Genomics Medicine
22 pages
Module4 Written Manual v04
No ratings yet
Module4 Written Manual v04
79 pages
Gain More Knowledge Reach Greater Heights Presidency College, Kempapura, Hebbal, BANGALORE-560024
No ratings yet
Gain More Knowledge Reach Greater Heights Presidency College, Kempapura, Hebbal, BANGALORE-560024
144 pages
Wharton - Business Analytics - Week 6 - Summary Transcripts
No ratings yet
Wharton - Business Analytics - Week 6 - Summary Transcripts
12 pages
An Introduction To Data Mining
No ratings yet
An Introduction To Data Mining
47 pages
Online BSC Degree and Diploma Program: Programming and Data Science
No ratings yet
Online BSC Degree and Diploma Program: Programming and Data Science
16 pages
Augmented Analytics
No ratings yet
Augmented Analytics
8 pages
Univariate, Bivariate and Multivariate Methods in Corpus-Based Lexicography - A Study of Synonymy
100% (1)
Univariate, Bivariate and Multivariate Methods in Corpus-Based Lexicography - A Study of Synonymy
614 pages
Rajesh Kumar Ray: Resume of
No ratings yet
Rajesh Kumar Ray: Resume of
4 pages
IC Digital Marketing Campaign Report Template
100% (1)
IC Digital Marketing Campaign Report Template
6 pages
Book Presentation
No ratings yet
Book Presentation
43 pages
Arecaunut Classification Report Final Yolo Based
No ratings yet
Arecaunut Classification Report Final Yolo Based
35 pages
Data Mining
100% (3)
Data Mining
18 pages
Weka Tutorial
No ratings yet
Weka Tutorial
2 pages
Machine Learning - Customer Segment Project. Approved by UDACITY
100% (1)
Machine Learning - Customer Segment Project. Approved by UDACITY
19 pages
000+ +curriculum+ +Complete+Data+Science+and+Machine+Learning+Using+Python
No ratings yet
000+ +curriculum+ +Complete+Data+Science+and+Machine+Learning+Using+Python
10 pages
Data Mining in Medicine
No ratings yet
Data Mining in Medicine
42 pages
Group 4 Managing The Distribution Channels For High Technology Products
No ratings yet
Group 4 Managing The Distribution Channels For High Technology Products
32 pages
Text Analytics Notes
No ratings yet
Text Analytics Notes
12 pages
Data Mining Project Shivani Pandey
100% (1)
Data Mining Project Shivani Pandey
40 pages
Data Collection
No ratings yet
Data Collection
65 pages
Logistic Regression
0% (1)
Logistic Regression
71 pages
3 - The Data Science Method
No ratings yet
3 - The Data Science Method
8 pages
The Big Book of Machine Learning Use Case
100% (1)
The Big Book of Machine Learning Use Case
75 pages
Statistics in Marketing
No ratings yet
Statistics in Marketing
7 pages
Statistics I
100% (2)
Statistics I
686 pages
The Power of Prediction in Health Care: A Step-by-step Guide to Data Science in Health Care
From Everand
The Power of Prediction in Health Care: A Step-by-step Guide to Data Science in Health Care
Rafiq Muhammad
No ratings yet
(Excerpts From) Investigating Performance: Design and Outcomes With Xapi
From Everand
(Excerpts From) Investigating Performance: Design and Outcomes With Xapi
Janet Laane Effron
No ratings yet
Optimizing Hadoop for MapReduce
From Everand
Optimizing Hadoop for MapReduce
Khaled Tannir
No ratings yet
Hierarchical Clustering
No ratings yet
Hierarchical Clustering
32 pages
Topic 6e - Hierarchical Clustering (MIN)
No ratings yet
Topic 6e - Hierarchical Clustering (MIN)
14 pages
Clustering Hierarchical PDF
No ratings yet
Clustering Hierarchical PDF
31 pages
22
No ratings yet
22
35 pages
Ticket
No ratings yet
Ticket
3 pages
Analisis Dan Perancangan Sistem Informasi Kesiswaan
No ratings yet
Analisis Dan Perancangan Sistem Informasi Kesiswaan
20 pages
A222 LabTest2-MIT App Inventor
No ratings yet
A222 LabTest2-MIT App Inventor
9 pages
Test Organization
No ratings yet
Test Organization
7 pages
حساس ليجراند
No ratings yet
حساس ليجراند
32 pages
1411 Release - OneVoice - Q4 Release
No ratings yet
1411 Release - OneVoice - Q4 Release
47 pages
CST2355 Lab02aIS
No ratings yet
CST2355 Lab02aIS
14 pages
Research Methodology MCQ (Multiple Choice Questions) - Javatpoint
No ratings yet
Research Methodology MCQ (Multiple Choice Questions) - Javatpoint
1 page
1613 IptDatalogging V4 E
No ratings yet
1613 IptDatalogging V4 E
4 pages
Itcsiu21194 Os Lab 10
No ratings yet
Itcsiu21194 Os Lab 10
27 pages
Kisi-Kisi Kelas 1 (English)
No ratings yet
Kisi-Kisi Kelas 1 (English)
3 pages
UX & Axure RP 8 - Training Session 2021
No ratings yet
UX & Axure RP 8 - Training Session 2021
33 pages
Dakota Bed 187X210X99: Version:05/31/2017
No ratings yet
Dakota Bed 187X210X99: Version:05/31/2017
19 pages
Ai Class 10
No ratings yet
Ai Class 10
78 pages
Anokha Soundz & Lites - Hinduja College Main Stage
No ratings yet
Anokha Soundz & Lites - Hinduja College Main Stage
2 pages
Olivinox 500 Continuous Line OFFER
No ratings yet
Olivinox 500 Continuous Line OFFER
1 page
2nd Periodical Test MAPEH
No ratings yet
2nd Periodical Test MAPEH
9 pages
PLC - Bom
No ratings yet
PLC - Bom
2 pages
IT Networking and Communication: MITS4004
No ratings yet
IT Networking and Communication: MITS4004
15 pages
Introduction To Algorithms 3rd Edition by Thomas H Cormen
100% (1)
Introduction To Algorithms 3rd Edition by Thomas H Cormen
316 pages
Information Theory Approach
No ratings yet
Information Theory Approach
4 pages
Exploit Reversing
No ratings yet
Exploit Reversing
109 pages
How Does Metaverse Affect The Tourism Industry Current Practices and Future Forecasts
No ratings yet
How Does Metaverse Affect The Tourism Industry Current Practices and Future Forecasts
16 pages
ER Modeling
No ratings yet
ER Modeling
74 pages
Micro Project
No ratings yet
Micro Project
15 pages
Computer Science. Lecture 19-2
No ratings yet
Computer Science. Lecture 19-2
36 pages
Nvidia Corporation: A Strategic Audit: Digitalcommons@University of Nebraska - Lincoln
No ratings yet
Nvidia Corporation: A Strategic Audit: Digitalcommons@University of Nebraska - Lincoln
14 pages

Topic 6d - Hierarchical Algorithm

Uploaded by

Topic 6d - Hierarchical Algorithm

Uploaded by

TOPIC 6 – PART D

• Produces a set of nested clusters organized as a hierarchical tree

• Use distance matrix as clustering criteria. This method does not

 Two main types of hierarchical clustering

 Traditional hierarchical algorithms use a similarity or distance matrix

 More popular hierarchical clustering technique

 Key operation is the computation of the dissimilarity of two clusters

A clustering of the data objects is

• After some merging steps, c1

(C2 and C5) and update the dissimilarity c2

• Similarity of two clusters is based on

p6 0.23 0.25 0.11 0.22 0.39 0.00

p4 0.37 0.20 0.16 0.00

1 O Step 2 – form the next cluster by finding the point

Draw the 0.15

Nested Clusters Dendrogram

• Similarity of two clusters is based on the two least similar (most

Calculation is using the maximum

• Step 1 – find 2 shortest distance from Draw the dendogram

first 2 clusters (p3, p6), (p2,p5) 0.15

p4 0.37 0.20 0.16 0.00

P(3,6) 0.23 0.39 0.00 0.2

UPDATE MATRIX Points Distance

O Step 3 – form the next cluster by 0.4

minimum distance to the other 0.25

points in the existing clusters; 0.15

 newly formed cluster {(2,5), 1} 0.05

• Cluster analysis groups objects based on their similarity/dissimilarity

1. Jiawei Han and Micheline Kamber, Data Mining: Concepts and

2. Pang-Ning Tan, Michael Steinbach & Vipin Kumar, Introduction to Data

You might also like