0% found this document useful (0 votes)
43 views

Data Mining Assignment

Uploaded by

nitob90303
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
43 views

Data Mining Assignment

Uploaded by

nitob90303
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 24

AGGLOMERATIVE HIERARCHICAL

CLUSTERNG
Data Mining Assignment

SUBMITTED BY:
Khushal Rastogi (2k19/CS/60)
Ques. 6 TO 10.
Digvijay Nayal (2k19/CS/53)
Ques. 1 TO 5.
B.Sc. Hons. Computer Science
INTRODUCTION

Agglomerative Clustering is a type of hierarchical clustering algorithm. It is an


unsupervised machine learning technique that divides the population into
several clusters such that data points in the same cluster are more similar and
data points in different clusters are dissimilar.

• Points in the same cluster are closer to each other.

• Points in the different clusters are far apart.

In the above sample 2-dimension dataset, it is visible that the dataset forms 3
clusters that are far apart, and points in the same cluster are close to each other.

Steps of Agglomerative Clustering:

1. Initially, all the data-points are a cluster of its own.

2. Take two nearest clusters and join them to form one single cluster.
3. Proceed recursively step 2 until you obtain the desired number of
clusters.

How to join two clusters to form one cluster?

To obtain the desired number of clusters, the number of clusters needs to be


reduced from initially being n cluster (n equals the total number of data-points).
Two clusters are combined by computing the similarity between them.

There are some methods which are used to calculate the similarity between two
clusters:

• (Single Linkage) Distance between two closest points in two clusters.

• (Complete Linkage) Distance between two farthest points in two


clusters.

• (Average Linkage) The average distance between all points in the two
clusters.

• (Centroid Linkage) Distance between centroids of two clusters.

Dendrograms are a diagrammatic representation of the hierarchical


relationship between the data-points. It illustrates the arrangement of the
clusters produced by the corresponding analyses and is used to observe the
output of hierarchical (agglomerative) clustering.

How to find the optimal number of clusters by observing the


dendrograms:
From the above dendrogram plot, find a horizontal rectangle with
max-height that does not cross any horizontal vertical dendrogram
line.

The portion in the dendrogram in which rectangle having the max-


height can be cut, and the optimal number of clusters will be 3 as
observed in the right part of the above image. Max height rectangle is
chosen because it represents the maximum Euclidean distance
between the optimal number of clusters.
Questions Based on Agglomerative Clustering
Question 1: Find the clusters using single link technique. Use Euclidean distance and draw the
dendrogram.

X Y
P1 0.40 0.53
P2 0.22 038
P3 0.35 0.32
P4 0.26 0.19
P5 0.08 0.41
P6 0.45 0.30

Answer:
P1 P2 P3 P4 P5 P6

P1 0.00 0.24 0.22 0.37 0.34 0.23

P2 0.24 0.00 0.15 0.20 0.14 0.25

P3 0.22 0.15 0.00 0.15 0.28 0.11

P4 0.37 0.20 0.15 0.00 0.29 0.22

P5 0.34 0.14 0.28 0.29 0.00 0.39

P6 0.23 0.25 0.11 0.22 0.39 0.00

P1 P2 P3, P6 P4 P5

P1 0 0.24

P2 0.24 0

P3, P6 0.22 0.15 0

P4 0.37 0.20 0.15 0

P5 0.34 0.14 0.28 0.29 0


P1 P2, P5 P3, P6 P4

P1 0.00

P2, P5 0.24 0.00

P3, P6 0.22 0.15 0.00

P4 0.37 0.20 0.15 0.00

P1 (P2, P5), (P3, P6) P4

P1 0

(P2, P5), (P3, P6) 0.22 0

P4 0.37 0.15 0

P1 P1, P2, P3, P4, P5

P1 0

P1, P2, P3, P4, P5 0.22 0


Question 2: Find the clusters using link technique. Use Euclidean distance and draw the
dendrogram.

X Y
P1 0.40 0.53
P2 0.22 038
P3 0.35 0.32
P4 0.26 0.19
P5 0.08 0.41
P6 0.45 0.30

Answer:
P1 P2 P3 P4 P5 P6

P1 0.00

P2 0.24 0.00

P3 0.22 0.15 0.00

P4 0.37 0.20 0.15 0.00

P5 0.34 0.14 0.28 0.29 0.00

P6 0.23 0.25 0.11 0.22 0.39 0.00

P1 P2 P3, P6 P4 P5

P1 0.00

P2 0.24 0.00

P3, P6 0.22 0.15 0.00

P4 0.37 0.20 0.15 0.00

P5 0.34 0.14 0.28 0.29 0.00


P1 P2, P5 P3, P6 P4

P1 0.00

P2, P5 0.24 0.00

P3, P6 0.22 0.39 0.00

P4 0.37 0.29 0.22 0.00

P1 (P2, P5) (P3, P6), P4

P1 0

(P2, P5), (P3, P6) 0.34 0

P4 0.37 0.39 0
Question 3: Use the distance matrix to perform complete link hierarchical clustering. Show your
results by drawing a dendrogram. The dendrogram should clearly show the order in which points are
merged.

P1 P2 P3 P4 P5

P1 0.00 0.10 0.41 0.55 0.35

P2 0.10 0.00 0.64 0.47 0.98

P3 0.41 0.64 0.00 0.44 0.85

P4 0.55 0.47 0.44 0.00 0.76

P5 0.35 0.98 0.85 0.76 0.00

Answer:
P1, P2 P3 P4 P5

P1, P2 0.00

P3 0.64 0.00

P4 0.65 0.44 0.00

P5 0.48 0.85 0.76 0.00

P1, P2 P3, P4 P5

P1, P2 0.00

P3, P4 0.65 0.00

P5 0.98 0.85 0.00


P1, P2, P3, P4 P5

P1, P2, P3, P4 0.00

P5 0.98 0.00

Question 4: Use the distance matrix to perform complete single link hierarchical clustering. Show
your results by drawing a dendrogram. The dendrogram should clearly show the order in which points
are merged.
P1 P2 P3 P4 P5

P1 0.00 0.10 0.41 0.55 0.35

P2 0.10 0.00 0.64 0.47 0.98

P3 0.41 0.64 0.00 0.44 0.85

P4 0.55 0.47 0.44 0.00 0.76

P5 0.35 0.98 0.85 0.76 0.00

Answer:
P1 P2 P3 P4 P5

P1 0.00

P2 0.40 0.00

P3 0.41 0.64 0.00

P4 0.65 0.49 0.44 0.00

P5 0.35 0.98 0.85 0.76 0.00

P1, P2 P3 P4 P5

P1, P2 0.00

P3 0.41 0.00

P4 0.47 0.44 0.00

P5 0.35 0.85 0.76 0.00


(P1, P2), P5 P3 P4

(P1, P2), P5 0.00

P3 0.41 0.00

P4 0.47 0.44 0.00

(P1, P2), P5, P3 P4

(P1, P2), P5, P3 0.00

P4 0.44 0.00
Question 5: Find the clusters using single link technique. draw the dendrogram.
Answer:
P1 P2 P3 P4 P5 P6 P7

P1 0

P2 40 .24 0

P3 25.19 15.52 0

P4 39 6.77 14.25 0

P5 45.46 5.03 80.28 8.55 0

P6 5.08 35.54 80.12 34 40.39 0

P7 21.18 19.48 4.01 18.15 24.28 16.11 0

P1 P2 P3,P7 P4 P5 P6

P1 0

P2 40 .24 0

P3,P7 21.18 15.52 0

P4 39 6.77 14.25 0

P5 45.46 5.03 80.28 8.55 0

P6 5.08 35.54 16.11 34 40.39 0


P1 P2,P5 P3,P7 P4 P6

P1 0

P2,P5 40 .24 0

P3,P7 21.18 15.52 0

P4 39 6.77 14.25 0

P6 5.08 35.54 16.11 34 0

P1,P6 P2,P5 P3,P7 P4

P1 0 P1,P6 P2,P5,P4 P3,P7

P2,P5 35.54 0 P1 0

P3,P7 16.11 15.52 0 P2,P5,P4 35.54 0

P4 34 6.77 14.25 0 P3,P7 16.11 14.25 0

Optimal no. of clusters = 3, which are (P3,P7), (P4,P2,P5) AND (P1,P6).


Question 6: Use the distance matrix to perform single linkage hierarchical clustering. Show your
results by drawing a dendrogram. The dendrogram should clearly show the order in which points are
merged.

X Y
1 4 4
2 8 4
3 15 8
4 24 4
5 24 12

Answer:
1 2 3 4 5

1 0

2 4.0 0

3 11.7 8.1 0

4 20.0 16.0 9.8 0

5 21.5 17.9 9.8 8.0 0

Pair(1,2) with distance = 4.0

12 3 4 5

12 0

3 8.1 0

4 16.0 9.8 0

5 17.9 9.8 8.0 0

Pair(4,5) with distance = 8.0

12 3 45
12 0

3 8.1 0

45 16.0 9.8 0

Pair(12,3) with distance = 8.1

12,3 45

12,3 0

45 9.8 0

Optimal No. of Clusters = 2. They are (1,2,3) and (4,5).


Let “k” represent the no. of clusters.
Therefore, For k=1, we have, (1,2,3,4,5).
For k=2, we have, (1,2,3) and (4,5).

For k=3, we have, (1,2), (3) and (4,5).


For k=4, we have, (1,2), (3), (4) and (5).
Question 7: Use the distance matrix to perform single linkage hierarchical clustering. Show your
results by drawing a dendrogram. The dendrogram should clearly show the order in which points are
merged.

Answer:
E A C B D

E 0

A 1 0

C 2 2 0

B 2 5 1 0

D 3 3 6 3 0

Pair(E,A) with distance = 1.

Now, Dist( EA, C) = MIN(EC, AC) = MIN(2,2) = 2

Dist( EA, B) = MIN(EB, AB) = MIN(2,5) = 2

Dist( EA, D) = MIN(ED, AD) = MIN(3,3) = 3

EA C B D

EA 0

C 2 0

B 2 1 0

D 3 6 3 0

Pair(B,C) with distance = 1.

Now, Dist( BC,EA) = MIN(B EA, C EA) = MIN(2,2) = 2

Dist( BC,D) = MIN(BD, CD) = MIN(3,6) = 3


EA BC D

EA 0

BC 2 0

D 3 3 0

Pair(EA,BC) with distance = 2.

Now, Dist( EA BC, D) = MIN(EA D, BC D) = MIN(3,3) = 3

EA,BC D

EA,BC 0

D 3 0

Optimal no. of clusters are either = 2 ; (A,B,C,E) AND (D) OR,

=3 ; (A,E), (BC) AND (D).

Question 8: Use the distance matrix to perform complete linkage hierarchical clustering. Show
your results by drawing a dendrogram. The dendrogram should clearly show the order in which points
are merged.
Answer:
E A C B D

E 0

A 1 0

C 2 2 0

B 2 5 1 0

D 3 3 6 3 0

Pair(E,A) with distance = 1.

Now, Dist( EA, C) = MAX(EC, AC) = MAX(2,2) = 2

Dist( EA, B) = MAX(EB, AB) = MAX(2,5) = 5

Dist( EA, D) = MAX(ED, AD) = MAX(3,3) = 3

EA C B D

EA 0

C 2 0

B 5 1 0

D 3 6 3 0

Pair(B,C) with distance = 1.

Now, Dist( BC,EA) = MAX(B EA, C EA) = MAX(2,5) = 5

Dist( BC,D) = MAX(BD, CD) = MAX(3,6) = 6

EA BC D

EA 0

BC 5 0

D 3 6 0
Pair(EA,D) with distance = 3.

Now, Dist( EA D, BC) = MAX(EA BC, D BC) = MAX(5, 6) = 6

EA,D BC

EA,D 0

BC 6 0

Optimal no. of clusters = 2 ; (A,D,E) AND (B,C) .

Question 9: Use the distance matrix to perform complete linkage hierarchical clustering. Show
your results by drawing a dendrogram. The dendrogram should clearly show the order in which points
are merged.

Answer:
A B C D E F

A 0

B 5 0

C 14 9 0

D 11 20 13 0

E 18 15 6 3 0

F 10 16 8 10 11 0

Pair(D,E).

A B C DE F

A 0

B 5 0

C 11 9 0

DE 18 20 13 0

F 10 16 8 11 0

Pair(A,B).

AB C DE F

AB 0

C 14 0

DE 20 13 0

F 16 8 11 0

Pair(C,F).

AB CF DE

AB 0

CF 16 0

DE 20 13 0
Pair(CF,DE)

AB CF,DE

AB 0

CF,DE 20 0

Optimal no. of clusters = 2 ; (A,B) AND (C,D,E,F).

Question 10: Use the distance matrix to perform single linkage hierarchical clustering. Show your
results by drawing a dendrogram. The dendrogram should clearly show the order in which points are
merged.

Answer:
A B C D E F

A 0

B 5 0

C 14 9 0

D 11 20 13 0

E 18 15 6 3 0

F 10 16 8 10 11 0

Pair(D,E).

A B C DE F

A 0

B 5 0

C 11 9 0

DE 11 15 6 0

F 10 16 8 10 0

Pair(A,B).

AB C DE F

AB 0

C 9 0

DE 11 6 0

F 10 8 10 0

Pair(C,DE).

AB C,DE F

AB 0

C,DE 9 0

F 10 8 0

Pair((C,DE),F).
AB (C,DE),F

AB 0

(C,DE),F 9 0

Optimal no. of clusters = 3 ; (A,B), (C,D,E) AND (F).

You might also like