0% found this document useful (0 votes)
5 views6 pages

Hierarchical Clustering

Hierarchical clustering is a method that organizes data into a tree-like structure by merging or splitting clusters without needing to predefine the number of clusters. It has advantages such as flexibility and the ability to create a hierarchy, but is computationally expensive and sensitive to noise. Different linkage methods, including single, complete, average, and Ward's linkage, offer various approaches to cluster formation, each with its own strengths and limitations.

Uploaded by

Rana Ben Fraj
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
5 views6 pages

Hierarchical Clustering

Hierarchical clustering is a method that organizes data into a tree-like structure by merging or splitting clusters without needing to predefine the number of clusters. It has advantages such as flexibility and the ability to create a hierarchy, but is computationally expensive and sensitive to noise. Different linkage methods, including single, complete, average, and Ward's linkage, offer various approaches to cluster formation, each with its own strengths and limitations.

Uploaded by

Rana Ben Fraj
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 6

Hierarchical Clustering

Definition:
Hierarchical clustering is a method that builds a tree-like structure
(dendrogram) of clusters by either merging smaller clusters into larger ones
(agglomerative) or splitting larger clusters into smaller ones (divisive). It does
not require the number of clusters to be specified in advance.

It’s like building a family tree: you start with individuals and group them into
families, then combine families into bigger groups like tribes or regions.

Advantages (+):
No need to predefine clusters: You don't need to specify the number of clusters beforehand.

Flexible: Works with any distance metric, and can be used with different linkage methods (like single,
complete, or centroid).

Creates a hierarchy: It gives you a tree (dendrogram) to show how clusters are merged, so you can choose
the level of clustering.

Disadvantages (-):
Computationally expensive: It can be slow, especially with large datasets.

Sensitive to noise and outliers: It might incorrectly group noisy or outlying data points.

Doesn't scale well: It’s less efficient with very large datasets compared to other methods.

--> In short, Agglomerative Clustering is great for flexibility and hierarchical analysis but can be slow and
sensitive to noise.

Hierarchical Clustering 1
Agglomerative Clustering (Bottom-Up):

1. Start with each data point as its own cluster, compute distance between all
points.(Imagine every person starts alone.)

2. Merge the two closest clusters.(You keep pairing the most similar people
into groups.)

3. Update linkage distance matrix based on linkage criterion:

Single Linkage (Nearest Neighbor) : The distance between two groups is the shortest
distance between any two points in the groups.

Finds the closest pair of points in two clusters; good for detecting elongated
shapes but prone to chaining.

Example: In a dataset, single linkage might connect two clusters just


because one point from each cluster is very close.

Hierarchical Clustering 2
The distance between two groups is the
Complete Linkage (Farthest Neighbor): * longest distance between any two points in
the groups.

Finds the farthest pair of points in two clusters; good for compact clusters.
The distance between two groups is the average of all the distances
Average Linkage: * between points in the first group and points in the second group.

Balances single and complete linkage, often giving more natural groupings.

Ward’s Linkage: This method tries to merge groups in a way that minimizes the increase in
"variance" (spread of points). It tries to keep the new group as compact as
possible.

Definition: The distance between two clusters is the increase in the total
within-cluster variance that results from merging them.

Minimize total within cluster variance

Focuses on compactness and minimizing variance.

This method is part of hierarchical clustering but focuses on keeping


clusters compact.(It tries to make the groups as tight and neat as possible.)

Hierarchical Clustering 3
It merges clusters that increase the overall cluster variance the least.
(Basically, it looks for the two groups that "fit" together best.)

Formula:
d(C1,C2)=Increase in variance when C1 and C2 are merged.

Behavior: Produces compact, spherical clusters by minimizing the variance


within clusters. It is often considered the most robust linkage method.

Advantages:

Creates well-separated, tight clusters. (The groups it forms look neat and
logical.)

Limitations:

Not good for elongated or irregularly shaped clusters. (It assumes clusters
are compact and round.)

Example:
If you’re grouping cities based on population and income, Ward’s method will
ensure each group has cities that are closely related:

Cluster 1: Small towns.

Cluster 2: Medium-sized cities.

Cluster 3: Large metropolitan areas.(Each cluster will be compact and easy


to understand.)

Centroid Linkage: measures the distance between the centers (centroids) of two groups. The
centroid is simply the average position of all points in the group.

Uses the average position (centroid) of clusters; Distance between cluster


means / Simple but less robust.

Hierarchical Clustering 4
Divisive Clustering (Top-Down):

1. Start with all data points in one cluster.(Start with everyone in one big
group.)

2. Split the biggest cluster into smaller groups.(Divide it into smaller groups
step by step.)

3. Stops when each object form a culster or until it specifies certain


conditions.

Advantages:
You don’t need to know the number of clusters beforehand. (You can
decide after looking at the hierarchy.)

Shows relationships clearly in a dendrogram. (It’s like a family tree for your
data.)

Limitations:
Computationally expensive for large datasets. (Takes a long time with too
many data points.)

Can’t reassign points to different clusters later. (Once a point joins a group,
it’s stuck there.)

Hierarchical Clustering 5
Worked example :

https://www.youtube.com/watch?v=8QCBl-xdeZI&ab_channel=DATAtab

Single Linkage:
Good for non-elliptical shapes (long or stretched clusters).
Sensitive to noise and outliers (can be easily affected by weird points).

Complete Linkage:
Less affected by noise and outliers.
Can break larger clusters into smaller ones.
Works best with globular (round) shapes.

Group Average (Average Linkage):


An intermediate approach between Single and Complete Linkage.
Less sensitive to outliers than Single Linkage.
Doesn’t break big clusters as much as Complete Linkage.
Uses the average distance between points in two groups to decide merging.

Hierarchical Clustering 6

You might also like