Hierarchical Clustering
Hierarchical Clustering
Definition:
Hierarchical clustering is a method that builds a tree-like structure
(dendrogram) of clusters by either merging smaller clusters into larger ones
(agglomerative) or splitting larger clusters into smaller ones (divisive). It does
not require the number of clusters to be specified in advance.
It’s like building a family tree: you start with individuals and group them into
families, then combine families into bigger groups like tribes or regions.
Advantages (+):
No need to predefine clusters: You don't need to specify the number of clusters beforehand.
Flexible: Works with any distance metric, and can be used with different linkage methods (like single,
complete, or centroid).
Creates a hierarchy: It gives you a tree (dendrogram) to show how clusters are merged, so you can choose
the level of clustering.
Disadvantages (-):
Computationally expensive: It can be slow, especially with large datasets.
Sensitive to noise and outliers: It might incorrectly group noisy or outlying data points.
Doesn't scale well: It’s less efficient with very large datasets compared to other methods.
--> In short, Agglomerative Clustering is great for flexibility and hierarchical analysis but can be slow and
sensitive to noise.
Hierarchical Clustering 1
Agglomerative Clustering (Bottom-Up):
1. Start with each data point as its own cluster, compute distance between all
points.(Imagine every person starts alone.)
2. Merge the two closest clusters.(You keep pairing the most similar people
into groups.)
Single Linkage (Nearest Neighbor) : The distance between two groups is the shortest
distance between any two points in the groups.
Finds the closest pair of points in two clusters; good for detecting elongated
shapes but prone to chaining.
Hierarchical Clustering 2
The distance between two groups is the
Complete Linkage (Farthest Neighbor): * longest distance between any two points in
the groups.
Finds the farthest pair of points in two clusters; good for compact clusters.
The distance between two groups is the average of all the distances
Average Linkage: * between points in the first group and points in the second group.
Balances single and complete linkage, often giving more natural groupings.
Ward’s Linkage: This method tries to merge groups in a way that minimizes the increase in
"variance" (spread of points). It tries to keep the new group as compact as
possible.
Definition: The distance between two clusters is the increase in the total
within-cluster variance that results from merging them.
Hierarchical Clustering 3
It merges clusters that increase the overall cluster variance the least.
(Basically, it looks for the two groups that "fit" together best.)
Formula:
d(C1,C2)=Increase in variance when C1 and C2 are merged.
Advantages:
Creates well-separated, tight clusters. (The groups it forms look neat and
logical.)
Limitations:
Not good for elongated or irregularly shaped clusters. (It assumes clusters
are compact and round.)
Example:
If you’re grouping cities based on population and income, Ward’s method will
ensure each group has cities that are closely related:
Centroid Linkage: measures the distance between the centers (centroids) of two groups. The
centroid is simply the average position of all points in the group.
Hierarchical Clustering 4
Divisive Clustering (Top-Down):
1. Start with all data points in one cluster.(Start with everyone in one big
group.)
2. Split the biggest cluster into smaller groups.(Divide it into smaller groups
step by step.)
Advantages:
You don’t need to know the number of clusters beforehand. (You can
decide after looking at the hierarchy.)
Shows relationships clearly in a dendrogram. (It’s like a family tree for your
data.)
Limitations:
Computationally expensive for large datasets. (Takes a long time with too
many data points.)
Can’t reassign points to different clusters later. (Once a point joins a group,
it’s stuck there.)
Hierarchical Clustering 5
Worked example :
https://www.youtube.com/watch?v=8QCBl-xdeZI&ab_channel=DATAtab
Single Linkage:
Good for non-elliptical shapes (long or stretched clusters).
Sensitive to noise and outliers (can be easily affected by weird points).
Complete Linkage:
Less affected by noise and outliers.
Can break larger clusters into smaller ones.
Works best with globular (round) shapes.
Hierarchical Clustering 6