Unsupervised Learning Algorithm: Clustering
Md. Apu Hosen
Lecturer
Dept. of CSE, NUBTK
Clustering
• Clustering divides a data set into a set of clusters (groups) where instances
of the same cluster have high similarity while instances of different
clusters have high dissimilarity.
• The aim of the clustering process is to segregate groups with similar
traits and assign them into clusters.
• A cluster of data objects can be treated as one group.
• While doing cluster analysis, we first partition the set of data into groups
based on data similarity and then assign the labels to the groups.
Clustering
Types of Clustering
Broadly speaking, clustering can be divided into two subgroups:
• Hard Clustering: In this, each input data point either belongs to a
cluster completely or not.
• Soft Clustering: In this, instead of putting each input data point
into a separate cluster, a probability or likelihood of that data point
being in those clusters is assigned.
Types of Clustering Algorithm
There are different types of clustering algorithms that handle all kinds of
unique data. Some are,
▪ Centroid-based: These types of algorithms separate data points based on multiple
centroids in the data. Each data point is assigned to a cluster based on its squared
distance from the centroid. This is the most commonly used type of clustering.
▪ Hierarchical clustering: It is a type of clustering algorithm that creates a hierarchy
or tree-like structure of clusters. It groups data points into clusters in a way that
preserves a nested or hierarchical relationship among the clusters.
▪ Density Based: In density-based clustering, data is grouped by areas of high
concentrations of data points surrounded by areas of low concentrations of data
points. Basically, the algorithm finds the places that are dense with data points and
calls those clusters.
Clustering Algorithm
Based on the types of clustering, there are different clustering
algorithms. A suitable clustering algorithm helps in finding valuable
insights for the industry. Different types of clustering in machine
learning are,
I. K-means Clustering
II. Fuzzy C-means Clustering
III. Hierarchical Agglomerative clustering
IV. Divisive Hierarchical Clustering
V. BIRCH Algorithm
VI. DBSCAN Algorithm
I. K-Means Clustering
K-Means Clustering is an Unsupervised iterative Learning algorithm, which
groups the unlabeled dataset into K different clusters in such a way that each
dataset belongs to only one group that has similar properties.
It is a centroid-based algorithm, where each cluster is associated with a centroid.
The main aim of this algorithm is to minimize the sum of distances between the
data point and their corresponding clusters.
It can be used in Image Segmentation, Customer Segmentation, Fraud
Detection, Bioinformatics, Text Clustering, Recommendation Systems, and
Environmental Monitoring.
I. K-Means Clustering
K-means algorithms
1. Choose the number of clusters(K) and obtain the data points
2. Randomly initialize the centroids c_1, c_2, ..... C_k
3. for each data point x_i:
- find the nearest centroid(c_1, c_2 .. c_k)
- assign the point to that cluster
4. for each cluster j = 1..k
- new centroid = mean of all points assigned to that cluster
5. Repeat steps 3 and 4 until convergence or until the end of a fixed
number of iterations
6. End
Example of k-Means
● Consider the following data set consisting of the scores of two
variables on each of seven individuals.
● This data set is to be grouped into two clusters.
Example of k-Means
• As a first step in finding a sensible initial partition, let the A & B
values of the two individuals furthest apart (using the Euclidean
distance measure), define the initial cluster means, giving:
Example of k-Means
• Now the initial partition has changed, and the two clusters at this
stage having the following characteristics:
Example of k-Means
• Only individual 3 is nearer to the mean of the opposite cluster
(Cluster 2) than its own (Cluster 1). Thus, individual 3 is relocated to
Cluster 2 resulting in the new partition
Practice Problem of K-Means
Problem-01:
Cluster the following eight points (with (x, y) representing locations)
into three clusters:
A1(2, 10), A2(2, 5), A3(8, 4), A4(5, 8), A5(7, 5), A6(6, 4), A7(1,
2), A8(4, 9)
Initial cluster centers are A1(2, 10), A4(5, 8), and A7(1, 2).
Use the K-Means Algorithm to find the three cluster centers after the
second iteration.
Practice Problem of K-Means
Problem-02:
Use K-Means Algorithm to create two clusters-
Assume A(2, 2) and C(1, 1) are centers of the two clusters.
II. Fuzzy c-means (FCM)
Fuzzy c-means clustering, is a soft clustering technique in which
each data point is separated into different clusters and then
assigned a probability score for being in that cluster that provides
one element of data or image belonging to two or more clusters.
It is a soft clustering technique.
II. Fuzzy c-means (FCM)
It starts with a random initial guess for the cluster centers; that is the
mean location of each cluster.
Next, FCM assigns every data point a random membership grade for
each cluster.
By iteratively updating the cluster centers and the membership grades
for each data point, FCM moves the cluster centers to the correct
location within a data set and, for each data point, finds the degree of
membership in each cluster.
K-Means vs Fuzzy c-means
K-means clustering algorithm gives the values of any point lying in some
particular cluster to be either 0 or 1 i.e., either true or false. But the fuzzy
logic gives the fuzzy values of any particular data point to be lying in either of
the clusters.
Steps of Fuzzy c-means
Example of Fuzzy c-means
❖ summation of membership of each data point should be equal to one.
Example of Fuzzy c-means
Example of Fuzzy c-means
Example of Fuzzy c-means
= 0.136
Example of Fuzzy c-means
Example of Fuzzy c-means
Practice problem of Fuzzy c-means
III. Hierarchical Agglomerative clustering
▪ A Hierarchical clustering method works via grouping data into a tree of
clusters.
▪ The tree-shaped structure is known as the dendrogram
▪ Agglomerative clustering is the most common type of hierarchical
clustering.
▪ In agglomerative clustering, each data point act as an individual cluster, and
at each step, data objects are grouped in a bottom-up method.
▪ Initially, each data object is in its cluster.
▪ At each iteration, the clusters are combined with different clusters until one
cluster is formed.
III. Hierarchical Agglomerative clustering
Steps of Agglomerative Clustering
1. Initialization: Start with each data point as its own cluster.
2. Compute Pairwise Distances: Calculate distances between clusters.
3. Merge Closest Clusters: Combine the two closest clusters based on a
chosen linkage criterion.
4. Update Distance Matrix: Recalculate distances for the new cluster.
5. Repeat Steps 3 and 4: Continue merging clusters until all data points are in
a single cluster.
6. Hierarchical Representation: Build a dendrogram to visualize the process.
Example-01
For the given dataset find the clusters using a single link technique.
Use Euclidean distance and draw the Dendrogram.
IV. Divisive hierarchical clustering
Divisive clustering works just the opposite of agglomerative
clustering.
In Divisive Hierarchical clustering, all the data points are considered
an individual cluster, and in every iteration, the data points that are
not similar are separated from the cluster.
The separated data points are treated as individual clusters.
Finally, we are left with N clusters.
IV. Divisive hierarchical clustering
Steps of Divisive Clustering:
1. Initialization: Start with all data points in a single cluster.
2. Choose Divisive Criterion: Define a criterion to split the cluster based on
dissimilarity.
3. Split the Cluster: Use the criterion to divide the cluster into smaller,
more dissimilar subclusters.
4. Update Structure: Update the cluster structure to include the subclusters
and remove the original cluster.
5. Repeat Steps 2-4: Continue splitting subclusters recursively until a
stopping criterion is met.
6. Hierarchical Representation: Build a dendrogram to visualize the
hierarchy of clusters.
V. BIRCH Algorithm
▪ The full form of BIRCH is Balanced Iterative Reducing and Clustering
using Hierarchies (BIRCH).
▪ It is a hierarchical clustering algorithm that is designed for clustering a
large amount of data.
▪ It can cluster large datasets by first producing a brief summary of the
dataset that preserves information as much as possible.
▪ This brief summary is then clustered using other clustering algorithms like
agglomerative clustering instead of clustering the whole dataset.
▪ It converts data to a tree data structure with the centroids being read off the
leaf.
▪ These centroids can be the final cluster centroid or the input for other
cluster algorithms like Agglomerative Clustering.
Stages of BIRCH Algorithm
The BIRCH clustering algorithm consists of two stages:
1. Building the CF Tree: BIRCH summarizes large datasets into smaller,
dense regions called Clustering Feature (CF) entries.
2. Global Clustering: Applies an existing clustering algorithm on the leaves
of the CF tree. A CF tree is a tree where each leaf node contains a
sub-cluster.
CF Tree
The CF tree is a height-balanced tree that gathers and manages
clustering features and holds necessary information of given
data for further hierarchical clustering.
This prevents the need to work with whole data given as input.
The tree cluster of data points as CF is represented by three
numbers (N, LS, SS).
• N = number of items in subclusters
• LS = vector sum of the data points
• SS = sum of the squared data points
Parameters of BIRCH Algorithm :
• threshold: threshold is the maximum number of data points a sub-cluster
in the leaf node of the CF tree can hold.
• branching_factor: This parameter specifies the maximum number of CF
sub-clusters in each node (internal node).
• n_clusters: The number of clusters to be returned after the entire BIRCH
algorithm is complete
• Radius: The "radius" typically refers to the maximum distance between
data points within a subcluster that determines whether those data points
are eligible for merging into a larger cluster.
Example 1
Let have the following Data,
x1=(3,4),x2= (2,6). X3=(4,5), x4=(4,7), x5=(3,8), x6=(6,2), x7=(7,2), x8=(7,4), x9=(8,4),
x10=(7,9) Cluster the Above Data Using the BIRCH Algorithm, considering T<1.5, and Max
Branch = 2.
Solution:-
Solution:-
Solution:-
Solution:-
Solution:-
Solution:-
Solution:-
Solution:-
Solution:-
Solution:-
V. DBSCAN Algorithm
The full form of DBSCAN is density-based spatial clustering of
applications with noise (DBSCAN).
It is a density-based clustering algorithm.
"Density" refers to the concentration or closeness of data points in a
particular region of the feature space.
It is a measure of how tightly data points are packed together within a
given area or neighborhood
It is a clustering algorithm that defines clusters as continuous regions of
high density and works well if all the clusters are dense enough and well
separated by low-density regions.
It can discover clusters of different shapes and sizes from a large amount
of data, which is containing noise and outliers.
V. DBSCAN Algorithm
The DBSCAN algorithm uses two parameters:
• minPts: The minimum number of points (a threshold) clustered together
for a region to be considered dense.
• eps (ε): A distance measure that will be used to locate the points in the
neighborhood of any point.
V. DBSCAN Algorithm
There are three types of points after the DBSCAN clustering is complete:
• Core: This is a point that has at least m points within distance n from
itself.
• Border: This is a point that has at least one Core point at a distance n.
• Noise: This is a point that is neither a Core nor a Border. And it has less
than m points within distance n from itself.
Step of DBSCAN Algorithm
1. Core Point Identification:
For each data point, identify core points that have at least MinPts neighbors
within a distance of ε(eps).
2. Cluster Formation:
Create a cluster for each unassigned core point and connect it to its core
neighbors. Recursively expand these clusters.
3. Border Point Assignment:
Assign border points to the cluster of their nearest core point.
4. Noise Point Handling:
Any data points not assigned to a cluster are considered noise points or
outliers.
Example 1
Solution:-
Solution:-
Point Status
P1 Noise Border
P2 Core
P3 Noise Border
P4 Noise Border
P5 Core
P6 Noise Border
P7 Noise Border
P8 Noise Border
P9 Noise
P10 Noise Border
P11 Core
P12 Noise Border
Solution:-
Thank You