Lecture-02 Unsupervised Learning Algorithm(Clustering)

The document provides an overview of clustering as an unsupervised learning algorithm, detailing its purpose of grouping similar data points into clusters. It discusses various types of clustering, including hard and soft clustering, and outlines specific algorithms such as K-means, Fuzzy C-means, Hierarchical Agglomerative, Divisive, BIRCH, and DBSCAN. Each algorithm is explained with its methodology, applications, and examples to illustrate their use in data analysis.

Uploaded by

Rakib Sheikh

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

5 views60 pages

Lecture-02 Unsupervised Learning Algorithm(Clustering)

Uploaded by

Rakib Sheikh

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 60

Unsupervised Learning Algorithm: Clustering

Md. Apu Hosen

Lecturer
Dept. of CSE, NUBTK
Clustering
• Clustering divides a data set into a set of clusters (groups) where instances
of the same cluster have high similarity while instances of different
clusters have high dissimilarity.
• The aim of the clustering process is to segregate groups with similar
traits and assign them into clusters.
• A cluster of data objects can be treated as one group.
• While doing cluster analysis, we first partition the set of data into groups
based on data similarity and then assign the labels to the groups.
Clustering
Types of Clustering
Broadly speaking, clustering can be divided into two subgroups:
• Hard Clustering: In this, each input data point either belongs to a
cluster completely or not.
• Soft Clustering: In this, instead of putting each input data point
into a separate cluster, a probability or likelihood of that data point
being in those clusters is assigned.
Types of Clustering Algorithm
There are different types of clustering algorithms that handle all kinds of
unique data. Some are,
▪ Centroid-based: These types of algorithms separate data points based on multiple
centroids in the data. Each data point is assigned to a cluster based on its squared
distance from the centroid. This is the most commonly used type of clustering.
▪ Hierarchical clustering: It is a type of clustering algorithm that creates a hierarchy
or tree-like structure of clusters. It groups data points into clusters in a way that
preserves a nested or hierarchical relationship among the clusters.
▪ Density Based: In density-based clustering, data is grouped by areas of high
concentrations of data points surrounded by areas of low concentrations of data
points. Basically, the algorithm finds the places that are dense with data points and
calls those clusters.
Clustering Algorithm
Based on the types of clustering, there are different clustering
algorithms. A suitable clustering algorithm helps in finding valuable
insights for the industry. Different types of clustering in machine
learning are,
I. K-means Clustering
II. Fuzzy C-means Clustering
III. Hierarchical Agglomerative clustering
IV. Divisive Hierarchical Clustering
V. BIRCH Algorithm
VI. DBSCAN Algorithm
I. K-Means Clustering
K-Means Clustering is an Unsupervised iterative Learning algorithm, which
groups the unlabeled dataset into K different clusters in such a way that each
dataset belongs to only one group that has similar properties.
It is a centroid-based algorithm, where each cluster is associated with a centroid.
The main aim of this algorithm is to minimize the sum of distances between the
data point and their corresponding clusters.
It can be used in Image Segmentation, Customer Segmentation, Fraud
Detection, Bioinformatics, Text Clustering, Recommendation Systems, and
Environmental Monitoring.
I. K-Means Clustering
K-means algorithms
1. Choose the number of clusters(K) and obtain the data points
2. Randomly initialize the centroids c_1, c_2, ..... C_k
3. for each data point x_i:
- find the nearest centroid(c_1, c_2 .. c_k)
- assign the point to that cluster
4. for each cluster j = 1..k
- new centroid = mean of all points assigned to that cluster
5. Repeat steps 3 and 4 until convergence or until the end of a fixed
number of iterations
6. End
Example of k-Means

● Consider the following data set consisting of the scores of two

variables on each of seven individuals.
● This data set is to be grouped into two clusters.
Example of k-Means

• As a first step in finding a sensible initial partition, let the A & B

values of the two individuals furthest apart (using the Euclidean
distance measure), define the initial cluster means, giving:
Example of k-Means

• Now the initial partition has changed, and the two clusters at this
stage having the following characteristics:
Example of k-Means

• Only individual 3 is nearer to the mean of the opposite cluster

(Cluster 2) than its own (Cluster 1). Thus, individual 3 is relocated to
Cluster 2 resulting in the new partition
Practice Problem of K-Means

Problem-01:
Cluster the following eight points (with (x, y) representing locations)
into three clusters:
A1(2, 10), A2(2, 5), A3(8, 4), A4(5, 8), A5(7, 5), A6(6, 4), A7(1,
2), A8(4, 9)
Initial cluster centers are A1(2, 10), A4(5, 8), and A7(1, 2).
Use the K-Means Algorithm to find the three cluster centers after the
second iteration.
Practice Problem of K-Means
Problem-02:
Use K-Means Algorithm to create two clusters-

Assume A(2, 2) and C(1, 1) are centers of the two clusters.

II. Fuzzy c-means (FCM)
Fuzzy c-means clustering, is a soft clustering technique in which
each data point is separated into different clusters and then
assigned a probability score for being in that cluster that provides
one element of data or image belonging to two or more clusters.
It is a soft clustering technique.
II. Fuzzy c-means (FCM)
It starts with a random initial guess for the cluster centers; that is the
mean location of each cluster.
Next, FCM assigns every data point a random membership grade for
each cluster.
By iteratively updating the cluster centers and the membership grades
for each data point, FCM moves the cluster centers to the correct
location within a data set and, for each data point, finds the degree of
membership in each cluster.
K-Means vs Fuzzy c-means
K-means clustering algorithm gives the values of any point lying in some
particular cluster to be either 0 or 1 i.e., either true or false. But the fuzzy
logic gives the fuzzy values of any particular data point to be lying in either of
the clusters.
Steps of Fuzzy c-means
Example of Fuzzy c-means

❖ summation of membership of each data point should be equal to one.

Example of Fuzzy c-means
Example of Fuzzy c-means
Example of Fuzzy c-means

= 0.136
Example of Fuzzy c-means
Example of Fuzzy c-means
Practice problem of Fuzzy c-means
III. Hierarchical Agglomerative clustering
▪ A Hierarchical clustering method works via grouping data into a tree of
clusters.
▪ The tree-shaped structure is known as the dendrogram
▪ Agglomerative clustering is the most common type of hierarchical
clustering.
▪ In agglomerative clustering, each data point act as an individual cluster, and
at each step, data objects are grouped in a bottom-up method.
▪ Initially, each data object is in its cluster.
▪ At each iteration, the clusters are combined with different clusters until one
cluster is formed.
III. Hierarchical Agglomerative clustering
Steps of Agglomerative Clustering
1. Initialization: Start with each data point as its own cluster.
2. Compute Pairwise Distances: Calculate distances between clusters.
3. Merge Closest Clusters: Combine the two closest clusters based on a
chosen linkage criterion.
4. Update Distance Matrix: Recalculate distances for the new cluster.
5. Repeat Steps 3 and 4: Continue merging clusters until all data points are in
a single cluster.
6. Hierarchical Representation: Build a dendrogram to visualize the process.
Example-01
For the given dataset find the clusters using a single link technique.
Use Euclidean distance and draw the Dendrogram.
IV. Divisive hierarchical clustering
Divisive clustering works just the opposite of agglomerative
clustering.
In Divisive Hierarchical clustering, all the data points are considered
an individual cluster, and in every iteration, the data points that are
not similar are separated from the cluster.
The separated data points are treated as individual clusters.
Finally, we are left with N clusters.
IV. Divisive hierarchical clustering
Steps of Divisive Clustering:
1. Initialization: Start with all data points in a single cluster.
2. Choose Divisive Criterion: Define a criterion to split the cluster based on
dissimilarity.
3. Split the Cluster: Use the criterion to divide the cluster into smaller,
more dissimilar subclusters.
4. Update Structure: Update the cluster structure to include the subclusters
and remove the original cluster.
5. Repeat Steps 2-4: Continue splitting subclusters recursively until a
stopping criterion is met.
6. Hierarchical Representation: Build a dendrogram to visualize the
hierarchy of clusters.
V. BIRCH Algorithm
▪ The full form of BIRCH is Balanced Iterative Reducing and Clustering
using Hierarchies (BIRCH).
▪ It is a hierarchical clustering algorithm that is designed for clustering a
large amount of data.
▪ It can cluster large datasets by first producing a brief summary of the
dataset that preserves information as much as possible.
▪ This brief summary is then clustered using other clustering algorithms like
agglomerative clustering instead of clustering the whole dataset.
▪ It converts data to a tree data structure with the centroids being read off the
leaf.
▪ These centroids can be the final cluster centroid or the input for other
cluster algorithms like Agglomerative Clustering.
Stages of BIRCH Algorithm
The BIRCH clustering algorithm consists of two stages:
1. Building the CF Tree: BIRCH summarizes large datasets into smaller,
dense regions called Clustering Feature (CF) entries.
2. Global Clustering: Applies an existing clustering algorithm on the leaves
of the CF tree. A CF tree is a tree where each leaf node contains a
sub-cluster.
CF Tree
The CF tree is a height-balanced tree that gathers and manages
clustering features and holds necessary information of given
data for further hierarchical clustering.
This prevents the need to work with whole data given as input.
The tree cluster of data points as CF is represented by three
numbers (N, LS, SS).
• N = number of items in subclusters
• LS = vector sum of the data points
• SS = sum of the squared data points
Parameters of BIRCH Algorithm :
• threshold: threshold is the maximum number of data points a sub-cluster
in the leaf node of the CF tree can hold.
• branching_factor: This parameter specifies the maximum number of CF
sub-clusters in each node (internal node).
• n_clusters: The number of clusters to be returned after the entire BIRCH
algorithm is complete
• Radius: The "radius" typically refers to the maximum distance between
data points within a subcluster that determines whether those data points
are eligible for merging into a larger cluster.
Example 1

Let have the following Data,

x1=(3,4),x2= (2,6). X3=(4,5), x4=(4,7), x5=(3,8), x6=(6,2), x7=(7,2), x8=(7,4), x9=(8,4),

x10=(7,9) Cluster the Above Data Using the BIRCH Algorithm, considering T<1.5, and Max
Branch = 2.
Solution:-
Solution:-
Solution:-
Solution:-
Solution:-
Solution:-
Solution:-
Solution:-
Solution:-
Solution:-
V. DBSCAN Algorithm
The full form of DBSCAN is density-based spatial clustering of
applications with noise (DBSCAN).
It is a density-based clustering algorithm.
"Density" refers to the concentration or closeness of data points in a
particular region of the feature space.
It is a measure of how tightly data points are packed together within a
given area or neighborhood
It is a clustering algorithm that defines clusters as continuous regions of
high density and works well if all the clusters are dense enough and well
separated by low-density regions.
It can discover clusters of different shapes and sizes from a large amount
of data, which is containing noise and outliers.
V. DBSCAN Algorithm
The DBSCAN algorithm uses two parameters:
• minPts: The minimum number of points (a threshold) clustered together
for a region to be considered dense.
• eps (ε): A distance measure that will be used to locate the points in the
neighborhood of any point.
V. DBSCAN Algorithm
There are three types of points after the DBSCAN clustering is complete:
• Core: This is a point that has at least m points within distance n from
itself.
• Border: This is a point that has at least one Core point at a distance n.
• Noise: This is a point that is neither a Core nor a Border. And it has less
than m points within distance n from itself.
Step of DBSCAN Algorithm
1. Core Point Identification:
For each data point, identify core points that have at least MinPts neighbors
within a distance of ε(eps).
2. Cluster Formation:
Create a cluster for each unassigned core point and connect it to its core
neighbors. Recursively expand these clusters.
3. Border Point Assignment:
Assign border points to the cluster of their nearest core point.
4. Noise Point Handling:
Any data points not assigned to a cluster are considered noise points or
outliers.
Example 1
Solution:-
Solution:-
Point Status
P1 Noise Border
P2 Core
P3 Noise Border
P4 Noise Border
P5 Core
P6 Noise Border
P7 Noise Border
P8 Noise Border
P9 Noise
P10 Noise Border
P11 Core
P12 Noise Border
Solution:-
Thank You

M3u Links and Xtream Codes
0% (1)
M3u Links and Xtream Codes
8 pages
People Also Ask: Cecelib Possessive Series
No ratings yet
People Also Ask: Cecelib Possessive Series
1 page
LTE Layer 4 (MAC, RLC, PDCP) Interview Questions
100% (1)
LTE Layer 4 (MAC, RLC, PDCP) Interview Questions
2 pages
HOL Informatica 9.1.0 Upgrade
No ratings yet
HOL Informatica 9.1.0 Upgrade
47 pages
Low Latency Streaming Cmaf Whitepaper
No ratings yet
Low Latency Streaming Cmaf Whitepaper
11 pages
ML Unit-4 Final 2024-25
No ratings yet
ML Unit-4 Final 2024-25
28 pages
Unsupervised Learning
No ratings yet
Unsupervised Learning
23 pages
Data Mining and Machine Learning
No ratings yet
Data Mining and Machine Learning
48 pages
Machine Learning Notes Anna University
100% (1)
Machine Learning Notes Anna University
14 pages
7.introduction To Clustering
No ratings yet
7.introduction To Clustering
11 pages
22AIP3101A Session 9
No ratings yet
22AIP3101A Session 9
38 pages
Clustering
No ratings yet
Clustering
38 pages
Unit-IV - Unsupervised Learning
No ratings yet
Unit-IV - Unsupervised Learning
154 pages
Unit 3 Data
No ratings yet
Unit 3 Data
37 pages
Cluster
100% (1)
Cluster
72 pages
An Introduction To Different Methods of Clustering in Machine Learning
No ratings yet
An Introduction To Different Methods of Clustering in Machine Learning
8 pages
Day 3 - Content
No ratings yet
Day 3 - Content
50 pages
Unit 3 Clustering Algorithm
No ratings yet
Unit 3 Clustering Algorithm
44 pages
Artificial Intelligence Report
No ratings yet
Artificial Intelligence Report
23 pages
ML CH 4
No ratings yet
ML CH 4
51 pages
AI Chapter 3 Part 5
No ratings yet
AI Chapter 3 Part 5
30 pages
Machine Learning
No ratings yet
Machine Learning
23 pages
Module 5 - Notes - 13 12 2024
No ratings yet
Module 5 - Notes - 13 12 2024
45 pages
UNIT 4 K-Means Clustring
No ratings yet
UNIT 4 K-Means Clustring
13 pages
Clustering
No ratings yet
Clustering
80 pages
4.unsupervised Learning Model-Clustering
No ratings yet
4.unsupervised Learning Model-Clustering
45 pages
Chap7 Basic Cluster Analysis
No ratings yet
Chap7 Basic Cluster Analysis
82 pages
Clustering
No ratings yet
Clustering
9 pages
Unit 4
No ratings yet
Unit 4
74 pages
ML Unit 4
No ratings yet
ML Unit 4
110 pages
DSML-ML09. Unsupervised Learning
No ratings yet
DSML-ML09. Unsupervised Learning
69 pages
Unit 4 Descriptive Modeling
No ratings yet
Unit 4 Descriptive Modeling
18 pages
Unit - Iv Unsupervisied Learning - Notes
No ratings yet
Unit - Iv Unsupervisied Learning - Notes
32 pages
Unit 4
No ratings yet
Unit 4
16 pages
ML Mod 4 Part 1
No ratings yet
ML Mod 4 Part 1
99 pages
Clustering
No ratings yet
Clustering
10 pages
Clustering
No ratings yet
Clustering
75 pages
Unit 4
No ratings yet
Unit 4
29 pages
Text Analytics Unit-3
No ratings yet
Text Analytics Unit-3
11 pages
Unsupervised Learning Part 1
No ratings yet
Unsupervised Learning Part 1
9 pages
Unit5 CSM ML
No ratings yet
Unit5 CSM ML
32 pages
Lecture Notes - Clustering
No ratings yet
Lecture Notes - Clustering
13 pages
DWM 4
No ratings yet
DWM 4
14 pages
Unit 5 Cluster Analysis
No ratings yet
Unit 5 Cluster Analysis
15 pages
Machine Learning Notes-1 (Clustering-1)
No ratings yet
Machine Learning Notes-1 (Clustering-1)
25 pages
Unt III (DS)
No ratings yet
Unt III (DS)
49 pages
ML Module 4 Unsupervised Learning - Updated
No ratings yet
ML Module 4 Unsupervised Learning - Updated
55 pages
Clustering
No ratings yet
Clustering
17 pages
Unsupervised Learning
No ratings yet
Unsupervised Learning
79 pages
Unit 4 Clustering - K-Means and Hierarchical
No ratings yet
Unit 4 Clustering - K-Means and Hierarchical
40 pages
Day 3
No ratings yet
Day 3
74 pages
Clustering Algorithm
No ratings yet
Clustering Algorithm
47 pages
Artificial Intelligence Lec 5
No ratings yet
Artificial Intelligence Lec 5
20 pages
Lecture 01 - Unsupervised Learning (Optional)
No ratings yet
Lecture 01 - Unsupervised Learning (Optional)
57 pages
Week 10
No ratings yet
Week 10
84 pages
Unit 3
No ratings yet
Unit 3
12 pages
Clustering
No ratings yet
Clustering
104 pages
Unit IV
No ratings yet
Unit IV
51 pages
Module 5
No ratings yet
Module 5
91 pages
Clustering - Unit 4
No ratings yet
Clustering - Unit 4
19 pages
Chapter 3-Unsupervised Learning - Updated
No ratings yet
Chapter 3-Unsupervised Learning - Updated
54 pages
Assignment 2
No ratings yet
Assignment 2
8 pages
Module 5
No ratings yet
Module 5
43 pages
Technical Specifications Saitel DR-EN-Rev3.3
No ratings yet
Technical Specifications Saitel DR-EN-Rev3.3
35 pages
CM9.4 ElasticsearchInstall Config PDF
No ratings yet
CM9.4 ElasticsearchInstall Config PDF
29 pages
Davy
No ratings yet
Davy
3 pages
DMS Guide - English
No ratings yet
DMS Guide - English
16 pages
MB Manual Ga-H110m-S2ph (ddr3) e
No ratings yet
MB Manual Ga-H110m-S2ph (ddr3) e
39 pages
Power Off Reset Reason Backup
No ratings yet
Power Off Reset Reason Backup
5 pages
Best Websites Ever
75% (4)
Best Websites Ever
2 pages
Engineering: Analysis Design
No ratings yet
Engineering: Analysis Design
2 pages
Exercises: 2 / Basic Structures: Sets, Functions, Sequences, Sums, and Matrices
No ratings yet
Exercises: 2 / Basic Structures: Sets, Functions, Sequences, Sums, and Matrices
2 pages
PQShield - Where Is Your Cryptography - Report 2023
No ratings yet
PQShield - Where Is Your Cryptography - Report 2023
31 pages
Minimal 3D Technology Consulting Presentation
No ratings yet
Minimal 3D Technology Consulting Presentation
19 pages
WBS Template ProjectManager
No ratings yet
WBS Template ProjectManager
12 pages
Module 3 Linked List
No ratings yet
Module 3 Linked List
33 pages
Ridge Regression
No ratings yet
Ridge Regression
6 pages
Wa0000.
No ratings yet
Wa0000.
14 pages
2011-Tirovolas Levitin-Music Perception and Cognition Research
No ratings yet
2011-Tirovolas Levitin-Music Perception and Cognition Research
16 pages
Hotel Management System JAVA + MySQL Summer Training Project For BCA - PDF Download
No ratings yet
Hotel Management System JAVA + MySQL Summer Training Project For BCA - PDF Download
65 pages
Optics
No ratings yet
Optics
3 pages
SimpleHTTPServer in C# GitHub
No ratings yet
SimpleHTTPServer in C# GitHub
10 pages
EvoFox Katana-X Mechanical Gaming Keyboard With Outemu Blue Switches Vivid Rainbow Lighting With 13 Preset Effects Dedicated
No ratings yet
EvoFox Katana-X Mechanical Gaming Keyboard With Outemu Blue Switches Vivid Rainbow Lighting With 13 Preset Effects Dedicated
1 page
Snapchat Brand Guidelines - EN
No ratings yet
Snapchat Brand Guidelines - EN
12 pages
Alsa Excell 200 - User Manual
No ratings yet
Alsa Excell 200 - User Manual
17 pages
Asus Prime h310m-k r2.0
No ratings yet
Asus Prime h310m-k r2.0
113 pages
PDF Document
No ratings yet
PDF Document
14 pages
Art04 PDF
No ratings yet
Art04 PDF
4 pages

Lecture-02 Unsupervised Learning Algorithm(Clustering)

Uploaded by

Lecture-02 Unsupervised Learning Algorithm(Clustering)

Uploaded by

Unsupervised Learning Algorithm: Clustering

Md. Apu Hosen

● Consider the following data set consisting of the scores of two

• As a first step in finding a sensible initial partition, let the A & B

• Only individual 3 is nearer to the mean of the opposite cluster

Assume A(2, 2) and C(1, 1) are centers of the two clusters.

❖ summation of membership of each data point should be equal to one.

Let have the following Data,

x1=(3,4),x2= (2,6). X3=(4,5), x4=(4,7), x5=(3,8), x6=(6,2), x7=(7,2), x8=(7,4), x9=(8,4),

You might also like