0% found this document useful (0 votes)

58 views

Unsupervised Machine Learning

This document discusses unsupervised machine learning techniques including clustering, association, and hierarchical clustering. Clustering algorithms are used to group unlabeled data based on similarities, including k-means clustering and fuzzy c-means clustering. Association rules are used to find relationships in large datasets. Hierarchical clustering creates tree-like structures to organize data.

Uploaded by

Ananya S

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

58 views

Unsupervised Machine Learning

Uploaded by

Ananya S

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 10

Unsupervised Machine Learning

unsupervised learning is a machine learning technique in which models are not supervised
using training dataset. Instead, models itself find the hidden patterns and insights from the
given data.

“Unsupervised learning is a type of machine learning in which models are trained using
unlabeled dataset and are allowed to act on that data without any supervision.”

The goal of unsupervised learning is to find the underlying structure of dataset, group
that data according to similarities, and represent that dataset in a compressed
format.

Why use Unsupervised Learning?

o Unsupervised learning is helpful for finding useful insights from the data.
o Unsupervised learning is much similar as a human learns to think by their own
experiences, which makes it closer to the real AI.
o In real-world, we do not always have input data with the corresponding output so to
solve such cases, we need unsupervised learning.

Types of Unsupervised Learning algorithms:

Clustering:

• Clustering is the process of dividing the datasets into groups, consisting

of similar data points.

• It means grouping of objects based on information found in the data,

describing the objects or then relationship.

Association:

• Used for finding the relationships between variables in the large database.

• Association rule makes marketing strategy more effective. Such as people who buy X
item (suppose a bread) are also tend to purchase Y (Butter/Jam) item. A typical
example of Association rule is Market Basket Analysis.

Advantages of Unsupervised Learning

• Unsupervised learning is used for more complex tasks as compared to
supervised learning because, in unsupervised learning, we don't have
labeled input data.

• Unsupervised learning is preferable as it is easy to get unlabeled data in

comparison to labeled data.

Disadvantages of Unsupervised Learning

• Unsupervised learning is more difficult than supervised learning as it
does not have corresponding output.

• The result of the unsupervised learning algorithm might be less accurate

as input data is not labeled, and algorithms do not know the exact output
in advance.
Clustering in Machine Learning

• Clustering is the task of dividing the data points into a number of

groups, such that data points in the same groups are more similar to each
other and dissimilar to the data points in other groups.
• It is basically a collection of objects on the basis of similarity and
dissimilarity between them.

Why clustering is important?

• Through the use of clusters, It is very easy to sort data and analyze
specific groups.
• Clustering enables businesses to approach customer segments
differently based on their attributes and similarities. This helps in
maximizing profits.
• It can help in dimensionality reduction if the dataset is comprised
of too many variables. Irrelevant clusters can be identified easier
and removed from the dataset.
Where it is used?
• City Planning: It is used to make groups of houses and to study their
values based on their geographical locations and other factors present.

• Earthquake studies: By learning the earthquake-affected areas we can

determine the dangerous zones.

• Image Processing: Clustering can be used to group similar images

together, classify images based on content, and identify patterns in
image data.

• Manufacturing: Clustering is used to group similar products together,

optimize production processes, and identify defects in manufacturing
processes.

• Medical diagnosis: Clustering is used to group patients with similar

symptoms or diseases, which helps in making accurate diagnoses and
identifying effective treatments.

• Fraud detection: Clustering is used to identify suspicious patterns or

anomalies in financial transactions, which can help in detecting fraud or
other financial crimes.

Types of Clustering:-

❖ Exclusive Clustering
• k-Means Clustering
❖ Overlapping Clustering
• Fuzzy c-Means Clustering
❖ Hierarchical Clustering

Exclusive Clustering:-
• Exclusive clustering, also known as hard clustering, is a type of clustering
in unsupervised machine learning where each data point is assigned to
exactly one cluster.
• In other words, there is a clear and exclusive assignment of each data
point to a single cluster, and no overlapping memberships are allowed.

• The most well-known exclusive clustering algorithm is k-means.

k-Means Clustering:-
• K-means clustering is a popular unsupervised machine learning
algorithm used for partitioning a dataset into a set of groups.

• There is no overlapping of subgroups or clusters.

• The algorithm's objective is to group data points into k clusters, where

each data point belongs to the cluster.

• Here K defines the number of pre-defined clusters that need to be

created in the process, as if K=2, there will be two clusters, and for K=3,
there will be three clusters, and so on.

• It is a centroid-based algorithm, where each cluster is associated with a

centroid. The main aim of this algorithm is to minimize the sum of
distances between the data point and their corresponding clusters.

How does the K-Means Algorithm Work?

Step-1: Select the number K to decide the number of clusters.

Step-2: Select random K points or centroids. (It can be other from the input
dataset).

Step-3: Calculate the distance between each data point and centroid.Assign
each data point to their closest centroid, which will form the predefined K
clusters.

Step-4: Recalculate each cluster center by taking the average of cluster’s data
point.

Step-5: Repeat from step2 to step5 until the recalculated cluster centers are
same as previous or No reassignment of data points happend.

How to decide the number of clusters?

Elbow Method :
• The Elbow method is one of the most popular ways to find the optimal
number of clusters. This method uses the concept of WCSS
value. WCSS stands for Within Cluster Sum of Squares, which defines
the total variations within a cluster.
• The formula to calculate the value of WCSS (for 3 clusters) is given below:

WCSS= ∑Pi in Cluster1 distance(Pi C1)2 +∑Pi in Cluster2distance(Pi C2)2+∑Pi in

2
CLuster3 distance(Pi C3)

• In the above formula of WCSS,

∑Pi in Cluster1 distance(Pi C1)2: It is the sum of the square of the distances
between each data point and its centroid within a cluster1 and the same
for the other two terms.

• To measure the distance between data points and centroid, we can use
any method such as Euclidean distance .

• To find the optimal value of clusters, the elbow method follows the
below steps:
(1) It executes the K-means clustering on a given dataset for different K
values (ranges from 1-10).

(2) For each value of K, calculates the WCSS value.

(3) Plots a curve between calculated WCSS values and the number of
clusters K.

(4) The sharp point of bend or a point of the plot looks like an arm, then
that point is considered as the best value of K.

• Since the graph shows the sharp bend, which looks like an elbow, hence it
is known as the elbow method.

Overlapping Clustering:-
• Overlapping clustering, also known as soft clustering ,it’s a type of
clustering in which a data point can belong to more than one cluster.

• In traditional (non-overlapping) clustering, each data point is assigned to

exactly one cluster. However, in overlapping clustering, a data point may
have membership in more than one cluster, indicating that it exhibits
characteristics of multiple clusters.

Fuzzy C-Means Clustering:-

• Fuzzy C-Means (FCM) clustering is a type of unsupervised machine
learning algorithm used for clustering, and it's an extension of the classic
K-Means algorithm.
• The key difference between K-Means and FCM lies in the assignment of
data points to clusters. In K-Means, each data point is assigned to a
single cluster, while in FCM data point to belong to more than one
cluster with different degrees of membership.

• fuzzy clustering assigns a membership degree between 0 and 1 for each

data point for each cluster.

Advantages of Fuzzy Clustering:-

• Flexibility: Fuzzy clustering allows for overlapping clusters, which can

be useful when the data has a complex structure.

• Interpretability: Fuzzy clustering provides a more detailed

representation of the relationships between data points and clusters.

Disadvantages of Fuzzy Clustering:-

• Complexity: Fuzzy clustering algorithms can be computationally more
expensive than traditional clustering algorithms.

Hierarchical Clustering:-
• Hierarchical clustering is a type of clustering algorithm that organizes
data points into a tree-like structure, known as a dendrogram.

• The basic idea behind hierarchical clustering is to build a hierarchy of

clusters, where clusters at one level of the hierarchy are formed by
merging or splitting clusters at the preceding level.

Advantages:-
• Hierarchy Representation: This hierarchical structure can be useful for
understanding the organization of the data.

• No Need for Specify Number of Clusters:

Disadvantages:
• Computational Complexity
• Sensitive to Noise
• Difficulty in Handling Large Datasets

Association in Machine Learning

• Association in unsupervised machine learning generally refers to
the discovery of interesting relationships, patterns, or associations
within a dataset without predefined labels or outcomes.
Applications of Association:
• Market Basket Analysis:
Discovering relationships between products that are frequently
purchased together.
• Healthcare Data Analysis:
Identifying associations between symptoms and diseases.

• Fraud Detection:
Discovering unusual patterns of transactions that may indicate
fraudulent activity.
Pros and Cons:
Pros:
• Discover Hidden Patterns:
Association rule can reveal hidden patterns or relationships within
the data.
• Applicability:
Used in various domain, including retail, healthcare, finance etc.
Cons:
• Data Quality:
Sensitive to noise and irrelevant information in the dataset.
• Scalability:
Computationally expensive for large datasets.

Machine Learning with Clustering: A Visual Guide for Beginners with Examples in Python
From Everand
Machine Learning with Clustering: A Visual Guide for Beginners with Examples in Python
Artem Kovera
No ratings yet
1 A Taller 24 de Mayo Del 20190001
No ratings yet
1 A Taller 24 de Mayo Del 20190001
1 page
unsupervised learning
No ratings yet
unsupervised learning
23 pages
Module 6 - Un-Supervised Learning Algorithms
No ratings yet
Module 6 - Un-Supervised Learning Algorithms
31 pages
ARTIFICIAL INTELLIGENCE LEC 5
No ratings yet
ARTIFICIAL INTELLIGENCE LEC 5
20 pages
Unit 4
No ratings yet
Unit 4
74 pages
Ml Unit5 Notes
No ratings yet
Ml Unit5 Notes
18 pages
Machine Learning Unsupervised
No ratings yet
Machine Learning Unsupervised
20 pages
7.introduction To Clustering
No ratings yet
7.introduction To Clustering
11 pages
Unit- 4(ML)
No ratings yet
Unit- 4(ML)
13 pages
unit4
No ratings yet
unit4
96 pages
FML Unit4
No ratings yet
FML Unit4
14 pages
ML CH 4
No ratings yet
ML CH 4
51 pages
Unit-4
No ratings yet
Unit-4
53 pages
Unit-4 (2)
No ratings yet
Unit-4 (2)
29 pages
U1 - KMeans - 5th Sem - DS
No ratings yet
U1 - KMeans - 5th Sem - DS
14 pages
Week 9
No ratings yet
Week 9
66 pages
ML UNIT-III
No ratings yet
ML UNIT-III
18 pages
Unit 5
No ratings yet
Unit 5
5 pages
UNIT IV
No ratings yet
UNIT IV
19 pages
Clustering Algorithm
No ratings yet
Clustering Algorithm
47 pages
Unit 3 unsupervised learning algorith
No ratings yet
Unit 3 unsupervised learning algorith
15 pages
Chapter 1 Introduction
No ratings yet
Chapter 1 Introduction
47 pages
ML Unit-4
No ratings yet
ML Unit-4
14 pages
fuzzy meaning
No ratings yet
fuzzy meaning
6 pages
chapter 3 p4
No ratings yet
chapter 3 p4
18 pages
K Means Clustering
No ratings yet
K Means Clustering
22 pages
Unit-5 Clustering (March 16, 24)
No ratings yet
Unit-5 Clustering (March 16, 24)
25 pages
Classify Clustering
No ratings yet
Classify Clustering
31 pages
UNIT-5 Material
No ratings yet
UNIT-5 Material
42 pages
Clustering
No ratings yet
Clustering
10 pages
Clustering in Machine Learning: Prepared by
No ratings yet
Clustering in Machine Learning: Prepared by
10 pages
ML UNIT 4 Sir
No ratings yet
ML UNIT 4 Sir
42 pages
R20 machine learning unit 4
No ratings yet
R20 machine learning unit 4
49 pages
DWDM Unit5
No ratings yet
DWDM Unit5
14 pages
K - Mean Clustering
No ratings yet
K - Mean Clustering
12 pages
Unit IV
No ratings yet
Unit IV
96 pages
CLUSTERING
No ratings yet
CLUSTERING
11 pages
8. Clustering
No ratings yet
8. Clustering
38 pages
22AIP3101A Session 9
No ratings yet
22AIP3101A Session 9
38 pages
Mod4_Unsupervised Learning
No ratings yet
Mod4_Unsupervised Learning
9 pages
unit-4 ML
No ratings yet
unit-4 ML
16 pages
Unit 3 Unsupervised Learning & Neural Network
No ratings yet
Unit 3 Unsupervised Learning & Neural Network
21 pages
Machine_Learning_Unit_4
No ratings yet
Machine_Learning_Unit_4
22 pages
Clustering Notes
No ratings yet
Clustering Notes
37 pages
Unit 3 Clustering Algorithm
No ratings yet
Unit 3 Clustering Algorithm
44 pages
M5
No ratings yet
M5
40 pages
Chapter 1 Introduction
No ratings yet
Chapter 1 Introduction
49 pages
K Means
No ratings yet
K Means
9 pages
04-FSSR_DS610_2024=2025T1_Kmeans
No ratings yet
04-FSSR_DS610_2024=2025T1_Kmeans
57 pages
clustering
No ratings yet
clustering
9 pages
ML L14 Clustering
No ratings yet
ML L14 Clustering
59 pages
M5
No ratings yet
M5
40 pages
unit4_ml[1]
No ratings yet
unit4_ml[1]
20 pages
Working of K Means Algorithm - YashBhure
No ratings yet
Working of K Means Algorithm - YashBhure
14 pages
Clustering Algorithm: An Unsupervised Learning Approach
No ratings yet
Clustering Algorithm: An Unsupervised Learning Approach
23 pages
Clustering: An Overview: Key Concepts Objective
No ratings yet
Clustering: An Overview: Key Concepts Objective
12 pages
Lecture 3 Types of Machine Learning
No ratings yet
Lecture 3 Types of Machine Learning
40 pages
Unit 4
No ratings yet
Unit 4
40 pages
Unit 4 Clustering - K-Means and Hierarchical
No ratings yet
Unit 4 Clustering - K-Means and Hierarchical
40 pages
Week 10 Lecture - Introduction to Clustering(1)
No ratings yet
Week 10 Lecture - Introduction to Clustering(1)
35 pages
Artificial Intelligence in HR Practices
No ratings yet
Artificial Intelligence in HR Practices
24 pages
Character Recognition Using Neural Network 24 Pages
No ratings yet
Character Recognition Using Neural Network 24 Pages
24 pages
Department of Electronics and Communication Engineering Chaitanya Bharathi Institute of Technology
No ratings yet
Department of Electronics and Communication Engineering Chaitanya Bharathi Institute of Technology
15 pages
CS464 Ch1 Intro Fall2020
No ratings yet
CS464 Ch1 Intro Fall2020
83 pages
Classical Control 1 Sche A Me
No ratings yet
Classical Control 1 Sche A Me
7 pages
CPD Oct 23 Xiics
No ratings yet
CPD Oct 23 Xiics
2 pages
9 - IAI5101 Unsupervised Learning - 20-40
No ratings yet
9 - IAI5101 Unsupervised Learning - 20-40
21 pages
Dynamical Systems in Neuroscience
100% (1)
Dynamical Systems in Neuroscience
14 pages
IBS MCS One
100% (2)
IBS MCS One
37 pages
Group Assignment 1
No ratings yet
Group Assignment 1
3 pages
Dynamic Difficulty Adjustment Via Fast User Adaptation
No ratings yet
Dynamic Difficulty Adjustment Via Fast User Adaptation
3 pages
ME-314 Control Systems: Lecture 1: Introduction
No ratings yet
ME-314 Control Systems: Lecture 1: Introduction
27 pages
A Review of Classification Algorithms For EEG-based Brain-Computer Interfaces: A 10 Year Update
No ratings yet
A Review of Classification Algorithms For EEG-based Brain-Computer Interfaces: A 10 Year Update
29 pages
Control System Engineering
No ratings yet
Control System Engineering
2 pages
Ejercicios DSC
No ratings yet
Ejercicios DSC
39 pages
The Intellectual History of AI: CS158-2: Introduction To Artificial Intelligence 4 Term AY 2019-2020
No ratings yet
The Intellectual History of AI: CS158-2: Introduction To Artificial Intelligence 4 Term AY 2019-2020
41 pages
DIP Mini Project
100% (1)
DIP Mini Project
12 pages
Control, Organisation and Accounting
No ratings yet
Control, Organisation and Accounting
14 pages
Week 4
No ratings yet
Week 4
55 pages
Translation Earpods-2
No ratings yet
Translation Earpods-2
13 pages
EEE373 Electric Motor Drive: Asst. Prof. Dr. Mongkol Konghirun Ee, Kmutt
No ratings yet
EEE373 Electric Motor Drive: Asst. Prof. Dr. Mongkol Konghirun Ee, Kmutt
16 pages
AI Rule Based Vs Machine Learning
No ratings yet
AI Rule Based Vs Machine Learning
3 pages
10ME82 16-17 (Control Engg.)
No ratings yet
10ME82 16-17 (Control Engg.)
30 pages
A Comparative Study of Sliding Mode Observers
No ratings yet
A Comparative Study of Sliding Mode Observers
7 pages
2017CS10324 Amal Prasad SIV895 Assignment
No ratings yet
2017CS10324 Amal Prasad SIV895 Assignment
6 pages
Informe Ingles
No ratings yet
Informe Ingles
11 pages
Programming Computer: Python
No ratings yet
Programming Computer: Python
4 pages
Neural Network and Fuzzy Logic
No ratings yet
Neural Network and Fuzzy Logic
4 pages
Model Hidden Markov Pada Prediksi Harga Beras Dan Perpindahan Konsumen Beras Di Kota Solok Provinsi Sumatera Barat Melsi Diansa Putri
No ratings yet
Model Hidden Markov Pada Prediksi Harga Beras Dan Perpindahan Konsumen Beras Di Kota Solok Provinsi Sumatera Barat Melsi Diansa Putri
10 pages