Unsupervised Machine Learning
Unsupervised Machine Learning
Unsupervised Machine Learning
unsupervised learning is a machine learning technique in which models are not supervised
using training dataset. Instead, models itself find the hidden patterns and insights from the
given data.
“Unsupervised learning is a type of machine learning in which models are trained using
unlabeled dataset and are allowed to act on that data without any supervision.”
The goal of unsupervised learning is to find the underlying structure of dataset, group
that data according to similarities, and represent that dataset in a compressed
format.
o Unsupervised learning is helpful for finding useful insights from the data.
o Unsupervised learning is much similar as a human learns to think by their own
experiences, which makes it closer to the real AI.
o In real-world, we do not always have input data with the corresponding output so to
solve such cases, we need unsupervised learning.
Association:
• Used for finding the relationships between variables in the large database.
• Association rule makes marketing strategy more effective. Such as people who buy X
item (suppose a bread) are also tend to purchase Y (Butter/Jam) item. A typical
example of Association rule is Market Basket Analysis.
• Through the use of clusters, It is very easy to sort data and analyze
specific groups.
• Clustering enables businesses to approach customer segments
differently based on their attributes and similarities. This helps in
maximizing profits.
• It can help in dimensionality reduction if the dataset is comprised
of too many variables. Irrelevant clusters can be identified easier
and removed from the dataset.
Where it is used?
• City Planning: It is used to make groups of houses and to study their
values based on their geographical locations and other factors present.
Types of Clustering:-
❖ Exclusive Clustering
• k-Means Clustering
❖ Overlapping Clustering
• Fuzzy c-Means Clustering
❖ Hierarchical Clustering
Exclusive Clustering:-
• Exclusive clustering, also known as hard clustering, is a type of clustering
in unsupervised machine learning where each data point is assigned to
exactly one cluster.
• In other words, there is a clear and exclusive assignment of each data
point to a single cluster, and no overlapping memberships are allowed.
k-Means Clustering:-
• K-means clustering is a popular unsupervised machine learning
algorithm used for partitioning a dataset into a set of groups.
Step-2: Select random K points or centroids. (It can be other from the input
dataset).
Step-3: Calculate the distance between each data point and centroid.Assign
each data point to their closest centroid, which will form the predefined K
clusters.
Step-4: Recalculate each cluster center by taking the average of cluster’s data
point.
Step-5: Repeat from step2 to step5 until the recalculated cluster centers are
same as previous or No reassignment of data points happend.
∑Pi in Cluster1 distance(Pi C1)2: It is the sum of the square of the distances
between each data point and its centroid within a cluster1 and the same
for the other two terms.
• To measure the distance between data points and centroid, we can use
any method such as Euclidean distance .
• To find the optimal value of clusters, the elbow method follows the
below steps:
(1) It executes the K-means clustering on a given dataset for different K
values (ranges from 1-10).
(3) Plots a curve between calculated WCSS values and the number of
clusters K.
(4) The sharp point of bend or a point of the plot looks like an arm, then
that point is considered as the best value of K.
• Since the graph shows the sharp bend, which looks like an elbow, hence it
is known as the elbow method.
Overlapping Clustering:-
• Overlapping clustering, also known as soft clustering ,it’s a type of
clustering in which a data point can belong to more than one cluster.
Hierarchical Clustering:-
• Hierarchical clustering is a type of clustering algorithm that organizes
data points into a tree-like structure, known as a dendrogram.
Advantages:-
• Hierarchy Representation: This hierarchical structure can be useful for
understanding the organization of the data.
• Fraud Detection:
Discovering unusual patterns of transactions that may indicate
fraudulent activity.
Pros and Cons:
Pros:
• Discover Hidden Patterns:
Association rule can reveal hidden patterns or relationships within
the data.
• Applicability:
Used in various domain, including retail, healthcare, finance etc.
Cons:
• Data Quality:
Sensitive to noise and irrelevant information in the dataset.
• Scalability:
Computationally expensive for large datasets.