Assignment 5
Assignment 5
Assignment 5
1. Explain k means clustering algorithm and give its advantages and disadvantages
Ans):
K-Means Clustering is an unsupervised learning algorithm that is used to solve the clustering problems
in machine learning or data science. In this topic, we will learn what is K-means clustering algorithm,
how the algorithm works, along with the Python implementation of k-means clustering.
It allows us to cluster the data into different groups and a convenient way to discover the categories of
groups in the unlabelled dataset on its own without the need for any training.
It is a centroid-based algorithm, where each cluster is associated with a centroid. The main aim of this
algorithm is to minimize the sum of distances between the data point and their corresponding clusters.
The algorithm takes the unlabelled dataset as input, divides the dataset into k-number of clusters, and
repeats the process until it does not find the best clusters. The value of k should be predetermined in
this algorithm.
o Determines the best value for K centre points or centroids by an iterative process.
o Assigns each data point to its closest k-centre. Those data points which are near to the
particular k-centre, create a cluster.
Hence each cluster has datapoints with some commonalities, and it is away from other clusters
Advantages of k-means:
Disadvantages of k-means:
Choosing k manually:
Use the “Loss vs. Clusters” plot to find the optimal (k).
For a low k, you can mitigate this dependence by running k-means several times with different initial
values and picking the best result. As k increases, you need advanced versions of k-means to pick
better values of the initial centroids (called k-means seeding). Clustering data of varying sizes and
density.
k-means has trouble clustering data where clusters are of varying sizes and density. To cluster such
data, you need to generalize k-means as described in the Advantages section.
Clustering outliers.
Centroids can be dragged by outliers, or outliers might get their own cluster instead of being ignored.
Consider removing or clipping outliers before clustering.
2. Use K-means clustering algorithm to divide the following data into clusters.
D = {2,3,4,10,11,12,20,25,30}
K=2
M1 = 4, M2 = 12
Ans)
3. Use K-means clustering algorithm to divide the following data into clusters.
D = {2,4,6,9,12,16,20,24,26}
K=2
M1 = 4, M2 = 12
Ans)
Ans)