K-means Clustering
Meghana Tribhuwan
K-means
• K means clears the confusion as to how many
groups
• Till you reach a point where no reassignment
is needed
2nd example Random Initialization
We take more appropriate clusters
End result
What will happen if we have a bad (Centroid)
random initialization ?
• Assume that we perform all of the steps again.
Final result after performing the said steps
• Before After
Selection of the centroids has a huge impact on the
clusters and assigning centroids is random…..then
Algorithm to identify(decide) the right
number of Clusters
• If we determine the clusters to be 3 than after
applying K-means algo
How do we know what will perform better
weather 3 or 4 or 10 clusters
• Formula to choose the right number of
clusters
• Within-Cluster-Sum-of-Squares (WCSS)
Within-Cluster-Sum-of-Squares (WCSS)
• Calculate each points distance from its
centroid and square it
• If we take only one big cluster?
• the distance between the points and Centroid
will be more and so will be the WCSS value
• WCSS decreases when we make 2 clusters
• WCSS has decrease more
• Question is how many clusters we can have?
• Max. No of clusters can be as many data
points you have, eg 50 points 50 clusters
• Pause the video think and tell me, what will be
the value of WCSS?
• Answer is it will be zero
• Every point will be its own centroid therefore
distance between the point and centroid will be
0. Square them the value will be 0and after
adding also it will be 0.
• The lesser the WCSS the better our goodness of
fit will be.
• But how do we find the optimum goodness of
fit?
Elbow method
• But is an arbitrary method, not very particular.
• Elbow
method is a
hint,
ultimately
you have to
choose 2
then 3 and
then 4 and
you have to
decide for
yourself as
you are the
one who is
analysing the
data