Chapter 9

Chapter 9

K-Means Clustering
• Unsupervised machine learning algorithm used for partitioning a dataset
into a pre-defined number of clusters.
• The goal is to group similar data points together and discover underlying
patterns or structures within the data.
• the first property of clusters –the points within a cluster should be
similar to each other.
• So, our aim here is to minimize the distance between the points within a
• tries to minimize the distance of the points in a cluster with their
centroid .
• K-means is a centroid-based algorithm or a distance-based algorithm,
where we calculate the distances to assign a point to a cluster.
• In K-Means, each cluster is associated with a centroid.
• popular method for grouping data by assigning observations to clusters
based on proximity to the cluster’s center.
Objective of K-Means Algorithm
• The main objective of the K-Means algorithm is
to minimize the sum of distances between the
points and their respective cluster centroid.
• Optimization plays a crucial role in the k-means
clustering algorithm.
• The goal of the optimization process is to find
the best set of centroids that minimizes the sum
of squared distances between each data point
and its closest centroid.
How K-Means Clustering Works?
• Initialization: Start by randomly selecting K points from the dataset.
These points will act as the initial cluster centroids.
• Assignment: For each data point in the dataset, calculate the distance
between that point and each of the K centroids. Assign the data point to
the cluster whose centroid is closest to it. This step effectively forms K
• Update centroids: Once all data points have been assigned to clusters,
recalculate the centroids of the clusters by taking the mean of all data
points assigned to each cluster.
• Repeat: Repeat steps 2 and 3 until convergence. Convergence occurs
when the centroids no longer change significantly or when a specified
number of iterations is reached.
• Final Result: Once convergence is achieved, the algorithm outputs the
final cluster centroids and the assignment of each data point to a cluster.
• Stopping Criteria for K-Means Clustering
• There are essentially three stopping criteria
that can be adopted to stop the K-means
– Centroids of newly formed clusters do not change
– Points remain in the same cluster
– Maximum number of iterations is reached
How to Apply K-Means Clustering Algorithm:Example

• Initial stage: 8 points

• Step-1: Choose the number of clusters k: Let k be 2

• Step-2: Select k random points from the data as centroids:
randomly select the centroid

Here, the red and green circles represent the

centroid for these clusters.
How to Apply K-Means Clustering
Algorithm: Example
• Step-3:Assign all the points to the closest cluster

The points closer to the red point are assigned to the

red cluster, whereas the points closer to the green
point are assigned to the green cluster.

• Step-4: Recompute the centroids

of newly formed clusters
Now, once we have assigned all of the points to
either cluster, the next step is to compute the
centroids of newly formed clusters: Here, the red and green crosses are
the new centroids.
How to Apply K-Means Clustering
Algorithm: Example
• Repeat steps 3 and 4

