Chapter 9
Chapter 9
Chapter 9
K-Means Clustering
• Unsupervised machine learning algorithm used for partitioning a dataset
into a pre-defined number of clusters.
• The goal is to group similar data points together and discover underlying
patterns or structures within the data.
• the first property of clusters –the points within a cluster should be
similar to each other.
• So, our aim here is to minimize the distance between the points within a
cluster.
• tries to minimize the distance of the points in a cluster with their
centroid .
• K-means is a centroid-based algorithm or a distance-based algorithm,
where we calculate the distances to assign a point to a cluster.
• In K-Means, each cluster is associated with a centroid.
• popular method for grouping data by assigning observations to clusters
based on proximity to the cluster’s center.
Objective of K-Means Algorithm
• The main objective of the K-Means algorithm is
to minimize the sum of distances between the
points and their respective cluster centroid.
• Optimization plays a crucial role in the k-means
clustering algorithm.
• The goal of the optimization process is to find
the best set of centroids that minimizes the sum
of squared distances between each data point
and its closest centroid.
How K-Means Clustering Works?
• Initialization: Start by randomly selecting K points from the dataset.
These points will act as the initial cluster centroids.
• Assignment: For each data point in the dataset, calculate the distance
between that point and each of the K centroids. Assign the data point to
the cluster whose centroid is closest to it. This step effectively forms K
clusters.
• Update centroids: Once all data points have been assigned to clusters,
recalculate the centroids of the clusters by taking the mean of all data
points assigned to each cluster.
• Repeat: Repeat steps 2 and 3 until convergence. Convergence occurs
when the centroids no longer change significantly or when a specified
number of iterations is reached.
• Final Result: Once convergence is achieved, the algorithm outputs the
final cluster centroids and the assignment of each data point to a cluster.
• Stopping Criteria for K-Means Clustering
• There are essentially three stopping criteria
that can be adopted to stop the K-means
algorithm:
– Centroids of newly formed clusters do not change
– Points remain in the same cluster
– Maximum number of iterations is reached
How to Apply K-Means Clustering Algorithm:Example