You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: contrib/machine-learning/K-Means_Clustering.md
+58-48Lines changed: 58 additions & 48 deletions
Original file line number
Diff line number
Diff line change
@@ -1,19 +1,25 @@
1
1
# K-Means Clustering
2
2
Unsupervised Learning Algorithm for Grouping Similar Data.
3
+
3
4
## Introduction
4
5
K-means clustering is a fundamental unsupervised machine learning algorithm that excels at grouping similar data points together. It's a popular choice due to its simplicity and efficiency in uncovering hidden patterns within unlabeled datasets.
6
+
5
7
## Unsupervised Learning
6
8
Unlike supervised learning algorithms that rely on labeled data for training, unsupervised algorithms, like K-means, operate solely on input data (without predefined categories). Their objective is to discover inherent structures or groupings within the data.
9
+
7
10
## The K-Means Objective
8
11
Organize similar data points into clusters to unveil underlying patterns. The main objective is to minimize total intra-cluster variance or the squared function.
9
12
10
13

11
14
## Clusters and Centroids
12
15
A cluster represents a collection of data points that share similar characteristics. K-means identifies a pre-determined number (k) of clusters within the dataset. Each cluster is represented by a centroid, which acts as its central point (imaginary or real).
16
+
13
17
## Minimizing In-Cluster Variation
14
18
The K-means algorithm strategically assigns each data point to a cluster such that the total variation within each cluster (measured by the sum of squared distances between points and their centroid) is minimized. In simpler terms, K-means strives to create clusters where data points are close to their respective centroids.
19
+
15
20
## The Meaning Behind "K-Means"
16
21
The "means" in K-means refers to the averaging process used to compute the centroid, essentially finding the center of each cluster.
22
+
17
23
## K-Means Algorithm in Action
18
24

19
25
The K-means algorithm follows an iterative approach to optimize cluster formation:
@@ -24,62 +30,66 @@ The K-means algorithm follows an iterative approach to optimize cluster formatio
24
30
4.**Iteration Until Convergence:** Steps 2 and 3 are repeated iteratively until a stopping criterion is met. This criterion can be either:
25
31
-**Centroid Stability:** No significant change occurs in the centroids' positions, indicating successful clustering.
26
32
-**Reaching Maximum Iterations:** A predefined number of iterations is completed.
27
-
## Code
28
-
Following is a simple implementation of K-Means.
33
+
34
+
## Code
35
+
Following is a simple implementation of K-Means.
29
36
30
-
31
-
# Generate and Visualize Sample Data
32
-
# import the necessary Libraries
33
-
34
-
import numpy as np
35
-
import matplotlib.pyplot as plt
37
+
```python
38
+
# Generate and Visualize Sample Data
39
+
# import the necessary Libraries
36
40
37
-
# Create data points for cluster 1 and cluster 2
38
-
X = -2 * np.random.rand(100, 2)
39
-
X1 = 1 + 2 * np.random.rand(50, 2)
40
-
41
-
# Combine data points from both clusters
42
-
X[50:100, :] = X1
43
-
44
-
# Plot data points and display the plot
45
-
plt.scatter(X[:, 0], X[:, 1], s=50, c='b')
46
-
plt.show()
47
-
48
-
# K-Means Model Creation and Training
49
-
from sklearn.cluster import KMeans
50
-
51
-
# Create KMeans object with 2 clusters
52
-
kmeans = KMeans(n_clusters=2)
53
-
kmeans.fit(X) # Train the model on the data
54
-
55
-
# Visualize Data Points with Centroids
56
-
centroids = kmeans.cluster_centers_ # Get centroids (cluster centers)
57
-
58
-
plt.scatter(X[:, 0], X[:, 1], s=50, c='b') # Plot data points again
**K-Means** can be applied to data that has a smaller number of dimensions, is numeric, and is continuous or can be used to find groups that have not been explicitly labeled in the data. As an example, it can be used for Document Classification, Delivery Store Optimization, or Customer Segmentation.
79
-
## Reference
80
-
[[Survey of Machine Learning and Data Mining Techniques used in Multimedia System](https://www.researchgate.net/publication/333457161_Survey_of_Machine_Learning_and_Data_Mining_Techniques_used_in_Multimedia_System?_tp=eyJjb250ZXh0Ijp7ImZpcnN0UGFnZSI6Il9kaXJlY3QiLCJwYWdlIjoiX2RpcmVjdCJ9fQ)]
81
88
82
-
[[A Clustering Approach for Outliers Detection in a Big Point-of-Sales Database](https://www.researchgate.net/publication/339267868_A_Clustering_Approach_for_Outliers_Detection_in_a_Big_Point-of-Sales_Database?_tp=eyJjb250ZXh0Ijp7ImZpcnN0UGFnZSI6Il9kaXJlY3QiLCJwYWdlIjoiX2RpcmVjdCJ9fQ)]
89
+
## References
90
+
91
+
-[Survey of Machine Learning and Data Mining Techniques used in Multimedia System](https://www.researchgate.net/publication/333457161_Survey_of_Machine_Learning_and_Data_Mining_Techniques_used_in_Multimedia_System?_tp=eyJjb250ZXh0Ijp7ImZpcnN0UGFnZSI6Il9kaXJlY3QiLCJwYWdlIjoiX2RpcmVjdCJ9fQ)
92
+
-[A Clustering Approach for Outliers Detection in a Big Point-of-Sales Database](https://www.researchgate.net/publication/339267868_A_Clustering_Approach_for_Outliers_Detection_in_a_Big_Point-of-Sales_Database?_tp=eyJjb250ZXh0Ijp7ImZpcnN0UGFnZSI6Il9kaXJlY3QiLCJwYWdlIjoiX2RpcmVjdCJ9fQ)
0 commit comments