0% found this document useful (0 votes)
100 views

Lecture 3. Partitioning-Based Clustering Methods

The document discusses different partitioning-based clustering methods. It covers the basic concepts of partitioning algorithms including k-means clustering. The k-means method is described as assigning points to the closest centroid and recomputing centroids iteratively until convergence. Variations discussed include k-medoids clustering, which uses data points as centroids rather than means, making it more robust to outliers.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
100 views

Lecture 3. Partitioning-Based Clustering Methods

The document discusses different partitioning-based clustering methods. It covers the basic concepts of partitioning algorithms including k-means clustering. The k-means method is described as assigning points to the closest centroid and recomputing centroids iteratively until convergence. Variations discussed include k-medoids clustering, which uses data points as centroids rather than means, making it more robust to outliers.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 27

Lecture 3.

Partitioning-Based
Clustering Methods
Lecture 3. Partitioning-Based Clustering Methods
 Basic Concepts of Partitioning Algorithms

 The K-Means Clustering Method

 Initialization of K-Means Clustering


 The K-Medoids Clustering Method
 The K-Medians and K-Modes Clustering Methods
 The Kernel K-Means Clustering Method
 Summary

2
Session 1: Basic Concepts of
Partitioning Algorithms
Partitioning Algorithms: Basic Concepts
 Partitioning method: Discovering the groupings in the data by optimizing a specific
objective function and iteratively improving the quality of partitions
 K-partitioning method: Partitioning a dataset D of n objects into a set of K clusters
so that an objective function is optimized (e.g., the sum of squared distances is
minimized, where ck is the centroid or medoid of cluster Ck)
 A typical objective function: Sum of Squared Errors (SSE)
K
SSE (C )    || xi ck ||2
k 1 xiCk
 Problem definition: Given K, find a partition of K clusters that optimizes the chosen
partitioning criterion
 Global optimal: Needs to exhaustively enumerate all partitions
 Heuristic methods (i.e., greedy algorithms): K-Means, K-Medians, K-Medoids, etc.

4
Session 2: The K-Means
Clustering Method
The K-Means Clustering Method
 K-Means (MacQueen’67, Lloyd’57/’82)
 Each cluster is represented by the center of the cluster
 Given K, the number of clusters, the K-Means clustering algorithm is outlined as follows

 Select K points as initial centroids


 Repeat
 Form K clusters by assigning each point to its closest centroid
 Re-compute the centroids (i.e., mean point) of each cluster
 Until convergence criterion is satisfied
 Different kinds of measures can be used

 Manhattan distance (L1 norm), Euclidean distance (L2 norm), Cosine similarity
6
Example: K-Means Clustering
Assign
Recompute
points to
cluster
clusters
centers

The original data points & Redo point assignment


randomly select K = 2 centroids

Execution of the K-Means Clustering Algorithm


Select K points as initial centroids
Repeat
• Form K clusters by assigning each point to its closest centroid
• Re-compute the centroids (i.e., mean point) of each cluster
Until convergence criterion is satisfied
7
Discussion on the K-Means Method
 Efficiency: O(tKn) where n: # of objects, K: # of clusters, and t: # of iterations
 Normally, K, t << n; thus, an efficient method
 K-means clustering often terminates at a local optimal
 Initialization can be important to find high-quality clusters
 Need to specify K, the number of clusters, in advance
 There are ways to automatically determine the “best” K
 In practice, one often runs a range of values and selected the “best” K value
 Sensitive to noisy data and outliers
 Variations: Using K-medians, K-medoids, etc.
 K-means is applicable only to objects in a continuous n-dimensional space
 Using the K-modes for categorical data
 Not suitable to discover clusters with non-convex shapes
 Using density-based clustering, kernel K-means, etc.
8
Variations of K-Means
 There are many variants of the K-Means method, varying in different aspects

 Choosing better initial centroid estimates

 K-means++, Intelligent K-Means, Genetic K-Means To be discussed in this lecture

 Choosing different representative prototypes for the clusters

 K-Medoids, K-Medians, K-Modes To be discussed in this lecture

 Applying feature transformation techniques

 Weighted K-Means, Kernel K-Means To be discussed in this lecture

9
Session 3: Initialization of K-
Means Clustering
Initialization of K-Means
 Different initializations may generate rather different clustering
results (some could be far from optimal)
 Original proposal (MacQueen’67): Select K seeds randomly

 Need to run the algorithm multiple times using different seeds


 There are many methods proposed for better initialization of k seeds

 K-Means++ (Arthur & Vassilvitskii’07):


 The first centroid is selected at random
 The next centroid selected is the one that is farthest from the currently selected
(selection is based on a weighted probability score)
 The selection continues until K centroids are obtained

11
Example: Poor Initialization May Lead to Poor Clustering

Assign Recompute
points to cluster
clusters centers

Another random selection of k


centroids for the same data points

 Rerun of the K-Means using another random K seeds

 This run of K-Means generates a poor quality clustering

12
Session 4: The K-Medoids
Clustering Method
Handling Outliers: From K-Means to K-Medoids
 The K-Means algorithm is sensitive to outliers!—since an object with an extremely
large value may substantially distort the distribution of the data
 K-Medoids: Instead of taking the mean value of the object in a cluster as a reference
point, medoids can be used, which is the most centrally located object in a cluster
 The K-Medoids clustering algorithm:
 Select K points as the initial representative objects (i.e., as initial K medoids)
 Repeat
 Assigning each point to the cluster with the closest medoid
 Randomly select a non-representative object oi
 Compute the total cost S of swapping the medoid m with oi
 If S < 0, then swap m with oi to form the new set of medoids
 Until convergence criterion is satisfied
14
PAM: A Typical K-Medoids Algorithm
10 10 10

9 9 9

8 8 8

7
Arbitrary 7
Assign 7

choose K each
6 6 6

remaining
5 5

4 object as 4 4

3 initial 3 object to 3

2
medoids 2
nearest 2

1 1
medoids 1

0 0 0
0 1 2 3 4 5 6 7 8 9 10 0 1 2 3 4 5 6 7 8 9 10 0 1 2 3 4 5 6 7 8 9 10

K=2
Randomly select a non-
medoid object,Oramdom
Select initial K medoids randomly
10 10

Repeat
Compute
9 9

Swapping O 8 8

Object re-assignment total cost of


and Oramdom
7 7

6
swapping 6

Swap medoid m with oi if it If quality is


5

4
5

improved
improves the clustering quality 3

2
3

1 1

Until convergence criterion is satisfied 0


0 1 2 3 4 5 6 7 8 9 10
0
0 1 2 3 4 5 6 7 8 9 10

15
Discussion on K-Medoids Clustering
 K-Medoids Clustering: Find representative objects (medoids) in clusters
 PAM (Partitioning Around Medoids: Kaufmann & Rousseeuw 1987)
 Starts from an initial set of medoids, and
 Iteratively replaces one of the medoids by one of the non-medoids if it improves
the total sum of the squared errors (SSE) of the resulting clustering
 PAM works effectively for small data sets but does not scale well for large data
sets (due to the computational complexity)
 Computational complexity: PAM: O(K(n − K)2) (quite expensive!)
 Efficiency improvements on PAM
 CLARA (Kaufmann & Rousseeuw, 1990):
 PAM on samples; O(Ks2 + K(n − K)), s is the sample size
 CLARANS (Ng & Han, 1994): Randomized re-sampling, ensuring efficiency + quality
16
Session 5: The K-Medians and K-
Modes Clustering Methods
K-Medians: Handling Outliers by Computing Medians
 Medians are less sensitive to outliers than means
 Think of the median salary vs. mean salary of a large firm when adding a few top
executives!
 K-Medians: Instead of taking the mean value of the object in a cluster as a reference
point, medians are used (L1-norm as the distance measure)
 The criterion function for the K-Medians algorithm: K
S    | xij  med kj |
 The K-Medians clustering algorithm: k 1 xiCk

 Select K points as the initial representative objects (i.e., as initial K medians)


 Repeat
 Assign every point to its nearest median
 Re-compute the median using the median of each individual feature
 Until convergence criterion is satisfied
18
K-Modes: Clustering Categorical Data
 K-Means cannot handle non-numerical (categorical) data
 Mapping categorical value to 1/0 cannot generate quality clusters for high-
dimensional data
 K-Modes: An extension to K-Means by replacing means of clusters with modes
 Dissimilarity measure between object X and the center of a cluster Z
 Φ(xj, zj) = 1 – njr/nl when xj = zj ; 1 when xj ǂ zj
 where zj is the categorical value of attribute j in Zl, nl is the number of objects
in cluster l, and njr is the number of objects whose attribute value is r
 This dissimilarity measure (distance function) is frequency-based
 Algorithm is still based on iterative object cluster assignment and centroid update
 A fuzzy K-Modes method is proposed to calculate a fuzzy cluster membership
value for each object to each cluster
 A mixture of categorical and numerical data: Using a K-Prototype method
19
Session 6: Kernel K-Means
Clustering
Kernel K-Means Clustering
 Kernel K-Means can be used to detect non-convex clusters
 K-Means can only detect clusters that are linearly separable
 Idea: Project data onto the high-dimensional kernel space, and
then perform K-Means clustering
 Map data points in the input space onto a high-dimensional feature
space using the kernel function
 Perform K-Means on the mapped feature space
 Computational complexity is higher than K-Means
 Need to compute and store n x n kernel matrix generated from the
kernel function on the original data
 The widely studied spectral clustering can be considered as a variant of
Kernel K-Means clustering
21
Kernel Functions and Kernel K-Means Clustering
 Typical kernel functions:
 Polynomial kernel of degree h: K(Xi, Xj) = (Xi∙Xj + 1)h
|| X i  X j || 2 /2 2
 Gaussian radial basis function (RBF) kernel: K(Xi, Xj) = e
 Sigmoid kernel: K(Xi, Xj) = tanh(κXi∙Xj −δ)
 The formula for kernel matrix K for any two points xi, xj є Ck is K xi x j   ( xi )   ( x j )
K
 The SSE criterion of kernel K-means: SSE (C )    ||  ( xi )  ck ||2
k 1 xiCk
 The formula for the cluster centroid:
 (x )
xiCk
i

ck 
| Ck |

 Clustering can be performed without the actual individual projections φ(xi) and φ(xj)
for the data points xi, xj є Ck
22
Example: Kernel Functions and Kernel K-Means Clustering
|| X i  X j || 2 /2 2
 Gaussian radial basis function (RBF) kernel: K(Xi, Xj) = e
 Suppose there are 5 original 2-dimensional points:
 x1(0, 0), x2(4, 4), x3(-4, 4), x4(-4, -4), x5(4, -4)
 If we set 𝜎 to 4, we will have the following points in the kernel space
2 32
2 2 −
 E.g., 𝑥1 − 𝑥2 = 0−4 + 0−4 = 32, therefore, 𝐾 𝑥1 , 𝑥2 = 𝑒 2⋅42 = 𝑒 −1

Original Space RBF Kernel Space (𝜎 = 4)


𝑲(𝒙𝒊 , 𝒙𝟏 ) 𝑲(𝒙𝒊 , 𝒙𝟐 ) 𝑲(𝒙𝒊 , 𝒙𝟑 ) 𝑲(𝒙𝒊 , 𝒙𝟒 ) 𝑲(𝒙𝒊 , 𝒙𝟓 )
𝑥 𝑦
x1 0 0 0 −
42 +42
−1
𝑒 −1 𝑒 −1 𝑒 −1
𝑒 2⋅4 2 =𝑒
x2 4 4
𝑒 −1 0 𝑒 −2 𝑒 −4 𝑒 −2
x3 −4 4
𝑒 −1 𝑒 −2 0 𝑒 −2 𝑒 −4
x4 −4 −4
𝑒 −1 𝑒 −4 𝑒 −2 0 𝑒 −2
x5 4 −4
𝑒 −1 𝑒 −2 𝑒 −4 𝑒 −2 0
23
Example: Kernel K-Means Clustering

The original data set The result of K-Means clustering The result of Gaussian Kernel K-Means clustering

 The above data set cannot generate quality clusters by K-Means since it contains non-
covex clusters
 Gaussian RBF Kernel transformation maps data to a kernel matrix K for any two points
|| X i  X j || 2 /2 2
xi, xj: K x x   ( xi )   ( x j ) and Gaussian kernel: K(Xi, Xj) = e
i j

 K-Means clustering is conducted on the mapped data, generating quality clusters


24
Session 7: Summary
Summary: Partitioning-Based Clustering Methods
 Basic Concepts of Partitioning Algorithms

 The K-Means Clustering Method

 Initialization of K-Means Clustering


 The K-Medoids Clustering Method
 The K-Medians and K-Modes Clustering Methods
 The Kernel K-Means Clustering Method
 Summary

26
Recommended Readings
 J. MacQueen. Some Methods for Classification and Analysis of Multivariate Observations. In Proc.
of the 5th Berkeley Symp. on Mathematical Statistics and Probability, 1967
 S. Lloyd. Least Squares Quantization in PCM. IEEE Trans. on Information Theory, 28(2), 1982
 A. K. Jain and R. C. Dubes. Algorithms for Clustering Data. Prentice Hall, 1988
 L. Kaufman and P. J. Rousseeuw. Finding Groups in Data: An Introduction to Cluster Analysis. John
Wiley & Sons, 1990
 R. Ng and J. Han. Efficient and Effective Clustering Method for Spatial Data Mining. VLDB'94
 B. Schölkopf, A. Smola, and K. R. Müller. Nonlinear Component Analysis as a Kernel Eigenvalue
Problem. Neural computation, 10(5):1299–1319, 1998
 I. S. Dhillon, Y. Guan, and B. Kulis. Kernel K-Means: Spectral Clustering and Normalized Cuts. KDD’04
 D. Arthur and S. Vassilvitskii. K-means++: The Advantages of Careful Seeding. SODA’07
 C. K. Reddy and B. Vinzamuri. A Survey of Partitional and Hierarchical Clustering Algorithms, in
(Chap. 4) Aggarwal and Reddy (eds.), Data Clustering: Algorithms and Applications. CRC Press, 2014
 M. J. Zaki and W. Meira, Jr.. Data Mining and Analysis: Fundamental Concepts and Algorithms.
Cambridge Univ. Press, 2014
27

You might also like