EM and Kmeans relations
EM and Kmeans relations
EM and Kmeans relations
Machine Learning
CMU-10715
Clustering and EM
Barnabás Póczos
Contents
Clustering
K-means
Mixture of Gaussians
Expectation Maximization
Variational Methods
2
Clustering
3
K- means clustering
What is clustering?
Clustering:
The process of grouping a set of objects into classes of similar objects
–high intra-class similarity
–low inter-class similarity
–It is the most common form of unsupervised learning
Clustering is Subjective
4
K- means clustering
What is clustering?
Clustering:
The process of grouping a set of objects into classes of similar objects
–high intra-class similarity
–low inter-class similarity
–It is the most common form of unsupervised learning
5
K- means clustering
What is Similarity?
6
The K- means Clustering Problem
7
K-means Clustering Problem
K=3
8
K-means Clustering Problem
The problem is NP hard, but there are good heuristic algorithms that seem
to work well in practice:
• K–means algorithm
• mixture of Gaussians
9
K-means Clustering Alg: Step 1
• Given n objects.
• Guess the cluster centers (k1, k2, k3. They were 1, 2,3 in the previous slide)
10
K-means Clustering Alg: Step 2
• Build a Voronoi diagram based on the cluster centers k1, k2, k3.
• Decide the class memberships of the n objects by assigning them to the
nearest cluster centers k1, k2, k3. 11
K-means Clustering Alg: Step 3
Assume these two steps are each done once for l iterations: O(lKn).
17
K- means clustering
Seed Choice
18
K- means clustering
Seed Choice
19
K- means clustering
Seed Choice
The results of the K- means Algorithm can vary based on random seed
selection.
20
Alternating Optimization
21
K- means clustering
K- means Algorithm (more formally)
K-means algorithm:
(1)
(2)
26
Density Estimation
Generative approach
What if the basic model (e.g. a Gaussian) doesn’t fit all data?
) Mixture modelling, Partitioning algorithms
Different parameters for different parts of the domain.
27
K- means clustering
Partitioning Algorithms
• K-means
–hard assignment: each object xj belongs to only one cluster
• Mixture modeling
–soft assignment: probability that an object belongs to a cluster
28
K- means clustering
Gaussian Mixture Model
Mixture of K Gaussians distributions: (Multi-modal distribution)
• There are K components
• Component i has an associated mean vector i
29
Gaussian Mixture Model
Mixture of K Gaussians distributions: (Multi-modal distribution)
Hidden variable
30
Mixture of Gaussians Clustering
Assume that
31
Mixture of Gaussians Clustering
Assume that
32
Piecewise linear decision boundary
33
MLE for GMM
What if we don't know the parameters?
34
K-means and GMM
MLE:
Same as K-means!!! 35
General GMM
General GMM –Gaussian Mixture Model
Mixture Mixture
component proportion
37
General GMM
Assume that
39
The EM algorithm
What is EM in the general case, and why does it work?
40
Expectation-Maximization (EM)
A general algorithm to deal with hidden data, but we will study it in
the context of unsupervised learning (hidden class labels =
clustering) first.
• EM is an optimization strategy for objective functions that can be interpreted
as likelihoods in the presence of missing data.
• EM is “simpler” than gradient methods:
No need to choose step size.
• EM is an iterative algorithm with two linked steps:
o E-step: fill-in hidden values using inference
o M-step: apply standard MLE/MAP method to completed data
• We will prove that this procedure monotonically improves the likelihood (or
leaves it unchanged). EM always converges to a local optimum of the
likelihood.
41
General EM algorithm
Notation
Observed data:
Unknown variables:
For example in clustering:
Paramaters:
Goal:
42
General EM algorithm
Other Examples: Hidden Markov Models
Observed data:
Unknown variables:
Paramaters:
Initial probabilities:
Transition probabilities:
Emission probabilities:
Goal:
43
General EM algorithm
Goal:
Free energy:
E Step:
M Step:
Free energy:
M Step:
We maximize only here in !!! 45
General EM algorithm
Free energy:
Proof:
46
General EM algorithm
Goal:
E Step:
M Step:
47
Convergence of EM
48
Convergence of EM
50
Variational methods
Free energy:
51
Variational methods
Free energy:
Partial E Step:
Alternate between filling in the latent variables using the best guess (posterior)
and updating the parameters based on this guess:
E Step:
M Step:
54
Expectation-Maximization (EM)
A simple case:
• We have unlabeled data x1, x2, …, xn
• We know there are K classes
• We know P(y=1)=1, P(y=2)=2, P(y=3) =3…, P(y=K)=K
• We know common variance 2
• We don’t know 1, 2, … K , and we want to learn them
We can write
Independent data
E step
M-step
Compute Max of function Q. [I.e. update μ given our data’s class
membership distributions (weights) ]
We want to learn:
Our estimator at the end of iteration t-1:
59
EM for general GMMs
At iteration t, construct function Q (E step) and maximize it in t (M step)
E-step
Compute “expected” classes of all datapoints for each class
M-step
Compute MLEs given our data’s class membership distributions (weights)
60
EM for general GMMs: Example
61
EM for general GMMs: Example
After 1st iteration
62
EM for general GMMs: Example
After 2nd iteration
63
EM for general GMMs: Example
After 3rd iteration
64
EM for general GMMs: Example
After 4th iteration
65
EM for general GMMs: Example
After 5th iteration
66
EM for general GMMs: Example
After 6th iteration
67
EM for general GMMs: Example
After 20th iteration
68
GMM for Density Estimation
69
Thanks for your attention!
70