0% found this document useful (0 votes)
11 views22 pages

Clustering Mixture

The document discusses clustering methods, focusing on mixture models and the Gaussian mixture model (GMM). It explains the concepts of hard vs. soft clustering, the use of probabilistic models, and the Expectation-Maximization (EM) algorithm for parameter estimation. The document also highlights the strengths and weaknesses of mixture models in clustering applications.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
11 views22 pages

Clustering Mixture

The document discusses clustering methods, focusing on mixture models and the Gaussian mixture model (GMM). It explains the concepts of hard vs. soft clustering, the use of probabilistic models, and the Expectation-Maximization (EM) algorithm for parameter estimation. The document also highlights the strengths and weaknesses of mixture models in clustering applications.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 22

Clustering

Lecture 5: Mixture Model

Jing Gao
SUNY Buffalo

1
Outline
• Basics
– Motivation, definition, evaluation
• Methods
– Partitional
– Hierarchical
– Density-based
– Mixture model
– Spectral methods
• Advanced topics
– Clustering ensemble
– Clustering in MapReduce
– Semi-supervised clustering, subspace clustering, co-clustering,
etc.

2
Using Probabilistic Models for Clustering

• Hard vs. soft clustering


– Hard clustering: Every point belongs to exactly one cluster
– Soft clustering: Every point belongs to several clusters with
certain degrees
• Probabilistic clustering
– Each cluster is mathematically represented by a
parametric distribution
– The entire data set is modeled by a mixture of these
distributions

3
Gaussian Distribution

f(x) Changing μ shifts the


distribution left or right

Changing σ increases or
decreases the spread
σ

μ x
Probability density function f(x) is a function of x
given μ and σ 1 1 x 2
N ( x | , ) 
2
exp(  ( ) )
 2 2  4
Likelihood

Which Gaussian distribution is


f(x) more likely to generate the data?

x
Define likelihood as a function of μ and σ
given x1, x2, …, xn n

 iN
i 1
( x |  ,  2
)
5
Gaussian Distribution

• Multivariate Gaussian

mean covariance

• Log likelihood

n n
1
L(  , )   ln N ( xi | , )   ( ( xi   )T  1 ( xi   ))   ln |  |)
i 1 i 1 2

6
Maximum Likelihood Estimate

• MLE
– Find model parameters ,  that maximize log
likelihood
L( , )

• MLE for Gaussian

7
Gaussian Mixture

• Linear combination of Gaussians


where

parameters to be estimated

8
Gaussian Mixture
• To generate a data point:
– first pick one of the components with probability
– then draw a sample from that component distribution
• Each data point is generated by one of K components, a latent variable
is associated with each

9
Gaussian Mixture

• Maximize log likelihood

• Without knowing values of latent variables, we have to


maximize the incomplete log likelihood

10
Expectation-Maximization (EM) Algorithm

• E-step: for given parameter values we can compute


the expected values of the latent variables
(responsibilities of data points)

– Note that instead of but we


still have
11
Expectation-Maximization (EM) Algorithm

• M-step: maximize the expected complete log


likelihood

• Parameter update:

12
EM Algorithm

• Iterate E-step and M-step until the log likelihood


of data does not increase any more.
– Converge to local optimal
– Need to restart algorithm with different initial
guess of parameters (as in K-means)

• Relation to K-means
– Consider GMM with common covariance

– As , two methods coincide


13
14
14
15
15
16
16
17
17
18
18
19
19
K-means vs GMM

• Objective function • Objective function


– Maximize log-likelihood
– Minimize sum of squared
Euclidean distance • EM algorithm
• Can be optimized by an EM – E-step: Compute posterior
algorithm probability of membership
– E-step: assign points to clusters – M-step: Optimize parameters
– M-step: optimize clusters – Perform soft assignment during
E-step
– Performs hard assignment
during E-step • Can be used for non-spherical
clusters
• Assumes spherical clusters with
equal probability of a cluster • Can generate clusters with
different probabilities

20
Mixture Model

• Strengths
– Give probabilistic cluster assignments
– Have probabilistic interpretation
– Can handle clusters with varying sizes, variance etc.
• Weakness
– Initialization matters
– Choose appropriate distributions
– Overfitting issues

21
Take-away Message

• Probabilistic clustering
• Maximum likelihood estimate
• Gaussian mixture model for clustering
• EM algorithm that assigns points to clusters and
estimates model parameters alternatively
• Strengths and weakness

22

You might also like