0% found this document useful (0 votes)

12 views

Expectation Maximization (EM) Algorithm.pptx

The Expectation-Maximization (EM) algorithm is a statistical method used for maximum likelihood estimation in the presence of latent variables, involving iterative estimation and optimization steps. It is particularly effective for density estimation and clustering, such as in Gaussian Mixture Models, where it helps to infer hidden variables from observed data. While the EM algorithm has advantages like guaranteed likelihood increase, it also has drawbacks, including slow convergence and potential local optima issues.

Uploaded by

Asma Ayub

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

12 views

Expectation Maximization (EM) Algorithm.pptx

Uploaded by

Asma Ayub

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 47

EXPECTATION MAXIMIZATION (EM)

ALGORITHM
Parameter estimation is a fundamental concept in statistics and
machine learning, involving the process of determining the values of
parameters that best describe a statistical model or distribution. The
goal is to find the most likely values for the parameters based on
observed data.
In the real-world applications of machine
learning, it is very common that there are many
relevant features available for learning but only
a small subset of them are observable.

So, for the variables which are sometimes

observable and sometimes not, then we can use
the instances when that variable is visible is
observed for the purpose of learning and then
predict its value in the instances when it is not
observable.
Maximum likelihood estimation is an approach to
density estimation for a dataset by searching across
probability distributions and their parameters.

It is a general and effective approach that underlies

many machine learning algorithms, although it
requires that the training dataset is complete, e.g. all
relevant interacting random variables are present.

Maximum likelihood becomes intractable if there are

variables that interact with those in the dataset but
were hidden or not observed, so-called latent
variables. Intractable problem = a problem that can be solved in theory (e.g. given
large but finite resources, especially time), but for which in practice any
solution takes too many resources to be useful, is known as an intractable
MAXIMUM LIKELIHOOD ESTIMATION (MLE)

Maximum Likelihood Estimation (MLE) is a

method used for estimating the parameters of a
statistical model. The basic idea behind MLE is
to find the parameter values that maximize the
likelihood function, which measures how well the
model explains the observed data.
The expectation-maximization algorithm is an
approach for performing maximum likelihood
estimation in the presence of latent variables.

It does this by first estimating the values for the

latent variables, then optimizing the model, then
repeating these two steps until convergence.

It is an effective and general approach and is

most commonly used for density estimation with
missing data, such as clustering algorithms like
the Gaussian Mixture Model.
PROBLEM OF LATENT VARIABLES FOR MAXIMUM
LIKELIHOOD

A common modeling problem involves how to estimate a

joint probability distribution for a dataset.

Density estimation involves selecting a probability

distribution function and the parameters of that
distribution that best explain the joint probability
distribution of the observed data.

There are many techniques for solving this problem,

although a common approach is called maximum
likelihood estimation, or simply “maximum likelihood.”

Maximum Likelihood Estimation involves treating the

problem as an optimization or search problem, where we
seek a set of parameters that results in the best fit for the
joint probability of the data sample.
LATENT VARIABLES.

A limitation of maximum likelihood estimation is

that it assumes that the dataset is complete, or fully
observed.

This does not mean that the model has access to all
data; instead, it assumes that all variables that are
relevant to the problem are present.

This is not always the case. There may be datasets

where only some of the relevant variables can be
observed, and some cannot, and although they
influence other random variables in the dataset, they
remain hidden.

More generally, these unobserved or hidden variables

are referred to as latent variables.
PLATO’S ALLEGORY OF THE CAVE

Observable variable
Latent (hidden) variable

[8.67, 12.8564, 0.44875,

874.22, …]

[4.59, 13.2548, 1.14569,

148.25, …]

These people have some data that can be observed but this data actually comes from something that we cannot observe
that is of a higher representation of this data abstract representation of this data and we want to learn something about
this abstract representation
LIMITATION OF MAXIMUM LIKELIHOOD
ESTIMATION

Conventional maximum likelihood estimation does

not work well in the presence of latent variables.

If we have missing data and/or latent variables, then

computing the [maximum likelihood] estimate
becomes hard.

Instead, an alternate formulation of maximum

likelihood is required for searching for the
appropriate model parameters in the presence of
latent variables.

The Expectation-Maximization algorithm is one such

approach.
EXPECTATION-MAXIMIZATION ALGORITHM
The EM algorithm is an iterative approach that cycles between two
modes. The first mode attempts to estimate the missing or latent
variables, called the estimation-step or E-step.

The second mode attempts to optimize the parameters of the model to

best explain the data, called the maximization-step or M-step.

E-Step. Estimate the missing variables in the dataset.

M-Step. Maximize the parameters of the model in the presence of the

data.

The EM algorithm can be applied quite widely, although is perhaps

most well known in machine learning for use in unsupervised learning
problems, such as density estimation and clustering.

Perhaps the most discussed application of the EM algorithm is for

clustering with a mixture model.
GAUSSIAN MIXTURE MODEL AND THE EM
ALGORITHM

A mixture model is a model comprised of an unspecified

combination of multiple probability distribution functions.

A statistical procedure or learning algorithm is used to

estimate the parameters of the probability distributions to
best fit the density of a given training dataset.

The Gaussian Mixture Model, or GMM for short, is a

mixture model that uses a combination of Gaussian
(Normal) probability distributions and requires the
estimation of the mean and standard deviation
parameters for each.

There are many techniques for estimating the parameters

for a GMM, although a maximum likelihood estimate is
perhaps the most common.
Consider the case where a dataset is comprised of many
points that happen to be generated by two different
processes.

The points for each process have a Gaussian probability

distribution, but the data is combined and the
distributions are similar enough that it is not obvious to
which distribution a given point may belong.

The processes used to generate the data point represents a

latent variable, e.g. process 0 and process 1.

It influences the data but is not observable.

As such, the EM algorithm is an appropriate approach to

use to estimate the parameters of the distributions.
EM EXPLAINED
In the EM algorithm, the estimation-step would estimate a
value for the process latent variable for each data point, and the
maximization step would optimize the parameters of the
probability distributions in an attempt to best capture the
density of the data.

The process is repeated until a good set of latent values and a

maximum likelihood is achieved that fits the data.

E-Step. Estimate the expected value for each latent variable.

M-Step. Optimize the parameters of the distribution using

maximum likelihood.

We can imagine how this optimization procedure could be

constrained to just the distribution means, or generalized to a
mixture of many different Gaussian distributions.
On the other hand, Expectation-Maximization
algorithm can be used for the latent variables
(variables that are not directly observable and
are actually inferred from the values of the other
observed variables)

In order to predict their values with the condition

that the general form of probability distribution
governing those latent variables is known to us.

This algorithm is actually at the base of many

unsupervised clustering algorithms in the field of
machine learning.
ALGORITHM:

Given a set of incomplete data, consider a set of

starting parameters.

Expectation step (E – step): Using the

observed available data of the dataset, estimate
(guess) the values of the missing data.

Maximization step (M – step): Complete data

generated after the expectation (E) step is used
in order to update the parameters.

Repeat step 2 and step 3 until convergence.

The essence of Expectation-Maximization
algorithm is to use the available observed data of
the dataset to estimate the missing data and
then using that data to update the values of the
parameters.
EM ALGORITHM EXPLAINED
Initially, a set of initial values of the parameters are considered. A set
of incomplete observed data is given to the system with the
assumption that the observed data comes from a specific model.

The next step is known as “Expectation” – step or E-step. In this step,

we use the observed data in order to estimate or guess the values of
the missing or incomplete data. It is basically used to update the
variables.

The next step is known as “Maximization”-step or M-step. In this step,

we use the complete data generated in the preceding “Expectation” –
step in order to update the values of the parameters. It is basically
used to update the hypothesis.

Now, in the fourth step, it is checked whether the values are

converging or not, if yes, then stop otherwise
repeat step-2 and step-3 i.e. “Expectation” – step and “Maximization” –
step until the convergence occurs.
FLOWCHART
EXAMPLE OF GAUSSIAN MIXTURE MODEL

We can make the application of the EM algorithm to a Gaussian

Mixture Model concrete with a worked example.

First, let’s contrive a problem where we have a dataset where

points are generated from one of two Gaussian processes.

The points are one-dimensional, the mean of the first

distribution is 20, the mean of the second distribution is 40, and
both distributions have a standard deviation of 5.

We will draw 3,000 points from the first process and 7,000
points from the second process and mix them together.

We can then plot a histogram of the points to give an intuition

for the dataset. We expect to see a bimodal distribution with a
peak for each of the means of the two distributions.
EXAMPLE PLOT
1 # example of a bimodal constructed from two gaussian processes
2 from numpy import hstack
3 from numpy.random import normal
4 from matplotlib import pyplot
5 # generate a sample
6 X1 = normal(loc=20, scale=5, size=3000)
7 X2 = normal(loc=40, scale=5, size=7000)
8 X = hstack((X1, X2))
9 # plot the histogram
1 pyplot.hist(X, bins=50, density=True)
0 pyplot.show()
1
1
HISTOGRAM
Running the example creates the dataset and
then creates a histogram plot for the data points.

The plot clearly shows the expected bimodal

distribution with a peak for the first process
around 20 and a peak for the second process
around 40.

We can see that for many of the points in the

middle of the two peaks that it is ambiguous as to
which distribution they were drawn from.
We can model the problem of estimating the density
of this dataset using a Gaussian Mixture Model.

The GaussianMixture scikit-learn class can be used

to model this problem and estimate the parameters of
the distributions using the expectation-maximization
algorithm.

The class allows us to specify the suspected number

of underlying processes used to generate the data via
the n_components argument when defining the
model. We will set this to 2 for the two processes or
distributions.
If the number of processes was not known, a range of
different numbers of components could be tested and
the model with the best fit could be chosen, where
models could be evaluated using scores such as
Akaike or Bayesian Information Criterion (AIC or
BIC).

There are also many ways we can configure the

model to incorporate other information we may know
about the data, such as how to estimate initial values
for the distributions.

In this case, we will randomly guess the initial

parameters, by setting the init_params argument to
‘random’
EXAMPLE CODE
1 ...
2 # fit model
3 model = GaussianMixture(n_components=2,
4 init_params='random')
model.fit(X)
Once the model is fit, we can access the learned
parameters via arguments on the model, such as the
means, covariances, mixing weights, and more.

More usefully, we can use the fit model to estimate

the latent parameters for existing and new data
points.

For example, we can estimate the latent variable for

the points in the training dataset and we would
expect the first 3,000 points to belong to one process
(e.g. value=1) and the next 7,000 data points to
belong to a different process (e.g. value=0).
EXAMPLE CODE

1 ...
2 # predict latent values
3 yhat = model.predict(X)
4 # check latent value for first few points
5 print(yhat[:100])
6 # check latent value for last few points
7 print(yhat[-100:])

Running the example fits the Gaussian mixture

model on the prepared dataset using the EM
algorithm. Once fit, the model is used to predict
the latent variable values for the examples in the
training dataset.
EXAMPLE
EXAMPLE
USAGE OF EM ALGORITHM –

It can be used to fill the missing data in a

sample.

It can be used as the basis of unsupervised

learning of clusters.

It can be used for the purpose of estimating the

parameters of Hidden Markov Model (HMM).

It can be used for discovering the values of latent

variables.
ADVANTAGES OF EM ALGORITHM –

It is always guaranteed that likelihood will

increase with each iteration.

The E-step and M-step are often pretty easy for

many problems in terms of implementation.

Solutions to the M-steps often exist in the closed

form.
DISADVANTAGES OF EM ALGORITHM –

It has slow convergence.

It makes convergence to the local optima only.

It requires both the probabilities, forward and

backward (numerical optimization requires only
forward probability).

AI29
No ratings yet
AI29
3 pages
ExpectationMaximization Algorithm (1)
No ratings yet
ExpectationMaximization Algorithm (1)
7 pages
ML - Expectation-Maximization Algorithm
No ratings yet
ML - Expectation-Maximization Algorithm
3 pages
Unit2_6 (6)
No ratings yet
Unit2_6 (6)
12 pages
UNIT 4 - EM Alg
No ratings yet
UNIT 4 - EM Alg
3 pages
Gaussian Distribution
No ratings yet
Gaussian Distribution
5 pages
EM Algorithm
No ratings yet
EM Algorithm
5 pages
S6, S7, S8 CS - U4 Getter Setter EM Algorithm
No ratings yet
S6, S7, S8 CS - U4 Getter Setter EM Algorithm
32 pages
TR 97 021
No ratings yet
TR 97 021
15 pages
Unit 3 ML
No ratings yet
Unit 3 ML
45 pages
Expectation_maximization
No ratings yet
Expectation_maximization
25 pages
PROBABILISTIC Learning Jb-new
No ratings yet
PROBABILISTIC Learning Jb-new
13 pages
Expectation-Maximization Algorithm
No ratings yet
Expectation-Maximization Algorithm
13 pages
کتاب ششم بارگزاری شده
No ratings yet
کتاب ششم بارگزاری شده
49 pages
EM Algorithm: Shu-Ching Chang Hyung Jin Kim December 9, 2007
No ratings yet
EM Algorithm: Shu-Ching Chang Hyung Jin Kim December 9, 2007
10 pages
EM Presentation 2013
No ratings yet
EM Presentation 2013
18 pages
14 Gaussian Mixture Models
No ratings yet
14 Gaussian Mixture Models
60 pages
Mixture Models and Expectation-Maximization: Justus H. Piater
No ratings yet
Mixture Models and Expectation-Maximization: Justus H. Piater
11 pages
Expectation Maximization
No ratings yet
Expectation Maximization
14 pages
Isye 6416: Computational Statistics Spring 2023: Prof. Yao Xie
No ratings yet
Isye 6416: Computational Statistics Spring 2023: Prof. Yao Xie
24 pages
Oral Texte
No ratings yet
Oral Texte
12 pages
Expectation Maximization Homework Solution
100% (1)
Expectation Maximization Homework Solution
8 pages
34-EM ALGORITHM-07-03-2023
No ratings yet
34-EM ALGORITHM-07-03-2023
19 pages
WINSEM2020-21 CSE4020 ETH VL2020210504996 Reference Material I 12-May-2021 5.5 Expectation Maximization
No ratings yet
WINSEM2020-21 CSE4020 ETH VL2020210504996 Reference Material I 12-May-2021 5.5 Expectation Maximization
28 pages
Unit 2
No ratings yet
Unit 2
7 pages
Chapter 9.4 Allele Frequency Estimation
No ratings yet
Chapter 9.4 Allele Frequency Estimation
24 pages
ds11 2
No ratings yet
ds11 2
19 pages
Lecture3 EM
No ratings yet
Lecture3 EM
36 pages
Lecture Expectation Maximization
No ratings yet
Lecture Expectation Maximization
58 pages
cs229 Notes7b PDF
No ratings yet
cs229 Notes7b PDF
4 pages
Expectation Minimization
No ratings yet
Expectation Minimization
22 pages
UCS 401 Unit-lV Probabilistic Models With Hidden Variables 04-12-24
No ratings yet
UCS 401 Unit-lV Probabilistic Models With Hidden Variables 04-12-24
13 pages
The Expectation Maximization Algorithm
No ratings yet
The Expectation Maximization Algorithm
7 pages
gmm
No ratings yet
gmm
8 pages
Likelihood EM HMM Kalman
No ratings yet
Likelihood EM HMM Kalman
46 pages
Beamer
No ratings yet
Beamer
34 pages
ML UNIT III
No ratings yet
ML UNIT III
12 pages
Learning With Hidden Variables - EM Algorithm
No ratings yet
Learning With Hidden Variables - EM Algorithm
31 pages
ML-2-Expectation Maximization
No ratings yet
ML-2-Expectation Maximization
11 pages
lecture5
No ratings yet
lecture5
16 pages
CpE646 6v3 PDF
No ratings yet
CpE646 6v3 PDF
44 pages
An Alternative View of EM_poornima
No ratings yet
An Alternative View of EM_poornima
4 pages
The EM Algorithm: Ajit Singh November 20, 2005
No ratings yet
The EM Algorithm: Ajit Singh November 20, 2005
4 pages
Chap2 Part2 GMM
No ratings yet
Chap2 Part2 GMM
34 pages
EM-converted
No ratings yet
EM-converted
22 pages
HW2
No ratings yet
HW2
4 pages
GAUSSIAN MIXTURES
No ratings yet
GAUSSIAN MIXTURES
5 pages
Em Algorithm
No ratings yet
Em Algorithm
20 pages
The Expectation-Maximization Algorithm: IEEE Signal Processing Magazine December 1996
No ratings yet
The Expectation-Maximization Algorithm: IEEE Signal Processing Magazine December 1996
15 pages
8th Lecture Note - 1039837803 230515 094639
No ratings yet
8th Lecture Note - 1039837803 230515 094639
10 pages
Expectation Maximization
No ratings yet
Expectation Maximization
19 pages
Algoritmo E-M. Utilizado para Calcular La Mezcla de Gausianas
No ratings yet
Algoritmo E-M. Utilizado para Calcular La Mezcla de Gausianas
8 pages
Lec15 16 Handout
No ratings yet
Lec15 16 Handout
33 pages
CB PDF
No ratings yet
CB PDF
69 pages
GMM
No ratings yet
GMM
26 pages
Expectation Maximization Notes
No ratings yet
Expectation Maximization Notes
5 pages
Week 7 - Latent Variable Models and Expectation Maximization
No ratings yet
Week 7 - Latent Variable Models and Expectation Maximization
39 pages
Process Performance Models: Statistical, Probabilistic & Simulation
From Everand
Process Performance Models: Statistical, Probabilistic & Simulation
Vishnuvarthanan Moorthy
No ratings yet
Markov Decision Process: Fundamentals and Applications
From Everand
Markov Decision Process: Fundamentals and Applications
Fouad Sabry
No ratings yet
Random Sample Consensus: Robust Estimation in Computer Vision
From Everand
Random Sample Consensus: Robust Estimation in Computer Vision
Fouad Sabry
No ratings yet
Assignment 3
No ratings yet
Assignment 3
1 page
Lecture 8
No ratings yet
Lecture 8
26 pages
Automata
No ratings yet
Automata
20 pages
Lecture 7
No ratings yet
Lecture 7
29 pages
Automata
No ratings yet
Automata
27 pages
Lecture-22-Pipelining
No ratings yet
Lecture-22-Pipelining
13 pages
Lecture-21-Exception Handling
No ratings yet
Lecture-21-Exception Handling
15 pages
Lecture-2-a
No ratings yet
Lecture-2-a
10 pages
Computer Architecture Introduction
No ratings yet
Computer Architecture Introduction
27 pages
Reinforcement Learning.pptx
No ratings yet
Reinforcement Learning.pptx
59 pages
Lecture-4
No ratings yet
Lecture-4
37 pages
Naive Bayes.ppt
No ratings yet
Naive Bayes.ppt
24 pages
K- Nearest Neighbors.pptx
No ratings yet
K- Nearest Neighbors.pptx
33 pages
Simple Linear and Logistic Regression
No ratings yet
Simple Linear and Logistic Regression
81 pages
dimensionalityReduction.pptx
No ratings yet
dimensionalityReduction.pptx
117 pages
Introduction to Machine Learning
No ratings yet
Introduction to Machine Learning
116 pages
Gradient Descent and Cost Function.pptx
No ratings yet
Gradient Descent and Cost Function.pptx
35 pages
Test Bank For Interactive Statistics 3 e 3rd Edition 0131497561
100% (59)
Test Bank For Interactive Statistics 3 e 3rd Edition 0131497561
4 pages
Probability_important_Questions-1
No ratings yet
Probability_important_Questions-1
9 pages
Efektivitas Metode SAS (Struktur Analitik Sintetik) Dalam Meningkatkan Keterampilan Membaca Bagi Anak Lambat Belajar (Slow Learner) Di SDN Demangan
No ratings yet
Efektivitas Metode SAS (Struktur Analitik Sintetik) Dalam Meningkatkan Keterampilan Membaca Bagi Anak Lambat Belajar (Slow Learner) Di SDN Demangan
20 pages
Topic 7 - Sample
No ratings yet
Topic 7 - Sample
44 pages
Data Description PDF
No ratings yet
Data Description PDF
38 pages
The Impact of Reward and Recognition On Employee Engagement at Pt. Bank Sulutgo, Manado
No ratings yet
The Impact of Reward and Recognition On Employee Engagement at Pt. Bank Sulutgo, Manado
13 pages
Vivek Jain SPM 8th Ed (OCR) - 867
No ratings yet
Vivek Jain SPM 8th Ed (OCR) - 867
1 page
Gomez. Statistical Procedures Agriculltural PDF
97% (32)
Gomez. Statistical Procedures Agriculltural PDF
690 pages
G10MATHQ4W1
No ratings yet
G10MATHQ4W1
9 pages
Design Of Experiments Lab
No ratings yet
Design Of Experiments Lab
7 pages
Unit 2.2 - Probability
No ratings yet
Unit 2.2 - Probability
34 pages
Automobile Sales Predictions
No ratings yet
Automobile Sales Predictions
19 pages
Sample Sizes Calculation
No ratings yet
Sample Sizes Calculation
5 pages
ML Practical 1 Code
100% (1)
ML Practical 1 Code
1 page
Instant Download Basic Business Statistics 13th Edition (eBook PDF) PDF All Chapters
100% (5)
Instant Download Basic Business Statistics 13th Edition (eBook PDF) PDF All Chapters
55 pages
Multiple Linear Regression
No ratings yet
Multiple Linear Regression
59 pages
Bivariate-Data-Report-Writing
No ratings yet
Bivariate-Data-Report-Writing
4 pages
BCADA1221 Exploratory Data Analysis Using Excel - UG - 1st Sem-Dec-2023
No ratings yet
BCADA1221 Exploratory Data Analysis Using Excel - UG - 1st Sem-Dec-2023
1 page
NoCA2019-ProxyML 2019nov29
No ratings yet
NoCA2019-ProxyML 2019nov29
24 pages
Simple Linear Regression
No ratings yet
Simple Linear Regression
11 pages
Applied Mathematics Seminar Questions
No ratings yet
Applied Mathematics Seminar Questions
6 pages
Éléments de Data Mining Avec Tanagra: Vincent ISOZ, 2013-10-21 (V3.0 Revision 6) (oUUID 1.679)
No ratings yet
Éléments de Data Mining Avec Tanagra: Vincent ISOZ, 2013-10-21 (V3.0 Revision 6) (oUUID 1.679)
146 pages
2.1 Describing Location in A Distribution: HW: P. 105 (1, 5, 9-15 ODD, 19-23 ODD, 31, 33-38)
No ratings yet
2.1 Describing Location in A Distribution: HW: P. 105 (1, 5, 9-15 ODD, 19-23 ODD, 31, 33-38)
24 pages
SPC: Control Chart Fundamentals and Applications Applications
No ratings yet
SPC: Control Chart Fundamentals and Applications Applications
56 pages
Inferential Statistics
No ratings yet
Inferential Statistics
23 pages
Econ2280 Tutorial 1 Review Notes PDF
No ratings yet
Econ2280 Tutorial 1 Review Notes PDF
4 pages
Tugas Sio
No ratings yet
Tugas Sio
9 pages
Answer
No ratings yet
Answer
5 pages