The Metropolis Sampling Algorithm: 1.1 Computer Graphics

CS448: Topics in Computer Graphics Lecture #11
Mathematical Models for Computer Graphics

Stanford University Tuesday, 18 November 1997
The Metropolis Sampling Algorithm

Lecture #11: Tuesday, 4 November 1997
Lecturer: Eric Veach
Scribe: Homan Igehy
1 Applications of Monte Carlo Methods

Monte Carlo methods were developed in order to solve dicult integration problems of
high dimensionality. The techniques created came out of practical needs rather than the-
oretical exercises. Before going into the details of the Metropolis Sampling Algorithm, we
shall rst look at a few typical problems that require the use of Monte Carlo techniques.
Monte Carlo is denitely not a solution in wait of a problem.
1.1 Computer Graphics

A large part of computer graphics deals with rendering and the transport of light. The
light transport problem is a high dimensional problem: light leaving an emitter makes
its way through several bounces on several surfaces (each with its own surface re ectance
properties) before hitting the image plane. To make matters more complicated, discon-
tinuities show up in a variety of ways. Within a BRDF, a singularity is present in the
case of mirror re ections. A discontinuity in the surface normal of an object causes a
discontinuous change in the amount of light reaching the object (e.g. along the edge of
a cube). Even if all the BRDF's and surface normals are smooth, object boundaries
form discontinuities. On the image plane, these discontinuities manifest themselves as
silhouette edges. On a surface, these discontinuities form shadow boundaries. In the
case of point light sources, the shadows are discontinuous in value, whereas in the case
of area light sources, the shadows are discontinuous in their rst or second derivatives.
The presence of discontinuities in such a high dimensional problem accentuates the need
for Monte Carlo techniques. Let us take a look at the dimensions involved in typical
rendering methods.
1.1.1 Distribution Ray Tracing
Distribution ray tracing is similar to standard ray tracing in the sense that rays are traced
from the eye towards the light source. The dierence is that at each step of the light
transport problem, an integral is evaluated. As illustrated in Figure 1, distribution ray
tracing can be thought of as a simultaneous integration over:
2 CS448: Lecture #11
Variable Dimensions
time 1
pixel area 2
lens aperture 2
area light source 2
surface re ections 2 each
Thus, in distribution ray tracing we must evaluate a (7 + 2r)-dimensional integral,

where r is the number of surface re ections considered.
1.1.2 Path Tracing

Path tracing is a generalization of distribution ray tracing in that it includes light trans-
port paths of all lengths. A path can be started in any direction and can be represented as
a sequence of points x0 ; x1 ; : : : ; xk on the scene surfaces. Each xi can be either be chosen
directly on a surface (e.g. sampling a point on a light source), or by choosing a random
direction and casting a ray. In either case, each xi has two degrees of freedom. If we
include an integration over time as well, we get a problem of dimensionality 2(k + 1) + 1
for a path of length k. In path tracing, k is not bounded, and we need to integrate over
an innite-dimensional space.
surface
image plane lens
area light source
moving surface
Figure 1: Distribution Ray Tracing.

CS448: Lecture #11 3
1.1.3 Radiosity
The form factor computation of radiosity seeks to derive the fraction of light leaving
patch A that falls onto patch B , assuming a Lambertian re ectance model. Although
an analytic expression has been derived for two arbitrary polygonal patches, this model
is slow to evaluate and does not take into account eects of visibility. Thus, this dis-
continuous 4-dimensional integration problem is amenable to Monte Carlo techniques.
Similarly, nite element-based techniques for radiance (rather than radiosity) need to
evaluate coecients that represent the fraction of light that leaves patch A, falls onto
patch B , and then gets re ected onto patch C . This is a 6-dimensional integral.
1.2 Finance
Monte Carlo techniques are widely used in the nancial world for valuing securities. One
class of securities traded is mortgage-backed securities. The mortgages backing these
securities have a xed or variable interest rate and typically have a maturity period of
30 years. The value of a mortgage-backed security depends on a complicated mix of
the prime interest rate, the default rate, the remortgaging rate, and a variety of other
interdependent factors that vary over time. By modeling these factors over time, a basis
for valuation is provided. Solutions to these models involve integrating functions of 20
or 30 or even 360 dimensions. Since these functions are typically smooth, quasi-Monte
Carlo techniques work well.
1.3 Physics and Chemistry

The proliferation of Monte Carlo techniques was due to the invention of electronic com-
puters and their application to problems of physics and chemistry. Monte Carlo tech-
niques have been used to solve problems in neutron transport (i.e., bombs), radiation
shielding (i.e., reactors), and particle transport (e.g., electron beam lithography).
Another area where Monte Carlo methods are used is to predict the properties of
materials. This is the application that originally led to the Metropolis Sampling Algo-
rithm. Problems in this eld include the calculation of the density of a liquid at a given
temperature, the magnetic properties of alloys, and the modeling phenomena such as
super uidity.
2 A Typical Physics Problem

Before getting into the Metropolis Sampling Algorithm, let us motivate it with a typical
physics problem. Imagine a bunch of particles inside a wrap-around box which is at
thermal equilibrium, as illustrated in Figure 2. The state of this system can be described
by a vector
X = (x1 ; x2; : : : ; xm)
Figure 2: A Typical Physics Problem: Particles with extent .
where xi is the position of particle i, and we let

represent the space of all possible
particle congurations X. (The momenta of the particles are sometimes also required, but
we will ignore them in this formulation.) If (X) represents the energy of the system, then
the probability p(X) that the system will be in state X obeys the Boltzmann distribution:
p(X) = R ff((XX))dX

where f (X) = e (X)=(kT ) :

Here T is the temperature, and k is Boltzmann's constant. Note that f (X) is just the
unnormalized p(X).
One model for the energy (X) is to sum up all possible pair potentials:
X
(X) = V (xi; xj )
1i<j m
The hard sphere potential model treats the particles as solid balls of diameter that
cannot interpenetrate, and any conguration which satises this requirement is equally
likely. This can be achieved by setting V as follows:
(
V (xi ; xj ) = 10 rij >
otherwise
where rij is distance between particles i and j :
rij = jxi xj j
Alternatively one could use the Lennard-Jones potential model (illustrated in Figure 3):
8 9
< !12 !
6=
V (xi; xj ) = 4" : r rij ;
ij
r
σ
Figure 3: The Lennard-Jones Potential.
In any case, suppose there is some function g(X) in which we are interested. We want
to nd the expected value of g(X) where X is distributed proportionally to f (X). In
other words, the quantity of interest G is:
Z
G = E [g(X)] = g(X)p(X) dX

If we choose X1; X2; : : : ; XN according to the distribution specied by p(X), an estimator

for G is given by:
XN
G^ = N1 g(Xt)
t=1
Since f (X) (and consequently p(X)) is a complicated function of a huge number of vari-
ables (in our case proportional to number of particles), nding an appropriate collection
of samples X1; X2; : : : ; XN poses a formidable task. The Metropolis Sampling Algorithm
solves precisely this problem.
3 The Metropolis Sampling Algorithm

The Metropolis Sampling Algorithm was developed in 1953 by Metropolis, Rosenbluth,
Rosenbluth, Teller, and Teller (it is also called the M(RT)2 algorithm for obvious reasons).
The algorithm takes a random walk to generate a set of samples that are distributed
according to some arbitrary given function. Specically, suppose want have a domain
such that X 2
. Furthermore, let f (X) be some some scalar function. The goal of
the M(RT)2 is to provide a series of samples X1 ; X2; : : : ; XN which have a probability
density function proportional to f (X). In other words:
Given: A function f (X) where X 2
.
Goal: Generate Xt p(X) where p(X) / f (X) and R
p(X) dX = 1.
3.1 Random Walks
M(RT)2 generates a set of samples by taking a random walk across
. A random walk
is simply a sequence of samples X1; X2; : : : XN where each Xt is chosen with some prob-
ability distribution pt(X). Furthermore, the random walk generated by M(RT)2 is a
Markov chain, meaning that the choice of Xt depends only on Xt 1 . If the random walk
is independent of the actual value of t, it is called a time-homogeneous Markov chain,
which can be characterized by a single transition function for all t:
K (Y ! X) = fdensity that Xt = X given that Xt 1 = Yg
In other words, K (Y ! X) is the probability density of going from state Y to state X.
Since it is a pdf, it satises:
Z
K (Y ! X) dX = 1 for all Y

Thus, for a random walk, we choose an initial state X1 according to some initial distri-
bution p1 (X). Then p2 (X j X1) is set to K (X1 ! X), and X2 is chosen according to
that distribution. By repeating this process, pt (X j Xt 1) is set to K (Xt 1 ! X) and
Xt is chosen according to that distribution. Now since each Xt is a random variable, we
get the the following distributions for each step of the random walk:
p1(X) = initial
Z distribution
p2(X) = p1 (Y)K (Y ! X) dY
Z
p3(X) = p2 (Y)K (Y ! X) dY

...
Z
pt(X) = pt 1 (Y)K (Y ! X) dY

...
Now if our random walk, which is embodied by K (Y ! X), is ergodic, then there exists
a stationary distribution p(X) such that:
Z
p(X) = p(Y)K (Y ! X) dY (1)

That means that for some p(X), the application of a step in our random walk will give
back the exact same distribution. Furthermore, ergodicity guarantees that the random
walk will converge to p(X) for any initial distribution p1 (X):
lim p (X) = p(X)
t!1 t
...
t=1 t=2 t=3
Figure 4: A Random Walk Across a Grid
M(RT)2 constructs a K (X ! Y) so that its stationary distribution p(X) is proportional

to the desired f (X). Thus, the samples X1; X2; : : : ; XN taken along a random walk will
eventually obey the desired distribution p(X) no matter what initial distribution p1(X)
is used to pick X1. Although we will not get into the details of ergodicity, it is a relatively
weak condition: there must exist a possibility of getting to any state from any other state
in a nite number of steps in non-periodic fashion.
Before getting into the M(RT)2 construction of K (Y ! X), we will rst give an
example of an ergodic random walk for clarity. Imagine a wrap-around grid of size
M M , as illustrated in Figure 4. Each node on the grid represents a distinct point in
the space
:

= fX = [i; j ] j 1 i; j M g = f[1; 1]; : : : ; [M; M ]g
j
j = M 2
We want to generate samples for
with a uniform distribution:
f ([i; j ]) = 1
p([i; j ]) = M1 2
We propose a transition function K (Y ! X) that will keep us at the current node or
move us to an adjacent node with equal probability:
( 1
0 0 if ji0 ij + jj 0 j j 1
K ([i ; j ] ! [i; j ]) = 05 otherwise
Figure 4 shows what happens with our random walk after a few steps, assuming an
initial distribution p1 ([i; j ]) which puts X1 at the center. The intensity of a location
corresponds to the value of the density function pt([i; j ]) at step t of the random walk.
By observation, we can see that pt ([i; j ]) converges to p([i; j ]), and that p([i; j ]) is indeed
a static distribution.
Note that if
is a discrete domain as in the above example, then p(X) may be
replaced by a column vector p with height equal to j
j. Each entry represents the
discrete probability of being in a given state of
. Similarly, pt(X) may be replaced with
a column vector pt. Furthermore, K (Y ! X) may be replaced by a matrix K of size
j
j j
j. Each entry (l; k) represents the probability of the random walk moving from
state l to state k, and the sum of any row is equal to 1. Given this matrix notation, the
distribution for our random walk becomes:
pt = Kpt 1 = Kt 1p1
Now if K is ergodic, then for any p1 we get:
lim Ktp1 = p
t!1
And the equation satised by the stationary distribution becomes:
p = Kp
Thus p is an eigenvector of K with eigenvalue 1. Thus in discrete domains, nding a
transition function whose stationary distribution is proportional to an arbitrary function
is thus equivalent to constructing a matrix with a specic eigenvector (modulo ergodicity).
3.2 The Tentative Transition Function

In order to construct K (Y ! X), we will break it up into two parts. A tentative transition
is rst used to propose a transition from the current state into some other state according
to some chosen distribution. For example, in the aforementioned m-particle problem, a
tentative transition could be to move a single particle by some small random distance.
The pdf of this tentative transition function is denoted by T (Y ! X). Now, once this
transition is proposed, we either choose to move to this new state for the next step of
the random walk, or else we stay in the same state for the next step of the random walk.
This acceptance probability, whose pdf is denoted by A(Y ! X), is chosen in a manner
such that the combination of T (Y ! X) and A(Y ! X) form a K (Y ! X) whose
stationary distribution is the desired function p(X).
3.3 Detailed Balance

The question we ask ourselves at this point is as follows: what is the relationship between
T (Y ! X), A(Y ! X), and p(X)? To answer this, we borrow an idea from physical
equilibria. Imagine a box lled with a gas and another box that contains nothing. A
clamped tube connects the two boxes. Once the tube is unclamped, gas begins to ow
from the rst box into the second box. However, once the second box starts holding some
gas, some of that gas ows back into the rst box. After some time, the two boxes reach
a state of equilibrium. Even though gas ows back and forth between the two boxes and
the state of the system is not static, the amount that goes in each direction is equal, and
thus the system is at equilibrium.
Let us examine what happens when we apply the notion of detailed balance to our
problem:
p(X)T (X ! Y)A(X ! Y) = p(Y)T (Y ! X)A(Y ! X) (2)
This equation says that once a stationary distribution p(X) is reached, then the chance
of being in state X and then taking a proposed transition into state Y is equal to the
chance of being in state Y and then taking a proposed transition into state X. To see
how this provides a solution to stationary distribution, we rewrite Equation (1) in terms
of T (Y ! X) and A(Y ! X):
Z
p0 (X) = p(Y)T (Y ! X)A(Y ! X) dY +

Z
p(X) 1 T (X ! Y)A(X ! Y) dY

The rst term of the above equation expresses the chance of starting in state Y and
taking a transition into X. The second term expresses the chance of starting in state X
and rejecting a transition into another state Y. These are all the ways that one can end
up in state X after a step of the random walk. Rearranging the terms and moving p(X)
into the integral (since it is constant with respect to Y), we get:
p0(X) = Zp(X) +
p(Y)T (Y ! X)A(Y ! X) p(X)T (X ! Y)A(X ! Y) dY

Thus in order to get a stationary distribution, the integral in the above equation must
evaluate to zero. One easy way to do this is to enforce the detailed balance of Equation
(2) in our choice of T (Y ! X) and A(Y ! X). Thus, we are left with:
p0 (X) = p(X)
3.4 The Acceptance Probability

Since p(X) is given to us (by way of f (X)), and T (Y ! X) is some guess for transitions,
then our only choice is to set A(Y ! X) appropriately in order to meet the condition of
detailed balance. From Equation (2), we know that we must have:
A(Y ! X) = p(X)T (X ! Y) (3)
A ( X ! Y ) p ( Y ) T ( Y ! X)
One way to achieve this is:
!
X)T (X ! Y)
A(Y ! X) = min 1; ff ((Y (4)
)T (Y ! X)
Noting that f (X)=f (Y) = p(X)=p(Y) since p(X) / f (X), it is clear from substitution
that Equation (4) satises Equation (3). Note that all of the above scalar values can
be readily evaluated for any pair of points. Thus we are now ready for the M(RT)2
algorithm.
3.5 The M(RT)2 Algorithm

Assuming we have a way of evaluating f (X) at a point, and a way of evaluating T (Y !
X), then the algorithm for generating samples with a pdf proportional to f (X) is as
follows:
X1 p1 (X)
for t = 2 to N
Xtent T (Xt 1 ! X)
if (random() < A(Xt 1 ! Xtent ))
Xt = Xtent
else
Xt = Xt 1
Note that one may want to reject the rst several samples since it takes a while for the
random walk to reach the stationary distribution.

The Metropolis Sampling Algorithm: 1.1 Computer Graphics

Uploaded by

Copyright:

Available Formats

The Metropolis Sampling Algorithm: 1.1 Computer Graphics

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

The Metropolis Sampling Algorithm: 1.1 Computer Graphics

Uploaded by

Copyright:

Available Formats

CS448: Topics in Computer Graphics Lecture #11

Mathematical Models for Computer Graphics

The Metropolis Sampling Algorithm

1 Applications of Monte Carlo Methods

1.1 Computer Graphics

Thus, in distribution ray tracing we must evaluate a (7 + 2r)-dimensional integral,

1.1.2 Path Tracing

image plane lens

area light source

Figure 1: Distribution Ray Tracing.

1.3 Physics and Chemistry

2 A Typical Physics Problem

Figure 2: A Typical Physics Problem: Particles with extent .

where xi is the position of particle i, and we let

where f (X) = e (X)=(kT ) :

Figure 3: The Lennard-Jones Potential.

If we choose X1; X2; : : : ; XN according to the distribution speci ed by p(X), an estimator

3 The Metropolis Sampling Algorithm

t=1 t=2 t=3

Figure 4: A Random Walk Across a Grid

M(RT)2 constructs a K (X ! Y) so that its stationary distribution p(X) is proportional

3.2 The Tentative Transition Function

3.3 Detailed Balance

3.4 The Acceptance Probability

3.5 The M(RT)2 Algorithm

You might also like

Figure 2: A Typical Physics Problem: Particles with extent .

where f (X) = e (X)=(kT ) :

If we choose X1; X2; : : : ; XN according to the distribution specied by p(X), an estimator