0% found this document useful (0 votes)
6 views

AM207 2 Transforms Sampling

The document covers advanced scientific computing techniques, focusing on stochastic methods for data analysis, inference, and optimization. It discusses the generation and properties of random numbers, including pseudo-random numbers, and various sampling methods such as the inverse transform and Monte Carlo methods. Additionally, it highlights statistical tests for evaluating random number generators and provides examples of sampling from different distributions.

Uploaded by

Regina Lin
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
6 views

AM207 2 Transforms Sampling

The document covers advanced scientific computing techniques, focusing on stochastic methods for data analysis, inference, and optimization. It discusses the generation and properties of random numbers, including pseudo-random numbers, and various sampling methods such as the inverse transform and Monte Carlo methods. Additionally, it highlights statistical tests for evaluating random number generators and provides examples of sampling from different distributions.

Uploaded by

Regina Lin
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 50

APMTH 207:

Advanced Scientific Computing:


Stochastic Methods for Data
Analysis, Inference and Optimization

Petros Koumoutsakos

Applied Mathematics

cse-lab.seas.harvard.edu
TODAY’s CLASS - STOCHASTIC VARIABLES

Sampling of Variables
Random Numbers
Inverse Transform
Simple Monte Carlo
Monte Carlo Variations
Importance Sampling
Rejection Sampling
What is a Random Number ? (answer by ChatGPT+)

A random number is a value that lacks a discernible pattern or predictability, often generated
using algorithms or physical processes to introduce randomness into various applications.

In the context of computer science and mathematics, a random number is typically generated
using algorithms or physical processes that introduce an element of unpredictability. These
numbers are often used in various applications, such as simulations, cryptography, games, and
statistical analysis, to introduce randomness or uncertainty.

It's important to note that true randomness is difficult to achieve in a computer program, as
computers operate based on deterministic algorithms. Instead, computer-generated random
numbers are often pseudo-random, meaning they appear random but are generated using a
deterministic process that starts from an initial value called a seed.



Generating random numbers

Real random numbers are hard to obtain


classical or thermal chaos (atmospheric noise) quantum mechanics
Commercial products: quantum random number generators based on photons and
semi-transparent mirror
4 Mbit/s from a USB device, too slow for most MC simulations

http://www.idquantique.com/




Pseudo Random numbers

Are generated by an algorithm

Not random at all, but completely deterministic

Look nearly random however when algorithm is not known and may be good enough
for our purposes

Never trust pseudo random numbers however!






Linear congruential generators
are of the simple form xn+1=f(xn)

A reasonably good choice is the GGL generator xn +1 = (axn + c)mod m


with a = 16807, c = 0, m = 231-1
Quality is sensitive to a good choice of a,c,m

Periodicity is a problem with such 32-bit generators The sequence repeats


identically after 231-1 iterations
With 500 million numbers per second that is just 4 seconds!

Should not be used anymore except to seed other generators









More advanced generators

As well-established generators fail new tests, better and better generators get developed

Mersenne twister (Matsumoto & Nishimura, 1997) http://www.math.sci.hiroshima-u.ac.jp/~m-


mat/MT/emt.html

Well generator (Panneton and L'Ecuyer , 2004) http://www.iro.umontreal.ca/~panneton/


WELLRNG.html

Special Efforts Needed to Parallelize and Vectorize Random Number Generators




Are these numbers really random?

No!
Are they random enough? Maybe?

Statistical tests for distribution and correlations. Are these tests enough?No!
Your calculation could depend in a subtle way on hidden correlations!

What is the ultimate test?


Run your simulation with various random number generators and compare the results
Marsaglia’s diehard tests https://en.wikipedia.org/wiki/Diehard_tests

The diehard tests are a battery of statistical tests for measuring the quality of a random number generator. They were
developed by George Marsaglia over several years and rst published in 1995 on a CD-ROM of random numbers.[1]

Birthday spacings: Choose random points on a large interval. The spacings between the points should be asymptotically Poisson
distributed. The name is based on the birthday paradox.

Overlapping permutations: Analyze sequences of five consecutive random numbers. The 120 possible orderings should occur with
statistically equal probability.

Ranks of matrices: Select some number of bits from some number of random numbers to form a matrix over {0,1}, then
determine the rank of the matrix. Count the ranks.

Monkey tests: Treat sequences of some number of bits as "words". Count the overlapping words in a stream. The number of
"words" that don't appear should follow a known distribution. The name is based on the infinite monkey theorem.

Count the 1s: Count the 1 bits in each of either successive or chosen bytes. Convert the counts to "letters", and count the
occurrences of five-letter "words".

Parking lot test: Randomly place unit circles in a 100 x 100 square. If the circle overlaps an existing one, try again. After 12,000
tries, the number of successfully "parked" circles should follow a certain normal distribution.






fi
Marsaglia’s diehard tests (cont.)
Minimum distance test: Randomly place 8,000 points in a 10,000 x 10,000 square, then find the minimum distance between the
pairs. The square of this distance should be exponentially distributed with a certain mean.

Random spheres test: Randomly choose 4,000 points in a cube of edge 1,000. Center a sphere on each point, whose radius is the
minimum distance to another point. The smallest sphere's volume should be exponentially distributed with a certain mean.

The squeeze test: Multiply 231 by random floats on [0,1) until you reach 1. Repeat this 100,000 times. The number of floats needed
to reach 1 should follow a certain distribution.

Overlapping sums test: Generate a long sequence of random floats on [0,1). Add sequences of 100 consecutive floats. The sums
should be normally distributed with characteristic mean and sigma.

Runs test: Generate a long sequence of random floats on [0,1). Count ascending and descending runs. The counts should follow a
certain distribution.

The craps test: Play 200,000 games of craps, counting the wins and the number of throws per game. Each count should follow
a certain distribution.
Inverse Transform - Derivation 1

Consider random variable X

We have pX(x) (usually a uniform or Gaussian distribution)

Consider random variable Y and Y = f(X)


What is pY(y) ?
Inverse Transform - Derivation 1

dx
pY(y) = pX (f (y))
−1
dy

pY(y) = pX (f (y)) (f (y))−1 −1 ′

dx
pY(y) = pX (f (y))
−1
dy
Note: Here y = f(x) is an increasing function.
If we have a decreasing function we still get a positive
probability density function. In general we set:

Prob (y ≤ Y ≤ y + dy) = Prob (x ≤ X ≤ x + dx) dx


pY(y) = pX (f (y))
−1
dy
pY(y) dy = pX(x) dx

dx
Inverse Transform - Example pY(y) = pX (f (y))
−1
dy

1 − (x − x0)
2

Let X be a Gaussian variable and: pX(x) = e 2σ2

2πσ 2
x
De ne a new variable Y so that y = f(x) = e . What is pY(y) ?
dx d ln y 1
= [ f (y)] =
First we invert: −1 ′
−1 Then compute: =
x = f (y) = ln y dy dy y

dx 1 (lny − x0)
2
1
pY(y) = pX (f (y))
−1
→ pY(y) = e −
2σ2 ⋅
dy 2πσ 2 y

1 − (lny − x0)
2

pY(y) = e 2σ2

y 2πσ 2

fi
du
HOW TO SAMPLE? Inverse Transform- Uniform Probabilities to the rescue pX(x) = pU (f (u))
−1
dx
Let X be a random variable (RV) with a probability density function (PDF) pX(x) and cumulative density
function (CDF) FX(x) . TASK: How to sample from pX(x) ?
A transformation between the values of
s the random variables X and U is
∫−∞ x = g(u)
Note that FX(x) = pX(s)ds introduced as follows:

INVERSE “DESIGN”: we seek the function g(u) such that the


random variable X follows the desired distribution pX(x).

Given x = g(u), it is known that the


PDFs pX(x) and pU(u) are given by:
pX(x)dx = pU(u)du

−1 −1
du dg (x) dg(u)
or, equivalently, the PDF pX(x) is: pX(x) = pU(u) = pU(u) = pU(u)
dx dx du
du
Inverse Transform - Uniform Probabilities to the rescue pX(x) = pU (f (u))
−1
dx

Since U is uniform, one has that pU(u) = 1 for u ∈ [0,1]

and integration of pX(x)dx = pU(u)du yields


x u

∫−∞ ∫0
pX(s)ds = pU(s)ds = u → FX(x) = u Thus the following transformation is true:
−1
x= FX (u)

which means that the values of x are given by the inverse of the CDF FX(x) which also this inverse represents
exactly the function g(u).

This transformation allows to draw samples x(i), i = 1,··· ,N from


the PDF pX(x) as x = FX (u ) where u are samples from
(i) −1 (i) (i)

the uniform distribution over interval [0, 1].


Sampling from the Exponential Distribution
SAMPLING USING INVERSE TRANSFORMS - TAKE II

Let X be a real random variable with probability density function (PDF) p(x)

and a corresponding cumulative distribution function (CDF):


x

∫−∞
Fx(x) = p(τ)dτ (probability of measuring
a value less than x)

EXAMPLE:
CDF of an unbiased dice
SAMPLING USING INVERSE TRANSFORMS - TAKE II

The random variable Y de ned as Y = FX(x) is uniformly distributed in


the interval 0 ≤ Y ≤ 1 irrespective of the nature of p(x)

PROOF:
Y = FX(x) is monotonic and continuous

FY(y) = P(Y ≤ y) = P (FX(x) ≤ y) =


= P (x ≤ FX (y))
−1
=
y y

∫−∞ ∫−∞
= FX (FX (y))
−1
=y= p(τ)dτ = U(τ)dτ

∫−∞
So, FY(y) = U(τ)dτ and FY(y) is the CDF of a uniform U(0,1) random variable

So, Y = FX(x) has a uniform PDF U(0,1) irrespective of pX(x)


fi
SAMPLING USING INVERSE TRANSFORMS - TAKE II

We can sample a PDF pX(x) with a

corresponding CDF FX(x) by

sampling a uniform distribution

u ∼ U(0,1) u

−1
and then taking x = FX (u)

x
EXAMPLE 1 - Uniform (0,1) to Uniform(a,b)

Let X have a pX(x) ∼ U(a, b) - how can we sample an x

0, x<α
x−α
Then: FX(x) = b−α
, α ≤ x ≤ b
1, x>b

−1
Then : x ∼ FX (u)

so that x = a + (b − a)u where u ∼ U(0,1)


EXAMPLE 2 - Uniform (0,1) to Exponential
1 − βx
Let X be an exponential random variable pX(x) = e
β

{1 − e
0, x<0
Then: FX(x) = x
−β
, x≥0
x x
−β −β
Then : u = FX(x) = 1 − e →1−u=e → x = − β Log[1 − u]

Recall that if u ∼ U(0,1) then 1 − u = v ∼ (0,1) so that

x = − β Log v
with v ∼ U(0,1)
What if pX(x) is complex ?
“complex” meaning that FX(x) is not invertible
Monte Carlo Methods
Monte Carlo Methods use random numbers to solve two types of problems:

1. Generate samples {χ }
(r) R
from a given distribution pX(x)
r=1

2. Estimate Expectations of functions under the distribution pX(x)


N
Φ = ⟨ϕ(x)⟩ = ϕ(x)p(x)d x for an N-dimensional space
Monte Carlo Methods
b

∫α
Recall that for an Integral of the form I = f(x)dx
this can be expressed as the average value of f(x) under
the uniform probability distribution U(x)

∫a
I = ⟨f(x)⟩ = f(x) U(x) dx

We focus on the sampling problem as if we solve it then we can solve the second
(r)
problem wing random samples x from p to get the estimate:

∑r Φ (x )
1 R (r)
Φ≡ R
Monte Carlo Methods : Two interpretations

f (xi)
b−a n
In = n
∑i=1
computing the mean value (or height) of
the function and multiplying by the
interval length
n

[n ∑ ]
1
In = (b − a) × f (xi)
i=1

n [∑ ]
1
In = f (xi) × (b − a)
i=1

or computing the average of several


Two interpretations of the Monte Carlo estimator rectangle rules with random evaluation
locations (bottom).
Monte Carlo Methods : Estimating the Error
1
I = | Ω | ⟨f⟩ ⟨f⟩ = ∫
|Ω| Ω
f(x)d
⃗ x,⃗ | Ω | = ∫Ω d x ⃗ .

f (xi⃗ )
1 M
⟨f⟩ ≈ ⟨f⟩M = M
∑i=1

Due to the random nature of Monte Carlo sampling (changing random number sequence would result in a slightly different
result), the estimate ⟨f ⟩M itself is a random variable. However, assuming that all samples xi are independent we obtain that
the expected value of this estimate ⟨f ⟩M is the exact integral

[⟨f⟩M] = [M f (xi⃗ )] = [ ( ) ]
1 M 1 M 1 M 1
∑i=1 M
∑i=1 f x ⃗
i = M
∑i=1 ⟨f⟩ = M
M⟨f⟩ = ⟨f⟩

Such estimators are called unbiased ("true") estimators.


𝔼
𝔼
𝔼
Monte Carlo Methods : Estimating the Error
The MC estimate is stochastic, and we want to estimate the error
as a function of the number of samples M We have obtained that the variance of the
Monte Carlo {estimator} ⟨f ⟩M is M times
smaller than the initial variance of the
integrand f Hence, the Monte Carlo
method is1/2-order accurate, i.e.
εM2 = Var[⟨f ⟩M] = ⟨⟨f ⟩2M⟩ − ⟨⟨f ⟩M⟩2
M
1
( )
2

= 2 [ f(x )f(x )] − ⟨f ⟩
M i,j=1 i j
Var[ f ]
(M )
−1/2
ϵM = =
1 M
1 M
M
( [ f(xi) ] − ⟨f ⟩ ) + 2
∑ (
− ⟨f ⟩2)
2 2
M ∑
= 2 [ f(xi)f(xj)]
i=1
M i, j = 1
i≠j = [ f(xi)] [ f(xj)]=⟨ f⟩2
M Notice that order 1/2 is independent of the
1
( ⟨f ⟩ − ⟨f ⟩ )
2 2 dimension d of the integrand f .
M ∑
= 2
𝔼
𝔼
i=1
Monte Carlo sampling has a signi cant
2 2
1 ⟨f ⟩ − ⟨f ⟩ Var[ f ]
= 2 M(⟨f ⟩ − ⟨f ⟩ ) =
𝔼
𝔼
2 2
= . advantage over quadrature methods for very
M M M high-dimensional integrals.
EXAMPLE: Compare Monte Carlo sampling to Simpson's rule
𝔼
with order of accuracy 4/d . MC quadrature is more ef cient
𝒪
(meaning1/2 > 4/d ) for any dimension d > 8.
fi
fi
SAMPLING PROBABILITY DISTRIBUTIONS

Assume we are given a high-dimensional random vector X with pdf pX(x).


pX(x) can be evaluated at each x.

Why is sampling from pX(x) difficult ?


Difficulty I:
In many cases, pX(x) is only known up to a multiplicative constant:
pX̂ (x)
pX(x) =
c
Computing c is very costly as a high-dimensional integral needs to be solved


c = pX̂ (x) dx
Example:
Bayesian Inference:
The posterior for the parameters θ given data D given is proportional to the product
of likelihood and prior: p(θ ∣ D) ∝ p(D ∣ θ)p(θ).
An exact computation would involve the normalisation constant:


c = p(D ∣ θ)p(θ) dθ
Diffiadty#2_ ÷
. It is not obvious how to

Difficulty II: sample from a function p G) without


possible states
enumerating most or all .

definition most samples from P


By ,

should from where p G)


Most samples from p(x) should come from an area where pX(x) is large.
come areas

is
large ;
HOW
evaluating pX(x)
FIND THESE AREAS WITHOUT
-> How can we find this EVALUATING
area withoutG) everywhere ??
CAN WE

?
p
everywhere
Eixample
Assume we wish to draw samples from
the
density pcx) =p G) 18
*
&

p*(x) =
exp [0.41×-0.47-0.08×4]
p¥x) TB

HR
÷÷E#T¥
or

Back to the first difficulty do not we

See Difficulty
know how to I: Wethe
compute don’t know c
normalizing !
constant c.
evaluate
density which we can

Sampling via Rejection Sampling →


before
Assume
,
call it

we
also
Qlx)

know
=
Q*GYc
value o
a constant such that
We want to sample from the probability distribution y
:

PE) tfx
pX(x) ! y
G) >
,
.

But we do not know how to sample from p.


Idea: Sample from Q instead with γQ > p !
¥Q*G)
ALGORITHM: • Generate X from Q
(i)
1. Generate x sample from a given distribution QX(x) using a random number Evaluate
yQ*Cx
• )

generator Generate •
uniform a

distributed variable u
(i)
2. Evaluate γQX(x ) the interval [
,yQ*c o

If u > p*( ) rejected else acce


(i) (i)
3. Sample u from the uniform distribution U[0,γQX(x )]
• ✗ x →
is

(i) (i) (i)


4. If u ≤ pX(x ) the sample x is accepted, else rejected
Repeat until the desired number of samples is obtained
̂ (x) is also possible !
Non-normalized pX
density
before call it Qlx) Q*GYc
Sampling via Rejection Sampling
=
,

Assume also know



we
value of
a constant such that
y
:

G) PE) tfx
y >
,
.

¥Q*G)
ALGORITHM: Generate X from Q
We sample from pX(x) only by evaluatingEvaluate from QX(x).

the proposed samples


yQ*Cx )

If pX(x) is close to QX(x) the acceptanceGenerate


chance is high.

a
uniformly
distributed variable u in

interval
-> If QX(x) is a good approximation for pX(x) this works well.
the [o
,yQ*c ]
If u > p*( x) ✗ is rejected else accepted
-> If not, γ needs to be large and we spend a lot of compute on rejecting samples


.
Sampling via Importance Sampling

Importance Sampling is a technique to evaluate integrals,


not to generate samples.

We want to compute the value of a N-dimensional integral


N
I = [ϕ(X)] = ϕ(x)pX(x)d x

But we do not know how to sample from p.


Idea: Sample from a simpler distribution q instead !
𝔼
Sampling via Importance Sampling

We want to compute the value of a N-dimensional integral


N
I = [ϕ(X)] = ϕ(x)pX(x)d x
But we do not know how to sample from p. Idea: Sample from q instead !

{x }i=1
(i) M
1. Generate independent samples from a given distribution qX(x) using
a random number generator
M (i)
1 (i) p(x )
M∑
2. Use the estimator: IM = ϕ(x )
i=1
q(x (i))

q should be a good approximation of p ! Relevant areas should represented !


𝔼
Importance Sampling: Why does it work ?

We want to compute the value of a N-dimensional integral


N
I = [ϕ(X)] = ϕ(x)pX(x)d x
But we do not know how to sample from p. Idea: Sample from q instead !

pX(x)

N
I = [ϕ(X)] = ϕ(x) qX(x)d x
qX(x)

We only change the function that is evaluated !

Problem: Inaccuracies in case qX(x) is small where


ϕ(x)pX(x) is large.
𝔼
𝔼
𝔼
Sampling via Importance Sampling
We can also use importance sampling if p and/or q are only known up
qX̂ (x) pX̂ (x)
to a constant, i.e. q = and p =
cq cp
Importance Sampling with self-normalization

{x }i=1
(i) M
1. Generate independent samples from a given distribution qX(x) using
a random number generator
(i)
i ̂
p(x )
2.Define w =
q(x̂ (i))
M (i) i
∑i=1 ϕ(x )w
3.Use the estimator: IM =
M
∑i=1 w i
Importance Sampling in Many Dimensions

Assume we want to sample from pX(x)


which is a uniform distribution inside a sphere

1 0 ≤ ρ(x) ≤ Rp
{0
pX̂ (x) =
ρ(x) ≥ Rp

with
N
2 1/2

ρ(x) = (xi )
i
Importance Sampling in Many Dimensions

Instead we sample from qX(x)


which is a normal distribution

N
2

qX(x) = (xi; 0,σ )
i=1

Thus, the distance from the origin is roughly distributed as

ρ ∼ Nσ ± 2
2Nσ 2

So most samples are within distance Nσ from the origin


𝒩
Importance Sampling in Many Dimensions

If we choose σ such that a typical set of q lies inside the sphere with radius Rp,
then it can be shown that the importance sampling weights take values:

max
w
∝ exp 2N
w min

Thus, the weights can have significantly different values !

Importance Sampling has two problems in high dimension:


- q needs to be a good approximation of p
- Weights in the approximation may vary widely
Generate Samples in High Dimensions

Both Importance Sampling and Rejection Sampling perform poorly


for sampling from high-dimensional distributions !

In both cases we generate a lot of samples that are are directly rejected
(Rejection sampling) or not useful for the estimator (Importance Sampling)
Monte Carlo: Metropolis Hastings
We want to sample from the probability distribution pX(x) !
Instead of a priori selecting qX(x) as in Rejection or Importance Sampling, we define a
proposal density based on our latest accepted sample.

n n
We define the transition density q(x′; x ) for x′ given x .
n
1. Start at the current sample x
n
2. Generate a new sample using q(x′; x )
n
p(x′)q(x ; x′)
3. Compute α = min(1, )
p(x )q(x′; x )
n n

4. Accept new state with probability α


n+1 n+1 n
5. x = x′ in case of acceptance, x = x in case of rejection







Monte Carlo: Metropolis Algorithm

n n
In case a symmetric proposal distribution q(x′; x ) for x′ given x is used, the
algorithm can be simplified to:

n
1. Start at the current sample x
n
2. Generate a new sample using q(x′; x )
p(x′)
3. Compute α = min(1, )
p(x )
n

4. Accept new state with probability α


n+1 n+1 n
5. x = x′ in case of acceptance, x = x in case of rejection

Are these samples independent ?







why is
sampling from PG) hard ? Diffiadty#2_ ÷
. It is not obvious how to

sample from function p G) without


Assume that the function p Cx)
a
can
enumerating most or all possible states
be evaluated at each X
.

definition most samples from P


. .

There are two


key difficulties By ,

should
:

come from areas where p G)


Difficulty
Usually pad is evaluated within is
large ;
a multiplicative constant So .
. we HOW CAN WE FIND THESE AREAS WITHOUT
have that p
Cx)
p*CxY& p G)
?
EVALUATING
everywhere
=

evaluate
To
usually implies
C
Eixample
computing the integral C=Jp*Gsdnx Assume we wish to draw samples from
the
density pcx) =p G) 18
*
' &
This be costly
'

can
very [0.41×-0.47-0.08×4]
.

p*(x) =
exp
Examples In Bayesian inference usually p¥x) TB

HR
:

we have that : plot d) pcd lol plot.

or
Specifying the full probability would ÷÷E#T¥
have required computing the Back to the first difficulty do not
normalisation factor Sp Cd lot plot do
we

know how to
.

compute the normalizing


.

high dimensional constant c.


We could discretize the space X &
ask IMPORTANCE SAMPLING
for samples from the discrete
finite set of
probability Importance sampling Is not a method
distribution uniformly
generate
over a

spaced points { Xi } .
to samples from p
G) it is

If we evaluate p¥P*Cxi ) at each a method of computing integrals


point
I=so(x)§=|oG)pG)dx.@l*
Xi
get
we -

c. =
.FI?.*&Pi--P*& Assume
& we can sample the distribution
we have p☒==p*☒ #
where pp be evaluated
using various methods
by generating can
everywhere
random bits .

Assume ☒☒ istoocomph.ca/-edt#
WHAT IS THE COST ? HOW DOES IT SCALE
sample from .

WITH DIMENSIONS ?

! what if we have a
function
To compute c we visit
every point qcx) that is
easy enough
to
1- d
sample from .C within
For
constant)
in assume we
space .

a
need 50 points .
If we have
9¥)
ngoNNH①have evaluate q G)
.

*
A- dimensions the we G)
can & g =

FT
Usually systems
but
we care

N== 55 to go Noooo is
THIS IS THE hot unusual
CURSE OF DPMENSIONALITY
Generate R samples {× }rI from
"
G) When we perform the
sampling
wW*==p*(×¥_ bnd_uated
q

¢ P
we let

¥
Q¥¥L÷÷Fed
So
that =/ dad paid =/ d G)
>
P¥*gq¥ d×=

when ②
g*④p*④ for some✗

f fK×)¥¥¥q¥ d
points overrepresented
② 9*1×1 < p*G)
points underrepresented
-

I q*G)dx

a)
So we sample :

that
folk
)pG)dx =
☒ G)
P→.q*G)dx
so
← ☒→ r=_ Dorr

9*6) EWW
with the normalisation :
HEI

§:¥¥-q*G)dx 1
1- f¥¥¥q*G)dx -1 Problems : When lÉp*GH is
large & 9*6 ) is

small there is no
proper sampling
=i¥¥=*d×
.

.
C → estimator will be
drastically wrong .
IMPORTANCE SAMPLING in dimensions Assume now that 0 that
is chosen
many
so

set of Q lies inside sphere


Is importance sampling
useful ? typical
the
of radius Rp .

Assume for simplicity that p G) is So most values for Q will be in the


uniform distribution inside sphere
É#exp$-¥±¥D

range
a .

1 0£
pads Rp
p G)
{
*
=

0
pcx )> Rp so Wr =

p*/q will
typically
where ( Ein
"
:) have values the
pad in
range
:

Assume now a proposal density that e×P( F- ±rn÷ )


is Gaussian (
easy
to sample)
qcx) = RNAi ; 0,07
i.
so
rough ratio of
'µWr*e*•☒Ñ¥É
☒ Orm
'

An importance
troubled if
sampling
dominated
estimator will be
few
This can
beoeryLarge#'☒
← ☒ → is
by SO IMPORTANCE SAMPLING HAS TWO ISSUES
large weights
.

of
.

needs to be of p
What is the
range ur values ? ② 9 good approximation .

If p distance from ② weights in the approximation


origin
is a
may
sample from q then pod has roughly
,

vary widely
Gaussian distribution & No ± Tenor
pen
-
a

→ most samples lie within Two from origin


REJECTION SAMPLING
Why does
Rejection Sampling work?
.

An alternative method for


sampling
from complex distributions .

jQ¥x4

÷÷÷¥¥¥¥¥¥¥¥¥¥÷¥
i÷÷÷mt¥¥.%*
simpler proposal
-

→ Assume we have a

can evaluate as
density
before ,
which
call it
we

Qlx) =
Q*GYc
- Point G. a)
lies under
→ Assume we
also know value of
constant that
.

a
y
such :
if it also ties
under
p G) then it
*

G) PE) tfx plxr)


y >
,
.

is accepted .

¥Q*G)
well
Rejection
if Q
sampling
approximation of
works

If
good is a

different then
P
.

they & are

there will be a lot ⑧


of
must be

large rejections .

ALGORITHM: • Generate X from Q


• Evaluate
yQ*Cx ) NoTE_ :
In
high dimensions one
may
Generate uniformly estimate

distributed
a

variable
you exp ( Nen
¥-1
u in

the interval [ o ,yQ*c ] so it not useful


very
is .

• If u> p*( )
x →
✗ is rejected else accepted .
METROPOLIS HASTINGS METHOD
symmetric
-

Note It of is
Importance
Rejection sampling imply
:

&

ftp.oriseetedqx This q
G) .
then q
ki
; ×
'

) =
qcx ; xD
'

suc-ceed.it#aytooTerioeq
must be close to pcx) to

( xD
in a "

systematic fashion ? q x
; =L

that depends 96¥


kEY#EA_ : Use a
proposal density
to the
Returning fax
on the current sample It!
of the proposal density expression
shape
:
The

willchangeaccording.to/ocation.-- ✗ = P*#
As before ,
we assume that we can evaluate p*(×n)
p G)
* for the
,

any X .

This is

(☒ ; * D) METROPOLIS HASTINGS
q¢* ;
* $'
We introduce "
* BE " -

from ALGORITHM
Hecan be
⑤- density it which we can

draw samples & is centered on

☒*☒M• For example qk*¥#D==¢*"§*% #


Notating :& K* ;
" "
☒ D is centered an ×
"
& ¥••
'
¢☒";*H is centered on × .
The visualization presents monthly global temperature
anomalies between the years 1880-2021. These
temperatures are based on the GISS Surface
Temperature Analysis (GISTEMP v4), an estimate of
global surface temperature change. Anomalies are
de ned relative to a base period of 1951-1980. The
data le used to create this visualization can be
accessed here.

The Goddard Institute of Space Studies (GISS) is a


NASA laboratory managed by the Earth Sciences
Division of the agency’s Goddard Space Flight Center
in Greenbelt, Maryland. The laboratory is af liated
with Columbia University’s Earth Institute and School
of Engineering and Applied Science in New York.

The 'climate spiral' is a visualization designed by


climate scientist Ed Hawkins from the National Centre
for Atmospheric Science, University of Reading.
Climate spiral visualizations have been widely
distributed, a version was even part of the opening
ceremony of the Rio de Janeiro Olympics.
fi
fi
fi
The visualization presents monthly global temperature
anomalies between the years 1880-2021. These
temperatures are based on the GISS Surface
Temperature Analysis (GISTEMP v4), an estimate of
global surface temperature change. Anomalies are
de ned relative to a base period of 1951-1980. The
data le used to create this visualization can be
accessed here.

The Goddard Institute of Space Studies (GISS) is a


NASA laboratory managed by the Earth Sciences
Division of the agency’s Goddard Space Flight Center
in Greenbelt, Maryland. The laboratory is af liated
with Columbia University’s Earth Institute and School
of Engineering and Applied Science in New York.

The 'climate spiral' is a visualization designed by


climate scientist Ed Hawkins from the National Centre
for Atmospheric Science, University of Reading.
Climate spiral visualizations have been widely
distributed, a version was even part of the opening
ceremony of the Rio de Janeiro Olympics.
fi
fi
fi

You might also like