AM207 2 Transforms Sampling
AM207 2 Transforms Sampling
Petros Koumoutsakos
Applied Mathematics
cse-lab.seas.harvard.edu
TODAY’s CLASS - STOCHASTIC VARIABLES
Sampling of Variables
Random Numbers
Inverse Transform
Simple Monte Carlo
Monte Carlo Variations
Importance Sampling
Rejection Sampling
What is a Random Number ? (answer by ChatGPT+)
A random number is a value that lacks a discernible pattern or predictability, often generated
using algorithms or physical processes to introduce randomness into various applications.
In the context of computer science and mathematics, a random number is typically generated
using algorithms or physical processes that introduce an element of unpredictability. These
numbers are often used in various applications, such as simulations, cryptography, games, and
statistical analysis, to introduce randomness or uncertainty.
It's important to note that true randomness is difficult to achieve in a computer program, as
computers operate based on deterministic algorithms. Instead, computer-generated random
numbers are often pseudo-random, meaning they appear random but are generated using a
deterministic process that starts from an initial value called a seed.
•
•
•
Generating random numbers
http://www.idquantique.com/
•
•
•
•
Pseudo Random numbers
Look nearly random however when algorithm is not known and may be good enough
for our purposes
As well-established generators fail new tests, better and better generators get developed
No!
Are they random enough? Maybe?
Statistical tests for distribution and correlations. Are these tests enough?No!
Your calculation could depend in a subtle way on hidden correlations!
The diehard tests are a battery of statistical tests for measuring the quality of a random number generator. They were
developed by George Marsaglia over several years and rst published in 1995 on a CD-ROM of random numbers.[1]
Birthday spacings: Choose random points on a large interval. The spacings between the points should be asymptotically Poisson
distributed. The name is based on the birthday paradox.
Overlapping permutations: Analyze sequences of five consecutive random numbers. The 120 possible orderings should occur with
statistically equal probability.
Ranks of matrices: Select some number of bits from some number of random numbers to form a matrix over {0,1}, then
determine the rank of the matrix. Count the ranks.
Monkey tests: Treat sequences of some number of bits as "words". Count the overlapping words in a stream. The number of
"words" that don't appear should follow a known distribution. The name is based on the infinite monkey theorem.
Count the 1s: Count the 1 bits in each of either successive or chosen bytes. Convert the counts to "letters", and count the
occurrences of five-letter "words".
Parking lot test: Randomly place unit circles in a 100 x 100 square. If the circle overlaps an existing one, try again. After 12,000
tries, the number of successfully "parked" circles should follow a certain normal distribution.
•
•
•
•
•
•
fi
Marsaglia’s diehard tests (cont.)
Minimum distance test: Randomly place 8,000 points in a 10,000 x 10,000 square, then find the minimum distance between the
pairs. The square of this distance should be exponentially distributed with a certain mean.
Random spheres test: Randomly choose 4,000 points in a cube of edge 1,000. Center a sphere on each point, whose radius is the
minimum distance to another point. The smallest sphere's volume should be exponentially distributed with a certain mean.
The squeeze test: Multiply 231 by random floats on [0,1) until you reach 1. Repeat this 100,000 times. The number of floats needed
to reach 1 should follow a certain distribution.
Overlapping sums test: Generate a long sequence of random floats on [0,1). Add sequences of 100 consecutive floats. The sums
should be normally distributed with characteristic mean and sigma.
Runs test: Generate a long sequence of random floats on [0,1). Count ascending and descending runs. The counts should follow a
certain distribution.
The craps test: Play 200,000 games of craps, counting the wins and the number of throws per game. Each count should follow
a certain distribution.
Inverse Transform - Derivation 1
dx
pY(y) = pX (f (y))
−1
dy
dx
pY(y) = pX (f (y))
−1
dy
Note: Here y = f(x) is an increasing function.
If we have a decreasing function we still get a positive
probability density function. In general we set:
1 − (x − x0)
2
2πσ 2
x
De ne a new variable Y so that y = f(x) = e . What is pY(y) ?
dx d ln y 1
= [ f (y)] =
First we invert: −1 ′
−1 Then compute: =
x = f (y) = ln y dy dy y
dx 1 (lny − x0)
2
1
pY(y) = pX (f (y))
−1
→ pY(y) = e −
2σ2 ⋅
dy 2πσ 2 y
1 − (lny − x0)
2
pY(y) = e 2σ2
y 2πσ 2

fi
du
HOW TO SAMPLE? Inverse Transform- Uniform Probabilities to the rescue pX(x) = pU (f (u))
−1
dx
Let X be a random variable (RV) with a probability density function (PDF) pX(x) and cumulative density
function (CDF) FX(x) . TASK: How to sample from pX(x) ?
A transformation between the values of
s the random variables X and U is
∫−∞ x = g(u)
Note that FX(x) = pX(s)ds introduced as follows:
−1 −1
du dg (x) dg(u)
or, equivalently, the PDF pX(x) is: pX(x) = pU(u) = pU(u) = pU(u)
dx dx du
du
Inverse Transform - Uniform Probabilities to the rescue pX(x) = pU (f (u))
−1
dx
∫−∞ ∫0
pX(s)ds = pU(s)ds = u → FX(x) = u Thus the following transformation is true:
−1
x= FX (u)
which means that the values of x are given by the inverse of the CDF FX(x) which also this inverse represents
exactly the function g(u).
Let X be a real random variable with probability density function (PDF) p(x)
∫−∞
Fx(x) = p(τ)dτ (probability of measuring
a value less than x)
EXAMPLE:
CDF of an unbiased dice
SAMPLING USING INVERSE TRANSFORMS - TAKE II
PROOF:
Y = FX(x) is monotonic and continuous
∫−∞ ∫−∞
= FX (FX (y))
−1
=y= p(τ)dτ = U(τ)dτ
∫−∞
So, FY(y) = U(τ)dτ and FY(y) is the CDF of a uniform U(0,1) random variable
u ∼ U(0,1) u
−1
and then taking x = FX (u)
x
EXAMPLE 1 - Uniform (0,1) to Uniform(a,b)
0, x<α
x−α
Then: FX(x) = b−α
, α ≤ x ≤ b
1, x>b
−1
Then : x ∼ FX (u)
{1 − e
0, x<0
Then: FX(x) = x
−β
, x≥0
x x
−β −β
Then : u = FX(x) = 1 − e →1−u=e → x = − β Log[1 − u]
x = − β Log v
with v ∼ U(0,1)
What if pX(x) is complex ?
“complex” meaning that FX(x) is not invertible
Monte Carlo Methods
Monte Carlo Methods use random numbers to solve two types of problems:
1. Generate samples {χ }
(r) R
from a given distribution pX(x)
r=1
∫
N
Φ = ⟨ϕ(x)⟩ = ϕ(x)p(x)d x for an N-dimensional space
Monte Carlo Methods
b
∫α
Recall that for an Integral of the form I = f(x)dx
this can be expressed as the average value of f(x) under
the uniform probability distribution U(x)
∫a
I = ⟨f(x)⟩ = f(x) U(x) dx
We focus on the sampling problem as if we solve it then we can solve the second
(r)
problem wing random samples x from p to get the estimate:
∑r Φ (x )
1 R (r)
Φ≡ R
Monte Carlo Methods : Two interpretations
f (xi)
b−a n
In = n
∑i=1
computing the mean value (or height) of
the function and multiplying by the
interval length
n
[n ∑ ]
1
In = (b − a) × f (xi)
i=1
n [∑ ]
1
In = f (xi) × (b − a)
i=1
f (xi⃗ )
1 M
⟨f⟩ ≈ ⟨f⟩M = M
∑i=1
Due to the random nature of Monte Carlo sampling (changing random number sequence would result in a slightly different
result), the estimate ⟨f ⟩M itself is a random variable. However, assuming that all samples xi are independent we obtain that
the expected value of this estimate ⟨f ⟩M is the exact integral
[⟨f⟩M] = [M f (xi⃗ )] = [ ( ) ]
1 M 1 M 1 M 1
∑i=1 M
∑i=1 f x ⃗
i = M
∑i=1 ⟨f⟩ = M
M⟨f⟩ = ⟨f⟩
∫
c = pX̂ (x) dx
Example:
Bayesian Inference:
The posterior for the parameters θ given data D given is proportional to the product
of likelihood and prior: p(θ ∣ D) ∝ p(D ∣ θ)p(θ).
An exact computation would involve the normalisation constant:
∫
c = p(D ∣ θ)p(θ) dθ
Diffiadty#2_ ÷
. It is not obvious how to
is
large ;
HOW
evaluating pX(x)
FIND THESE AREAS WITHOUT
-> How can we find this EVALUATING
area withoutG) everywhere ??
CAN WE
?
p
everywhere
Eixample
Assume we wish to draw samples from
the
density pcx) =p G) 18
*
&
p*(x) =
exp [0.41×-0.47-0.08×4]
p¥x) TB
HR
÷÷E#T¥
or
See Difficulty
know how to I: Wethe
compute don’t know c
normalizing !
constant c.
evaluate
density which we can
we
also
Qlx)
know
=
Q*GYc
value o
a constant such that
We want to sample from the probability distribution y
:
PE) tfx
pX(x) ! y
G) >
,
.
generator Generate •
uniform a
distributed variable u
(i)
2. Evaluate γQX(x ) the interval [
,yQ*c o
G) PE) tfx
y >
,
.
¥Q*G)
ALGORITHM: Generate X from Q
We sample from pX(x) only by evaluatingEvaluate from QX(x).
•
yQ*Cx )
interval
-> If QX(x) is a good approximation for pX(x) this works well.
the [o
,yQ*c ]
If u > p*( x) ✗ is rejected else accepted
-> If not, γ needs to be large and we spend a lot of compute on rejecting samples
•
→
.
Sampling via Importance Sampling
∫
N
I = [ϕ(X)] = ϕ(x)pX(x)d x
∫
N
I = [ϕ(X)] = ϕ(x)pX(x)d x
But we do not know how to sample from p. Idea: Sample from q instead !
{x }i=1
(i) M
1. Generate independent samples from a given distribution qX(x) using
a random number generator
M (i)
1 (i) p(x )
M∑
2. Use the estimator: IM = ϕ(x )
i=1
q(x (i))
∫
N
I = [ϕ(X)] = ϕ(x)pX(x)d x
But we do not know how to sample from p. Idea: Sample from q instead !
pX(x)
∫
N
I = [ϕ(X)] = ϕ(x) qX(x)d x
qX(x)
{x }i=1
(i) M
1. Generate independent samples from a given distribution qX(x) using
a random number generator
(i)
i ̂
p(x )
2.Define w =
q(x̂ (i))
M (i) i
∑i=1 ϕ(x )w
3.Use the estimator: IM =
M
∑i=1 w i
Importance Sampling in Many Dimensions
1 0 ≤ ρ(x) ≤ Rp
{0
pX̂ (x) =
ρ(x) ≥ Rp
with
N
2 1/2
∑
ρ(x) = (xi )
i
Importance Sampling in Many Dimensions
N
2
∏
qX(x) = (xi; 0,σ )
i=1
ρ ∼ Nσ ± 2
2Nσ 2
If we choose σ such that a typical set of q lies inside the sphere with radius Rp,
then it can be shown that the importance sampling weights take values:
max
w
∝ exp 2N
w min
In both cases we generate a lot of samples that are are directly rejected
(Rejection sampling) or not useful for the estimator (Importance Sampling)
Monte Carlo: Metropolis Hastings
We want to sample from the probability distribution pX(x) !
Instead of a priori selecting qX(x) as in Rejection or Importance Sampling, we define a
proposal density based on our latest accepted sample.
n n
We define the transition density q(x′; x ) for x′ given x .
n
1. Start at the current sample x
n
2. Generate a new sample using q(x′; x )
n
p(x′)q(x ; x′)
3. Compute α = min(1, )
p(x )q(x′; x )
n n
n n
In case a symmetric proposal distribution q(x′; x ) for x′ given x is used, the
algorithm can be simplified to:
n
1. Start at the current sample x
n
2. Generate a new sample using q(x′; x )
p(x′)
3. Compute α = min(1, )
p(x )
n
should
:
evaluate
To
usually implies
C
Eixample
computing the integral C=Jp*Gsdnx Assume we wish to draw samples from
the
density pcx) =p G) 18
*
' &
This be costly
'
can
very [0.41×-0.47-0.08×4]
.
p*(x) =
exp
Examples In Bayesian inference usually p¥x) TB
HR
:
or
Specifying the full probability would ÷÷E#T¥
have required computing the Back to the first difficulty do not
normalisation factor Sp Cd lot plot do
we
know how to
.
spaced points { Xi } .
to samples from p
G) it is
c. =
.FI?.*&Pi--P*& Assume
& we can sample the distribution
we have p☒==p*☒ #
where pp be evaluated
using various methods
by generating can
everywhere
random bits .
Assume ☒☒ istoocomph.ca/-edt#
WHAT IS THE COST ? HOW DOES IT SCALE
sample from .
WITH DIMENSIONS ?
→
! what if we have a
function
To compute c we visit
every point qcx) that is
easy enough
to
1- d
sample from .C within
For
constant)
in assume we
space .
a
need 50 points .
If we have
9¥)
ngoNNH①have evaluate q G)
.
*
A- dimensions the we G)
can & g =
FT
Usually systems
but
we care
N== 55 to go Noooo is
THIS IS THE hot unusual
CURSE OF DPMENSIONALITY
Generate R samples {× }rI from
"
G) When we perform the
sampling
wW*==p*(×¥_ bnd_uated
q
¢ P
we let
¥
Q¥¥L÷÷Fed
So
that =/ dad paid =/ d G)
>
P¥*gq¥ d×=
✗
when ②
g*④p*④ for some✗
f fK×)¥¥¥q¥ d
points overrepresented
② 9*1×1 < p*G)
points underrepresented
-
I q*G)dx
→
a)
So we sample :
☒
that
folk
)pG)dx =
☒ G)
P→.q*G)dx
so
← ☒→ r=_ Dorr
9*6) EWW
with the normalisation :
HEI
§:¥¥-q*G)dx 1
1- f¥¥¥q*G)dx -1 Problems : When lÉp*GH is
large & 9*6 ) is
small there is no
proper sampling
=i¥¥=*d×
.
.
C → estimator will be
drastically wrong .
IMPORTANCE SAMPLING in dimensions Assume now that 0 that
is chosen
many
so
1 0£
pads Rp
p G)
{
*
=
0
pcx )> Rp so Wr =
p*/q will
typically
where ( Ein
"
:) have values the
pad in
range
:
An importance
troubled if
sampling
dominated
estimator will be
few
This can
beoeryLarge#'☒
← ☒ → is
by SO IMPORTANCE SAMPLING HAS TWO ISSUES
large weights
.
of
.
needs to be of p
What is the
range ur values ? ② 9 good approximation .
vary widely
Gaussian distribution & No ± Tenor
pen
-
a
jQ¥x4
÷÷÷¥¥¥¥¥¥¥¥¥¥÷¥
i÷÷÷mt¥¥.%*
simpler proposal
-
→ Assume we have a
•
can evaluate as
density
before ,
which
call it
we
Qlx) =
Q*GYc
- Point G. a)
lies under
→ Assume we
also know value of
constant that
.
a
y
such :
if it also ties
under
p G) then it
*
is accepted .
¥Q*G)
well
Rejection
if Q
sampling
approximation of
works
If
good is a
different then
P
.
large rejections .
distributed
a
variable
you exp ( Nen
¥-1
u in
• If u> p*( )
x →
✗ is rejected else accepted .
METROPOLIS HASTINGS METHOD
symmetric
-
Note It of is
Importance
Rejection sampling imply
:
&
ftp.oriseetedqx This q
G) .
then q
ki
; ×
'
) =
qcx ; xD
'
suc-ceed.it#aytooTerioeq
must be close to pcx) to
( xD
in a "
systematic fashion ? q x
; =L
willchangeaccording.to/ocation.-- ✗ = P*#
As before ,
we assume that we can evaluate p*(×n)
p G)
* for the
,
any X .
This is
(☒ ; * D) METROPOLIS HASTINGS
q¢* ;
* $'
We introduce "
* BE " -
from ALGORITHM
Hecan be
⑤- density it which we can