Week 7 Lecture
Week 7 Lecture
Week 7 Lecture
2/54
Monte Carlo Integration
b
Let g(x) be a function and suppose that we want to compute ∫a g(x)
(assuming that this integral exists). Recall that if X is a random variable with
density f (x), then the mathematical expectation of the random variable
Y = g(X) is
3/54
Simple Monte Carlo estimator
Compute a Monte Carlo estimate of
1
−x
θ = ∫ e dx
0
4/54
Simple Monte Carlo estimator
m <- 10000
x <- runif(m)
theta.hat <- mean(exp(-x))
print(theta.hat)
## [1] 0.6339235
print(1 - exp(-1))
## [1] 0.6321206
5/54
Simple Monte Carlo estimator
Compute a Monte Carlo estimate of
4
−x
θ = ∫ e dx
2
and compare the estimate with the exact value of the integral.
6/54
Simple Monte Carlo estimator
m <- 10000
x <- runif(m, min=2, max=4)
theta.hat <- mean(exp(-x)) * 2
print(theta.hat)
## [1] 0.1169499
print(exp(-2) - exp(-4))
## [1] 0.1170196
7/54
Exercise 1
Compute a Monte Carlo estimate of
π/3
∫ sin tdt
0
and compare your estimate with the exact value of the integral.
8/54
Simple Monte Carlo estimator
Use the Monte Carlo approach to estimate the standard normal cdf
x
1 2
−t /2
ϕ(x) = ∫ −− e dt
−∞ √2π
First, notice that we cannot apply the algorithm above directly because the
limits of integration cover an unbounded interval. However, we can break
this problem into two cases: x ≥ 0 and x < 0, and use the symmetry of the
normal density to handle the second case.
9/54
Simple Monte Carlo estimator
x
Next, estimate θ for x . Let y , ,
2
−t /2
= ∫ e dt > 0 = t/x dt = xdy
0
1
2
−(xy) /2
θ = ∫ xe dy
0
1 2
^ = g (u) = −(ui x) /2
θ m̄ ∑ xe
m
−−
If , ^
x > 0 ϕ(x) = 0.5 + θ / √ 2π . For x ,
< 0 ϕ(x) = 1 − ϕ(−x) .
10/54
Simple Monte Carlo estimator
11/54
Simple Monte Carlo estimator
Now the estimates θ^ for ten values of x are stored in the vector cdf. Compare
the estimates with the value ϕ(x) computed (numerically) by the pnorm
function.
## [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
## x 0.10 0.367 0.633 0.900 1.167 1.433 1.700 1.967 2.233 2.500
## cdf 0.54 0.643 0.737 0.816 0.879 0.925 0.957 0.978 0.991 0.998
## Phi 0.54 0.643 0.737 0.816 0.878 0.924 0.955 0.975 0.987 0.994
12/54
Simple Monte Carlo estimator
Results for several values x > 0 are shown compared with the value of the
normal cdf function pnorm. The Monte Carlo estimates appear to be very
close to the pnorm values. (The estimates will be worse in the extreme upper
tail of the distribution.)
13/54
Exercise 2
Compute a Monte Carlo estimate of the standard normal cdf, by generating
from the Uniform(0,x) distribution. Compare your estimates with the normal
cdf function pnorm. Compute an estimate of the variance of your Monte
Carlo estimate of ϕ(2), and a 95% confidence interval for ϕ(2).
14/54
Simple Monte Carlo estimator
Let I (⋅) be the indicator function, and Z ∼ N (0, 1). Then for any constant
x we have E[I (Z ≤ x)] = P (Z ≤ x) = ϕ(x) , the standard normal cdf
evaluated at x.
15/54
Simple Monte Carlo estimator
16/54
Simple Monte Carlo estimator
Now the estimates in p for the sequence of x values can be compared to the
result of the R normal cdf function pnorm.
## [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
## x 0.100 0.367 0.633 0.900 1.167 1.433 1.700 1.967 2.233 2.500
## p 0.536 0.648 0.741 0.821 0.880 0.923 0.954 0.975 0.986 0.993
## Phi 0.540 0.643 0.737 0.816 0.878 0.924 0.955 0.975 0.987 0.994
It appears that we have better agreement with pnorm in the upper tail, but
worse agreement near the center.
17/54
Variance
^2
σ 1
¯ 2
= ∑[g(xi ) − g(x)]
2
m m
Exercise:
18/54
Error bounds for MC integration
x <- 2
m <- 10000
z <- rnorm(m)
g <- (z < x)
#the indicator function
v <- mean((g - mean(g))^2) / m
cdf <- mean(g)
c(cdf, v)
19/54
Error bounds for MC integration
The probability P (I (Z < x) = 1) is ϕ(2) ≈ 0.977 .
Here g(X) has the distribution of the sample proportion of 1’s in m = 10000
Bernoulli trials with p ≈ 0.977, and the variance of g(X) is therefore
(0.977)(1 − 0.977)/10000 = 2.223e
−06
.
20/54
For x = 2.5 the output is
x <- 2.5
m <- 10000
z <- rnorm(m)
g <- (z < x)
#the indicator function
v <- mean((g - mean(g))^2) / m
cdf <- mean(g)
c(cdf, v)
21/54
For x = 2.5 the output is
The probability P (I (Z < x) = 1) is ϕ(2.5) ≈ 0.995 .
22/54
Variance Reduction
We have seen that Monte Carlo integration can be applied to estimate
functions of the type E[g(X)].
23/54
Variance Reduction
The variance of the Monte Carlo estimator is
V ar f g(X)
^) = V ar(Y
V ar(θ ¯) =
m
24/54
Variance Reduction
Thus, although variance can always be reduced by increasing the number of
Monte Carlo replicates, the computational cost is high. Other methods for
reducing the variance can be applied that are less computationally expensive
than simply increasing the number of replicates:
· Antithetic variables
· Control variates
· Importance sampling
· Stratified sampling
25/54
Antithetic variates
By using antithetic variates, the negative correlation between the pairs helps
cancel out some of the random fluctuations, leading to a reduction in
variance. This reduction in variance allows for more accurate estimation
using the same number of samples or fewer samples to achieve a desired
level of precision.
26/54
Antithetic Variates
· When U1 , U2 are independent,
.
1
V ar[(U1 + U2 )/2] = [V ar(U1 ) + V ar(U2 )]
4
27/54
Antithetic Variates
1. Generate a random sample from the underlying distribution of interest.
4. Calculate the average of the estimators obtained from the original and
antithetic pairs.
28/54
Antithetic Variates
Consider the standard normal cdf
x
1 2
−t /2
ϕ(x) = ∫ −− e dt
−∞ √2π
29/54
Antithetic Variates
30/54
Antithetic Variates
31/54
Antithetic Variates
The approximate reduction in variance can be estimated for given x by a
simulation under both methods, the simple Monte Carlo integration
approach and the antithetic variable approach.
32/54
Antithetic Variates
m <- 1000
MC1 <- MC2 <- numeric(m)
x <- 1.95
for (i in 1:m) {
MC1[i] <- MC.Phi(x, R = 1000, anti = FALSE)
MC2[i] <- MC.Phi(x, R = 1000)
}
33/54
Antithetic Variates
print(sd(MC1))
## [1] 0.006874616
print(sd(MC2))
## [1] 0.0004392972
print((var(MC1) - var(MC2))/var(MC1))
## [1] 0.9959166
34/54
Control Variates
· Involves introducing a known and correlated random variable, called a
control variate, to reduce the variance of the estimator.
· By choosing an appropriate control variate, which has a similar behavior to
the variable being estimated, the variance of the estimator can be
significantly reduced.
· The idea is to estimate the difference between the variable of interest and
the control variate, as this difference has lower variance than estimating
the variable directly.
35/54
Control Variates
1. Define the variable of interest.
6. Combine the estimator for the variable of interest and the estimator for
the control variate using the control variate coefficient.
7. Use the control variate estimator to obtain the final estimate of the
variable of interest.
36/54
Control Variates
Apply the control variate approach to compute:
1
U u
θ = E[e ] = ∫ e du
0
U 2U 2
V ar(g(U )) = V ar(e ) = E[e ] − θ
2
e − 1
2
= − (e − 1) = 0.2420351
2
37/54
Control Variates
A natural choice ofr control variate is U ∼ unif (0, 1) , then E[U ] = 1/2
U
cov(e , U ) = 1 − (1/2)(e − 1) = 0.1408591
Hence
U
−Cov(e , U)
∗
c = = −12 + 6(e − 1) = −1.690309
V ar(U )
38/54
Control Variates
For m replicates:
U 2
∗
[Cov(e , U )]
^c ) = V ar(e
mV ar(θ
U
) −
V ar(U )
2
e − 1 e − 1
2
= − (e − 1) − 12(1 − )
2 2
2
= 0.2420356 − 12(0.1408591) = 0.003940175.
The percent reduction in variance using the control variate compared with
the simple Monte Carlo estimate is 100(1-0.003940175/0.2429355) =
98.3781%.
39/54
Control Variates
m <- 10000
a <- - 12 + 6 * (exp(1) - 1)
U <- runif(m)
T1 <- exp(U)
#simple MC
T2 <- exp(U) + a * (U - 1/2)
#controlled
40/54
Control Variates
mean(T1)
## [1] 1.711419
mean(T2)
## [1] 1.717676
## [1] 0.9837252
The percent reduction in variance using the control variate compared with
the simple Monte Carlo estimate is 98.3781%.
41/54
Importance sampling
The average value of a function g(x) over an interval (a, b) is usually defined
(in calculus) by
1
g(x)dx
b − a
Here a uniform weight function is applied over the entire interval (a, b).
42/54
Importance sampling
If X is a random variable uniformly distributed on (a, b), then
a a
1 1
E[g(X)] = ∫ g(x) dx = ∫ g(x)dx
b
b − a b − a b
which is simply the average value of the function g(x) over the interval (a, b)
with respect to a uniform weight function. The simple Monte Carlo method
generates a large number of replicates X1, . . . , Xm uniformly distributed
a
on [a, b] and estimates ∫b g(x)dx by the sample mean
m
b − a
∑ g(Xi )
m
i=1
a
which converges to ∫b g(x)dx with probability 1 by the strong law of large
numbers.
43/54
Importance sampling
· One limitation of this method is that it does not apply to unbounded
intervals.
· Another drawback is that it can be inefficient to draw samples uniformly
across the interval if the function g(x) is not very uniform.
· However, once we view the integration problem as an expected value
problem, it seems reasonable to consider other weight functions (other
densities) than uniform.
· This leads us to a general method called importance sampling.
44/54
Importance sampling
1 −x
e
∫ dx
2
0
1 + x
−x
f1 (x) = e ,0 < x < ∞
2 −1
f2 (x) = (1 + x ) /π, −∞ < x < ∞
−x −1
f3 (x) = e /(1 − e ), 0 < x < 1
2 −1
f4 (x) = 4(1 + x ) /π, 0 < x < 1
45/54
Importance sampling
The integrant is
−x 2
g(x) = e /(1 + x )if (0 < x < 1)
g(x) = 0otherwise
46/54
Importance sampling
m <- 10000
theta.hat <- se <- numeric(5)
g <- function(x) {
exp(-x - log(1+x^2)) * (x > 0) * (x < 1)
}
47/54
Importance sampling
48/54
Importance sampling
rbind(theta.hat, se)
49/54
Stratified sampling
· Another approach to variance reduction is stratified sampling, which aims
to reduce the variance of the estimator by dividing the interval into strata
and estimating the integral on each of the stratum with smaller variance.
· Linearity of the integral operator and the strong law of large numbers
imply that the sum of these estimates converges to ∫ g(x)dx with
probability 1.
· In stratified sampling, the number of replicates m and number of
replicates mj to be drawn from each of k strata are fixed so that m = m1 +
· · · + mk, with the goal that
^)
^ (m , . . . , m )) < V ar(θ
V ar(θ k 1 k
50/54
Stratified sampling
1
Variance reduction for ∫0 e
−x 2
(1 + x )
−1
dx
M <- 10000
#number of replicates
k <- 10
#number of strata
r <- M / k
#replicates per stratum
N <- 50
#number of times to repeat the estimation
51/54
Stratified sampling
T2 <- numeric(k)
estimates <- matrix(0, N, 2)
g <- function(x) {
exp(-x - log(1+x^2)) * (x > 0) * (x < 1)
}
for (i in 1:N) {
estimates[i, 1] <- mean(g(runif(M)))
for (j in 1:k)
T2[j] <- mean(g(runif(M/k, (j-1)/k, j/k)))
estimates[i, 2] <- mean(T2)
}
52/54
Stratified sampling
apply(estimates, 2, mean)
apply(estimates, 2, var)
53/54
End
THE END
54/54