Week 7 Lecture

AA037014 Statistical Computing
7: Monte Carlo Integration & Variance Reduction

Dr. Diana Abdul Wahab
Monte Carlo Integration
· Monte Carlo integration is a numerical method used to estimate the value
of definite integrals, especially in cases where analytical solutions are
difficult or impossible to obtain.
· It is based on the principles of probability and random sampling.
· The basic idea: simulate a large number of random points within a specific
region, and then use these points to estimate the integral.
· The region of integration is defined by the limits of the integral and is often
referred to as the integration domain.
2/54
Monte Carlo Integration
b
Let g(x) be a function and suppose that we want to compute ∫a g(x)
(assuming that this integral exists). Recall that if X is a random variable with
density f (x), then the mathematical expectation of the random variable
Y = g(X) is
E[g(X)] = ∫ g(x)f (x)dx

−∞
If a random sample is available from the distribution of X , an unbiased

estimator of E[g(X)] is the sample mean.
3/54
Simple Monte Carlo estimator
Compute a Monte Carlo estimate of
1
−x
θ = ∫ e dx
0
and compare the estimate with the exact value
4/54
m <- 10000
x <- runif(m)
theta.hat <- mean(exp(-x))
print(theta.hat)
## [1] 0.6339235
print(1 - exp(-1))
## [1] 0.6321206
The estimate is θ^ = 0.6355 and θ = 1 − e

−1
= 0.6321 .
5/54
4
−x
θ = ∫ e dx
2
and compare the estimate with the exact value of the integral.
6/54
m <- 10000
x <- runif(m, min=2, max=4)
theta.hat <- mean(exp(-x)) * 2
print(theta.hat)
## [1] 0.1169499
print(exp(-2) - exp(-4))
## [1] 0.1170196
The estimate is θ^ = 0.1172 and θ = e

−2
− e
−4
.
= 0.1170
7/54
Exercise 1
π/3
∫ sin tdt
0
and compare your estimate with the exact value of the integral.
8/54
Use the Monte Carlo approach to estimate the standard normal cdf
x
1 2
−t /2
ϕ(x) = ∫ −− e dt
−∞ √2π
First, notice that we cannot apply the algorithm above directly because the
limits of integration cover an unbounded interval. However, we can break
this problem into two cases: x ≥ 0 and x < 0, and use the symmetry of the
normal density to handle the second case.
9/54
x
Next, estimate θ for x . Let y , ,
2
−t /2
= ∫ e dt > 0 = t/x dt = xdy
0
1
2
−(xy) /2
θ = ∫ xe dy
0
1 2
^ = g (u) = −(ui x) /2
θ m̄ ∑ xe
m
−−
If , ^
x > 0 ϕ(x) = 0.5 + θ / √ 2π . For x ,
< 0 ϕ(x) = 1 − ϕ(−x) .
10/54
x <- seq(.1, 2.5, length = 10)

m <- 10000
u <- runif(m)
cdf <- numeric(length(x))
for (i in 1:length(x)) {
g <- x[i] * exp(-(u * x[i])^2 / 2)
cdf[i] <- mean(g) / sqrt(2 * pi) + 0.5
}
11/54
Now the estimates θ^ for ten values of x are stored in the vector cdf. Compare
the estimates with the value ϕ(x) computed (numerically) by the pnorm
function.
Phi <- pnorm(x)

print(round(rbind(x, cdf, Phi), 3))
## [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
## x 0.10 0.367 0.633 0.900 1.167 1.433 1.700 1.967 2.233 2.500
## cdf 0.54 0.643 0.737 0.816 0.879 0.925 0.957 0.978 0.991 0.998
## Phi 0.54 0.643 0.737 0.816 0.878 0.924 0.955 0.975 0.987 0.994
12/54
Results for several values x > 0 are shown compared with the value of the
normal cdf function pnorm. The Monte Carlo estimates appear to be very
close to the pnorm values. (The estimates will be worse in the extreme upper
tail of the distribution.)
13/54
Exercise 2
Compute a Monte Carlo estimate of the standard normal cdf, by generating
from the Uniform(0,x) distribution. Compare your estimates with the normal
cdf function pnorm. Compute an estimate of the variance of your Monte
Carlo estimate of ϕ(2), and a 95% confidence interval for ϕ(2).
14/54
Let I (⋅) be the indicator function, and Z ∼ N (0, 1). Then for any constant
x we have E[I (Z ≤ x)] = P (Z ≤ x) = ϕ(x) , the standard normal cdf
evaluated at x.
Generate a random sample z1 , . . . , zm from the standard normal

distribution. Then the sample mean
m
1
^
ϕ(x) = ∑ I (zi ≤ x)
m
i=1
converges with probability one to its expected value

E[I (Z ≤ x)] = P (Z ≤ x) = ϕ(x) .
15/54
x <- seq(.1, 2.5, length = 10)

m <- 10000
z <- rnorm(m)
dim(x) <- length(x)
p <- apply(x, MARGIN = 1,
FUN = function(x, z) {mean(z < x)}, z = z)
16/54
Now the estimates in p for the sequence of x values can be compared to the
result of the R normal cdf function pnorm.
Phi <- pnorm(x)

print(round(rbind(x, p, Phi), 3))
## [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
## x 0.100 0.367 0.633 0.900 1.167 1.433 1.700 1.967 2.233 2.500
## p 0.536 0.648 0.741 0.821 0.880 0.923 0.954 0.975 0.986 0.993
## Phi 0.540 0.643 0.737 0.816 0.878 0.924 0.955 0.975 0.987 0.994
It appears that we have better agreement with pnorm in the upper tail, but
worse agreement near the center.
17/54
Variance
The variance of θ^ can be estimated by
^2
σ 1
¯ 2
= ∑[g(xi ) − g(x)]
2
m m
Exercise:
Estimate the variance of the estimator and construct approximate 95%

confidence intervals for the estimate of ϕ(2) and ϕ(2.5) .
18/54
Error bounds for MC integration
x <- 2
m <- 10000
z <- rnorm(m)
g <- (z < x)
#the indicator function
v <- mean((g - mean(g))^2) / m
cdf <- mean(g)
c(cdf, v)
## [1] 9.774000e-01 2.208924e-06
c(cdf - 1.96 * sqrt(v), cdf + 1.96 * sqrt(v))
## [1] 0.974487 0.980313
19/54
Error bounds for MC integration
The probability P (I (Z < x) = 1) is ϕ(2) ≈ 0.977 .
Here g(X) has the distribution of the sample proportion of 1’s in m = 10000
Bernoulli trials with p ≈ 0.977, and the variance of g(X) is therefore
(0.977)(1 − 0.977)/10000 = 2.223e
−06
.
The MC estimate 2.228e-06 of variance is quite close to this value.
20/54
For x = 2.5 the output is
x <- 2.5
m <- 10000
z <- rnorm(m)
g <- (z < x)
#the indicator function
v <- mean((g - mean(g))^2) / m
cdf <- mean(g)
c(cdf, v)
## [1] 9.92900e-01 7.04959e-07
c(cdf - 1.96 * sqrt(v), cdf + 1.96 * sqrt(v))
## [1] 0.9912543 0.9945457
21/54
For x = 2.5 the output is
The probability P (I (Z < x) = 1) is ϕ(2.5) ≈ 0.995 .
The Monte Carlo estimate 5.272e-07 of variance is approximately equal to

the theoretical value (0.995)(1 − 0.995)/10000 = 4.975e-07.
22/54
Variance Reduction
We have seen that Monte Carlo integration can be applied to estimate
functions of the type E[g(X)].
In this section we consider several approaches to reducing the variance in the

sample mean estimator of θ = E[g(X)].
23/54
Variance Reduction
The variance of the Monte Carlo estimator is
V ar f g(X)
^) = V ar(Y
V ar(θ ¯) =
m
· Increasing the number of replicates m clearly reduces the variance of the

Monte Carlo estimator.
· However, a large increase in m is needed to get even a small improvement
in standard error.
· To reduce the standard error from 0.01 to 0.0001, we would need
approximately 10000 times the number of replicates.
24/54
Variance Reduction
Thus, although variance can always be reduced by increasing the number of
Monte Carlo replicates, the computational cost is high. Other methods for
reducing the variance can be applied that are less computationally expensive
than simply increasing the number of replicates:
· Antithetic variables
· Control variates
· Importance sampling
· Stratified sampling
25/54
Antithetic variates
By using antithetic variates, the negative correlation between the pairs helps
cancel out some of the random fluctuations, leading to a reduction in
variance. This reduction in variance allows for more accurate estimation
using the same number of samples or fewer samples to achieve a desired
level of precision.
26/54
Antithetic Variates
· When U1 , U2 are independent,
.
1
V ar[(U1 + U2 )/2] = [V ar(U1 ) + V ar(U2 )]
4
· But if U1 , U2 are correlated,

.
1
V ar[(U1 + U2 )/2] = [V ar(U1 ) + V ar(U2 ) + 2cov(U1 , U2 )]
4
· Instead of m replications of (u1 , u2 , . . . um ), we have pairs that are

negatively correlated (u1 , 1 − u1 ), (u3 , 1 − u3 ), . . . , (um , 1 − um ) so
we have m/2 replications with cor(ui , ui−1 ) = −ve.
· By exploiting the negative correlation, the variance of the estimator can be
reduced.
27/54
Antithetic Variates
1. Generate a random sample from the underlying distribution of interest.
2. Create pairs of random variables, where each pair consists of a variable

and its “antithetic” counterpart. The antithetic variable is generated by
taking the negation or complement of the original variable.
3. Use the pairs of variables (original and antithetic) to calculate the

estimator for the quantity of interest by combining the original and
antithetic variables.
4. Calculate the average of the estimators obtained from the original and
antithetic pairs.
28/54
Antithetic Variates
Consider the standard normal cdf
x
1 2
−t /2
ϕ(x) = ∫ −− e dt
−∞ √2π
The Monte Carlo estimation of the integral ϕ(x) is implemented in the

function MC.Phi. Optionally MC.Phi will compute the estimate with or without
antithetic sampling.
29/54
Antithetic Variates
MC.Phi <- function(x, R = 10000, antithetic = TRUE) {

u <- runif(R/2)
if (!antithetic) v <- runif(R/2) else
v <- 1 - u
u <- c(u, v)
cdf <- numeric(length(x))
for (i in 1:length(x)) {
g <- x[i] * exp(-(u * x[i])^2 / 2)
cdf[i] <- mean(g) / sqrt(2 * pi) + 0.5
}
cdf
}
30/54
Antithetic Variates
x <- seq(.1, 2.5, length=5)

Phi <- pnorm(x)
set.seed(123)
MC1 <- MC.Phi(x, anti = FALSE)
set.seed(123)
MC2 <- MC.Phi(x)
print(round(rbind(x, MC1, MC2, Phi), 5))
## [,1] [,2] [,3] [,4] [,5]

## x 0.10000 0.70000 1.30000 1.90000 2.50000
## MC1 0.53983 0.75825 0.90418 0.97311 0.99594
## MC2 0.53983 0.75805 0.90325 0.97132 0.99370
## Phi 0.53983 0.75804 0.90320 0.97128 0.99379
31/54
Antithetic Variates
The approximate reduction in variance can be estimated for given x by a
simulation under both methods, the simple Monte Carlo integration
approach and the antithetic variable approach.
32/54
Antithetic Variates
m <- 1000
MC1 <- MC2 <- numeric(m)
x <- 1.95
for (i in 1:m) {
MC1[i] <- MC.Phi(x, R = 1000, anti = FALSE)
MC2[i] <- MC.Phi(x, R = 1000)
}
33/54
Antithetic Variates
print(sd(MC1))
## [1] 0.006874616
print(sd(MC2))
## [1] 0.0004392972
print((var(MC1) - var(MC2))/var(MC1))
## [1] 0.9959166
The antithetic variable approach achieved approximately 99.5% reduction in

variance at x = 1.95.
34/54
Control Variates
· Involves introducing a known and correlated random variable, called a
control variate, to reduce the variance of the estimator.
· By choosing an appropriate control variate, which has a similar behavior to
the variable being estimated, the variance of the estimator can be
significantly reduced.
· The idea is to estimate the difference between the variable of interest and
the control variate, as this difference has lower variance than estimating
the variable directly.
35/54
Control Variates
1. Define the variable of interest.
2. Identify a suitable control variate (look for a correlated variable that is

known or can be easily estimated).
3. Simulate both the variable of interest and the control variate.
4. Calculate the control variate estimator.
5. Determine the control variate coefficient.
6. Combine the estimator for the variable of interest and the estimator for
the control variate using the control variate coefficient.
7. Use the control variate estimator to obtain the final estimate of the
variable of interest.
36/54
Control Variates
Apply the control variate approach to compute:
1
U u
θ = E[e ] = ∫ e du
0
Note that θ = e − 1 = 1.718282 by integration.
U 2U 2
V ar(g(U )) = V ar(e ) = E[e ] − θ
2
e − 1
2
= − (e − 1) = 0.2420351
2
37/54
Control Variates
A natural choice ofr control variate is U ∼ unif (0, 1) , then E[U ] = 1/2
and V ar(U ) = 1/12, so
U
cov(e , U ) = 1 − (1/2)(e − 1) = 0.1408591
Hence
U
−Cov(e , U)
∗
c = = −12 + 6(e − 1) = −1.690309
V ar(U )
38/54
Control Variates
The controlled estimator is θ^c∗ = e

U
− 1.690309(U − 0.5)
For m replicates:
U 2
∗
[Cov(e , U )]
^c ) = V ar(e
mV ar(θ
U
) −
V ar(U )
2
e − 1 e − 1
2
= − (e − 1) − 12(1 − )
2 2
2
= 0.2420356 − 12(0.1408591) = 0.003940175.
The percent reduction in variance using the control variate compared with
the simple Monte Carlo estimate is 100(1-0.003940175/0.2429355) =
98.3781%.
39/54
Control Variates
m <- 10000
a <- - 12 + 6 * (exp(1) - 1)
U <- runif(m)
T1 <- exp(U)
#simple MC
T2 <- exp(U) + a * (U - 1/2)
#controlled
40/54
Control Variates
mean(T1)
## [1] 1.711419
mean(T2)
## [1] 1.717676
(var(T1) - var(T2)) / var(T1)
## [1] 0.9837252
The percent reduction in variance using the control variate compared with
the simple Monte Carlo estimate is 98.3781%.
41/54
Importance sampling
The average value of a function g(x) over an interval (a, b) is usually defined
(in calculus) by
1
g(x)dx
b − a
Here a uniform weight function is applied over the entire interval (a, b).
42/54
Importance sampling
If X is a random variable uniformly distributed on (a, b), then
a a
1 1
E[g(X)] = ∫ g(x) dx = ∫ g(x)dx
b
b − a b − a b
which is simply the average value of the function g(x) over the interval (a, b)
with respect to a uniform weight function. The simple Monte Carlo method
generates a large number of replicates X1, . . . , Xm uniformly distributed
a
on [a, b] and estimates ∫b g(x)dx by the sample mean
m
b − a
∑ g(Xi )
m
i=1
a
which converges to ∫b g(x)dx with probability 1 by the strong law of large
numbers.
43/54
Importance sampling
· One limitation of this method is that it does not apply to unbounded
intervals.
· Another drawback is that it can be inefficient to draw samples uniformly
across the interval if the function g(x) is not very uniform.
· However, once we view the integration problem as an expected value
problem, it seems reasonable to consider other weight functions (other
densities) than uniform.
· This leads us to a general method called importance sampling.
44/54
Importance sampling
1 −x
e
∫ dx
2
0
1 + x
The candidates for the importance functions are
f0 (x) = 1, 0 < x < 1
−x
f1 (x) = e ,0 < x < ∞
2 −1
f2 (x) = (1 + x ) /π, −∞ < x < ∞
−x −1
f3 (x) = e /(1 − e ), 0 < x < 1
2 −1
f4 (x) = 4(1 + x ) /π, 0 < x < 1
45/54
Importance sampling
The integrant is
−x 2
g(x) = e /(1 + x )if (0 < x < 1)
g(x) = 0otherwise
46/54
Importance sampling
m <- 10000
theta.hat <- se <- numeric(5)
g <- function(x) {
exp(-x - log(1+x^2)) * (x > 0) * (x < 1)
}
x <- runif(m) #using f0

fg <- g(x)
theta.hat[1] <- mean(fg)
se[1] <- sd(fg)
x <- rexp(m, 1) #using f1

fg <- g(x) / exp(-x)
se[2] <- sd(fg)
47/54
Importance sampling
x <- rcauchy(m) #using f2

i <- c(which(x > 1), which(x < 0))
x[i] <- 2 #to catch overflow errors in g(x)
fg <- g(x) / dcauchy(x)
se[3] <- sd(fg)
u <- runif(m) #f3, inverse transform method

x <- - log(1 - u * (1 - exp(-1)))
fg <- g(x) / (exp(-x) / (1 - exp(-1)))
se[4] <- sd(fg)
48/54
Importance sampling
u <- runif(m) #f4, inverse transform method

x <- tan(pi * u / 4)
fg <- g(x) / (4 / ((1 + x^2) * pi))
se[5] <- sd(fg)
rbind(theta.hat, se)
## [,1] [,2] [,3] [,4] [,5]

## theta.hat 0.5270117 0.5262871 0.5330293 0.52550028 0.5253863
## se 0.2450074 0.4188069 0.9605489 0.09657368 0.1418119
49/54
Stratified sampling
· Another approach to variance reduction is stratified sampling, which aims
to reduce the variance of the estimator by dividing the interval into strata
and estimating the integral on each of the stratum with smaller variance.
· Linearity of the integral operator and the strong law of large numbers
imply that the sum of these estimates converges to ∫ g(x)dx with
probability 1.
· In stratified sampling, the number of replicates m and number of
replicates mj to be drawn from each of k strata are fixed so that m = m1 +
· · · + mk, with the goal that
^)
^ (m , . . . , m )) < V ar(θ
V ar(θ k 1 k
50/54
Stratified sampling
1
Variance reduction for ∫0 e
−x 2
(1 + x )
−1
dx
M <- 10000
#number of replicates
k <- 10
#number of strata
r <- M / k
#replicates per stratum
N <- 50
#number of times to repeat the estimation
51/54
Stratified sampling
T2 <- numeric(k)
estimates <- matrix(0, N, 2)
g <- function(x) {
exp(-x - log(1+x^2)) * (x > 0) * (x < 1)
}
for (i in 1:N) {
estimates[i, 1] <- mean(g(runif(M)))
for (j in 1:k)
T2[j] <- mean(g(runif(M/k, (j-1)/k, j/k)))
estimates[i, 2] <- mean(T2)
}
52/54
Stratified sampling
apply(estimates, 2, mean)
## [1] 0.5245936 0.5248098
apply(estimates, 2, var)
## [1] 5.778212e-06 5.739439e-08
This represents a more than 98% reduction in variance.
53/54
End
THE END
54/54

Week 7 Lecture

Uploaded by

Copyright:

Available Formats

Week 7 Lecture

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Week 7 Lecture

Uploaded by

Copyright:

Available Formats

AA037014 Statistical Computing

7: Monte Carlo Integration & Variance Reduction

E[g(X)] = ∫ g(x)f (x)dx

If a random sample is available from the distribution of X , an unbiased

and compare the estimate with the exact value

The estimate is θ^ = 0.6355 and θ = 1 − e

The estimate is θ^ = 0.1172 and θ = e

x <- seq(.1, 2.5, length = 10)

Phi <- pnorm(x)

Generate a random sample z1 , . . . , zm from the standard normal

converges with probability one to its expected value

x <- seq(.1, 2.5, length = 10)

Phi <- pnorm(x)

The variance of θ^ can be estimated by

Estimate the variance of the estimator and construct approximate 95%

## [1] 9.774000e-01 2.208924e-06

c(cdf - 1.96 * sqrt(v), cdf + 1.96 * sqrt(v))

## [1] 0.974487 0.980313

The MC estimate 2.228e-06 of variance is quite close to this value.

## [1] 9.92900e-01 7.04959e-07

c(cdf - 1.96 * sqrt(v), cdf + 1.96 * sqrt(v))

## [1] 0.9912543 0.9945457

The Monte Carlo estimate 5.272e-07 of variance is approximately equal to

In this section we consider several approaches to reducing the variance in the

· Increasing the number of replicates m clearly reduces the variance of the

· But if U1 , U2 are correlated,

· Instead of m replications of (u1 , u2 , . . . um ), we have pairs that are

2. Create pairs of random variables, where each pair consists of a variable

3. Use the pairs of variables (original and antithetic) to calculate the

The Monte Carlo estimation of the integral ϕ(x) is implemented in the

MC.Phi <- function(x, R = 10000, antithetic = TRUE) {

x <- seq(.1, 2.5, length=5)

## [,1] [,2] [,3] [,4] [,5]

The antithetic variable approach achieved approximately 99.5% reduction in

2. Identify a suitable control variate (look for a correlated variable that is

3. Simulate both the variable of interest and the control variate.

4. Calculate the control variate estimator.

5. Determine the control variate coefficient.

Note that θ = e − 1 = 1.718282 by integration.

and V ar(U ) = 1/12, so

The controlled estimator is θ^c∗ = e

(var(T1) - var(T2)) / var(T1)

The candidates for the importance functions are

f0 (x) = 1, 0 < x < 1

x <- runif(m) #using f0

x <- rexp(m, 1) #using f1

x <- rcauchy(m) #using f2

u <- runif(m) #f3, inverse transform method

u <- runif(m) #f4, inverse transform method

## [,1] [,2] [,3] [,4] [,5]

## [1] 0.5245936 0.5248098

## [1] 5.778212e-06 5.739439e-08

This represents a more than 98% reduction in variance.

You might also like