Week 7 Lecture

Download as pdf or txt
Download as pdf or txt
You are on page 1of 54

AA037014 Statistical Computing

7: Monte Carlo Integration & Variance Reduction


Dr. Diana Abdul Wahab
Monte Carlo Integration
· Monte Carlo integration is a numerical method used to estimate the value
of definite integrals, especially in cases where analytical solutions are
difficult or impossible to obtain.
· It is based on the principles of probability and random sampling.
· The basic idea: simulate a large number of random points within a specific
region, and then use these points to estimate the integral.
· The region of integration is defined by the limits of the integral and is often
referred to as the integration domain.

2/54
Monte Carlo Integration
b
Let g(x) be a function and suppose that we want to compute ∫a g(x)

(assuming that this integral exists). Recall that if X is a random variable with
density f (x), then the mathematical expectation of the random variable
Y = g(X) is

E[g(X)] = ∫ g(x)f (x)dx


−∞

If a random sample is available from the distribution of X , an unbiased


estimator of E[g(X)] is the sample mean.

3/54
Simple Monte Carlo estimator
Compute a Monte Carlo estimate of
1

−x
θ = ∫ e dx
0

and compare the estimate with the exact value

4/54
Simple Monte Carlo estimator

m <- 10000
x <- runif(m)
theta.hat <- mean(exp(-x))
print(theta.hat)

## [1] 0.6339235

print(1 - exp(-1))

## [1] 0.6321206

The estimate is θ^ = 0.6355 and θ = 1 − e


−1
= 0.6321 .

5/54
Simple Monte Carlo estimator
Compute a Monte Carlo estimate of
4
−x
θ = ∫ e dx
2

and compare the estimate with the exact value of the integral.

6/54
Simple Monte Carlo estimator

m <- 10000
x <- runif(m, min=2, max=4)
theta.hat <- mean(exp(-x)) * 2
print(theta.hat)

## [1] 0.1169499

print(exp(-2) - exp(-4))

## [1] 0.1170196

The estimate is θ^ = 0.1172 and θ = e


−2
− e
−4
.
= 0.1170

7/54
Exercise 1
Compute a Monte Carlo estimate of

π/3

∫ sin tdt
0

and compare your estimate with the exact value of the integral.

8/54
Simple Monte Carlo estimator
Use the Monte Carlo approach to estimate the standard normal cdf
x
1 2
−t /2
ϕ(x) = ∫ −− e dt
−∞ √2π

First, notice that we cannot apply the algorithm above directly because the
limits of integration cover an unbounded interval. However, we can break
this problem into two cases: x ≥ 0 and x < 0, and use the symmetry of the
normal density to handle the second case.

9/54
Simple Monte Carlo estimator
x
Next, estimate θ for x . Let y , ,
2
−t /2
= ∫ e dt > 0 = t/x dt = xdy
0

1
2
−(xy) /2
θ = ∫ xe dy
0

1 2
^ = g (u) = −(ui x) /2
θ m̄ ∑ xe
m

−−
If , ^
x > 0 ϕ(x) = 0.5 + θ / √ 2π . For x ,
< 0 ϕ(x) = 1 − ϕ(−x) .

10/54
Simple Monte Carlo estimator

x <- seq(.1, 2.5, length = 10)


m <- 10000
u <- runif(m)
cdf <- numeric(length(x))
for (i in 1:length(x)) {
g <- x[i] * exp(-(u * x[i])^2 / 2)
cdf[i] <- mean(g) / sqrt(2 * pi) + 0.5
}

11/54
Simple Monte Carlo estimator

Now the estimates θ^ for ten values of x are stored in the vector cdf. Compare
the estimates with the value ϕ(x) computed (numerically) by the pnorm
function.

Phi <- pnorm(x)


print(round(rbind(x, cdf, Phi), 3))

## [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
## x 0.10 0.367 0.633 0.900 1.167 1.433 1.700 1.967 2.233 2.500
## cdf 0.54 0.643 0.737 0.816 0.879 0.925 0.957 0.978 0.991 0.998
## Phi 0.54 0.643 0.737 0.816 0.878 0.924 0.955 0.975 0.987 0.994

12/54
Simple Monte Carlo estimator
Results for several values x > 0 are shown compared with the value of the
normal cdf function pnorm. The Monte Carlo estimates appear to be very
close to the pnorm values. (The estimates will be worse in the extreme upper
tail of the distribution.)

13/54
Exercise 2
Compute a Monte Carlo estimate of the standard normal cdf, by generating
from the Uniform(0,x) distribution. Compare your estimates with the normal
cdf function pnorm. Compute an estimate of the variance of your Monte
Carlo estimate of ϕ(2), and a 95% confidence interval for ϕ(2).

14/54
Simple Monte Carlo estimator
Let I (⋅) be the indicator function, and Z ∼ N (0, 1). Then for any constant
x we have E[I (Z ≤ x)] = P (Z ≤ x) = ϕ(x) , the standard normal cdf

evaluated at x.

Generate a random sample z1 , . . . , zm from the standard normal


distribution. Then the sample mean
m
1
^
ϕ(x) = ∑ I (zi ≤ x)
m
i=1

converges with probability one to its expected value


E[I (Z ≤ x)] = P (Z ≤ x) = ϕ(x) .

15/54
Simple Monte Carlo estimator

x <- seq(.1, 2.5, length = 10)


m <- 10000
z <- rnorm(m)
dim(x) <- length(x)
p <- apply(x, MARGIN = 1,
FUN = function(x, z) {mean(z < x)}, z = z)

16/54
Simple Monte Carlo estimator
Now the estimates in p for the sequence of x values can be compared to the
result of the R normal cdf function pnorm.

Phi <- pnorm(x)


print(round(rbind(x, p, Phi), 3))

## [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
## x 0.100 0.367 0.633 0.900 1.167 1.433 1.700 1.967 2.233 2.500
## p 0.536 0.648 0.741 0.821 0.880 0.923 0.954 0.975 0.986 0.993
## Phi 0.540 0.643 0.737 0.816 0.878 0.924 0.955 0.975 0.987 0.994

It appears that we have better agreement with pnorm in the upper tail, but
worse agreement near the center.

17/54
Variance

The variance of θ^ can be estimated by

^2
σ 1
¯ 2
= ∑[g(xi ) − g(x)]
2
m m

Exercise:

Estimate the variance of the estimator and construct approximate 95%


confidence intervals for the estimate of ϕ(2) and ϕ(2.5) .

18/54
Error bounds for MC integration

x <- 2
m <- 10000
z <- rnorm(m)
g <- (z < x)
#the indicator function
v <- mean((g - mean(g))^2) / m
cdf <- mean(g)
c(cdf, v)

## [1] 9.774000e-01 2.208924e-06

c(cdf - 1.96 * sqrt(v), cdf + 1.96 * sqrt(v))

## [1] 0.974487 0.980313

19/54
Error bounds for MC integration
The probability P (I (Z < x) = 1) is ϕ(2) ≈ 0.977 .

Here g(X) has the distribution of the sample proportion of 1’s in m = 10000
Bernoulli trials with p ≈ 0.977, and the variance of g(X) is therefore
(0.977)(1 − 0.977)/10000 = 2.223e
−06
.

The MC estimate 2.228e-06 of variance is quite close to this value.

20/54
For x = 2.5 the output is

x <- 2.5
m <- 10000
z <- rnorm(m)
g <- (z < x)
#the indicator function
v <- mean((g - mean(g))^2) / m
cdf <- mean(g)
c(cdf, v)

## [1] 9.92900e-01 7.04959e-07

c(cdf - 1.96 * sqrt(v), cdf + 1.96 * sqrt(v))

## [1] 0.9912543 0.9945457

21/54
For x = 2.5 the output is
The probability P (I (Z < x) = 1) is ϕ(2.5) ≈ 0.995 .

The Monte Carlo estimate 5.272e-07 of variance is approximately equal to


the theoretical value (0.995)(1 − 0.995)/10000 = 4.975e-07.

22/54
Variance Reduction
We have seen that Monte Carlo integration can be applied to estimate
functions of the type E[g(X)].

In this section we consider several approaches to reducing the variance in the


sample mean estimator of θ = E[g(X)].

23/54
Variance Reduction
The variance of the Monte Carlo estimator is

V ar f g(X)
^) = V ar(Y
V ar(θ ¯) =
m

· Increasing the number of replicates m clearly reduces the variance of the


Monte Carlo estimator.
· However, a large increase in m is needed to get even a small improvement
in standard error.
· To reduce the standard error from 0.01 to 0.0001, we would need
approximately 10000 times the number of replicates.

24/54
Variance Reduction
Thus, although variance can always be reduced by increasing the number of
Monte Carlo replicates, the computational cost is high. Other methods for
reducing the variance can be applied that are less computationally expensive
than simply increasing the number of replicates:

· Antithetic variables
· Control variates
· Importance sampling
· Stratified sampling

25/54
Antithetic variates
By using antithetic variates, the negative correlation between the pairs helps
cancel out some of the random fluctuations, leading to a reduction in
variance. This reduction in variance allows for more accurate estimation
using the same number of samples or fewer samples to achieve a desired
level of precision.

26/54
Antithetic Variates
· When U1 , U2 are independent,
.
1
V ar[(U1 + U2 )/2] = [V ar(U1 ) + V ar(U2 )]
4

· But if U1 , U2 are correlated,


.
1
V ar[(U1 + U2 )/2] = [V ar(U1 ) + V ar(U2 ) + 2cov(U1 , U2 )]
4

· Instead of m replications of (u1 , u2 , . . . um ), we have pairs that are


negatively correlated (u1 , 1 − u1 ), (u3 , 1 − u3 ), . . . , (um , 1 − um ) so
we have m/2 replications with cor(ui , ui−1 ) = −ve.
· By exploiting the negative correlation, the variance of the estimator can be
reduced.

27/54
Antithetic Variates
1. Generate a random sample from the underlying distribution of interest.

2. Create pairs of random variables, where each pair consists of a variable


and its “antithetic” counterpart. The antithetic variable is generated by
taking the negation or complement of the original variable.

3. Use the pairs of variables (original and antithetic) to calculate the


estimator for the quantity of interest by combining the original and
antithetic variables.

4. Calculate the average of the estimators obtained from the original and
antithetic pairs.

28/54
Antithetic Variates
Consider the standard normal cdf
x
1 2
−t /2
ϕ(x) = ∫ −− e dt
−∞ √2π

The Monte Carlo estimation of the integral ϕ(x) is implemented in the


function MC.Phi. Optionally MC.Phi will compute the estimate with or without
antithetic sampling.

29/54
Antithetic Variates

MC.Phi <- function(x, R = 10000, antithetic = TRUE) {


u <- runif(R/2)
if (!antithetic) v <- runif(R/2) else
v <- 1 - u
u <- c(u, v)
cdf <- numeric(length(x))
for (i in 1:length(x)) {
g <- x[i] * exp(-(u * x[i])^2 / 2)
cdf[i] <- mean(g) / sqrt(2 * pi) + 0.5
}
cdf
}

30/54
Antithetic Variates

x <- seq(.1, 2.5, length=5)


Phi <- pnorm(x)
set.seed(123)
MC1 <- MC.Phi(x, anti = FALSE)
set.seed(123)
MC2 <- MC.Phi(x)
print(round(rbind(x, MC1, MC2, Phi), 5))

## [,1] [,2] [,3] [,4] [,5]


## x 0.10000 0.70000 1.30000 1.90000 2.50000
## MC1 0.53983 0.75825 0.90418 0.97311 0.99594
## MC2 0.53983 0.75805 0.90325 0.97132 0.99370
## Phi 0.53983 0.75804 0.90320 0.97128 0.99379

31/54
Antithetic Variates
The approximate reduction in variance can be estimated for given x by a
simulation under both methods, the simple Monte Carlo integration
approach and the antithetic variable approach.

32/54
Antithetic Variates

m <- 1000
MC1 <- MC2 <- numeric(m)
x <- 1.95
for (i in 1:m) {
MC1[i] <- MC.Phi(x, R = 1000, anti = FALSE)
MC2[i] <- MC.Phi(x, R = 1000)
}

33/54
Antithetic Variates

print(sd(MC1))

## [1] 0.006874616

print(sd(MC2))

## [1] 0.0004392972

print((var(MC1) - var(MC2))/var(MC1))

## [1] 0.9959166

The antithetic variable approach achieved approximately 99.5% reduction in


variance at x = 1.95.

34/54
Control Variates
· Involves introducing a known and correlated random variable, called a
control variate, to reduce the variance of the estimator.
· By choosing an appropriate control variate, which has a similar behavior to
the variable being estimated, the variance of the estimator can be
significantly reduced.
· The idea is to estimate the difference between the variable of interest and
the control variate, as this difference has lower variance than estimating
the variable directly.

35/54
Control Variates
1. Define the variable of interest.

2. Identify a suitable control variate (look for a correlated variable that is


known or can be easily estimated).

3. Simulate both the variable of interest and the control variate.

4. Calculate the control variate estimator.

5. Determine the control variate coefficient.

6. Combine the estimator for the variable of interest and the estimator for
the control variate using the control variate coefficient.

7. Use the control variate estimator to obtain the final estimate of the
variable of interest.

36/54
Control Variates
Apply the control variate approach to compute:
1

U u
θ = E[e ] = ∫ e du
0

Note that θ = e − 1 = 1.718282 by integration.

U 2U 2
V ar(g(U )) = V ar(e ) = E[e ] − θ

2
e − 1
2
= − (e − 1) = 0.2420351
2

37/54
Control Variates
A natural choice ofr control variate is U ∼ unif (0, 1) , then E[U ] = 1/2

and V ar(U ) = 1/12, so

U
cov(e , U ) = 1 − (1/2)(e − 1) = 0.1408591

Hence
U
−Cov(e , U)

c = = −12 + 6(e − 1) = −1.690309
V ar(U )

38/54
Control Variates

The controlled estimator is θ^c∗ = e


U
− 1.690309(U − 0.5)

For m replicates:
U 2

[Cov(e , U )]
^c ) = V ar(e
mV ar(θ
U
) −
V ar(U )

2
e − 1 e − 1
2
= − (e − 1) − 12(1 − )
2 2

2
= 0.2420356 − 12(0.1408591) = 0.003940175.

The percent reduction in variance using the control variate compared with
the simple Monte Carlo estimate is 100(1-0.003940175/0.2429355) =
98.3781%.

39/54
Control Variates

m <- 10000
a <- - 12 + 6 * (exp(1) - 1)
U <- runif(m)
T1 <- exp(U)
#simple MC
T2 <- exp(U) + a * (U - 1/2)
#controlled

40/54
Control Variates

mean(T1)

## [1] 1.711419

mean(T2)

## [1] 1.717676

(var(T1) - var(T2)) / var(T1)

## [1] 0.9837252

The percent reduction in variance using the control variate compared with
the simple Monte Carlo estimate is 98.3781%.

41/54
Importance sampling
The average value of a function g(x) over an interval (a, b) is usually defined
(in calculus) by

1
g(x)dx
b − a

Here a uniform weight function is applied over the entire interval (a, b).

42/54
Importance sampling
If X is a random variable uniformly distributed on (a, b), then
a a
1 1
E[g(X)] = ∫ g(x) dx = ∫ g(x)dx
b
b − a b − a b

which is simply the average value of the function g(x) over the interval (a, b)
with respect to a uniform weight function. The simple Monte Carlo method
generates a large number of replicates X1, . . . , Xm uniformly distributed
a
on [a, b] and estimates ∫b g(x)dx by the sample mean
m
b − a
∑ g(Xi )
m
i=1

a
which converges to ∫b g(x)dx with probability 1 by the strong law of large
numbers.

43/54
Importance sampling
· One limitation of this method is that it does not apply to unbounded
intervals.
· Another drawback is that it can be inefficient to draw samples uniformly
across the interval if the function g(x) is not very uniform.
· However, once we view the integration problem as an expected value
problem, it seems reasonable to consider other weight functions (other
densities) than uniform.
· This leads us to a general method called importance sampling.

44/54
Importance sampling
1 −x
e
∫ dx
2
0
1 + x

The candidates for the importance functions are

f0 (x) = 1, 0 < x < 1

−x
f1 (x) = e ,0 < x < ∞

2 −1
f2 (x) = (1 + x ) /π, −∞ < x < ∞

−x −1
f3 (x) = e /(1 − e ), 0 < x < 1

2 −1
f4 (x) = 4(1 + x ) /π, 0 < x < 1

45/54
Importance sampling
The integrant is
−x 2
g(x) = e /(1 + x )if (0 < x < 1)

g(x) = 0otherwise

46/54
Importance sampling

m <- 10000
theta.hat <- se <- numeric(5)
g <- function(x) {
exp(-x - log(1+x^2)) * (x > 0) * (x < 1)
}

x <- runif(m) #using f0


fg <- g(x)
theta.hat[1] <- mean(fg)
se[1] <- sd(fg)

x <- rexp(m, 1) #using f1


fg <- g(x) / exp(-x)
theta.hat[2] <- mean(fg)
se[2] <- sd(fg)

47/54
Importance sampling

x <- rcauchy(m) #using f2


i <- c(which(x > 1), which(x < 0))
x[i] <- 2 #to catch overflow errors in g(x)
fg <- g(x) / dcauchy(x)
theta.hat[3] <- mean(fg)
se[3] <- sd(fg)

u <- runif(m) #f3, inverse transform method


x <- - log(1 - u * (1 - exp(-1)))
fg <- g(x) / (exp(-x) / (1 - exp(-1)))
theta.hat[4] <- mean(fg)
se[4] <- sd(fg)

48/54
Importance sampling

u <- runif(m) #f4, inverse transform method


x <- tan(pi * u / 4)
fg <- g(x) / (4 / ((1 + x^2) * pi))
theta.hat[5] <- mean(fg)
se[5] <- sd(fg)

rbind(theta.hat, se)

## [,1] [,2] [,3] [,4] [,5]


## theta.hat 0.5270117 0.5262871 0.5330293 0.52550028 0.5253863
## se 0.2450074 0.4188069 0.9605489 0.09657368 0.1418119

49/54
Stratified sampling
· Another approach to variance reduction is stratified sampling, which aims
to reduce the variance of the estimator by dividing the interval into strata
and estimating the integral on each of the stratum with smaller variance.
· Linearity of the integral operator and the strong law of large numbers
imply that the sum of these estimates converges to ∫ g(x)dx with
probability 1.
· In stratified sampling, the number of replicates m and number of
replicates mj to be drawn from each of k strata are fixed so that m = m1 +
· · · + mk, with the goal that

^)
^ (m , . . . , m )) < V ar(θ
V ar(θ k 1 k

50/54
Stratified sampling
1
Variance reduction for ∫0 e
−x 2
(1 + x )
−1
dx

M <- 10000
#number of replicates
k <- 10
#number of strata
r <- M / k
#replicates per stratum
N <- 50
#number of times to repeat the estimation

51/54
Stratified sampling

T2 <- numeric(k)
estimates <- matrix(0, N, 2)
g <- function(x) {
exp(-x - log(1+x^2)) * (x > 0) * (x < 1)
}
for (i in 1:N) {
estimates[i, 1] <- mean(g(runif(M)))
for (j in 1:k)
T2[j] <- mean(g(runif(M/k, (j-1)/k, j/k)))
estimates[i, 2] <- mean(T2)
}

52/54
Stratified sampling

apply(estimates, 2, mean)

## [1] 0.5245936 0.5248098

apply(estimates, 2, var)

## [1] 5.778212e-06 5.739439e-08

This represents a more than 98% reduction in variance.

53/54
End
THE END

54/54

You might also like