Odf-SC1 05 MonteCarlo
Odf-SC1 05 MonteCarlo
Odf-SC1 05 MonteCarlo
we can use crude Monte Carlo to estimate Eθ [g(x) − g ∗(x)]. That is,
n
1X
µ̂ = {g(xi) − g ∗(xi)} + τ
n i=1
n n
1X 1X ∗
= {g(xi)} − {g (xi)} + τ
n i=1 n i=1
with
1
Var[µ̂] = Varθ [g(X) − g ∗(X)]
n
1
= {Varθ [g(X)] + Varθ [g ∗(X)] − 2Covθ [g(X), g ∗(X)]}.
n
.
If g ∗(X) mimics g(X), then Varθ [g(X)] = Varθ [g ∗(X)] and
. 1
Var[µ̂] = {2Varθ [g(X)] − 2Varθ [g(X)]Corrθ [g(X), g ∗(X)]}
n
1 1
< Varθ [g(X)] if Corrθ [g(X), g ∗(X)] > .
n 2
Thus, reduction in variability relative to crude MC if Corrθ [g(X), g ∗(X)] >
1
2.
as the estimate. That is, use x̄ as a control variate for estimating E[M ].
# Gamma(2, 4) distribution, with E[X]=a*b and Var[X]=a*b^2.
a <- 2
b <- 4
# true median
qgamma(0.5, a, scale=b)
## [1] 6.713
## [1] 8
## [1] 0.7612
1.1 Basics of Monte Carlo methods 5
# This estimate of mu, the true median, has lower variance than x.bar
mu.hat <- mean(M - x.bar) + tau
c(mu.hat, var(M - x.bar))
·
Note that 0 loge(0) ≡ 0. In large samples, both P and G2 ∼ χ2k−1 when
H0 is true. One way to study the closeness of χ2k−1 approximation is
through the moments: how close do the moments of P and G2 match
those of the χ2k−1 distribution? The moments of P are tractable, but the
moments of G2 are not. This suggests using P as a control variate for
estimating moments of G2. For example, suppose we wish to estimate
E[G2] = µ.
We know
E[P ] = E[χ2k−1] = k − 1.
where G2r and Pr are the values of G2 and P from the rth sample.
µ̂1 + µ̂2
µ̂AV =
2
2
Antithetic means “directly opposed or contrasted; mutually incompatible”.
8 Monte Carlo Methods
has
1
Var[µ̂AV] = {Var[µ̂1] + Var[µ̂2] + 2Cov[µ̂1, µ̂2]}
4
1 p
= {Var[µ̂1] + Var[µ̂2] + 2ρ Var[µ̂1]Var[µ̂2]}
4
1 2
= {σ + σ 2 + 2ρσ 2}
4n
1 2
= σ (1 + ρ)
2n
σ2
<
2n
where the last term is the variance of either µ̂1 or µ̂2 based on a sample of
size 2n. That is, averaging the two estimators based on the same sample
of size n (necessary to make estimators correlated) is better than doubling
the sample size using either estimator individually.
Put another way, two negatively correlated estimators can be combined
to provide a more precise estimator than either estimate individually, even
when the combined estimator is based on half the number of samples.
The AV method is often difficult to implement since you need to find
negatively correlated estimators. This can often be done in situations with
certain symmetry constraints.
# define h(x)
f.h <- function(x) {
h <- x / (2^x - 1)
return(h)
}
## [1] -0.9527
# estimate
combine.h.x <- (h.x + h.negx) / 2
mu.hat.AS <- mean(combine.h.x)
mu.hat.AS
## [1] 1.499
# sd of AV estimate
sd(combine.h.x)
## [1] 0.07764
sd(c(h.x, h.x2))
## [1] 0.5086
sd(c(h.negx, h.negx2))
## [1] 0.507
10 Monte Carlo Methods
with two estimates with equal variance, the estimate is always the
average, and
depending on Σ, could potentially reduce n and get same precision
as using individual estimator µ̂j .
Remarks
2
σIS
Var(µ̂IS) =
n
where
n
2 1 X
σIS = {g(xi)w(xi) − µ̂IS}2,
n − 1 i=1
More generally, we would consider estimating Pr[T (X) ∈ C] for some set
C. If we, for the moment, assume that t is fixed, then all we are doing is
estimating the probability
For crude MC, we generate n copies X1, X2, . . . , Xn from the same dis-
tribution as X, and compute
n
1X
p̂ = 1(T (Xi)≤t)
n i=1
number of {T (Xi) ≤ t}
=
n
= {sample proportion ≤ t}.
We know
1
Var[p̂] = Var[1(T (Xi)≤t)]
n
1
= p(1 − p)
n
which can be estimated via
1
Var[p̂] = p̂(1 − p̂)
n
or (a close approximation)
n
!
. 1 1 X
Var[p̂] = {1(T (Xi)≤t) − p̂}2 .
n n−1 i=1
q
which implies that p is within (approximately) 2 p̂(1−p̂)n of p̂ in 95% of
samples. That is, the error on p̂ as an estimate of p is within
r
p̂(1 − p̂)
MOE = 2
n
in 95% of samples. Since p(1 − p) ≤ 0.25,
r
0.25 1
MOE ≤ 2 = √ .
n n
If we pre-specify a desired MOE, then choosing
1 1
√ = MOE implies n =
n MOE2
1 2
gives the desired result. For a MOE of 0.01, we need n = 0.01 2 = 100 =
1 2
10000. For a MOE of 0.05, we need n = 0.05 2 = 20 = 400. In general,
decreasing the MOE by a factor of two requires quadrupling n.
.
Note that this is a worst case scenario. If you know p = 0.1, then
r
. 0.1 × 0.9 2(0.3) 0.6
MOE = 2 = √ = √
n n n
or
. 0.62 0.36
n = 2 = 2,
MOE MOE
which reduces the necessary sample size by a factor of approximately 3
relative to using p = 0.5.
pT = Pr[T (X) ≤ t]
pS = Pr[S(X) ≤ t]
This gives
pT (1 − pT )
Var[p̂T ] =
n
pS (1 − pS )
Var[p̂S ] =
n
and, since samples are independent
1
Var[p̂T − p̂S ] = {pT (1 − pT ) + pS (1 − pS )}.
n
This is a two independent proportions problem.
1.3 Using the same stream of random numbers 19
and
n n
1X 1X
p̂T − p̂S = {1(T (Xi)≤t) − 1(S(Xi)≤t)} = ∆t{T (Xi), S(Xi)},
n i=1 n i=1
where
Let the joint distribution of the indicators be given by the following 2-by-2
contingency table
20 Monte Carlo Methods
S
T S≤t S>t
T ≤ t pT S pT S̄ pT
T > t pT̄ S pT̄ S̄ pT̄
pS pS̄ 1
where pT S = Pr[T ≤ t, S ≤ t], pT S̄ = Pr[T ≤ t, S > t], etc. Then
that is,
E[p̂T − p̂S ] = pT − pS
and
Thus,
1
Var[p̂T − p̂S ] = Var[∆t{T (Xi), S(Xi)}]
n
p (1 − pT S̄ ) + pT̄ S (1 − pT̄ S ) + 2pT S̄ pT̄ S
= T S̄ .
n
Remarks
1.3 Using the same stream of random numbers 21
conditioning swindles,
Rao-Blackwellization, and
stratified sampling.