STAT-36700 Homework 4 - Solutions: Fall 2018 September 28, 2018

STAT-36700 Homework 4 - Solutions
Fall 2018
September 28, 2018
This contains solutions for Homework 4. Please note that we have

included several additional comments and approaches to the problems
to give you better insight.
Problem 1. Suppose that X1 , . . . , Xn ∼ Geom( p), i.e. the samples have

a geometric distribution with parameter p. A geometric distribution is the
distribution of the number of coin flips needed to see one head.
(a) Write down the likelihood as a function of the observed data X1 , . . . , Xn ,

and the unknown parameter p.
(b) Compute the MLE of p. In order to do this you need to find a zero of the
derivative of the likelihood, and also check that the second derivative of
the likelihood at the point is negative.
(c) Compute the method-of-moments estimator for p. Is this the same as the
MLE?
(d) Extra Credit: Are the estimators you have derived above unbiased? As a
hint: think about using Jensen’s inequality, and when Jensen’s inequality
is a strict inequality.
Solution 1. We derive each of the results below:

n
(a) We claim the L( p | x1 , x2 , . . . , xn ) = pn (1 − p)∑i=1 xi −n
Proof. We firstly note that X ∼ Geom( p) its PDF is given by

P( X = k) = (1 − p)k−1 p. Now we have the likelihood as:
n
L ( p | x1 , x2 , . . . , x n ) = ∏ P ( Xi = x i )
i =1
n
= ∏ (1 − p ) x i −1 p
i =1
n
= p n ( 1 − p ) ∑ i =1 x i − n
1
(b) We claim that p̂ MLE = Xn
Proof. From the previous part we have that the Likelihood of p is

given by:
n
L ( p | x 1 , x 2 , . . . , x n ) = p n ( 1 − p ) ∑ i =1 x i − n
stat-36700 homework 4 - solutions 2
. We can then derive the log-likelihood of p as follows:

n
L ( p | x 1 , x 2 , . . . , x n ) = p n ( 1 − p ) ∑ i =1 x i − n
!
n
=⇒ l ( p | x1 , x2 , . . . , xn ) = n log ( p) + ∑ xi − n log (1 − p) (we are now maximizing the log-likelihood)
i =1
Now we can differentiate the log-likelihood and set to 0 to find the

MLE as follows:
∂l n ∑n x − n
= − i =1 i =0 (setting partial derivative to 0)
∂p p 1− p
n
=⇒ n − np = p ∑ xi − np
i =1
n
=⇒ n = p ∑ xi
i =1
n
=⇒ p =
∑in=1 xi
1
=
xn
We then verify that this is a maximum using the second derivative
as follows:
∂2 l n n − ∑in=1 xi
= − −
∂p2 p2 (1 − p )2
∑n x − n

n
= − 2 + i =1 i 2
p (1 − p )
<0
1
Since it has negative curvature for all p we have that p̂ MLE = Xn
as
required.
1
(c) We claim that p̂ MOM = Xn
Proof. 1 For X ∼ Geom( p) we have that E( X ) = 1p . 1

Note: Prove this as an exercise
Now we use the method of moments as follows:
n
1 1
p̂ MOM
=
n ∑ xi
i =1
= xn
1
=⇒ p̂ MOM =
xn
1
So the method of moments is p̂ MOM = Xn
which is equal to p̂ MLE
in this case.
1
(d) Extra Credit: We claim that p̂ MLE = p̂ MOM = Xn
are biased estima-
tors.
Proof. Firstly we note that

!
1 n
n i∑
E Xn = E

Xi
=1
= E( X1 ) (using linearity of expectation and IID of Xi ’s)
1
= (Using expectation of Geomp variable (Exercise: derive this))
p
Now per the hint, using Jensen’s inequality for the convex func-
tion2 x 7→ 1x we have: 2
This is checked by noting the second
∂2 1
derivative ∂x2 x
= x13 > 0 ∀ x > 0
1
E( p MLE
ˆ )=E
Xn
1
> (note the strict inequality since our convex map is not linear)
E Xn
1
= 1
p
=p
Since E( p MLE
ˆ ) = E( p MOM
ˆ ) > p our estimators are biased.
Problem 2. Suppose we have samples X1 , . . . , Xn ∼ Unif [0, θ ].
(a) Write down the likelihood as a function of the observed data X1 , . . . , Xn ,

and the unknown parameter θ.
(b) Compute the MLE of θ.
(c) Use the method of moments to derive an estimator of θ. Is this the same
as the MLE?
(a) We claim the L(θ | x1 , x2 , . . . , xn ) =
Proof.
n
1
L ( θ | x1 , x2 , . . . , x n ) = ∏ θ 1(0 ≤ x i ≤ θ )
i =1
n n
1
=
θ ∏ 1(0 ≤ x i ≤ θ )
i =1
n
1
= 1 ( 0 ≤ x (1) ≤ x ( n ) ≤ θ )
θ
(b) 3 We claim that θ̂ MLE = X(n) 3

Note: In this case it is critical to
understand that the parameter we
are seeking to estimate i.e. θ is in the
support of our likelihood. As such we
can’t just differentiate the log-likelihood
per usual and have to carefully consider
the shape of the likelihood over it’s
support and choose the θ value that
maximizes it.
Proof. We note that from part (a) that the log likelihood is:
l (θ | x1 , x2 , . . . , xn ) = −n log (θ )1(0 ≤ x(1) ≤ x(n) ≤ θ )

∂l n
=⇒ = − 1 ( 0 ≤ x (1) ≤ x ( n ) ≤ θ )
∂θ θ
which is a decreasing function of θ over it’s support. This function
is then maximized over it’s support when θ = x(n) . We then have
that θ̂ MLE = X(n) as required.
(c) We claim that θ̂ MOM = 2X n
Proof.
x
Z θ
E( X ) = dx
θ
0
2 θ
x
=
2θ 0
θ2
=
2θ
θ
=
2
θ̂ MOM 1 n
=⇒ = ∑ xi
2 n i =1
=⇒ θ̂ MOM = 2x n
We then have θ̂ MOM = 2X n as required. This is not necessarily the

same as the MLE.
Problem 3. In this question we will explore what is known as Monte Carlo

integration. We often wish to calculate an integral of the form:
Z 1
I( f ) = f ( x )dx,
0
but cannot do so analytically. The Monte Carlo method is a way to approxi-

mate this integral.
(a) Re-express this integral as an expected value of the function f . What is

the distribution of the underlying random variable.
(b) The above suggests that we can generate random variables in a certain
way and approximate the integral of interest. Use the weak law of large
numbers (WLLN) to give a precise (asymptotic) guarantee for this
method.
(c) As a concrete example, we consider the evaluation of:

Z 1
1
I( f ) = √ exp(− x2 /2)dx.
2π 0
Use R (or any other programming language) to approximate the above

integral using 1000 random samples. Report your result. Repeat this
100 times just to get a sense of the variability - you do not need to report
these 100 numbers.
(d) Use the standard Gaussian CDF to give a precise evaluation of this
integral. You should use R (or any other language) to evaluate the CDF.
(a) We claim that this is EX ( f ( X )) where X ∼ Unif [0, 1].
Proof. For X ∼ Unif [0, 1] we have the PDF as:
p X ( x ) = 1(0 ≤ x ≤ 1)
. So we have:
Z
EX ( f ( X )) := f ( x ) p X ( x )dx
ZX
= f ( x )1(0 ≤ x ≤ 1)dx
X
Z 1
= f ( x )dx
0
as required
(b) Now that we have expressed the required integral as the expectation of a
random variable X ∼ Unif [0, 1], we have that for X1 , X2 , . . . Xn ∼ X ∼
Unif [0, 1] by the WLLN that:
n Z 1
1 p
n ∑ f ( Xi ) → EX ( f ( X )) =
0
f ( x )dx
i =1
(c) As a concrete example, we consider the evaluation of:

Z 1
1
I( f ) = √ exp(− x2 /2)dx.
2π 0
We can use R to approximate the above integral using 1000 random

samples. Report your result.
# i n s t a l l . packages (" t i d y v e r s e ")
library ( tidyverse )
# Set seed f o r r e p r o d u c i b i l i t y
set . seed (35842)
# D e f i n e t h e r e q u i r e d f u n c t i o n we want t o i n t e g r a t e
f <− f u n c t i o n ( x ) {
b a s e : : r e t u r n ( ( 1 / s q r t ( 2 * p i ) ) * exp ( − ( x^2 / 2 ) ) )
}
# D e f i n e t h e Monte C a r l o s i m u l a t i o n
monte _ c a r l o <− f u n c t i o n ( i n p _ fn , n ) {
# G e n e r a t e random s a m p l e s
samps <− p u r r r : : map_ d b l ( . x = 1 : n , ~ r u n i f ( n = 1 ,
min = 0 ,
max = 1 ) )
# Approximate t h e i n t e g r a l
i n t e g l <− mean ( i n p _ f n ( samps ) )
base : : return ( i n t e g l )
}
# S i n g l e run
o n e _ sim <− monte _ c a r l o ( i n p _ f n = f , n = 1 0 0 0 )
o n e _ sim
This returns a value of 0.3416494.

We can repeat this 100 times as follows:
# L e t ’ s r e p l i c a t e t h e s i m u l a t i o n 100 t i m e s
num_ r e p l i c a t i o n s <− 100
s i m s <− r e p l i c a t e ( e x p r = monte _ c a r l o ( i n p _ f n = f , n = 1 0 0 0 ) ,
n = num_ r e p l i c a t i o n s )
# Let ’ s p l o t a histogram
hist ( sims )
(d) We can represent this exactly as the difference of CDF of the standard
normal distribution as follows:
Z 1
1
I( f ) = √ exp(− x2 /2)dx.
2π 0
Z 1 Z 0
1 1
= √ exp(− x2 /2)dx. − √ exp(− x2 /2)dx.
−∞ 2π −∞ 2π
= Φ (1) − Φ (0)
This can be numerically calculated (approximated) in R using the follow-

ing command:
pnorm ( 1 ) − pnorm ( 0 )
which evaluates to 0.3413447.
Problem 4. The Poisson distribution with parameter λ, has pmf:
exp(−λ)λk
P( X = k ) = .
k!
For large values of λ the Poisson distribution is well approximated by a
Gaussian distribution.
(a) Use moment matching to find the Gaussian that best approximates a
Poi(λ) distribution. In other words, we can use N (µ, σ2 ) to approximate
the Poisson, of we choose µ and σ2 to match the mean and variance of the
Poisson.
(b) Suppose that we have a system that emits a random variable X particles
according to a Poisson distribution with mean λ = 900 per hour. Use the
above approximation to calculate the probability P( X > 950). You should
express this in terms of an appropriate standard Gaussian quantile, i.e.,
express your answer in terms of the function Φ(z) = P( Z ≤ z) where Z
has a standard normal distribution.
(c) Use R to compute the value of P( X > 950), approximately using

the Gaussian quantile and exactly using the Poisson CDF. There are
functions built in to R for each of these.
(a) We are given that X ∼ Poiss(λ) and Y ∼ N (µ, σ2 ). We know that

E( X 2 ) = Var( X ) + (E( X ))2 .
Equating first moments we have:
E ( X ) = E (Y )
=⇒ µ = λ
Equating second moments we have

E X2 = E Y2
=⇒ λ2 + λ = σ2 + µ2
= σ 2 + λ2
=⇒ σ2 = λ
So the closest Gaussian approximation to the Poisson by matching the

first 2 moments is Y ∼ N (λ, λ).
(b) We have from part (a) that X ∼ Y ∼ N (µ = λ, σ2 = λ) (approxi-

mately). This shows that:
P( X > 950) = 1 − P( X ≤ 950)

X−µ 950 − µ X−λ 950 − λ

= 1−P ≤ = 1−P √ ≤ √
σ σ λ λ

50
= 1−P Z ≤
30

5
= 1−Φ
3
(c) We can use R to perform the approximation as follows:

# Setup parameters
lambd <− 900
mu <− lambd
s i g m a _ s q <− lambd
t _ v a l <− 950
t _ v a l _ s t d <− ( t _ v a l − mu) / s q r t ( s i g m a _ s q )
# Normal a p p r o x i m a t i o n t o t h e p o i s s o n
1 − pnorm ( t _ v a l _ s t d )
which gives a value of 0.04779035.

If we calculate using the in-built Poisson functions we get the more
precise numerical estimation as:
1 − ppois ( q = 9 5 0 , lambda = lambd )
which gives a value of 0.04711902.
Problem 5. A Binomial RV is the sum of independent Bernoullis and is thus

well approximated by a Gaussian by the central limit theorem. We will use
this to perform a rudimentary hypothesis test.
We believe a coin is fair and toss it 100 times. It lands heads up 60
times. Use moment matching (or the CLT) to approximate the probability
P( X ≥ 60) assuming the coin is fair in terms of the standard Gaussian
CDF.
Use R to give a numerical value. Roughly, our intuition is that if this
value is small then we should be “suspicious” of our initial hypothesis that
the coin is fair. Are you suspicious?
Solution 5. Let Y1 , ..., Yn ∼ Bernoulli ( p) with p = 1/2. Then

X = {number of times landing heads up} = ∑in=1√Yi . Since n = 100
n (Ȳ −µ)
is large enough that by applying the CLT, we have σ ∼ N (0, 1).
Equivalently,
√ n
n (Ȳ − µ) 1 σ2
σ
∼ N (0, 1) ⇒ Ȳ =
n ∑ Yi ∼ N (µ, n
)
i =1
n
⇒ ∑ Yi ∼ N (nµ, nσ2 )
i =1
Since µ = E[Yi ] = p = 1/2, σ2 = Var (Yi ) = p(1 − p), we have

X = ∑in=1 Yi ∼ N (np, np(1 − p)) = N (50, 25). Hence
X − 50 60 − 50
P( X ≥ 60) = P( ≥ )
5 5
= P( Z ≥ 2)
= 1 − FZ (2)
≈ 0.02275 (can be computed using 1-pnorm(2) in R)
Where Z is a standard Gaussian random variable. This value is small

enough that we should be suspicious.
Problem 6. Suppose that X1 , . . . , Xn are repeated measurements of a

quantity µ, and that E[ Xi ] = µ, and Var( Xi ) = σ2 . Further, let us suppose
that each Xi ∈ [0, 1]. Let X n denote the average of the measurements.
(a) Use the fact that each Xi ∈ [0, 1] to give some bounds on µ and σ.
1
(b) Suppose that we take 16 measurements and that σ2 = 12 . Use the CLT to
approximate the probability that the average deviates from µ by more than
0.5.
(c) Use Chebyshev’s inequality to give an upper bound on the same quantity.
(d) Repeat the above calculation but now use Hoeffding’s inequality.
(e) Now use R to estimate this probability in the following way. Suppose
that each Xi is U [0, 1]. The mean is 0.5 and the variance is exactly 1/12.
Draw 16 measurements and track if the sample mean is within 0.5 of the
true mean. Repeat this 1000 times to get an accurate estimate. Compare
the answer to what you obtained analytically. Particularly, order the
confidence intervals by length.
Solution 6.(a) 0 ≤ Xi ≤ 1 ⇒ 0 ≤ E[ Xi ] ≤ 1 ⇒ 0 ≤ µ ≤ 1.
0 ≤ Var ( Xi ) = E[( Xi − E[ Xi ])2 ] ≤ (1 − 0)2 ≤ 1 ⇒ 0 ≤ σ ≤ 1 (Note
that we can actually have Var ( Xi ) ≤ 1/4 by the Popoviciu’s inequality).
√
n ( X̄ −µ)
(b) By the CLT, σ∼ N (0, 1), then
√ √
n | X̄ − µ| 0.5 n
P(| X̄ − µ| > 0.5) = P( > )
σ √ σ
0.5 × 16
= P(| Z | > √ )
1/12
√
= P(| Z | > 4 3 )
√
= 2 × P( Z > 4 3 )
= 4.26 × 10−12 (R command: 2*(1-pnorm(4*sqrt(3))))
(c) First notice that E( X̄ ) = µ and Var ( X̄ ) = Var ( Xi )/n = σ2 /n. Then
by Chebyshev’s inequality,
Var ( X̄ )
P(| X̄ − µ| > 0.5) ≤
0.52
σ2
=
0.25n
1
=
12 × 0.25 × 16
1
= ≈ 0.0208
48
(d) By Hoeffding’s inequality,
2n2 0.52
P(| X̄ − µ| > 0.5) ≤ 2 exp(− )
∑in=1 (1 − 0)2
= 2 exp(−2 × 0.25 × 16)
= 0.00067
(e) Use R to simulate the samples:
# initialize
L = rep ( 0 , 1000)
for ( k in 1:1000){
# g e n e r a t e 16 u n i f o r m l y d i s t r i b u t e d s a m p l e s
X = r u n i f ( 1 6 , min = 0 , max = 1 )
# s a m p l e mean
X . mean = mean (X)
# c h e c k i f t h e mean i s w i t h i n 0 . 5 o f t h e t r u e mean
L [ k ] = ( abs (X . mean − 0 . 5 ) > 0 . 5 )
}
# probability = 0
p r o b = sum ( L ) / l e n g t h ( L )
Based on the R simulations, P(| X̄ − µ| > 0.5) = 0. Summarizing parts

(b)(c)(d)(e), we have
PR < PCLT < P Hoe f f ding0 s < PChebyshev0 s
and hence, the α-level confidence intervals for (b)(c)(d)(e) satisfy
CIR ⊂ CICLT ⊂ CIHoe f f ding0 s ⊂ CIChebyshev0 s
Problem 7. Conjugate Priors: In lectures we have seen that the Beta and
Binomial are conjugate distributions, i.e. if we are estimating a Binomial
parameter and use a Beta prior then the posterior is a Beta distribution.
There is a similar relationship between Gamma and Poisson distributions.
You can use any reference (Wikipedia) for the distributions you need - for the
Gamma use the shape/rate parameterization.
(a) Suppose the data X1 , . . . , Xn ∼ Poi(λ). Write down the likelihood as a

function of the parameter λ.
(b) Compute the MLE for λ.
(c) Assume that λ ∼ Gamma(α, β), and write down the posterior distri-
bution over the parameter λ. Show that this posterior distribution is a
Gamma distribution, and compute its parameters.
(d) The mean of a Gamma distribution is α/β. Compute the posterior mean.
This will be our point estimate.
(e) Write the posterior mean as a convex combination of the prior mean and
the MLE. What happens if α, β are fixed and n → ∞?
Solution 7.(a)
n
L(λ; x1 , ..., xn ) = L(λ) = ∏ P ( Xi = x i )
i =1
n − λ xi
e λ
=∏
i =1
xi !
n
∑ xi
e−nλ λi=1
= n
∏ xi !
i =1
(b) Since L(λ) ≥ 0, λ MLE = arg max L(λ) = arg max log L(λ).
λ λ
∂ log L(λ) ∂2 log L(λ)
Such a λ satisfies ∂λ = 0 and ∂λ2
≤ 0.
n n
log L(λ) = −nλ + ∑ xi log λ − log ∏ xi !
i =1 i =1
n
∑ xi
∂ log L(λ) i =1
= −n + =0
∂λ λ
n
∑ xi
i =1
⇒λ=
n
n
∑ xi
∂2 log L(λ)
= − i=12 < 0.
∂λ2 λ
n
∑ xi
i =1
Therefore λ MLE = n .
(c) From lecture notes 9, we have
p(λ| x1 , ..., xn ) ∝ L(λ) p(λ)

n
∑ xi
e−nλ λi=1 βα α−1 − βλ
∝ λ e
n Γ(α)
∏ xi !
i =1
n
α + ∑ xi n
( β + n) i =1 α + ∑ x i −1
∝ n λ i =1 e−( β+n)λ
Γ ( α + ∑ xi )
i =1
n
Therefore λ| X1 , ..., Xn ∼ Gamma(α + ∑ Xi , β + n).
i =1
(d) This is
n
α + ∑ xi
i =1
λBayes =
β+n
(e)
n
∑ xi
β α n i =1
λBayes = +
β+n β β+n n
when n → ∞, λBayes = λMLE .
Problem 8. Computing a Bayes rule: Suppose that X ∼ N (µ, 1) is

one sample from a Normal Let µ ∼ N (0, 1) be a prior. Assume squared
error loss. Recall that the Bayes estimator is the estimator that minimizes
the expected posterior loss.
(a) Compute the posterior distribution for µ.
(b) For the squared loss, compute the Bayes estimator.

(c) Extra credit: Compute the Bayes estimator using the loss |µ̂ − µ|.
Solution 8.(a) From lecture notes 9, we have
f (µ| x ) ∝ L(µ; x ) f (µ)

1 ( x − µ )2 1 µ2
∝ √ exp − ×√ exp −
2π 2 2π 2
2
2µ − 2xµ + x 2
∝ exp −
2
2
(µ − 2 ) + x4
x 2
= exp −
2 · ( 21 )
(µ − 2x )2
∝ exp −
2 · ( 12 )
Therefore µ| X ∼ N ( X2 , 12 ).
(b) Let R L2 (µ, µ̂) = Eµ| X (µ̂ − µ)2 .

∂R L2 (µ,µ̂)
µ̂ L2 = arg min R L2 (µ, µ̂). Such a µ̂ L2 satisfies: ∂µ̂ =0
µ
∂2 R L2 (µ,µ̂)
and ∂µ̂2
≥ 0.
Z +∞
2
R L2 (µ, µ̂) = Eµ| X (µ̂ − µ) = (µ̂ − µ)2 f µ|X (µ) f µ
−∞
Z +∞
∂R L2 (µ, µ̂) ∂(µ̂ − µ)2
= f µ| X (µ ) f µ
∂µ̂ −∞ ∂µ̂
Z +∞
= 2(µ̂ − µ) f µ| X (µ) f µ = 0
−∞
Z +∞ Z +∞
⇒ µ̂ f µ| X (µ) f µ = µ f µ| X (µ ) f µ
−∞ −∞
Z +∞
⇒µ̂ f µ | X ( µ ) f µ = Eµ | X ( µ )
−∞
X
⇒µ̂ = Eµ|X (µ) =
2
And
Z +∞
∂2 R L2 (µ, µ̂)
= 2 f µ| X (µ) f µ = 2 > 0.
∂µ̂2 −∞
X
Therefore µ̂ L2 = 2.
(c) The idea is same as (b), and one can get4 4

( f ( x ) = | x |, the derivative
Note: For
1 x>0
Z +∞ f 0 (x) = .
−1 x < 0
R L1 (µ, µ̂) = Eµ| X |µ̂ − µ| = |µ̂ − µ)| f µ|X (µ) f µ
−∞
Z +∞
∂R L1 (µ, µ̂) ∂|µ̂ − µ|
= f µ| X (µ ) f µ
∂µ̂ −∞ ∂µ̂
Z µ̂ Z +∞
= f µ| X (µ ) f µ − f µ| X (µ ) f µ = 0
−∞ µ̂
⇒P(µ ≤ µ̂) = P(µ > µ̂)

X
⇒µ̂ = By symmetry of normal distribution
2
And5 5
Note: In this case for the absolute
value loss, the Bayes estimator is the
∂2 R L1 (µ, µ̂) median of the posterior, not mean
= 0. per the squared loss. For the normal
∂µ̂2
distribution the median and the mean
X coincide so the estimator is the same as
Therefore µ̂ L1 = 2. the previous part

STAT-36700 Homework 4 - Solutions: Fall 2018 September 28, 2018

Uploaded by

Copyright:

Available Formats

STAT-36700 Homework 4 - Solutions: Fall 2018 September 28, 2018

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

STAT-36700 Homework 4 - Solutions: Fall 2018 September 28, 2018

Uploaded by

Copyright:

Available Formats

STAT-36700 Homework 4 - Solutions

This contains solutions for Homework 4. Please note that we have

Problem 1. Suppose that X1 , . . . , Xn ∼ Geom( p), i.e. the samples have

(a) Write down the likelihood as a function of the observed data X1 , . . . , Xn ,

Solution 1. We derive each of the results below:

Proof. We firstly note that X ∼ Geom( p) its PDF is given by

Proof. From the previous part we have that the Likelihood of p is

. We can then derive the log-likelihood of p as follows:

Now we can differentiate the log-likelihood and set to 0 to find the

Proof. 1 For X ∼ Geom( p) we have that E( X ) = 1p . 1

Proof. Firstly we note that

Problem 2. Suppose we have samples X1 , . . . , Xn ∼ Unif [0, θ ].

(a) Write down the likelihood as a function of the observed data X1 , . . . , Xn ,

(b) Compute the MLE of θ.

Solution 2. We derive each of the results below:

(a) We claim the L(θ | x1 , x2 , . . . , xn ) =

(b) 3 We claim that θ̂ MLE = X(n) 3

l (θ | x1 , x2 , . . . , xn ) = −n log (θ )1(0 ≤ x(1) ≤ x(n) ≤ θ )

(c) We claim that θ̂ MOM = 2X n

We then have θ̂ MOM = 2X n as required. This is not necessarily the

Problem 3. In this question we will explore what is known as Monte Carlo

but cannot do so analytically. The Monte Carlo method is a way to approxi-

(a) Re-express this integral as an expected value of the function f . What is

(c) As a concrete example, we consider the evaluation of:

Use R (or any other programming language) to approximate the above

Solution 3. We derive each of the results below:

(a) We claim that this is EX ( f ( X )) where X ∼ Unif [0, 1].

Proof. For X ∼ Unif [0, 1] we have the PDF as:

(c) As a concrete example, we consider the evaluation of:

We can use R to approximate the above integral using 1000 random

This returns a value of 0.3416494.

This can be numerically calculated (approximated) in R using the follow-

which evaluates to 0.3413447.

Problem 4. The Poisson distribution with parameter λ, has pmf:

(c) Use R to compute the value of P( X > 950), approximately using

Solution 4. We derive each of the results below:

(a) We are given that X ∼ Poiss(λ) and Y ∼ N (µ, σ2 ). We know that

Equating second moments we have

So the closest Gaussian approximation to the Poisson by matching the

(b) We have from part (a) that X ∼ Y ∼ N (µ = λ, σ2 = λ) (approxi-

P( X > 950) = 1 − P( X ≤ 950)

(c) We can use R to perform the approximation as follows:

which gives a value of 0.04779035.

which gives a value of 0.04711902.

Problem 5. A Binomial RV is the sum of independent Bernoullis and is thus

Solution 5. Let Y1 , ..., Yn ∼ Bernoulli ( p) with p = 1/2. Then

Since µ = E[Yi ] = p = 1/2, σ2 = Var (Yi ) = p(1 − p), we have

Where Z is a standard Gaussian random variable. This value is small

Problem 6. Suppose that X1 , . . . , Xn are repeated measurements of a

(d) By Hoeffding’s inequality,

(e) Use R to simulate the samples:

Based on the R simulations, P(| X̄ − µ| > 0.5) = 0. Summarizing parts

PR < PCLT < P Hoe f f ding0 s < PChebyshev0 s

and hence, the α-level confidence intervals for (b)(c)(d)(e) satisfy

CIR ⊂ CICLT ⊂ CIHoe f f ding0 s ⊂ CIChebyshev0 s

(a) Suppose the data X1 , . . . , Xn ∼ Poi(λ). Write down the likelihood as a

(b) Compute the MLE for λ.

(c) From lecture notes 9, we have

p(λ| x1 , ..., xn ) ∝ L(λ) p(λ)