STAT-36700 Homework 4 - Solutions: Fall 2018 September 28, 2018
STAT-36700 Homework 4 - Solutions: Fall 2018 September 28, 2018
STAT-36700 Homework 4 - Solutions: Fall 2018 September 28, 2018
Fall 2018
September 28, 2018
(b) Compute the MLE of p. In order to do this you need to find a zero of the
derivative of the likelihood, and also check that the second derivative of
the likelihood at the point is negative.
(c) Compute the method-of-moments estimator for p. Is this the same as the
MLE?
(d) Extra Credit: Are the estimators you have derived above unbiased? As a
hint: think about using Jensen’s inequality, and when Jensen’s inequality
is a strict inequality.
1
(b) We claim that p̂ MLE = Xn
1
(c) We claim that p̂ MOM = Xn
1
(d) Extra Credit: We claim that p̂ MLE = p̂ MOM = Xn
are biased estima-
tors.
stat-36700 homework 4 - solutions 3
Now per the hint, using Jensen’s inequality for the convex func-
tion2 x 7→ 1x we have: 2
This is checked by noting the second
∂2 1
derivative ∂x2 x
= x13 > 0 ∀ x > 0
1
E( p MLE
ˆ )=E
Xn
1
> (note the strict inequality since our convex map is not linear)
E Xn
1
= 1
p
=p
Since E( p MLE
ˆ ) = E( p MOM
ˆ ) > p our estimators are biased.
(c) Use the method of moments to derive an estimator of θ. Is this the same
as the MLE?
Proof.
n
1
L ( θ | x1 , x2 , . . . , x n ) = ∏ θ 1(0 ≤ x i ≤ θ )
i =1
n n
1
=
θ ∏ 1(0 ≤ x i ≤ θ )
i =1
n
1
= 1 ( 0 ≤ x (1) ≤ x ( n ) ≤ θ )
θ
Proof. We note that from part (a) that the log likelihood is:
Proof.
x
Z θ
E( X ) = dx
θ
0
2 θ
x
=
2θ 0
θ2
=
2θ
θ
=
2
θ̂ MOM 1 n
=⇒ = ∑ xi
2 n i =1
=⇒ θ̂ MOM = 2x n
Z 1
I( f ) = f ( x )dx,
0
(b) The above suggests that we can generate random variables in a certain
way and approximate the integral of interest. Use the weak law of large
numbers (WLLN) to give a precise (asymptotic) guarantee for this
method.
stat-36700 homework 4 - solutions 5
(d) Use the standard Gaussian CDF to give a precise evaluation of this
integral. You should use R (or any other language) to evaluate the CDF.
p X ( x ) = 1(0 ≤ x ≤ 1)
. So we have:
Z
EX ( f ( X )) := f ( x ) p X ( x )dx
ZX
= f ( x )1(0 ≤ x ≤ 1)dx
X
Z 1
= f ( x )dx
0
as required
(b) Now that we have expressed the required integral as the expectation of a
random variable X ∼ Unif [0, 1], we have that for X1 , X2 , . . . Xn ∼ X ∼
Unif [0, 1] by the WLLN that:
n Z 1
1 p
n ∑ f ( Xi ) → EX ( f ( X )) =
0
f ( x )dx
i =1
# Set seed f o r r e p r o d u c i b i l i t y
set . seed (35842)
# D e f i n e t h e r e q u i r e d f u n c t i o n we want t o i n t e g r a t e
f <− f u n c t i o n ( x ) {
b a s e : : r e t u r n ( ( 1 / s q r t ( 2 * p i ) ) * exp ( − ( x^2 / 2 ) ) )
}
# D e f i n e t h e Monte C a r l o s i m u l a t i o n
monte _ c a r l o <− f u n c t i o n ( i n p _ fn , n ) {
# G e n e r a t e random s a m p l e s
samps <− p u r r r : : map_ d b l ( . x = 1 : n , ~ r u n i f ( n = 1 ,
min = 0 ,
max = 1 ) )
# Approximate t h e i n t e g r a l
i n t e g l <− mean ( i n p _ f n ( samps ) )
base : : return ( i n t e g l )
}
# S i n g l e run
o n e _ sim <− monte _ c a r l o ( i n p _ f n = f , n = 1 0 0 0 )
o n e _ sim
# Let ’ s p l o t a histogram
hist ( sims )
(d) We can represent this exactly as the difference of CDF of the standard
normal distribution as follows:
Z 1
1
I( f ) = √ exp(− x2 /2)dx.
2π 0
Z 1 Z 0
1 1
= √ exp(− x2 /2)dx. − √ exp(− x2 /2)dx.
−∞ 2π −∞ 2π
= Φ (1) − Φ (0)
stat-36700 homework 4 - solutions 7
exp(−λ)λk
P( X = k ) = .
k!
For large values of λ the Poisson distribution is well approximated by a
Gaussian distribution.
(a) Use moment matching to find the Gaussian that best approximates a
Poi(λ) distribution. In other words, we can use N (µ, σ2 ) to approximate
the Poisson, of we choose µ and σ2 to match the mean and variance of the
Poisson.
(b) Suppose that we have a system that emits a random variable X particles
according to a Poisson distribution with mean λ = 900 per hour. Use the
above approximation to calculate the probability P( X > 950). You should
express this in terms of an appropriate standard Gaussian quantile, i.e.,
express your answer in terms of the function Φ(z) = P( Z ≤ z) where Z
has a standard normal distribution.
E ( X ) = E (Y )
=⇒ µ = λ
=⇒ λ2 + λ = σ2 + µ2
= σ 2 + λ2
=⇒ σ2 = λ
# Normal a p p r o x i m a t i o n t o t h e p o i s s o n
1 − pnorm ( t _ v a l _ s t d )
Equivalently,
√ n
n (Ȳ − µ) 1 σ2
σ
∼ N (0, 1) ⇒ Ȳ =
n ∑ Yi ∼ N (µ, n
)
i =1
n
⇒ ∑ Yi ∼ N (nµ, nσ2 )
i =1
X − 50 60 − 50
P( X ≥ 60) = P( ≥ )
5 5
= P( Z ≥ 2)
= 1 − FZ (2)
≈ 0.02275 (can be computed using 1-pnorm(2) in R)
(a) Use the fact that each Xi ∈ [0, 1] to give some bounds on µ and σ.
1
(b) Suppose that we take 16 measurements and that σ2 = 12 . Use the CLT to
approximate the probability that the average deviates from µ by more than
0.5.
(c) Use Chebyshev’s inequality to give an upper bound on the same quantity.
(d) Repeat the above calculation but now use Hoeffding’s inequality.
(e) Now use R to estimate this probability in the following way. Suppose
that each Xi is U [0, 1]. The mean is 0.5 and the variance is exactly 1/12.
Draw 16 measurements and track if the sample mean is within 0.5 of the
true mean. Repeat this 1000 times to get an accurate estimate. Compare
the answer to what you obtained analytically. Particularly, order the
confidence intervals by length.
Solution 6.(a) 0 ≤ Xi ≤ 1 ⇒ 0 ≤ E[ Xi ] ≤ 1 ⇒ 0 ≤ µ ≤ 1.
0 ≤ Var ( Xi ) = E[( Xi − E[ Xi ])2 ] ≤ (1 − 0)2 ≤ 1 ⇒ 0 ≤ σ ≤ 1 (Note
that we can actually have Var ( Xi ) ≤ 1/4 by the Popoviciu’s inequality).
stat-36700 homework 4 - solutions 10
√
n ( X̄ −µ)
(b) By the CLT, σ∼ N (0, 1), then
√ √
n | X̄ − µ| 0.5 n
P(| X̄ − µ| > 0.5) = P( > )
σ √ σ
0.5 × 16
= P(| Z | > √ )
1/12
√
= P(| Z | > 4 3 )
√
= 2 × P( Z > 4 3 )
= 4.26 × 10−12 (R command: 2*(1-pnorm(4*sqrt(3))))
(c) First notice that E( X̄ ) = µ and Var ( X̄ ) = Var ( Xi )/n = σ2 /n. Then
by Chebyshev’s inequality,
Var ( X̄ )
P(| X̄ − µ| > 0.5) ≤
0.52
σ2
=
0.25n
1
=
12 × 0.25 × 16
1
= ≈ 0.0208
48
2n2 0.52
P(| X̄ − µ| > 0.5) ≤ 2 exp(− )
∑in=1 (1 − 0)2
= 2 exp(−2 × 0.25 × 16)
= 0.00067
# initialize
L = rep ( 0 , 1000)
for ( k in 1:1000){
# g e n e r a t e 16 u n i f o r m l y d i s t r i b u t e d s a m p l e s
X = r u n i f ( 1 6 , min = 0 , max = 1 )
# s a m p l e mean
X . mean = mean (X)
# c h e c k i f t h e mean i s w i t h i n 0 . 5 o f t h e t r u e mean
L [ k ] = ( abs (X . mean − 0 . 5 ) > 0 . 5 )
}
# probability = 0
p r o b = sum ( L ) / l e n g t h ( L )
stat-36700 homework 4 - solutions 11
Problem 7. Conjugate Priors: In lectures we have seen that the Beta and
Binomial are conjugate distributions, i.e. if we are estimating a Binomial
parameter and use a Beta prior then the posterior is a Beta distribution.
There is a similar relationship between Gamma and Poisson distributions.
You can use any reference (Wikipedia) for the distributions you need - for the
Gamma use the shape/rate parameterization.
(c) Assume that λ ∼ Gamma(α, β), and write down the posterior distri-
bution over the parameter λ. Show that this posterior distribution is a
Gamma distribution, and compute its parameters.
(d) The mean of a Gamma distribution is α/β. Compute the posterior mean.
This will be our point estimate.
(e) Write the posterior mean as a convex combination of the prior mean and
the MLE. What happens if α, β are fixed and n → ∞?
Solution 7.(a)
n
L(λ; x1 , ..., xn ) = L(λ) = ∏ P ( Xi = x i )
i =1
n − λ xi
e λ
=∏
i =1
xi !
n
∑ xi
e−nλ λi=1
= n
∏ xi !
i =1
(b) Since L(λ) ≥ 0, λ MLE = arg max L(λ) = arg max log L(λ).
λ λ
∂ log L(λ) ∂2 log L(λ)
Such a λ satisfies ∂λ = 0 and ∂λ2
≤ 0.
n n
log L(λ) = −nλ + ∑ xi log λ − log ∏ xi !
i =1 i =1
stat-36700 homework 4 - solutions 12
n
∑ xi
∂ log L(λ) i =1
= −n + =0
∂λ λ
n
∑ xi
i =1
⇒λ=
n
n
∑ xi
∂2 log L(λ)
= − i=12 < 0.
∂λ2 λ
n
∑ xi
i =1
Therefore λ MLE = n .
n
Therefore λ| X1 , ..., Xn ∼ Gamma(α + ∑ Xi , β + n).
i =1
(d) This is
n
α + ∑ xi
i =1
λBayes =
β+n
(e)
n
∑ xi
β α n i =1
λBayes = +
β+n β β+n n
when n → ∞, λBayes = λMLE .
(c) Extra credit: Compute the Bayes estimator using the loss |µ̂ − µ|.
Therefore µ| X ∼ N ( X2 , 12 ).
Z +∞
∂R L2 (µ, µ̂) ∂(µ̂ − µ)2
= f µ| X (µ ) f µ
∂µ̂ −∞ ∂µ̂
Z +∞
= 2(µ̂ − µ) f µ| X (µ) f µ = 0
−∞
Z +∞ Z +∞
⇒ µ̂ f µ| X (µ) f µ = µ f µ| X (µ ) f µ
−∞ −∞
Z +∞
⇒µ̂ f µ | X ( µ ) f µ = Eµ | X ( µ )
−∞
X
⇒µ̂ = Eµ|X (µ) =
2
And
Z +∞
∂2 R L2 (µ, µ̂)
= 2 f µ| X (µ) f µ = 2 > 0.
∂µ̂2 −∞
X
Therefore µ̂ L2 = 2.
Z +∞
∂R L1 (µ, µ̂) ∂|µ̂ − µ|
= f µ| X (µ ) f µ
∂µ̂ −∞ ∂µ̂
Z µ̂ Z +∞
= f µ| X (µ ) f µ − f µ| X (µ ) f µ = 0
−∞ µ̂