4 Comparison of Estimators: 4.1 Optimality Theory
4 Comparison of Estimators: 4.1 Optimality Theory
4 Comparison of Estimators: 4.1 Optimality Theory
E|T (X) − θ|
if T ∼ N(θ, ),
r
2p
E|T (X) − θ| = R(θ, T ).
π
1
Pn
µ̂ = X̄, σ̂ = n i=1 (Xi − X̄)2
σ2
R(θ, X̄) = var(X̄) =
n
σ2
2
2 nσ̂
b(θ, σ̂ ) = E − σ2
n σ2
σ2
= (n − 1) − σ 2
n
σ2
= −
n
1
σ4
2
2 nσ̂ σ4
R(θ, σ̂ ) = var +
n2 σ2 n2
σ4
= [2(n − 1) + 1]
n2
σ 4 (2n − 1)
= .
n2
σ2
R(θ, aX̄) = a2 + (a − 1)2 µ2
n
Inadmissible : we say the estimator S is inadmissible if there exists the other estimator T
such that
• Difficulty :
* Bayes risk :
Z
R(θ, T )π(θ)dθ
2
* Worst-case risk :
max R(θ, T )
θ
4.2 UMVUEs
Ex. Consider two estimators for estimating the standard deviation σ when data ∼ N(µ, σ 2 ):
s
1X
σ̂ = (Xi − X̄)2
n i
1X
σ̃ = |Xi − X̄|
n i
Theorem. (Rao-Blackwell) T (X) is sufficient for θ and Eθ (|S(X)|) < ∞. Then the estimator
defined by
Strict inequality holds unless T ∗ (X) = S(X) with probability one if varθ (S(X)) < ∞.
p.f.
3
so it is sufficient to show that
with “=” holds iff E(S(X)|T (X)) = S(X); but that is already a known result (Chapter 1)
.
Complete: A statistic T is complete iff the only function g , defined on the range of T ,
satisfying
Eθ [g(T )] = 0, ∀θ
is just g() = 0.
P
Ex. (X1 , . . . , Xn ) ∼ P oisson(θ) ⇒ T = i Xi is sufficient for θ and T ∼ P oisson(nθ):
E(g(T ))
∞
−nθ
X g(i) (n θ)i
= e = 0, ∀θ > 0
i=0
i!
⇔ g(i) = 0, ∀i = 0, 1, 2, ...
So T is complete.
unbiased estimate of q(θ), then T ∗ = E[S(X)|T (X)] is UMVUE of q(θ). If varθ (T∗ (X)) < ∞,
var(T∗ ) ≤ var(S)
(strictly smaller unless S is equal to T ∗ ). We need to show that T ∗ does not depend on
4
both unbiased and obtained by Rao-Blackwell.
Then
E(g1 (T ) − g2 (T )) = 0.
How to find UMVUE: first find a complete sufficient statistic T , then either
n Nθ N −N θ
X k n−k
Eθ (g(X)) = N
g(k) = 0, ∀θ = 0, 1/N, . . . , 1.
k=0 n
When θ = 0,
1
When θ = N
,
1 N −1 1 N −1
0 1 n−1
Eθ (g(X)) = N
n g(0) + N
g(1)
n n
⇒ g(1) = 0.
So, by induction,
5
Theorem. Suppose {Pθ } belongs to exponential family, and suppose C = (C1 (θ), · · · , Ck (θ))
1
p(X, θ) = if X(n) < θ, X(1) > 0.
θn
Recall that the sufficient statistic for θ is X(n) , and the density of X(n) is
n tn−1
fX(n) (t) = , 0 < t < θ.
θn
Note that, if
θ
n
Z
g(t) tn−1dt = 0, ∀θ > 0,
E g(X(n) ) = n
θ 0
⇒ g(t) = 0.
θ
n nθ
Z
E(X(n) ) = n tn dt =
θ 0 n+1
n+1
We have that n
X(n) is UMVUE for θ.
6
Pn
⇒T = i=1 Xi is sufficient, (complete) for λ. We want to estimate the quantity
Pλ (X1 ≤ x) = 1 − e−λx .
E[I(X1 ≤ x)|T = t]
n
X
= P (X1 ≤ x| Xi = t)
i=1
n
!
X1 x X
= P Pn ≤ Pn | Xi = t
i=1 Xi i=1 Xi i=1
∵ X1 ∼ Γ(1, λ)
n
X
Xi ∼ Γ(n, λ)
i=1
X1
∴ Pn ∼ β(1,n−1)
i=1 Xi
n
X
E[I(X1 ≤ x)| Xi = t]
i=1
Z x/t
= β(1,n−1) (u) du
0
Z x/t
= (n − 1)(1 − u)n−2 du
0
Pn
Pn x )n−1 , if
1 − (1 −
i=1 Xi > x
i=1 Xi
=
1, o.w.
* Regularity Assumptions :
7
(2) ∀x ∈ A and θ ∈ Θ,
∂
log p(x, θ)
∂θ
∂ ∂
Z Z
T (x)p(x, θ)dx = T (x) p(x, θ)dx
∂θ ∂θ
∂
S(X, θ) = log p(X, θ)
∂θ
* Properties for score function (under regularity assumptions) : (1) Eθ (S(X, θ)) = 0.
p.f.
∂
Eθ log p(x, θ)
∂θ
∂
Z
= log p(x, θ) p(x, θ)dx
∂θ
∂
Z
= p(x, θ)dx
∂θ
∂
Z
= p(x, θ)dx
∂θ
∂
= 1
∂θ
= 0.
h i
∂2
(2) varθ (S(X, θ)) = −E ∂θ 2
log p(x, θ) .
∂ 2
p.f. Since Eθ (S(X, θ)) = 0, varθ (S(X, θ)) = E[S(X, θ)]2 = E ∂θ
log p(X, θ) . The result
8
follows by noting that
∂2
−E log p(x, θ)
∂θ2
∂2
Z
= − log p(x, θ) p(x, θ)dx
∂θ2
Z
∂ ∂ ∂ ∂
Z
= − log p(x, θ) p(x, θ) dx − log p(x, θ) p(x, θ) dx
∂θ ∂θ ∂θ ∂θ
∂
∂ ∂ ∂ p(x, θ)
Z Z
= − log p(x, θ) p(x, θ)dx + log p(x, θ) ∂θ p(x, θ) dx
∂θ ∂θ ∂θ p(x, θ)
Z 2
∂
= 0+ log p(x, θ) p(x, θ)dx
∂θ
2
∂
= E log p(x, θ) .
∂θ
p.f. Since Eθ (S(X, θ)) = 0, covθ (S(X, θ), T (X)) = E[(S(X, θ)T (X)], and
∂
Z
E[(S(X, θ)T (X)] = log p(x, θ)T (x)p(x, θ) dx
∂θ
∂
Z
= p(x, θ)T (x) dx
∂θ
∂
Z
= p(x, θ)T (x) dx
∂θ
∂
= Eθ (T (x)) = ψ ′ (θ).
∂θ
Theorem. (Information Inequality) Let T (x) be any statistic such that var(T (x)) < ∞, ∀θ.
Let Eθ (T (x)) = ψ(θ). Under reqularity conditions and 0 < I(θ) < ∞:
[ψ ′ (θ)]2
varθ (T (x)) ≥ , ∀ψ differentiable.
I(θ)
∂
p.f. Let S = ∂θ
log p(x, θ) (score). Then var(S) = I(θ), and ψ ′ (θ) = cov(T, S). The result
Corollary.
Corollary.
For unbiased T ∗ , if
1
var(T ∗ ) =
I(θ)
then T ∗ is UMVUE.
In general, we have n observations and the Fisher information I(θ) is defined for n
observations. Suppose that I1 (θ) is the Fisher information for single observation. Then
I(θ) = n I1 (θ).
10
Ex. (X1 , . . . , Xn ) ∼ N(µ, σ 2 )
σ2 1 1
var(X̄) = = = ,
n I(θ) n I1 (θ)
1
where I1 (θ) = σ2
.
exponential family with density p(x, θ) = exp [c(θ)T ∗ (x) + d(θ) + s(x)] IA (x). Conversely, if
on Θ. Then T (X) achieves the information bound and is a UMVUE of E(T (X)).
P [ |Tn (X1 , . . . , Xn ) − θ| ≥ ε ] → 0.
Ex.
n
1X j
m̂j = X
n i=1 i
p
Tn = g(m̂1 , . . . , m̂r ) → g(m1 (θ), . . . , mr (θ)) = g(θ)
11
* Asymptotic Normality:
Tn is approximately normally distributed with µn (θ) and variance σn2 (θ) iff
t − µn (θ)
P (Tn (X) ≤ t) ≈ Φ
σn (θ)
i.e., ∀z,
Tn (X1 , . . . , Xn ) − µn (θ)
lim P ≤ z = Φ(z).
n→∞ σn (θ)
√
* n-consistent :
√
n(µn (θ) − q(θ)) → 0,
√ d
n q(X̄n ) − q(θ) → Z ∼ N(0, [q ′ (θ)]2 θ(1 − θ)).
So we can take
[q ′ (θ)]2 θ(1 − θ)
µn = q(θ), σn2 = .
n
Theorem. (a) Suppose that P = (P1 , . . . , Pk ) are the population frequencies for k categories.
√
n(Tn − h(P1 , . . . , Pk )) → N(0, σh2 )
k 2 " k
#2
X ∂ X ∂
σh2 = Pi h(P) − Pi h(P)
i=1
∂Pi i=1
∂Pi
(b) Suppose that m = (m1 , . . . , mr ) are the population moments. Let Tn = h(m̂1 , . . . , m̂k )
√
n(Tn − g(m1 , . . . , mr )) → N(0, σg2 )
12
2r
" r #2
X X ∂
σg2 = bi mi − mi g(m)
i=2 i=1
∂mi
r
!
X ∂
= var g(m) X i ,
i=1
∂mi
X ∂ ∂
bi = g(m) g(m).
∂mj ∂mk
j+k=i
1≤j, k≤r
p
h1 (P1 , P2 , P3 ) = P1
p
h2 (P1 , P2 , P3 ) = 1 − P3
1
h3 (P1 , P2 , P3 ) = P1 + P2
2
2
1 − θ2
1 1 1 − P1
σ12 = √ · var(X1 ) = · P1 (1 − P1 ) = =
2 P1 4P1 4 4
2
1 − (1 − θ)2
1 1 1 − P3
σ22 = − √ · var(X3 ) = · P3 (1 − P3 ) = =
2 P3 4P3 4 4
2
1 1
σ32 = 2
1 · var(X1 ) + · var(X2 ) + 1 · · 2 cov(X1 , X2 )
2 2
1
= P1 (1 − P1 ) + · P2 (1 − P2 ) + 1 · (−P1 P2 )
4
1
= θ12 (1 − θ12 ) + · 2θ1 (1 − θ1 )[1 − 2θ1 (1 − θ1 )] − θ12 · 2θ1 (1 − θ1 )
4
2
θ θ1
= − 1 +
2 2
θ1 (1 − θ1 )
= .
2
13
Ex. σ̂ 2 = m̂2 − m̂21 .
g(m1 , m2 ) = m2 − m21
∂
g(m1 , m2 ) = −2m1
∂m1
∂
g(m1 , m2 ) = 1
∂m2
∴ var(σ̂ 2 ) = var(−2m1 · X + 1 · X 2 )
= var(X − m1 )2
= E(X − m1 )4 − σ 4
σ22
e(θ, T (1) , T (2) ) =
σ12
Ex. (HWE)
1 − θ2
σ12 =
4
2 1 − (1 − θ)2
σ2 =
4
1 − (1 − θ)2
e(T1 , T2 ) =
1 − θ2
1
T1 is better when θ > 2
and T1 ≈ T2 when θ = 21 .
14
Does the asymptotic variance σ 2 (θ) satisfy
{ψ ′ (θ)}2
σ 2 (θ) ≧ ?
I(θ)
generally holds.
Ex. X ∼ Poisson(θ)
Ex. X ∼ Bin(θ)
2 2
(X − θ)2
∂ X 1−X 1
I(θ) = E log θX (1 − θ)1−X =E − =E =
∂θ θ 1−θ θ(1 − θ) θ(1 − θ)
∂
Let ℓi = ∂θ
log p(Xi , θ), and θ̂ the MLE. Then
X X hX i
0= ℓi (θ̂) ≈ ℓi (θ) + ℓ′i (θ∗ ) (θ̂ − θ)
15
− √1n ℓi (θ)
P
√
N(0, I(θ)) 1
n(θ̂ − θ) ≈ 1 P ′ −−−−→ = N 0,
n
ℓi (θ) Stustky
CLT I(θ) I(θ)
16