Lectura 3 MLE
Lectura 3 MLE
Lectura 3 MLE
Mathematical statistics
Point estimation
This publication forms part of an Open University module. Details of this and other
Open University modules can be obtained from the Student Registration and Enquiry Service, The
Open University, PO Box 197, Milton Keynes MK7 6BJ, United Kingdom (tel. +44 (0)845 300 6090;
email general-enquiries@open.ac.uk).
Alternatively, you may visit the Open University website at www.open.ac.uk where you can learn
more about the wide range of modules and packs offered at all levels by The Open University.
To purchase a selection of Open University materials visit www.ouw.co.uk, or contact Open
University Worldwide, Walton Hall, Milton Keynes MK7 6AA, United Kingdom for a brochure
(tel. +44 (0)1908 858793; fax +44 (0)1908 858787; email ouw-customer-services@open.ac.uk).
Note to reader
Mathematical/statistical content at the Open University is usually provided to students in
printed books, with PDFs of the same online. This format ensures that mathematical notation
is presented accurately and clearly. The PDF of this extract thus shows the content exactly as
it would be seen by an Open University student. Please note that the PDF may contain
references to other parts of the module and/or to software or audio-visual components of the
module. Regrettably mathematical and statistical content in PDF files is unlikely to be
accessible using a screenreader, and some OpenLearn units may have PDF files that are not
searchable. You may need additional help to read these documents.
1.1 Background
You should know a little about the whys and wherefores of the likelihood
function from your previous studies and the review of likelihood in
Section 5 of Unit 4. A further brief review of the fundamentals is
nevertheless included here.
Write f (x|θ) to denote either the probability density function associated
with X if X is continuous, or the probability mass function if X is
discrete. The ‘likelihood function’ gives a measure of how likely a value of
θ is, given that it is known that the sample X1 , X2 , . . . , Xn has the values
x1 , x2 , . . . , xn .
L(θ) can be evaluated for any value of θ in the parameter space that you
care to try. An important point to remember, therefore, is that the
likelihood is thought of as a function of θ (not of x). See, for example,
Figure 4.6 in Subsection 5.2 of Unit 4, which shows a likelihood function
for a parameter λ.
It follows that the best estimator for the value of θ is the one that
maximises the likelihood function . . . the maximum likelihood
estimator of θ. In this sense, the maximum likelihood estimator is the
most likely value of θ given the data.
Exercise 5.1
Suppose that a coin is tossed 100 times with a view to estimating p, the
probability of obtaining a head, 0 < p < 1. The probability mass function
of the distribution of X, the random variable representing the number of Binomial distribution
heads in 100 tosses of the coin, is that of the binomial distribution with
parameters n = 100 and p. Therefore,
100 x
f (x|p) = p (1 − p)100−x .
x
Now suppose that the outcome of the experiment was 54 heads (and
100 − 54 = 46 tails).
What is the likelihood, L(p), for p given that X = x = 54?
0
0 pb 1 p
Write C = 100
54 and note that C is positive and does not depend on p.
Then, write the likelihood function that it is required to maximise as
L(p) = Cp54 (1 − p)46 .
4
1 Maximum likelihood estimation
Define the log-likelihood function, or log-likelihood for short, by A more precise notation would
be ℓ(θ | x1 , x2 , . . . , xn ).
ℓ(θ) = log L(θ).
(Remember that log means log to the base e.)
5
Point estimation
The likelihood in the binomial case considered in Example 5.1 was shown
to be
L(p) = Cp54 (1 − p)46 .
The log-likelihood is therefore
Manipulating logs
ℓ(p) = log L(p)
= log{Cp54 (1 − p)46 }
= log C + log(p54 ) + log((1 − p)46 )
= log C + 54 log p + 46 log(1 − p).
Now, differentiating with respect to p:
d
ℓ(p) = ℓ′ (p)
dp
d d d
= (log C) + 54 (log p) + 46 {log(1 − p)}
dp dp dp
54 46
=0+ −
p (1 − p)
54(1 − p) − 46p
=
p(1 − p)
54 − 100p
= .
p(1 − p)
Setting ℓ′ (p) = 0 means that the potential maximum value of p satisfies
54 − 100p
= 0.
p(1 − p)
This equation can be multiplied through by p(1 − p) since p(1 − p) is
positive for 0 < p < 1. Therefore, p solves
54 − 100p = 0
or p = 54/100 = 0.54. You might or might not feel that this calculation is
a little easier than the one done in Example 5.1!
However, this time checking that this value of p corresponds to a
maximum is much more easily done:
d2 ′′ d ′ d 54 d 46
ℓ(p) = ℓ (p) = ℓ (p) = −
dp2 dp dp p dp 1 − p
54 46
=− 2 − .
p (1 − p)2
At p = 54/100,
′′ 54 54 × 1002 46 × 1002
ℓ =− −
100 542 462
1 1
= −10 000 × + < 0,
54 46
and so pb = 54/100 is indeed the maximum likelihood estimate of p. In this
case, ℓ′′ (p) happens to be negative for all 0 < p < 1, but this property is
not required to confirm that p = 54/100 corresponds to a maximum.
6
1 Maximum likelihood estimation
You might have expected the ‘natural’ estimate of the proportion of heads
based on this experiment to be the observed number of heads, 54, divided
by the total number of coin tosses, 100. All this effort has confirmed that
the general approach of maximum likelihood estimation agrees with
intuition in this case.
Also, the fact that pb = 0.54 maximises both the likelihood and
log-likelihood functions is confirmed pictorially in Figure 5.2.
L(p)
0.1
0
0 pb = 0.54 1 p
(a)
ℓ(p)
0
−5
−10
0 pb = 0.54 1 p
(b)
Figure 5.2 (a) Plot of L(p) with maximum marked as pb = 0.54; (b) plot
of ℓ(p) with maximum marked as pb = 0.54
7
Point estimation
Exercise 5.2
Interactive content appears here. Please visit the website to use it.
If you would like to follow through the calculus argument for why you can
maximise the log-likelihood instead of the likelihood, refer to Section 1 of
the ‘Optional material for Unit 5’.
Exercise 5.3
8
1 Maximum likelihood estimation
The assumption made in Step 3 above is quite a big one. There are
alternatives. For example, more generally, the maximum of the
log-likelihood is actually either at a stationary point of the log-likelihood
function or at one of the endpoints of the parameter space. (One or both
of these endpoints might be infinite.) You could then check the values of
ℓ(θ) at the stationary points and the endpoints, and return the point from
among these that yields the largest value of the log-likelihood as the
maximum likelihood estimator. It is also possible that the log-likelihood
function is not differentiable everywhere in the interior of the parameter
space, in which case the calculus route to maximum likelihood estimation
fails. You will see an example of this in Subsection 1.5. Nonetheless, you
should use the approach of the above box throughout M347, unless
specifically told not to.
This might be a useful time to introduce a standard piece of terminology
to avoid the mouthfuls ‘maximum likelihood estimator’ and ‘maximum
likelihood estimate’; either is often referred to as the MLE.
9
Point estimation
On the other hand, mouthfuls are just what you want at the other
MLE: Major League Eating, ‘the world body that oversees all
professional eating contests’ and ‘includes the sport’s governing body,
the International Federation of Competitive Eating’.
10
1 Maximum likelihood estimation
Exercise 5.4
11
Point estimation
You might have noticed that, so far, all the MLEs that have been derived
arise from equating the sample mean with the population mean. (And this
will happen again!) However, it is certainly not the case that MLEs always
correspond to such a simple alternative approach.
Exercise 5.5
Exercise 5.6
12
1 Maximum likelihood estimation
√
where C1 = −n log( 2π). Notice that C1 differs from C in Exercise 5.5
because σ, which formed part of C, is now the parameter of interest.
b given above.
(b) Find ℓ′ (σ) and hence show that the candidate MLE is σ Steps 2 and 3
(c) Confirm that σb is indeed the MLE of σ. Hint: You might find Step 4
manipulations
P easier if you use the shorthand notation
S0 = n1 ni=1 (xi − µ0 )2 and note that S0 , being a sum of squares, is
positive.
You were promised that in this unit you would have to deal with the
estimation of only one parameter at a time. Nonetheless it is opportune to
mention a result that involves estimating two parameters simultaneously, Now, both µ and σ are unknown.
the parameters of the normal model.
b = X. It might also
Given the result of Exercise 5.5, it is no surprise that µ
b has the same form when µ is unknown as it does
be unsurprising that σ
when µ is known, with µ0 in the latter replaced by its estimator X in the
former. However, proof of the boxed result is not entirely straightforward
and is beyond the scope of this unit. Moreover, you might also notice
b is not the usual sample standard deviation; more on that later in
that σ
the unit.
Exercise 5.7
being the boundary of the support of f (x|θ), and is common to such It is arguable as to how
‘irregular’ likelihood problems. important such models
are/should be in statistics.
Rest assured that calculus usually works in maximum likelihood contexts,
that it should remain your approach of choice to such problems in M347,
and that no attempt will be made to catch you out in this way. The real
world, however, won’t always be so kind!
then is the MLE of σ2 equal to the square of the MLE of σ, i.e. is it true
that
n
c 2 1X
2
σ = (b σ) = (Xi − µ0 )2 ? Notice that the first, wide, hat
n covers all of σ2 , while the
i=1
second, narrow, hat covers
This question is addressed in Subsection 1.6.1. Then, in Subsection 1.6.2, just σ.
the more general question of estimating a function of the parameter, h(θ),
will be explored.
Exercise 5.8
14
1 Maximum likelihood estimation
−20
0 8
(a) b = 1.0399
σ
ℓ(v)
−14
−20
0 vb = 1.0814 8
(b)
15
Point estimation
Exercise 5.9
The main result applies to virtually any function h, but it will be proved
only for increasing h. This means that h has an inverse function g, say, so
that since τ = h(θ), θ = g(τ). Notice that g is an increasing function of τ. g is usually written as h−1 but it
Now, by setting θ = g(τ) in the log-likelihood ℓ(θ) it is possible to write the will be less confusing not to do
log-likelihood in terms of τ (as was done in a special case in Exercise 5.8). so here.
Denoting the log-likelihood for τ as m(τ),
m(τ) = ℓ(g(τ)) = ℓ(θ).
Now, let b
θ be the MLE of θ. Write bτ = h(b θ), so that b
θ = g(bτ), and let τ0 be
some value of τ different from bτ. Write θ0 = g(τ0 ). Because bθ is the MLE,
you know that
ℓ(b
θ) > ℓ(θ0 ).
Then, combining the equation and the inequality above,
m(bτ) = ℓ(g(bτ)) = ℓ(b
θ) > ℓ(θ0 ) = ℓ(g(τ0 )) = m(τ0 ).
That is, the likelihood associated with bτ is greater than the likelihood
associated with any other value τ0 , and hence bτ is indeed the MLE of τ.
2 Properties of estimators
Define eθ to be any point estimator of a parameter θ. This section concerns The estimator is read as ‘theta
general properties of point estimators. The notation eθ is used to tilda’.
b
distinguish a general estimator from θ, which in the remainder of this unit
will always refer to the MLE.
16
2 Properties of estimators
Interactive content appears here. Please visit the website to use it.
17
Point estimation
E(e
θ) = θ,
then the bias of the estimator is zero and e
θ is said to be an unbiased
estimator of θ.
If e
θ is an unbiased estimator there is no particular tendency for it to
under- or over-estimate θ. This is clearly a desirable attribute for a point
estimator to possess. For example, Figure 5.4 shows the sampling
distributions of two estimators, e θ1 and e θ2 , of the same parameter θ. The
left-hand distribution, with pdf fs (e θ1 ), has mean θ; the right-hand one,
with pdf fs (e
θ2 ), clearly does not. (Note the locations of E(e θ1 ) = θ and
E(θ2 ), and the bias B(θ2 ) = E(θ2 ) − θ.) It would appear that e
e e e θ1 is a
better estimator of θ than is e θ2 .
fs (θe1 ) fs (θe2 )
θ = E(θe1 ) E(θe2 )
B(θe2 )
18
2 Properties of estimators
Bias (of Priene) was one of the Seven Sages of Ancient Greece.
Among his sayings is this one relevant to the OU student: ‘Choose
the course which you adopt with deliberation; but when you have
adopted it, then persevere in it with firmness.’
2.2.1 Examples
e = X, is an unbiased
since E(Xi ) = µ for each i. Thus the sample mean, µ
estimator of the population mean µ.
Notice that this result holds whatever the distribution of the Xi ’s,
provided only that it has finite mean µ.
The result that you will obtain in the next exercise is both important in its
own right and as a prerequisite for the example concerning unbiasedness
which follows it.
19
Point estimation
Exercise 5.10
20
2 Properties of estimators
Exercise 5.11
Is c
σ2 a biased estimator of σ2 ? If so, what is its bias? Hint: Relate c
σ2 to
2
the sample variance S .
21
Point estimation
fs (θe1 )
fs (θe3 )
θ = E(θe1 ) = E(θe3 )
Exercise 5.12
22
2 Properties of estimators
Here, ℓ(θ | X1 , X2 , . . . , Xn ) is once more the log-likelihood function, It is useful here to use this more
precise notation for the
d log-likelihood and its derivative.
ℓ′ (θ | X1 , X2 , . . . , Xn ) =
{ℓ(θ | X1 , X2 , . . . , Xn )},
dθ
and E is the expectation over the distribution of X1 , X2 , . . . , Xn .
23
Point estimation
where
d
Ui = log f (Xi |θ), i = 1, 2, . . . , n.
dθ
Notice that U1 , U2 , . . . , Un are independent random variables because
X1 , X2 , . . . , Xn are.
An important subsidiary result is that E(Ui ) = 0, i = 1, 2, . . . , n, as the
following shows.
d
E(Ui ) = E log f (Xi |θ)
dθ Expectation
Z
d Chain rule
= {log f (x|θ)} f (x|θ) dx
dθ
Z
d 1
= {f (x|θ)} f (x|θ) dx
dθ f (x|θ)
(using the chain rule)
Z
d
= {f (x|θ) dx}
dθ
Z
d
= f (x|θ) dx
dθ
(swapping the order of integration and differentiation)
d
= (1)
dθ
(since f (x|θ) is a density)
= 0.
(The regularity conditions mentioned above ensure the validity of
swapping the order of integration and differentiation.)
The following exercise will provide results that lead to the restatement of
the CRLB that follows it.
Exercise 5.13
Recall that
n
X
′
φ = ℓ (θ | X1 , X2 , . . . , Xn ) = Ui
i=1
and that U1 , U2 , . . . , Un are independent with
d
Ui = log f (Xi |θ).
dθ
(a) Show that E(φ) = 0.
(b) Write σ2U = V (Ui ), i = 1, 2, . . . , n. Show that V (φ) = nσ2U .
(c) Show that
" 2 #
d
σ2U = E log f (X|θ) ,
dθ
where X is a random variable from the distribution with pdf f (x|θ).
24
2 Properties of estimators
2.4.2 Examples
Example 5.7 The CRLB for estimating the parameter of the Poisson
distribution
Suppose that X1 , X2 , . . . , Xn is a set of independent random variables each
arising from the same Poisson distribution with parameter µ.
To compute the CRLB, start from the pmf of the Poisson distribution,
µx e−µ
f (x|µ) = .
x!
Take logs:
log f (x|µ) = x log µ − µ − log(x!).
Differentiate with respect to the parameter µ:
d x
log f (x|µ) = − 1.
dµ µ
Square this:
2 2
d x 1
log f (x|µ) = − 1 = 2 (x − µ)2 .
dµ µ µ
Finally, take the expectation:
" 2 #
d 1
E log f (X|µ) = 2 E{(X − µ)2 }
dµ µ
1 1 1
= V (X) = 2 µ = .
µ2 µ µ
Here, E(X) = V (X) = µ for the Poisson distribution has been used. The
CRLB is the reciprocal of n times this value, i.e. 1/(n/µ) = µ/n.
25
Point estimation
Exercise 5.14
Exercise 5.15
f (x|p) = px (1 − p)1−x .
Calculate the CRLB for unbiased estimators of p.
26
Point estimation
Solutions
Solution 5.1
The likelihood, L(p), for p given that X = x = 54 is the pmf when x = 54:
100 54 100−54 100 54
L(p) = f (54|p) = p (1 − p) = p (1 − p)46 ,
54 54
thought of as a function of p.
Solution 5.2
(a) L(p) = f (x|p) = Cpx (1 − p)n−x .
(b) ℓ(p) = log C + x log p + (n − x) log(1 − p).
x n−x x(1 − p) − (n − x)p x − np
(c) ℓ′ (p) = − = = ,
p 1−p p(1 − p) p(1 − p)
which equals zero when x = np and hence p = x/n. The candidate
formula for the maximum likelihood estimator is therefore pb = X/n.
x n−x
(d) ℓ′′ (p) = − 2 − .
p (1 − p)2
Noting that
x n−x
1 − pb = 1 − = ,
n n
x xn2 (n − x)n2 n2 n2
ℓ′′ (b
p) = ℓ′′ =− − = − − < 0.
n x2 (n − x)2 x (n − x)
(Again, in this case, ℓ′′ (p) < 0 for all 0 < p < 1.) Therefore, pb = x/n
maximises the log-likelihood.
(e) The maximum likelihood estimator of p is pb = X/n. pb is the number
of heads divided by the total number of coin tosses, or pb is the
proportion of heads observed in the experiment.
Solution 5.3
( n
) n
Y X
ℓ(θ) = log L(θ) = log f (xi |θ) = log f (xi |θ).
i=1 i=1 Manipulating logs
Solution 5.4
(a) log f (xi |µ) = log(e−µ ) + log(µxi ) − log(xi !)
= −µ + xi log µ − log(xi !),
so that
n
X n
X n
X
ℓ(µ) = log f (xi |µ) = −nµ + xi log µ − log(xi !)
i=1 i=1 i=1
= −nµ + nx log µ − C,
as required.
′ d nx x
(b) ℓ (µ) = (−nµ + nx log µ − C) = −n + =n −1 .
dµ µ µ
Setting this equal to zero yields the requirement that x/µ = 1, which
is equivalent to µ = x. Hence the candidate MLE is µ b = X.
d x x
(c) ℓ′′ (µ) = n −1 = −n 2 ,
dµ µ µ
27
Solutions
so that
x n
ℓ′′ (b
µ) = −n2 = − < 0.
x x
The negativity is because each of x1 , x2 , . . . , xn is positive and hence x
b = X is indeed the MLE of µ.
is positive. Therefore, µ
713 × 0 + 299 × 1 + 66 × 2 + 16 × 3 + 1 × 4
(d) b=x=
µ
1095
299 + 132 + 48 + 4 483
= = = 0.441
1095 1095
correct to three decimal places.
Thus, the estimated murder rate in London is 0.441 murders per day
or, more roughly, about 1 murder every two days.
Solution 5.5
√ (xi − µ)2
(a) log f (xi |µ) = − log( 2πσ0 ) − ,
2σ20
so that
n
X
ℓ(µ) = log f (xi |µ)
i=1
Xn
√ (xi − µ)2
= −n log( 2πσ0 ) −
i=1
2σ20
n
1 X
=C− (xi − µ)2 ,
2σ20 i=1
as required.
n
′ 2 X 1 n
(b) ℓ (µ) = 2 (xi − µ) = 2 (nx − nµ) = 2 (x − µ).
2σ0 σ0 σ0
i=1
Solution 5.6
(a) As in Exercise 5.5(a), with relabelling of parameters,
Xn
√ (xi − µ0 )2
ℓ(σ) = −n log( 2πσ) − 2
,
i=1
2σ
which can be written as
n
√ 1 X
−n log( 2π) − n log σ − 2 (xi − µ0 )2
2σ i=1
n
1 X
= C1 − n log σ − 2 (xi − µ0 )2 ,
2σ i=1
as required.
(b) Differentiating with respect to σ,
n
n 2 X
ℓ′ (σ) = − + 3 (xi − µ0 )2 .
σ 2σ i=1
28
Point estimation
so that
n
2 1X
σ = (xi − µ0 )2
n
i=1
Solution 5.7
(a) The pdf of the uniform distribution on (0, θ) is
1
f (x|θ) = on 0 < x < θ.
θ
The likelihood is
Yn
L(θ) = f (xi |θ).
i=1
Now, if θ is less than or equal to any of the x’s, the corresponding pdf
is zero and so, therefore, is the likelihood. Only if θ is greater than all
of the x’s, and in particular greater than the maximum data value,
xmax , say, is each pdf in the likelihood equal to 1/θ and hence
1
L(θ) = on θ > xmax .
θn
(b) For θ > xmax ,
n
L′ (θ) = − n+1 .
θ
This is negative for all finite positive θ and, in particular, is never
zero. Also,
n(n + 1)
L′′ (θ) = ,
θn+2
and its limit as θ → ∞ is zero.
29
Solutions
0 xmax θ
Solution 5.8
√
(a) Set σ = v in the log-likelihood
√ given in Exercise 5.6(a) and
1
remember that log v = 2 log v. The desired formula then follows.
n
n 1 X
(b) ℓ′ (v) = − + 2 (xi − µ0 )2 .
2v 2v i=1
which is satisfied by
n
1X
v= (xi − µ0 )2 .
n i=1
That is, the candidate MLE of v is
n
1X
vb = (Xi − µ0 )2 .
n
i=1
n
n 2 X n n
(c) ′′
ℓ (v) = 2 − 3 (xi − µ0 )2 = 2 − 3 S0 ,
2v 2v i=1 2v v
Pn
where S0 = n1 i=1 (xi − µ0 )2 (as in Exercise 5.6). Now, since vb = S0 ,
n n n
ℓ′′ (b
v) = 2
− 3 S0 = − 2 < 0.
2S0 S0 2S0
Therefore, vb is indeed the MLE of v.
( n )1/2
1X
(d) b=
σ (Xi − µ0 )2 ,
n i=1
so that
n
1X
σ)2 =
(b (Xi − µ0 )2 .
n i=1
But you have just shown that
X n
c2 = 1
vb = σ (Xi − µ0 )2 .
n i=1
Solution 5.9
(a) For the Poisson distribution,
µ0 e−µ
θ = P (X = 0) = = e−µ .
0!
30
Point estimation
b = X. Therefore
(b) From Exercise 5.4, µ
b
θ = e−bµ = e−X .
Solution 5.10
n
! n
1X 1 X 1 2 σ2
V (X) = V Xi = V (X i ) = nσ = ,
n i=1 n2 i=1 n2 n
Solution 5.11
c2 = n − 1 S 2 ,
σ
n
so
c2 ) = n−1 n−1 2
E(σ E(S 2 ) = σ .
n n
c2 is a biased estimator of σ2 . Its bias is
Therefore, σ
B(σc2 ) = E(σc2 ) − σ2 = n − 1 σ2 − σ2 = − 1 σ2 .
n n
Solution 5.12
(a) E(X1 ) = µ, so X1 is an unbiased estimator of µ. Its variance is
V (X1) = σ2 .
σ2
(b) V (X n ) = < σ2 = V (X1 )
n
for all n ≥ 2.
σ2 σ2
(c) V (X m ) = > = V (X n ),
m n
since m < n. The lazy statistician’s estimator has greater variability
than X n . He will have to admit defeat and choose m = n.
Solution 5.13
(a) Using the result (just shown) that E(Ui ) = 0,
n
! n n Linearity of expectation
X X X
E(φ) = E Ui = E(Ui ) = 0 = 0.
i=1 i=1 i=1
n
! n n
X X X
(b) V (φ) = V Ui = V (Ui ) = σ2U = nσ2U ,
i=1 i=1 i=1 Variance of a sum of
independent random variables
because U1 , U2 , . . . , Un are independent.
" 2 #
d
(c) σ2U = V (Ui ) = E(Ui2 ) − {E(Ui )} = E 2
log f (Xi |θ) ,
dθ
since E(Ui ) = 0. This equals the required formula because each Xi
has the same distribution as X.
Solution 5.14
First, take logs:
log f (x|λ) = log λ − λx.
Then differentiate with respect to the parameter λ:
d 1
log f (x|λ) = − x.
dλ λ
31
Solutions
Square this:
2 2
d 1
log f (x|λ) = −x .
dλ λ
Finally, take the expectation:
" 2 # ( 2 )
d 1
E log f (X|λ) =E −X
dλ λ
( 2 )
1
=E X−
λ
= V (X)
1
= 2,
λ
since E(X) = 1/λ and V (X) = 1/λ2 . The CRLB is the reciprocal of n
times this value, that is, 1/(n/λ2 ) = λ2 /n.
Solution 5.15
First, take logs:
log f (x|p) = x log p + (1 − x) log(1 − p).
Then differentiate with respect to the parameter p:
d x 1−x
log f (x|p) = −
dp p 1−p
x(1 − p) − (1 − x)p
=
p(1 − p)
x−p
= .
p(1 − p)
Square this:
2
d (x − p)2
log f (x|p) = 2 .
dp p (1 − p)2
Finally, take the expectation:
" 2 #
d E(X − p)2
E log f (X|p) = 2
dp p (1 − p)2
V (X)
=
p2 (1 − p)2
p(1 − p)
=
p2 (1 − p)2
1
= ,
p(1 − p)
since E(X) = p and V (X) = p(1 − p). So the CRLB is p(1 − p)/n.
32