Econ 512 Box Jenkins Slides
Econ 512 Box Jenkins Slides
Econ 512 Box Jenkins Slides
Eric Zivot
April 7, 2011
Box-Jenkins Modeling Strategy for Fitting ARMA(p, q) Models
Sample autocovariances/autocorrelations
1 X T 1 XT
j = (yt )(ytj ), = yt
T t=j+1 T t=1
j
j =
0
Sample autocorrelation function (SACF)/correlogram
plot j vs. j
Result: If Yt W N (0, 2) then j = 0 for all j and
d
T j N (0, 1)
so that
1
avar(j ) =
T
Therefore, a simple tstatistic for H0 : j = 0 is
T j
and we reject H0 : j = 0 if
|j | > 1.96 T
Remark: 1, . . . , k are asymptotically independent:
d
T k N (0, Ik )
where k = (1, . . . , k )0.
Box-Pierce and Box-Ljung Q-statistics
Yt = 1(Yt1 ) + 2(Yt2 ) + t
11 6= 0, 22 = 2, kk = 0 for k > 2
Example: MA(1)
For iid data with marginal pdf f (yt; ), the joint pdf for a sample y =
(y1, . . . , yT ) is
T
Y
f (y; ) = f (y1, . . . , yT ; ) = f (yt; )
t=1
The likelihood function is this joint density treated as a function of the para-
meters given the data y :
T
Y
L(|y) = L(|y1, . . . , yT ) = f (yt; )
t=1
The log-likelihood is
T
X
ln L(|y) = ln f (yt; )
t=1
Problem: For a sample from a covariance stationary time series {yt}, the con-
struction of the log-likelihood given above doesnt work because the random
variables in the sample (y1, . . . , yT ) are not iid.
Intiution: Consider the joint density of two adjacent observations f (y2, y1; ).
The joint density can always be factored as the product of the conditional
density of y2 given y1 and the marginal density of y1 :
In finite samples, however, cmle and mle are generally not equal and my
dier by a substantial amount if the data are close to being non-stationary or
non-invertible.
Example: MLE for stationary AR(1)
yt = c + yt1 + t, t = 2, . . . , T
t iid N (0, 2)
It follows that
ccmle = cols
cmle = ols
T
X
2cmle = (T 1)1 (yt ccmle cmleyt1)2
t=2
The marginal log-likelihood for the initial value y1 is
!
1 1 2
ln f (y1; ) = ln(2) ln
2 2 1 2
!2
1 2 c
y1
2 2 1
The exact log-likelihood function is then
!
T 1 2
ln L(|y) = ln(2) ln
2 2 1 2
!2
1 2 c
y1
2 2 1
(T 1) 1 XT
2
ln( ) 2 (yt c yt1)2
2 2 t=2
Remarks
3. The estimates of the Hessian and Score may be computed numerically (using
numerical derivative routines) or they may be computed analytically (if analytic
derivatives are known).
Prediction Error Decomposition of Log-Likelihood
E[yt|It1] = c + yt1
var(yt|It1) = 2
The 1-step ahead prediction errors may then be defined as
vt = yt E[yt|It1] = yt c + yt1, t = 2, . . . T
The variance of the prediction error at time t is
ft = var(vt) = var(t) = 2, t = 2, . . . T
For the initial value, the first prediction error and its variance are
c
v1 = y1 E[y1] = y1
1
2
f1 = var(v1) =
1 2
Using the prediction errors and the prediction error variances, the exact log-
likelihood function may be re-expressed as
T T
T 1X 1X vt2
ln L(|y) = ln(2) ln ft
2 2 t=1 2 t=1 ft
which is the prediction error decomposition.
Remarks
var(vt) = 2ft
1
= 2 2 for t = 1
1
= 2 1 for t > 1
That is ft = 1/(1 2) for t = 1 and ft = 1 for t > 1. Then the log-
likelihood becomes
T 1X T 1 XT
vt2
T 2
ln L(|y) = ln(2) ln ln ft 2
2 2 2 t=1 2 t=1 ft
2. With above simplification, 2 may be concentrated out of the log-likelihood.
That is,
T
ln L(|y) 2 1 X vt2
= 0 mle(c, ) =
2 T t=1 ft
Substituting 2mle(c, ) back into ln L(|y) gives the concentrated log-likelihood
T
c T T 2 1X
ln L (c, |y) = ln(2 + 1) ln mle(c, ) ln ft
2 2 2 t=1
Maximizing ln Lc(c, |y) gives the mles for c and . Maximizing ln Lc(c, |y)
is faster than maximizing ln L(|y) and is more numerically stable.
3. For general time series models, the prediction error decomposition may be
conveniently computed as a by product of the Kalman filter algorithm if the
time series model can be cast in state space form.