Econ 512 Box Jenkins Slides

Box-Jenkins Analysis of ARMA(p,q) Models
Eric Zivot
April 7, 2011
Box-Jenkins Modeling Strategy for Fitting ARMA(p, q) Models
1. Transform the data, if necessary, so that the assumption of covariance

stationarity is a reasonable one
2. Make an initial guess for the values of p and q
3. Estimate the parameters of the proposed ARMA(p, q) model
4. Perform diagnostic analysis to confirm that the proposed model adequately

describes the data (e.g. examine residuals from fitted model)
Identification of Stationary ARMA(p,q) Processes
Intuition: The mean, variance, and autocorrelations define the properties of

an ARMA(p,q) model. A natural way to identify an ARMA model is to match
the pattern of the observed (sample) autocorrelations with the patterns of the
theoretical autocorrelations of a particular ARMA(p, q) model.
Sample autocovariances/autocorrelations
1 X T 1 XT
j = (yt )(ytj ), = yt
T t=j+1 T t=1
j
j =
0
Sample autocorrelation function (SACF)/correlogram
plot j vs. j
Result: If Yt W N (0, 2) then j = 0 for all j and
d
T j N (0, 1)
so that
1
avar(j ) =
T
Therefore, a simple tstatistic for H0 : j = 0 is

T j
and we reject H0 : j = 0 if

|j | > 1.96 T
Remark: 1, . . . , k are asymptotically independent:
d
T k N (0, Ik )
where k = (1, . . . , k )0.
Box-Pierce and Box-Ljung Q-statistics
The joint statistical significance of 1, . . . , k may be tested using the Box-

Pierce Portmanteau statistic
k
X
Q(k) = T 2j
j=1
Assuming Yt W N (0, 2) it is straightforward to show that

d
Q(k) 2(k)
Reject H0 : 1 = = k = 0 if
Q(k) > 20.95(k)

Remark: Box and Ljung show that a simple degrees-of-freedom adjustment
improves the finite sample performance:
k
X 2j d
Q(k) = T (T + 2) 2(k)
j=1 T j
Partial Autocorrelation Function (PACF)
The kth order partial autocorrelation of Xt = Yt is the partial regression

coecient (in the population) kk in the kth order autoregression
Xt = 1k Xt1 + 2k Xt2 + + kk Xtk + errort

PACF:
plot kk vs. k
Sample PACF (SPACF)
plot kk vs. k
where kk is estimated from an AR(k) model for Yt.
Example: AR(2)
Yt = 1(Yt1 ) + 2(Yt2 ) + t
11 6= 0, 22 = 2, kk = 0 for k > 2
Example: MA(1)
Yt = + t + t1, || < 1 (invertible)

Since || < 1, Yt has an AR() representation
(Yt ) = 1(Yt1 ) + 2(Yt2 ) +

j = ()j
Therefore,
kk 6= 0 for all k, and kk 0 as k
Results:
1. AR(p) processes: kk 6= 0 for k p, kk = 0 for k > p
2. MA(q) processes: kk 6= 0 for all k, and 0 as k

Inspection of the SACF and SPACF to identify ARMA models is somewhat of
an art rather than a science. A more rigorous procedure to identify an ARMA
model is to use formal model selection criteria. The two most widely used
criteria are the Akaike information criterion (AIC) and the Bayesian (Schwarz)
criterion (BIC or SIC)
2(p + q)
AIC(p, q) = ln( 2) +
T
2 ln(T )(p + q)
BIC(p, q) = ln( ) +
T
2 = estimate of 2 from ARMA(p, q)
Intuition: Think of adjusted R2 in regression:
ln( 2) measures overall fit
2(p + q) ln(T )(p + q)
, penalty terms for large models
T T
Note: BIC penalizes larger models more than AIC.
How to use AIC, BIC to identify an ARMA(p,q) model
1. Set upper bounds, P and Q for the AR and MA order, respectively
2. Fit all possible ARMA(p, q) models for p P and q Q using a common

sample size T
3. The best models satisfy

min AIC(p, q)
pP,qQ
min BIC(p, q)
pP,qQ
Result: If the true values of p and q satisfy p P and q Q then
1. BIC picks the true model with probability 1 as T
2. AIC picks largers values of p and q with positive probability as T

Maximum Likelihood Estimation of ARMA models
For iid data with marginal pdf f (yt; ), the joint pdf for a sample y =
(y1, . . . , yT ) is
T
Y
f (y; ) = f (y1, . . . , yT ; ) = f (yt; )
t=1
The likelihood function is this joint density treated as a function of the para-
meters given the data y :
T
Y
L(|y) = L(|y1, . . . , yT ) = f (yt; )
t=1
The log-likelihood is
T
X
ln L(|y) = ln f (yt; )
t=1
Problem: For a sample from a covariance stationary time series {yt}, the con-
struction of the log-likelihood given above doesnt work because the random
variables in the sample (y1, . . . , yT ) are not iid.
One Solution: Conditional factorization of log-likelihood
Intiution: Consider the joint density of two adjacent observations f (y2, y1; ).
The joint density can always be factored as the product of the conditional
density of y2 given y1 and the marginal density of y1 :
f (y2, y1; ) = f (y2|y1; )f (y1; )

For three observations, the factorization becomes
f (y3, y2, y1; ) = f (y3|y2, y1; )f (y2|y1; )f (y1; )

In general, the conditional marginal factorization has the form

T
Y
f (yT , . . . , y1; ) = f (yt|It1, ) f (yp, . . . , y1; )
t=p+1
It = {yt, . . . , y1} = info availble at time t
yp, . . . , y1 = initial values
The exact log-likelihood function may then be expressed as
T
X
ln L(|y) = ln f (yt|It1, ) + ln f (yp, . . . , y1; )
t=p+1
The conditional log-likelihood is
T
X
ln L(|y) = ln f (yt|It1, )
t=p+1
Two types of maximum likelihood estimates (mles) may be computed.
The first type is based on maximizing the conditional log-likelihood function.

These estimates are called conditional mles and are defined by
T
X
cmle = arg max ln f (yt|It1, )
t=p+1
The second type is based on maximizing the exact log-likelihood function.
These estimates are called exact mles, and are defined by
T
X
mle = arg max ln f (yt|It1, ) + ln f (yp, . . . , y1; )
t=p+1
Result: For stationary models, cmle and mle are consistent and have the
same limiting normal distribution:
cmle, mle N (, T 1V)
In finite samples, however, cmle and mle are generally not equal and my
dier by a substantial amount if the data are close to being non-stationary or
non-invertible.
Example: MLE for stationary AR(1)
yt = c + yt1 + t, t iid N (0, 2), t = 1, . . . , T

= (c, , 2)0, || < 1
Conditional on It1
yt|It1 N (c + yt1, 2), t = 2, . . . , T

which only depends on yt1. The conditional density f (yt|It1, ) is then

2 1
f (yt|yt1, ) = (2 ) 1/2 exp 2 (yt c yt1)2 ,
2
t = 2, . . . , T
To determine the marginal density for the initial value y1, recall that for a
stationary AR(1) process
c
E[y1] = =
1
2
var(y1) =
1 2
It follows that
!
c 2
y1 N ,
1 1 2
!1/2 !2
2 2 1 2 c
f (y1; ) = exp y1
1 2 2 2 1
The conditional log-likelihood function is
T
X (T 1) (T 1)
ln f (yt|yt1, ) = ln(2) ln( 2)
t=2 2 2
1 XT
2 (yt c yt1)2
2 t=2
Notice that the conditional log-likelihood function has the form of the log-
likelihood function for a linear regression model with normal errors
yt = c + yt1 + t, t = 2, . . . , T
t iid N (0, 2)
It follows that
ccmle = cols
cmle = ols
T
X
2cmle = (T 1)1 (yt ccmle cmleyt1)2
t=2
The marginal log-likelihood for the initial value y1 is
!
1 1 2
ln f (y1; ) = ln(2) ln
2 2 1 2
!2
1 2 c
y1
2 2 1
The exact log-likelihood function is then
!
T 1 2
ln L(|y) = ln(2) ln
2 2 1 2
!2
1 2 c
y1
2 2 1
(T 1) 1 XT
2
ln( ) 2 (yt c yt1)2
2 2 t=2
Remarks
1. The exact log-likelihood function is a non-linear function of the parameters

, and so there is no closed form solution for the exact mles.
2. The exact mles must be determined by numerically maximizing the exact

log-likelihood function. Usually, a Newton-Raphson type algorithm is used for
the maximization which leads to the interative scheme
mle,n = mle,n1 H(mle,n1)1s(mle,n1)
where H() is an estimate of the Hessian matrix (2nd derivative of the log-
likelihood function), and s() is an estimate of the score vector (1st derivative
of the log-likelihood function).
3. The estimates of the Hessian and Score may be computed numerically (using
numerical derivative routines) or they may be computed analytically (if analytic
derivatives are known).
Prediction Error Decomposition of Log-Likelihood
To illustrate this algorithm, consider the simple AR(1) model. Recall,
yt|It1 N (c + yt1, 2), t = 2, . . . , T

from which it follows that
E[yt|It1] = c + yt1
var(yt|It1) = 2
The 1-step ahead prediction errors may then be defined as
vt = yt E[yt|It1] = yt c + yt1, t = 2, . . . T
The variance of the prediction error at time t is
ft = var(vt) = var(t) = 2, t = 2, . . . T
For the initial value, the first prediction error and its variance are
c
v1 = y1 E[y1] = y1
1
2
f1 = var(v1) =
1 2
Using the prediction errors and the prediction error variances, the exact log-
likelihood function may be re-expressed as
T T
T 1X 1X vt2
ln L(|y) = ln(2) ln ft
2 2 t=1 2 t=1 ft
which is the prediction error decomposition.
Remarks
1. A further simplification may be achieved by writing
var(vt) = 2ft
1
= 2 2 for t = 1
1
= 2 1 for t > 1
That is ft = 1/(1 2) for t = 1 and ft = 1 for t > 1. Then the log-
likelihood becomes
T 1X T 1 XT
vt2
T 2
ln L(|y) = ln(2) ln ln ft 2
2 2 2 t=1 2 t=1 ft
2. With above simplification, 2 may be concentrated out of the log-likelihood.
That is,
T
ln L(|y) 2 1 X vt2
= 0 mle(c, ) =
2 T t=1 ft
Substituting 2mle(c, ) back into ln L(|y) gives the concentrated log-likelihood
T
c T T 2 1X
ln L (c, |y) = ln(2 + 1) ln mle(c, ) ln ft
2 2 2 t=1
Maximizing ln Lc(c, |y) gives the mles for c and . Maximizing ln Lc(c, |y)
is faster than maximizing ln L(|y) and is more numerically stable.
3. For general time series models, the prediction error decomposition may be
conveniently computed as a by product of the Kalman filter algorithm if the
time series model can be cast in state space form.
We will cover this algorithm later on in the course.

Diagnostics of Fitted ARMA Model
1. Compare theoretical ACF, PACF with SACF, SPACF
2. Examine autocorrelation properties of residuals
SACF, SPACF of residuals
LM test for serial correlation in residuals

Remarks:
1. If the Box-Ljung Q-statistic is used on the residuals from a fitted ARMA(p,q)

model, the degrees of freedom of the limiting chi-square distribution must be
adjusted for the number of estimated parameters:
d
Q(k) 2k(p+q)
Note that this test is only valid for k > (p + q). In finite samples, it may
perform badly if k is close to p + q.

Econ 512 Box Jenkins Slides

Uploaded by

Copyright:

Available Formats

Econ 512 Box Jenkins Slides

Uploaded by

Document Information

Original Description:

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Econ 512 Box Jenkins Slides

Uploaded by

Copyright:

Available Formats

Box-Jenkins Analysis of ARMA(p,q) Models

1. Transform the data, if necessary, so that the assumption of covariance

2. Make an initial guess for the values of p and q

3. Estimate the parameters of the proposed ARMA(p, q) model

4. Perform diagnostic analysis to confirm that the proposed model adequately

Intuition: The mean, variance, and autocorrelations define the properties of

The joint statistical significance of 1, . . . , k may be tested using the Box-

Assuming Yt W N (0, 2) it is straightforward to show that

Q(k) > 20.95(k)

The kth order partial autocorrelation of Xt = Yt is the partial regression

Xt = 1k Xt1 + 2k Xt2 + + kk Xtk + errort

Yt = + t + t1, || < 1 (invertible)

(Yt ) = 1(Yt1 ) + 2(Yt2 ) +

1. AR(p) processes: kk 6= 0 for k p, kk = 0 for k > p

2. MA(q) processes: kk 6= 0 for all k, and 0 as k

1. Set upper bounds, P and Q for the AR and MA order, respectively

2. Fit all possible ARMA(p, q) models for p P and q Q using a common

3. The best models satisfy

1. BIC picks the true model with probability 1 as T

2. AIC picks largers values of p and q with positive probability as T

One Solution: Conditional factorization of log-likelihood

f (y2, y1; ) = f (y2|y1; )f (y1; )

f (y3, y2, y1; ) = f (y3|y2, y1; )f (y2|y1; )f (y1; )

The first type is based on maximizing the conditional log-likelihood function.

cmle, mle N (, T 1V)

yt = c + yt1 + t, t iid N (0, 2), t = 1, . . . , T

yt|It1 N (c + yt1, 2), t = 2, . . . , T

1. The exact log-likelihood function is a non-linear function of the parameters

2. The exact mles must be determined by numerically maximizing the exact

To illustrate this algorithm, consider the simple AR(1) model. Recall,

yt|It1 N (c + yt1, 2), t = 2, . . . , T

1. A further simplification may be achieved by writing

We will cover this algorithm later on in the course.

1. Compare theoretical ACF, PACF with SACF, SPACF

2. Examine autocorrelation properties of residuals

SACF, SPACF of residuals

LM test for serial correlation in residuals

1. If the Box-Ljung Q-statistic is used on the residuals from a fitted ARMA(p,q)

You might also like