Intuition: The mean, variance, and autocorrelations de-
1. Transform the data, if necessary, so that the assump- fine the properties of an ARMA(p,q) model. A natural
tion of covariance stationarity is a reasonable one way to identify an ARMA model is to match the pattern
of the observed (sample) autocorrelations with the pat-
2. Make an initial guess for the values of p and q terns of the theoretical autocorrelations of a particular
ARMA(p, q) model.
3. Estimate the parameters of the proposed ARMA(p, q)
model Sample autocovariances/autocorrelations
1 X T 1 XT
4. Perform diagnostic analysis to confirm that the pro- γ̂ j = (yt − μ̂)(yt−j − μ̂), μ̂ = yt
T t=j+1 T t=1
posed model adequately describes the data (e.g. examine
γ̂ j
residuals from fitted model) ρ̂j =
γ̂ 0
Sample autocorrelation function (SACF)/correlogram
Yt − μ = φ1(Yt−1 − μ) + φ2(Yt−2 − μ) + εt
The kth order partial autocorrelation of Xt = Yt − μ is
φ11 6= 0, φ22 = φ2, φkk = 0 for k > 2
the partial regression coefficient (in the population) φkk
in the kth order autoregression Example: MA(1)
Xt = φ1k Xt−1 + φ2k Xt−2 + · · · + φkk Xt−k + errort Yt = μ + εt + θεt−1, |θ| < 1 (invertible)
PACF: Since |θ| < 1, Yt has an AR(∞) representation
plot φkk vs. k (Yt − μ) = φ1(Yt−1 − μ) + φ2(Yt−2 − μ) + · · ·
Sample PACF (SPACF) φj = −(−θ)j
where φkk is estimated from an AR(k) model for Yt. φkk 6= 0 for all k, and φkk → 0 as k → ∞
ln(σ̂ 2) measures overall fit Result: If the true values of p and q satisfy p ≤ P and
2(p + q) ln(T )(p + q) q ≤ Q then
, penalty terms for large models
Note: BIC penalizes larger models more than AIC. 1. BIC picks the true model with probability 1 as T → ∞
The exact log-likelihood function may then be expressed The second type is based on maximizing the exact log-
as likelihood function. These estimates are called exact
T mles, and are defined by
ln L(θ|y) = ln f (yt|It−1, θ)+ln f (yp, . . . , y1; θ) T
t=p+1 θ̂mle = arg max ln f (yt|It−1, θ)+ln f (yp, . . . , y1; θ)
θ t=p+1
The conditional log-likelihood is
ln L(θ|y) = ln f (yt|It−1, θ) Result: For stationary models, θ̂cmle and θ̂mle are con-
sistent and have the same limiting normal distribution.
In finite samples, however, θ̂cmle and θ̂mle are generally
not equal and my differ by a substantial amount if the
data are close to being non-stationary or non-invertible.
Example: MLE for stationary AR(1) To determine the marginal density for the initial value y1,
recall that for a stationary AR(1) process
yt = c + φyt−1 + εt, εt ∼ iid N (0, σ 2), t = 1, . . . , T
θ = (c, φ, σ 2)0, |φ| < 1 E[y1] = μ =
Conditional on It−1 σ 2
var(y1) =
1 − φ2
yt|It−1 ∼ N (c + φyt−1, σ 2), t = 2, . . . , T
It follows that
which only depends on yt−1. The conditional density à !
c σ2
f (yt|It−1, θ) is then y1 ∼ N ,
1 − φ 1 − φ2
µ ¶ Ã !−1/2 ⎛ Ã !2⎞
2 −1/2 1 2
f (yt|yt−1, θ) = (2πσ ) exp − 2 (yt − c − φyt−1) , 2πσ 2 1 − φ2 c
2σ f (y1; θ) = exp ⎝− y1 − ⎠
t = 2, . . . , T 1 − φ2 2σ 2 1−φ
The conditional log-likelihood function is The marginal log-likelihood for the initial value y1 is
à !
X −(T − 1) (T − 1) 1 1 σ2
ln f (yt|yt−1, θ) = ln(2π) − ln(σ 2) ln f (y1; θ) = − ln(2π) − ln
2 2 2 2 1 − φ2
t=2 Ã !2
1 XT 1 − φ2 c
− 2 (yt − c − φyt−1)2 − y1 −
2σ t=2 2σ 2 1−φ
The exact log-likelihood function is then
Notice that the conditional log-likelihood function has the
à !
form of the log-likelihood function for a linear regression T 1 σ2
ln L(θ|y) = − ln(2π) − ln
model with normal errors 2 2 1 − φ2
à !2
yt = c + φyt−1 + εt, t = 2, . . . , T 1 − φ2 c
− 2
y1 −
2σ 1−φ
εt ∼ iid N (0, σ 2)
(T − 1) 1 X
It follows that − ln(σ 2) − 2 (yt − c − φyt−1)2
2 2σ t=2
ĉcmle = ĉols
φ̂cmle = φ̂ols
σ̂ 2cmle = (T − 1)−1 (yt − ĉcmle − φ̂cmleyt−1)2
Remarks Prediction Error Decomposition of Log-Likelihood
1. The exact log-likelihood function is a non-linear func- To illustrate this algorithm, consider the simple AR(1)
model. Recall,
tion of the parameters θ, and so there is no closed form
solution for the exact mles. yt|It−1 ∼ N (c + φyt−1, σ 2), t = 2, . . . , T
from which it follows that
2. The exact mles must be determined by numerically
E[yt|It−1] = c + φyt−1
maximizing the exact log-likelihood function. Usually, a
var(yt|It−1) = σ 2
Newton-Raphson type algorithm is used for the maxi-
mization which leads to the interative scheme The 1-step ahead prediction errors may then be defined
θ̂mle,n = θ̂mle,n−1 − Ĥ(θ̂mle,n−1)−1ŝ(θ̂mle,n−1)
vt = yt − E[yt|It−1] = yt − c + φyt−1, t = 2, . . . T
where Ĥ(θ̂) is an estimate of the Hessian matrix (2nd The variance of the prediction error at time t is
derivative of the log-likelihood function), and ŝ(θ̂) is an
ft = var(vt) = var(εt) = σ 2, t = 2, . . . T
estimate of the score vector (1st derivative of the log-
likelihood function). For the initial value, the first prediction error and its vari-
ance are
3. The estimates of the Hessian and Score may be com- v1 = y1 − E[y1] = y1 −
puted numerically (using numerical derivative routines) or 2
they may be computed analytically (if analytic derivatives f1 = var(v1) =
1 − φ2
are known).
Using the prediction errors and the prediction error vari- 2. With above simplification, σ 2 may be concentrated
ances, the exact log-likelihood function may be re-expressed out of the log-likelihood. That is,
as T v2
∂ ln L(θ|y) 2 1 X t
T T = 0 ⇒ σ̂ mle(c, φ) =
T 1X 1X vt2 ∂σ 2 T t=1 ft∗
ln L(θ|y) = − ln(2π) − ln ft −
2 2 t=1 2 t=1 ft
Substituting σ̂ 2mle(c, φ) back into ln L(θ|y) gives the
which is the prediction error decomposition. concentrated log-likelihood
T 1 T
Remarks T
ln Lc(c, φ|y) = − ln(2π+1)− ln σ̂ 2mle(c, φ)− ln ft∗
2 2 2 t=1
1. A further simplification may be achieved by writing Maximizing ln Lc(c, φ|y) gives the mles for c and φ.
Maximizing ln Lc(c, φ|y) is faster than maximizing ln L(θ|y)
var(vt) = σ 2ft∗
and is more numerically stable.
= σ2 · for t = 1
1 − φ2
= σ 2 · 1 for t > 1 3. For general time series models, the prediction error
decomposition may be conveniently computed as a by
That is ft∗ = 1/(1−φ2) for t = 1 and ft∗ = 1 for t > 1. product of the Kalman filter algorithm if the time series
Then the log-likelihood becomes model can be cast in state space form.
T T 1X 1 X vt2
ln L(θ|y) = − ln(2π)− ln σ 2− ln ft∗− 2
2 2 2 t=1 2σ t=1 ft∗
Diagnostics of Fitted ARMA Model