0% found this document useful (0 votes)
461 views31 pages

Lecture Note 6 - Cointegration and Common Trends

This document discusses cointegration and common trends in econometrics. It defines cointegration as when a linear combination of non-stationary time series is stationary. This can occur when the time series share a common stochastic trend. The document provides examples to illustrate cointegration and how it allows variables to have an equilibrium relationship even if they are individually non-stationary. It also discusses how cointegration implies an error-correction mechanism that keeps deviations from the equilibrium bounded. Finally, it outlines some single-equation tools for analyzing cointegration such as the Engle-Granger two-step approach.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
461 views31 pages

Lecture Note 6 - Cointegration and Common Trends

This document discusses cointegration and common trends in econometrics. It defines cointegration as when a linear combination of non-stationary time series is stationary. This can occur when the time series share a common stochastic trend. The document provides examples to illustrate cointegration and how it allows variables to have an equilibrium relationship even if they are individually non-stationary. It also discusses how cointegration implies an error-correction mechanism that keeps deviations from the equilibrium bounded. Finally, it outlines some single-equation tools for analyzing cointegration such as the Engle-Granger two-step approach.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 31

COINTEGRATION

AND COMMON TRENDS


Econometrics C ¨ Lecture Note 6
Heino Bohn Nielsen
February 6, 2012

I
n this note we discuss some important issues in regression models for non-statio-
nary time series. It is illustrated how linear combinations of non-stationary time
series are non-stationary in general, and cointegration is defined as the special
case where a linear combination is stationary. We emphasize that relations between
non-stationary variables can only be interpreted as defining an equilibrium if the vari-
ables cointegrate, and we discuss error-correction as the force that sustain the equilib-
rium relation. We then present some single-equation tools for cointegration analysis,
e.g. the so-called Engle-Granger two-step procedure and cointegration analysis based
on unrestricted ADL models. We show how to estimate the cointegrating parameters
and how to test the hypothesis of no-cointegration. Towards the end of the note we
discuss some limitations of the single-equation approach.

Outline
§1 Unit-Root Time-Series and Cointegration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
§2 Estimation and Inference . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
§3 Testing for No-Cointegration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
§4 Limitations of the Single-Equation Approach. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
§5 Concluding Remarks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30

1
1 Unit-Root Time-Series and Cointegration
In this section we look at linear combinations of unit root non-stationary time series and
define the concept of cointegration. To simplify the notation we consider the case of  = 2
variables in most of the presentation below, but the discussion is easily extended to more
variables.
Let 1 and 2 be two time series that are integrated of first order, I(1). We can write
the two processes on the form

1 =  1 + stationary process + initial value (1)


2 =  2 + stationary process + initial value, (2)

where  1 and  2 are random walk components generated by unit roots. We often refer
to   as the stochastic trend of  .
Next define the linear combination,  :=  0  , where  is a vector of variables, and 
is a vector of weights in the linear combination, i.e.
à ! à !
1 1
 = and  = 
2 − 2

Inserting (1) and (2), we can write the linear combination as


à !
³ ´ 
0 1
 =   = 1 − 2 = 1 −  2 2
2
=  1 −  2  2 + stationary process + initial value. (3)

We note that  contains the random walk component,  1 −  2  2 , and in most cases 
will also be I(1). The result that a combination of I(1) variables is in general I(1) can
easily be extended to higher order of integration, and a combination of variables integrated
of order 2 (say), will also be I(2) in general.
An exception from this result is if there exist a vector, , so that  defined in (3) is
a stationary process. This property is denoted cointegration and the vector  is called a
cointegration vector. For cointegration we need   to be common, that is generated by
the same underlying random walk,  ∗ , i.e.

 1 = 1  ∗ and  2 = 2  ∗ 

If we choose  2 = 1 2 we have from (3) that

 = 1  ∗ − (1 2 ) 2  ∗ + stationary process + initial value.


| {z }
=0

The common stochastic trends cancel and  is a stationary process.

2
Example 1 (cointegrated processes): As an example of a data generating process
(DGP) that generates cointegrated variables, consider the following system:
∆2 = 2 (4)
1 =  2 2 + 1  (5)
where 1 and 2 are IID and uncorrelated error processes. We solve for the levels to find
X
2 =  2 + initial value = 2 + 20
=1
X
1 =  1 + initial value + stationary process =  2 2 +  2 20 + 1 
=1
Here  1 =  2  2 is a common stochastic trend and the processes cointegrate with cointe-
gration vector  = (1 : − 2 )0 . In particular we find
 =  0  = 1 
which is stationary. An economic example could be that income (2 ) develops as a random
walk process according to (4), while consumption (21 ) according to (5) is a linear function
of income plus a stationary noise term. The dynamics of both equations could of course
be more complicated. ¨

If we consider a regression type formulation,


1 =  +  2 2 +   (6)
where  is the mean of  = 1 −  2 2 , then cointegration implies that the deviation,
 , is a (mean zero) stationary process. It is important to realize, however, that if  =
 0  = 1 −  2 2 is a stationary process then so is e =  = 1 −  2 2 . This means
that for all values  6= 0, both
à ! à ! à !
1 e
 
= and  e= 1
=
− 2 e
 − 2
2

are cointegration vectors for the variables in  . In the first case, , we have imposed a
normalization on the first coefficient,  1 = 1. This normalization is natural if we have
a relation of the form (6) in mind, but we could equally well have chosen a different
¡ ¢
normalization, e.g.  e = − −1  1 0 , corresponding to an equation with 2 on the left
2
hand side.
The definition of cointegration is easily extended to more variables. In particular,
let  = (1  2  3    )0 be a −dimensional vector of variables. Then a vector
¡ ¢0
 = 1 − 2   −  is a cointegration vector if
 =  0  = 1 −  2 2 −  −   
is a stationary process. Note that with  I(1) variables there can be several (at most
 − 1) different cointegration vectors. This is not a problem for the theory, but the single-
equation tools presented in this note are only appropriate for the existence of a single
cointegration vector. As the number of variables, , increases it becomes less and less
likely that there is only one stationary combination.

3
1.1 Cointegration and Economic Equilibrium
Consider again a DGP as in Example 1:

∆2 = 2 (7)


1 =  +  2 2 +   (8)

where 2 is IID and  is a stationary process uncorrelated with 2 . Notice that the
individual variables, 1 and 2 , are I(1) non-stationary, while  0  is a stationary process.
An implication is that the shock 2 has permanent effects on the levels of both variables
but only transitory effects on  0  .
That makes it natural to think of the cointegrating relation (8) as defining an economic
equilibrium: The variables themselves wander arbitrarily far up and down due to the
accumulation of shocks to 2 , but they never deviate too much from equilibrium. When
the variables cointegrate, we can define ∗1 =  +  2 2 , and we will refer to ∗1 as
the equilibrium value of 1 , and  = 1 − ∗1 is the deviation from equilibrium. The
equilibrium value can be interpreted as the value at which there is no inherent tendency for
1 to move away, but it is important to realize that because the economy is continuously
hit by shocks, the system will never settle down at ∗1 , and 1 will not converge to ∗1
in any sense.

Example 2 (purchasing power parity): Let 1 = log( ) denote the log of the
bilateral exchange rate between Dollar and Euro (denominated as Dollar per Euro), and
let 2 = log( ) − log( ) denote the corresponding difference between the logs of
the consumer prices. Then
µ ¶
¡  
¢  · 
 = 1 − 2 = log( ) − log( ) − log( ) = log
 

is the relative deviation from purchasing power parity (PPP) between the US and the
Euro area. For most countries consumer prices and exchange rates appear non-stationary,
and if the deviation from PPP is stationary we can think of PPP as a valid equilibrium
relation for parity between US and the Euro area. In this case  = (1 −1)0 would be a
cointegrating vector for  = (1  2 )0 . If, on the other hand, the deviation,  , is non-
stationary, it means that the price differential can wander arbitrarily far from the PPP
value and there is no equilibrium interpretation of the PPP. ¨

Example 3 (prices on the orange market): As an empirical example, Figure 1 (A)


illustrates the price of organic and regular oranges, org
 and reg
 , in pence per lb., while
org reg
graph (B) illustrates the price differential,  −  . The individual prices in graph (A)
are obviously non-stationary and a possible interpretation is that the non-stationarity is
driven by stochastic trends. The prices show strong co-movements, however, and the price
differential looks much more stable and could be a sample path from a stationary process.

4
This suggests that the relation

org
 =  + reg
 +  

defines an equilibrium for the orange market, where  is the additional price of organic
oranges in equilibrium, and  is the deviation from equilibrium in period . Note again,
that org
 − 
reg
will not equal  in any specific period and org
 − 
reg
will not approach
 as  → ∞. The equilibrium concept refers to the fact that fluctuations of org  − 
reg

around  will be stationary as suggested by graph (B). ¨

Example 4 (private consumption): Similarly, Figure 1 (C) illustrates the log of real
private consumption in Denmark,  , the log of real disposable income,  , and the log of
real private wealth including the value of owner occupied housing,  (we have subtracted
2 from  in the graph to make the levels comparable). All three time series are clearly
trending. The series for consumption and income have many similarities and co-move
in some periods. Deviations from this pattern seem to occur primarily when there are
large fluctuations in private wealth. People familiar with the Danish business cycle will
recognize the peak in private wealth in 1986 as the result of a boom in the housing market,
which apparently drove up the consumption-to-income ratio. The time series behavior, as
well as simple economic theory, suggest that consumption depends on both income and
wealth, and graph (D) depicts the deviation,  =  − ∗ , from a simple consumption
function
 = −0404 + 0364 ·  + 0516 ·  +  

We note that the deviation,  , looks much more stable than the variables themselves,
suggesting that  = (1 −0364 −0516)0 may be a cointegrating vector for  = (     )0 .
Whether the deviation,  , actually corresponds to a stationary process is a testable
hypothesis to which we return in §3. ¨

Example 5 (money demand): To estimate a long-run money demand relation we may


consider the variables  = (       )0 , where  is the real money stock (in logs), 
is real income (in logs),  is the short interest rate as a measure of the yield of holding
money, while  is the bond rate measuring the yield on holdings alternative to money.
Some theories suggest that in the long run the demand for money is given by

 =  −  ( −  ) 

so that money demand increases with the amount of transactions, measured by  , and
decreases with the opportunity cost of holding money,  −  . This suggests that  =
(1 −1  −)0 could be a cointegrating vector for the variables in  .
Alternatively, theories for the determination of interest rates would suggest that two
interest rates with different maturities should be cointegrated, and also the velocity,  −

5
(A) Price of oranges (pence per lb.) (B) Price differential
Organic oranges 30
250 Regular oranges

200 25

150 20

100 15

50 10
0 50 100 150 0 50 100 150
(C) Real consumption and income, logs (D) Deviation from consumption function
6.50 Income
Consumption 0.05
Wealth (subtracted 2)

6.25
0.00

6.00
-0.05

1970 1980 1990 2000 1970 1980 1990 2000

Figure 1: Examples of some possibly cointegrated series. (A): Price of organic oranges,
org reg
 , and regular oranges,  , measured in pence per lb. (B): The price differential,
org reg
 −  . (C): Real aggregate consumption,  , disposable income,  , and private wealth,
 , in logs. (D): The linear combination,  =  − 0364 ·  − 0516 ·  + 0404.

 , may be stationary. That suggests a different scenario with two cointegration relations:
⎛ ⎞ ⎛ ⎞
 
⎜ ⎟ ⎜ ⎟
⎜  ⎟ ⎜  ⎟
 1  = (0 0 −1 1) ⎜
0 ⎟ 0 ⎜ ⎟
⎜  ⎟ =  −  and  2  = (1 −1 0 0) ⎜  ⎟ =  −  
⎝  ⎠ ⎝  ⎠
 

It is an empirical question, which of the scenarios (if any) that characterizes a data set,
but the single-equation tools presented in this note are only appropriate in the scenario
with one cointegrating relation. ¨

1.2 Deterministic Terms


In the definition of cointegration above we have assumed that the variables, 1 and 2 ,
are I(1) and that  =  0  is a stationary variable with mean [ ] = . The theory of
cointegration can easily be extended to other specifications of deterministic variables.
As an example we might believe that  is stationary around a deterministic linear
trend. This would be the case if 1 and 2 contain both deterministic and stochastic

6
trends, and that the linear combination,  0  , cancels the stochastic trends but not the
deterministic trends. To model this case we can extend (6) with a deterministic trend
term, e.g.
1 =  + 1  +  2 2 +  

The interpretation is that  = 1 −  2 2 is trend-stationary, i.e. stationary around the
linear trend,  + 1 . The deviation,  , is a mean zero stationary process.
Similarly, linear combinations could be stationary around other deterministic compo-
nents, e.g. level shifts.

1.3 How is the Equilibrium Sustained?


In the previous section we defined cointegration of variables,  = (1  2 )0 , as the ex-
istence of a vector  so that the combination,  =  0  , is a stationary process, and
we interpreted the relation as defining an equilibrium for the variables. Logically, an
equilibrium requires the existence of some forces in the DGP which ensures that the non-
stationary variables, 1 and 2 , do not move too far away from equilibrium. In this
section we present error-correction as a way of describing these forces, and we discuss
how cointegration and error-correction are two complementary ways of characterizing the
same phenomenon.
There exists a famous representation theorem, due to Engle and Granger (1987), sta-
ting that 1 and 2 cointegrate if and only if there exist an error correction model for
either 1 , 2 or both. To illustrate the link, let

1 =  +  2 2 + 

be an equilibrium relation between two I(1) variables. Since  is a stationary mean zero
variable, there exist a stationary ARMA model for  . Assume for simplicity that it is an
AR(2),
 = 1 −1 + 2 −2 +  

where (1) = 1 − 1 − 2  0 from stationarity. Inserting the definition of  , this is


equivalent to

(1 −  −  2 2 ) = 1 (1−1 −  −  2 2−1 ) + 2 (1−2 −  −  2 2−2 ) +  

or collecting terms:

1 = (1 − 1 − 2 ) + 1 1−1 + 2 1−2 +  2 2 − 1  2 2−1 − 2  2 2−2 +  

which is an autoregressive distributed lag model, ADL(2,2). This can also be written as
the error-correction model

∆1 =  2 ∆2 + 2  2 ∆2−1 − 2 ∆1−1 − (1 − 1 − 2 ) {1−1 −  −  2 2−1 } +   (9)

7
where the long-run solution is the lagged deviation from the cointegrating relation, −1 ,
and the error-correction parameter −(1 − 1 − 2 )  0 ensures that deviations from the
equilibrium are eliminated1 .
To intuitively understand the link between cointegration and error correction, notice
that under the maintained assumptions, ∆1 , ∆1−1 , ∆2 , ∆2−1 , and  are all sta-
tionary terms. Since 1 and 2 are assumed to be I(1), the equation in (9) is only
balanced in terms of the order of integration if the combination 1−1 −  2 2−1 is station-
ary, i.e. if the variables cointegrate. If the variables do not cointegrate, the only way to
balance the equation is to exclude the levels from the equation by setting (1 −1 −2 ) = 0.
The link between cointegration and error correction also emphasizes that cointegration
is essentially a system property; and from the result of the representation theorem we do
not know whether 1 or 2 or both variables error correct. This suggests that a general
formulation of the error-correction model consists of an equation for each variable, in our
case

∆1 =  1 + Γ11 ∆1−1 + Γ12 ∆2−1 + 1 (1−1 −  2 2−1 ) + 1


∆2 =  2 + Γ21 ∆1−1 + Γ22 ∆2−1 + 2 (1−1 −  2 2−1 ) + 2 

where one lag of each first difference has been included. Stacking the equations we may
write the model as the so-called vector error correction model,
à ! à ! à !à ! à ! à !
∆1 1 Γ11 Γ12 ∆1−1 1 1
= + + (1−1 −  2 2−1 )+ 
∆2 2 Γ21 Γ22 ∆2−1 2 2
or
∆ =  + Γ∆−1 +  0 −1 +  

where we have used the definitions


à ! à ! à ! à !
1 Γ11 Γ12 1 1
=  Γ=  =  and  = 
2 Γ21 Γ22 2 − 2

We note that the lagged deviation from the cointegrating relation,  0 −1 = 1−1 −
 2 2−1 , appears as an explanatory variable in both equations. For 1 to error correct we
need 1  0. To see this, imagine that 1−1 is above equilibrium so that 1−1 − 2 2−1 is
positive. For 1 to move towards the equilibrium we need ∆1  0, which requires 1 
0. If 1 error corrects, the magnitude of 1 measures the proportion of the deviation that
is corrected each period, and 1 is sometimes referred to as the speed of adjustment. As
an example, a value of 1 = −04 would indicate that 40% of a deviation from equilibrium
is removed each period. Using the same line of arguments, 2  0 is consistent with error
correction of 2 .
1
The simple assumptions used in the present derivation impose a common factor restriction on (9) but
that is not necessarily true in practice.

8
To illustrate the graphical implications of cointegration and error correction we con-
sider a simple model for two cointegrated variables,
à ! à ! à !
∆1 −02 1
= (1−1 − 2−1 ) +  (10)
∆2 01 2

where 1 and 2 are independent standard normals,  (0 1). Here  = (1 −1)0 is a
cointegrating vector and both variables error correct, with speeds of adjustment given
by  = (−02 01)0 . One realization of 1 and 2 ( = 1 2  100) generated from the
DGP in (10) is illustrated in Figure 2 (A). Notice the strong co-movement between the
variables, which reflects that they have the same stochastic trend. Graph (B) depicts the
deviation from the long-run relation,

 =  0  = 1 − 2 

The series  is relatively persistent and is often above or below equilibrium for longer
periods of time. This illustrates the moderately slow error-correction in (10). In graph
(C) we illustrate the speed of adjustment. We consider a large deviation  =  0  =
10 in a particular period and show the adjustment towards equilibrium in a situation
where no shocks hit the system. In the present case the deviation from  0  is visible
for approximately 10 periods and the convergence is exponential. It is the equilibrating
force in graph (C) that ensures that the levels in graph (A) do not move to far apart.
Finally graph (D) depicts a cross plot of 1 on 2 . The variables are non-stationary
and will wander arbitrarily on the real axis. Cointegration (i.e. the force implied by
error-correction) implies that the observations will never move to far from the equilibrium
defined by the straight line. Finally, observe that the most recent observation is far from
equilibrium,  0 100  0. If we were to make an out-of-sample forecast of the series,
101  102   then we would conjecture that  would be drawn towards equilibrium, i.e.
that either 1 would decrease or that 2 would increase to close the gap.

Example 6 (prices on the orange market, continued): For the case of the organic
and regular oranges, an estimation yields the two error-correction equations
¡ reg ¢
∆org
 = 22864 − 1090 · org org
−1 − −1 + ̂
(1665) (0081)
reg ¡ reg ¢
∆ = 1147 − 0008 · org reg
−1 − −1 + ̂ 
(0634) (0031)

where the numbers in parentheses are standard errors of the estimated coefficients. We
can write the system as a vector error correction model,
à ! à ! à ! à !
∆org
 22864 1090 ¡ org reg ¢ ̂org

= − −1 − −1 + 
∆reg
 1147 0008 ̂reg

where à !
org
0
 −1 = (1 −1) −1
= org reg
−1 − −1
reg
−1

9
(A) Two cointegrated variables (B) Deviation from equilibrium
x1t
0 x2t  'x t =x 1t x 2t
2.5

-5
0.0

-10
-2.5

0 20 40 60 80 100 0 20 40 60 80 100

(C) Speed of adjustment (D) Cross-plot


12.5
x1t  x2t
10.0 0

7.5

5.0 -5
 'x t
2.5

0.0 -10 x
100

-2.5
0 20 40 60 80 100 -12.5 -10.0 -7.5 -5.0 -2.5 0.0

Figure 2: Simulated series to illustrate cointegration and error-correction.

is the cointegrating relation, and  = (−1090 −0008)0 characterizes the speed of adjust-
ment towards equilibrium.
The organic orange price seems to error correct very strongly, removing the entire
disequilibrium each month. The regular orange price, on the other hand, does not seem to
error correct. The coefficient is negative, indicating a movement away from equilibrium,
but it is very small and not significantly different from zero. A simple interpretation of
this result is that the orange price is essentially determined on the large market for regular
oranges. The price of organic oranges has to follow the price of regular oranges, with an
additional premium of approximately 23 pence per lb. Note that changes in the price of
regular oranges (i.e. a shock to reg
 ceteris paribus) will be fully transmitted to the price
of organic oranges after one month, while changes to the price of organic oranges (i.e. a
 ceteris paribus) will not be transmitted to the market for regular oranges. ¨
shock to org

2 Estimation and Inference


Above, the concepts of cointegration and error correction was introduced. In this section
we discuss how the parameters in the cointegrating vector,  = (1 − 2   −  )0 , can be
estimated and how inference on  can be conducted.

10
2.1 The Engle-Granger Two-Step Approach
Recall, that if a set of variables, 1 and 2 , cointegrate then there exists coefficients, 
and  2 , so that
1 =  +  2 2 +  (11)

defines an equilibrium. It is natural to try to estimate  2 in the static regression (11) and
this is the approach suggested in the seminal paper of Engle and Granger (1987).
It can be shown that if 1 and 2 are I(1) and cointegrated then the OLS estimator
from (11),  b , is consistent for the true parameter,  . We do not postulate that the
2 2
model in (11) is the DGP that generated the data, and it turns out that consistency of
b2 holds even if the estimation model is misspecified relative to the DGP—as long as the

misspecification only relates to stationary terms. The reason is that the stochastic trends
will dominate asymptotically, so for  → ∞ any misspecification of stationary terms
will not affect the estimator. As an example the static regression in (11) will produce
consistent estimators even if the true DGP is dynamic. This is discussed in some detail
in Box 1. This result is in contrast to the stationary case, where consistency is normally
only obtained if the DGP is contained in the estimation model.
Consistency of the estimator tells you that b2 converges to  2 as  diverges. It turns
out that the non-stationarity of the variables in  affects the so-called rate of convergence,
b2 go to zero. If 1 and 2 are stationary variables,
i.e. the speed at which the variance of 
we know that under usual conditions,
√ ³ ´
 b −  →  (0  ) 
2 2

where  is the asymptotic variance of  b . The interpretation is that the variance of  b


2 2
−1 −1
is   , which approaches zero at the rate of  . For cointegrated I(1) series, the
variance of b approaches zero at a faster rate of  −2 , known as super consistency of  b .
2 2
To illustrate this phenomenon, graph (A) and (B) in Figure 3 show the distributions of
the estimator  b of a true value  = 1 from the static regression (11). In graph (A) 2
2 2
is generated as an IID variable, while 1 is 2 plus an IID error term. In graph (B) we
use the same setup but now 2 is I(1), generated as a random walk. In both cases the
estimators are consistent and the distributions collapse around the true value,  2 = 1. In
the cointegrated I(1) case, however, convergence is much faster, and the distributions are
much less dispersed.
Whereas the specification of stationary terms is not important asymptotically, it might
nevertheless be important in finite samples, and some authors suggest that the super-
consistent OLS estimator  b2 can be severely biased in finite samples. We return to an
alternative estimator and an illustration of the bias in §2.2
In a cointegration analysis, the static regression (11) is sometimes referred to as the first
step of an Engle-Granger two-step procedure; where the second step is a description of the
dynamic adjustment towards equilibrium. Given the estimated cointegration parameters,

11
Box 1: Static Regression when the DGP is Dynamic
In most cases we believe that the DGP, generating the observed data in the economy, is dynamic.
In this case the static regression (11) is misspecified; but the misspecification is related only to
stationary terms and the obtained estimator,  b , is still consistent.
2
As an example, consider a simple dynamic DGP given by

1 =  + 1 1−1 + 0 2 + 1 2−1 + 1 (B1-1)


2 = 2−1 + 2  (B1-2)

where 1 and 2 are IID error processes. Here, 2 is a random walk, while 1 is generated as
an autoregressive distributed lag model ADL(1,1). The equation in (B1-1) can be rewritten as

(1 − 1 ) 1 =  + (0 + 1 ) 2 − 1 (1 − 1−1 ) − 1 (2 − 2−1 ) + 1

or
1 =  +  2 2 + 1 ∆1 + 2 ∆2 + e1  (B1-3)

where we have defined


 0 + 1 1 1 1
=  2 =  1 = −  2 = −  and e1 = 
1 − 1 1 − 1 1 − 1 1 − 1 1 − 1
Comparing the expressions in (11) and (B1-3), we note that the static regression is a simplified
version of the DGP, obtained by excluding the stationary terms, ∆1 and ∆2 . Since the
misspecification is related to only stationary terms, the estimator from the static regression is
still consistent.
From a first look it seems natural to use the model (B1-3) for estimating the parameters.
Note, however, that ∆1 = 1 − 1−1 is correlated with e1 , so the OLS estimator of 1 is
not consistent. Asymptotically this will not affect the estimator of  2 , and the OLS estimator
in (B1-3) is consistent. Alternatively we can use 1−1 as an instrument for ∆1 and estimate
the parameters using instrumental variables (IV). This IV estimator is numerically equivalent
to the estimator obtained from applying OLS to the ADL model (B1-1). The IV estimator may
sound complicated compared to the ADL model, but depending on the software system you
use, the IV estimator is sometimes a convenient way to get the estimated standard errors for
the cointegrating parameter,  b , see also in §2.4.
2

we may define the so-called error correction term as the deviation from equilibrium,

b = 1 − 
 b 2 
b− 2

Under cointegration  b is a stationary process, and since the estimators converge to the
true values very fast we can include  b−1 as a fixed regressor in a dynamic model. The
second step of the Engle-Granger procedure is therefore to estimate an error-correction
model given b−1 , e.g.

∆1 =  + 1 ∆1−1 + 0 ∆2 + 1 ∆2−1 + b


−1 +  

where we have assumed one lag in the first differences and have conditioned on the con-
temporaneous change ∆2 . All terms in the error correction models are stationary and

12
standard inference procedures apply to all parameters, in the sense that −ratios will
follow standard normal distributions,  (0 1), asymptotically.

2.2 Dynamic Regression Models


An alternative to the estimator obtained by OLS in the static regression (11) is to con-
struct a dynamic model, which is believed to be a better approximation of the DGP, and
derive the estimator of the cointegrating coefficients from this model. One possibility is
to construct the best possible description of the auto-covariance structure of the data by
estimating an appropriate autoregressive distributed lag (ADL) model, and derive esti-
mators of the cointegrating parameters from the long-run solution. In particular we could
estimate by OLS the unrestricted ADL model, where the lag-lengths are set to eliminate
residual autocorrelation, e.g. an ADL(2,2) model,

1 =  + 1 1−1 + 2 1−2 + 0 2 + 1 2−1 + 2 2−2 +   (12)

Recall that the unrestricted ADL model can be written as an error-correction model. In
particular we can use the reformulations

1 − 1 1−1 − 2 1−2 = ∆1 + 2 ∆1−1 − (1 + 2 − 1) 1−1


0 2 + 1 2−1 + 2 2−2 = 0 ∆2 − 2 ∆2−1 + (0 + 1 + 2 ) 2−1 

to obtain the ECM form

∆1 =  + 1 ∆1−1 + 0 ∆2 + 1 ∆2−1 +  1 1−1 +  2 2−1 +   (13)

where 1 = −2 , 0 = 0 , 1 = −2 ,  1 = (1 + 2 − 1), and  2 = (0 + 1 + 2 ). For


both (12) and (13) the estimator of the cointegrating coefficient is given by the long-run
solution, i.e.
b b b b2
b2 = 0 + 1 + 2 = − 
  (14)
b b
1 − 1 − 2 b
 1
The model in (13) is often referred to as the unrestricted ECM form. Recall, that we may
also write the model with the long-run solution explicit as

∆1 = 1 ∆1−1 + 0 ∆2 + 1 ∆2−1 +  1 (1−1 −  −  2 2−1 ) +   (15)

These formulations are equivalent but (13) can be estimated with OLS while (15) is
non-linear in the parameters and requires a more elaborate estimation procedure (e.g.
maximum likelihood).
Compared to the estimator from the static regression, the estimator derived from a
dynamic model has the advantage of being based on a well-specified model. The main
problem in empirical applications is that the DGP is not known, so the precise form of
(12) has to be determined from the data. The usual approach is to start with a general
ADL(p,q), where  and  are large enough to eliminate residual autocorrelation. From
this model insignificant lags can be removed.

13
(A) Distribution of estimators, stationary case (B) Distribution of estimators, cointegrated case
30 30
T=50 T=50
T=100 T=100
T=500 T=500
20 20

10 10

0 0
0.50 0.75 1.00 1.25 1.50 0.50 0.75 1.00 1.25 1.50

(C) Distribution of estimators for T=100 (D) Distribution of t-ratios


0.4
Static regression Static regression, T=100
ADL(1,1) Static regression, T=500
6 ADL(2,2) ADL(1,1), T=100
0.3 ADL(1,1), T=500
Standard normal distribution

4
0.2

2 0.1

0 0.0
0.4 0.6 0.8 1.0 1.2 1.4 -10.0 -7.5 -5.0 -2.5 0.0 2.5 5.0

(E) Distribution of estimators, static regression (F) Distribution of estimators, ADL regressions
2.0 2.0
ADL(1,1)
ADL(2,2)
1.5 1.5

1.0 1.0

0.5 0.5

0.0 0.0
0 100 200 300 400 500 0 100 200 300 400 500

Figure 3: (A): Consistency of the estimated parameter in a static regression for station-
ary variables. (B): Superconsistency for cointegrated I(1) variables. (C): Distributions of
the estimated cointegration parameter based on a static and a dynamic regression. (D):
Distributions of the −ratios under a true null hypothesis for  = 100 and  = 500
based on the static and dynamic regressions. (E)-(F): Mean and 95% confidence bands
of the distributions of the estimated cointegration parameter for different sample lengths
 = 20 30  500. The Monte Carlo simulations are based on 10 000 replications.

14
Box 2: Inference on Coefficients in an ADL Model
Consider an ADL(2,2) model given by

1 =  + 1 1−1 + 2 1−2 + 0 2 + 1 2−1 + 2 2−2 +   (B2-1)

where  is an IID process. Given that the variables in (B2-1) are all I(1), it is interesting
to ask if any of the estimators obtained by applying OLS to equation (B2-1) follow standard
distributions, so that inference based on the standard normal distribution applies.
The answer to this question is given in Sims, Stock, and Watson (1990). They give the
general result that an estimated parameter follow a normal distribution asymptotically if it
can be written as the coefficient to a mean zero stationary variable—possibly after a linear
transformation of the model. This means that if the model can be reformulated so that e.g. the
parameter  is the coefficient to a stationary variable with mean zero, then the distribution of
b is asymptotically normal.

Again we may rewrite the ADL model in ECM form as

∆1 = 1 ∆1−1 + 0 ∆2 + 1 ∆2−1 +  {1−1 −  −  2 2−1 } +   (B2-2)

where 1 = −2 , 0 = 0 , 1 = −2 ,  = (1 + 2 − 1),  =  (1 − 1 − 2 ), and  2 =


(0 + 1 + 2 )  (1 − 1 − 2 ).
Note that ∆1−1 , ∆2 , and ∆2−1 are stationary variables with mean zero, so estimators
of the corresponding parameters: 1 , 0 , and 1 will follow a normal distribution. Given
cointegration, the term 1−1 −  −  2 2−1 is also stationary with mean zero, so also the
estimator of  will follow a normal distribution.
Unfortunately there is no way to rewrite the model so that  2 is the coefficient to a stationary
mean zero term, so this argument cannot be used to show that the estimator of the cointegrat-
ing coefficient,  2 , has a normal distribution. If 1 is the only variable that error corrects,
however, then all information on  2 is present in the equation (B2-1) and the single equation
OLS estimator is identical to the maximum likelihood estimator in the vector error-correction
model. It follows that  b is asymptotically efficient and asymptotically normal. This is strong
2
assumption to make, however, and inference on cointegrating parameters should always be done
with some caution.

2.3 Comparison in a Monte Carlo Simulation


To compare the two approaches and illustrate the practical importance of the bias in a
static regression we set up a small Monte Carlo simulation. As the DGP we consider a
specific model

1 = 030 · 2 + 020 · 2−1 + 050 · 1−1 + 1


2 = 2−1 + 2 

for  = 1 2   , where the innovations, 1 and 1 , are assumed  (0 1) and uncorrelated.
The DGP implies a long-run solution with a cointegrating coefficient of  2 = 030+020 1−050 = 1.
Based on 10 000 data sets from this DGP we look at the properties of the estimators
obtained from the static regression (11) and from the ADL(1,1) model. We note that the

15
used ADL model is identical to the DGP and we expect it to perform better than the
static regression. To illustrate the effect of choosing an estimation model which is more
general than the DGP we also consider the estimates obtained from an ADL(2,2), which
estimates a redundant lag for both variables. This setup amounts to using one regression
model that coincides with the DGP (an ADL(1,1)), one that is too general (an ADL(2,2)),
and one that is too restricted (a static regression).
Figure 3 (C) illustrates the distributions of the estimated parameters for the three
cases for  = 100 observations. We note that the distributions for the ADL(1,1) and
ADL(2,2) almost coincide and are symmetric and nicely centered around the true value.
This indicates that estimating a redundant lag will only marginally affect the estimators.
The distribution of the estimates from the static regression is shifted to the left, reflecting
the bias of the estimator. The mean of the estimates is 093, which is significantly smaller
than unity. We also note that the distribution is skewed, with a long left tail. Graph
(E) and (F) illustrate the mean and the 5% and 95% quantiles of the distributions of
the estimates for different sample lengths  = 20 30  500. We see that the estimator
from the static regression is consistent, but it is severely biased in small samples and the
distribution is clearly asymmetric. The estimator from the dynamic regression has the
correct expectation for all considered sample lengths. We also note that the cost of the
two redundant regressors in the ADL(2,2) is only visible for very short sample lengths, and
even for  = 20, the difference between the estimates from an ADL(1,1) and an ADL(2,2)
is small.
The results for this specific DGP thus seem to suggest that estimators derived from a
dynamic regression model are clearly preferable to the two-step Engle-Granger estimators.
This seems to be confirmed for more general classes of DGPs in the literature.

2.4 Inference on Cointegrating Parameters


Besides getting estimates of the parameters we are often interested in testing specific hy-
potheses on the cointegrating coefficients, which may link the statistical model to economic
theory. This requires that we know the distribution of  b . Unfortunately, it turns out
2
that the estimator obtained from the static regression (11) is not normal and in general
the distribution depends on unknown parameters, which invalidates standard inference.
As a consequence, we can only use the static regression to estimate the parameters, while
the estimated standard errors cannot be used for inference in general.
In the dynamic regression, (12) or equivalently (13) or (15), the situation is a bit
more promising, and given cointegration −ratios constructed from the estimated standard
errors follow standard normal distributions under the null. Assuming cointegration this
result implies that we can make inference on the cointegration coefficients derived as the
long-run solution from an ADL or ECM model. As an example we can test hypotheses

16
on the cointegrating coefficient using the standard −ratio

b −

b
 2 = = 2 
b )
se( 2

which will follow a standard normal distribution asymptotically. A more theoretical dis-
cussion of the inference on the parameters of the ADL model for I(1) variables is given in
Box 2.
The only complication is that b2 is a non-linear function of the estimated parameters,
b2 is a complicated function of the covariance matrix
cf. (14), and the standard error of 
for the estimated parameters in (12). The software package PcGive has a procedure to
calculate the static long-run solution and supply the derived standard errors. In other
software packages it is sometimes more convenient to use the alternative (but numerically
equivalent) IV estimator mentioned in Box 1 or the non-linear etimation of (15) since they
automatically produce standard errors to  b2 .
The distributions of the −ratios in the Monte Carlo simulation are reported in graph
3 (D). The −ratios from a static regression have a distribution which is skewed to the left,
and inference based on a standard normal would be very misleading. The −ratios from
the dynamic regression, on the other hand, seem to be close to a standard normal—making
it possible to test hypotheses on the parameters.

Example 7 (private consumption, continued): To illustrate estimation and infer-


ence on cointegrating coefficients, consider again the Danish quarterly consumption data:
 = (     )0 . Applying OLS to a static regression model for the 122 observations,
1973 : 1 − 2003 : 2, yields

b 
 = −0404 + 0364 ·  + 0516 ·  +  (16)
(0129) (0049) (0044)

where the numbers in parentheses are standard errors. The estimates seem consistent
with a simple consumption function in which consumption depends positively on income
and wealth. We may note that a one per cent increase in income and wealth give less
than a one per cent increase in consumption as 0364 + 0516 = 088. Consequently, the
consumption-income ratio will not be constant in a steady state, which may be regarded
as unsatisfactory from an economic point of view. Note that −ratios constructed from
the reported standard errors in (16) do not follow a standard normal distribution and they
are not suitable for testing. For example we cannot test if the sum of coefficients, 088, is
significantly different from zero.
Based on the estimates of the static regression (16) we may define the error-correction
term
b =  − 0404 − 0364 ·  − 0516 ·  

which is interpretable as the deviation from equilibrium. The term  b may be used in
the construction of error-correcting models to characterize the dynamic properties of the

17
data as suggested by the Engle-Granger approach. In principle there may exist error
correction models of ∆ , ∆ , and ∆ , and starting with a model with two lags in the
first differences and deleting insignificant lags, produces the three equations:

∆b b−1
 = 0001 − 0195 · ∆−1 + 0229 · ∆ + 0426 · ∆ − 0250 · 
(0002) (0077) (0057) (0117) (0064)
∆b b−1
 = 0002 + 0433 · ∆ + 0387 · ∆−1 − 0353 · ∆−1 + 0066 · 
(0002) (0118) (0115) (0087) (0099)
b = 0003 + 0232 · ∆ − 0030 · 
∆ b−1 
(0001) (0060) (0050)

Note that only consumption corrects deviations form the long-run relation, with a speed
of adjustment of −025, while ∆ and ∆ do not adjust significantly when the variables
are out of equilibrium.
An alternative estimator of the cointegrating coefficients can be derived from a condi-
tional ADL model for consumption. Assuming at most three lags and deleting insignificant
terms lead to the preferred ADL model

 = −0080 + 0544 · −1 + 0204 · −2 + 0240 ·  − 0125 · −1


(0093) (0092) (0079) (0060) (0065)
+0401 ·  − 0291 · −1 + b  (17)
(0124) (0129)

According to misspecification tests, the model seems relatively well-behaved. No-auto-


correlation of order 1 to 5 is not rejected with a −value of 064; and no-ARCH of order
1 to 4 is not rejected with a −value of 021. Solving equation (17) for the static long-run
solution yields
b 
 = −0320 + 0458 ·  + 0436 ·  +  (18)
(0357) (0146) (0130)

where the long-run coefficients are derived as


0240 − 0125 0401 − 0291
= 0458 and = 0436
1 − 0544 − 0204 1 − 0544 − 0204
and where the standard errors to the cointegrating coefficients are complicated functions
of the covariance matrix of the estimated parameters. Compared to the static regression,
the estimated coefficient to income is somewhat higher, whereas the coefficient to private
wealth is lower. We also note that the standard errors in (18), which can be used for
testing hypotheses on the cointegrating coefficients, are much larger than the standard
errors in (16).
Based on the dynamic model, the sum of the coefficients is still below unity, 0458 +
0436 = 0894, but now we can test the hypothesis that it is actual unity. A Wald test
for this hypothesis gives a test statistic of 5.26, corresponding to a −value of 0.022 in a
2 (1) distribution. We therefore reject the hypothesis and conclude that the sum of the
coefficients seems to be significantly smaller than unity.
To illustrate the dynamic properties of the estimated cointegration model Figure 4
shows the impulse-response functions for a permanent change in income and wealth, i.e.
the cumulated values of +  and +  ( = 0 1  40). For disposable income

18
(A) Impulse response function for income (B) Impulse response function for wealth
0.5 0.5

0.4 0.4

0.3 0.3

0.2 0.2
0 10 20 30 40 0 10 20 30 40

Figure 4: Impulse-response functions for a permanent change in income and wealth, i.e.
the accumulated values of +  and +  ( = 0 1  40).

the contemporaneous impact is 0240, and there is a smooth convergence to the long-run
impact of 0458. A permanent change in the private wealth have a contemporaneous
effect on consumption of 0401, which is not far from the long-run impact of 0436. The
convergence is not monotone, however, and the large contemporaneous impact is followed
by a decrease in the next period and then a gradual convergence.
Notice that the results obtained in the estimation of (17) can also be obtained by
estimating the equivalent unrestricted error-correction model, i.e.

∆ = −0080 − 0204 · ∆−1 + 0240 · ∆ + 0401 · ∆


(0093) (0079) (0060) (0124)
−0251 · −1 + 0115 · −1 + 0110 · −1 + b  (19)
(0065) (0044) (0046)

Here the cointegrating coefficients can be found as the long-run solution, 01150251 =
0458 and 01100251 = 0436, which are identical to the results in (18). ¨

2.5 What if Variables do Not Cointegrate?


Recall that cointegration is the special case where the stochastic trends in the individual
variables cancel. From a logical point of view this is an exception, and it is interesting to
ask for the properties of regression models with I(1) variables that do not cointegrate.
To discuss this case assume that 1 and 2 are two unrelated I(1) variables. Both
variables contain stochastic trends, but they are unrelated and do not cointegrate. Ideally
we would like the static regression

1 =  +  2 2 +   (20)

to reveal that  2 = 0, at least asymptotically. This turns out not to be the case, however,
and the estimator  b is not consistent. Moreover, as  → ∞ the −ratio, b  2 =0 , will
2
indicate a significant relation between 1 and 2 . This is known as the spurious regression
result. The problem is that when the variables do not cointegrate,  is an (1) process
and standard results do not hold.

19
Example 8 (spurious regression): As an example of a spurious regression, consider
two presumably unrelated I(1) variables, namely yearly data covering 1980 − 2000 for the
log of real private consumption in Denmark, cons , and the log of the number of breeding
cormorants in Denmark, bird . We estimate a static regression:

b 
cons = 12145 + 0095 · bird + 
(0150) (0015)

The −ratio for the hypothesis that there is no relation,  2 = 0, is given by

b 0095
 2 =0 = = 630
0015
which seems highly significant in a  (0 1) distribution, apparently suggesting a clear
positive relation between the number of birds and aggregate consumption! Furthermore,
2 in the equation is 069, indicating that the number of breeding birds can account
for large proportion of the variation in consumption. These results are of cause totally
spurious—a simple consequence of the variables being I(1). ¨

To illustrate the spurious regression problem we set up a simple Monte Carlo simulation.
As a comparison we first reproduce the standard results for a stationary regression. We
generate data series as independent IID variables,

1 = 1 and 2 = 2   = 1 2  

where 1 and 2 are independent drawings from a  (0 1). The results from the regression
model (20) are reported in Figure 5 (A) and (B) for sample lengths  = 50 100 500. We
note in graph (A) that the distributions of b2 are centered around the true value and the
convergence of the estimator  b2 implies that the variance decreases as  → ∞. In graph
(B) we consider the distributions of the −ratio, b  2 =0 . The distribution is close to the
asymptotic  (0 1) for all considered sample lengths.
Next consider the spurious regression between I(1) variables. Here we generate data
as independent random walks, i.e.

1 = 1−1 + 1 and 2 = 2−1 + 2   = 1 2  

The results are reported in graph (C) and (D). In graph (C) we note that the distribution of
b2 is centered around the true value, but it does not collapse as  → ∞. This reflects that

the estimator is unbiased but not consistent. In graph (D) we notice that the distributions
of the −ratios get increasingly dispersed as  increases, and the distributions are far from
a standard normal. As  → ∞ this implies that using the conventional critical values of
±196, we would always reject the true hypothesis that  2 = 0.

2.5.1 Dynamic Regression Models

One way to explain the spurious regression result is to note that the regression model in
(20) is logically inconsistent if the variables do not cointegrate. Since there is no genuine

20
relation between 1 and 2 , the true value of the parameter is zero,  2 = 0; so a model
with  2 6= 0 is necessarily false. Note, however, that the model with  2 = 0 is also false. If
 2 = 0, then the only way to balance the equation is if  is I(1), but that is not consistent
with the assumptions of the regression model. One problem with the spurious regression
is therefore that the actual DGP can not be contained in the estimation model; and the
test for  2 = 0 against  2 6= 0 compares two false models.
Based on this insight it is easy to suggest a modification of the static regression, which
circumvents some of the problems of the spurious regression. As an example, consider the
simple ADL(1,0) model, where the lagged value of 1 is included in the regression:

1 =  + 1 1−1 + 0 2 +   (21)

In this model the simple DGP is obtained if

1 = 1 and 0 =  = 0

which is consistent with the assumption of a stationary error term.


To analyze how the dynamic regression model behaves with unrelated I(1) variables,
we redo the Monte Carlo simulation using now the dynamic regression in (21). The
distributions of the estimator,  b , and the −ratio, b0 =0 , are reported in graph (E) and
0
(F), respectively. In graph (E) we note that the estimator is consistent, and comparing
with graph (A) we also note that the rate of convergence is faster than for the stationary
case. It is remarkable, that simply augmenting the static regression with the lagged left
hand side variable eliminates the inconsistency of the estimator. The variables, 1 and
2 , are still I(1), however, and standard results for hypothesis testing do not automatically
apply. In graph (F) we note that the distributions are fixed for different sample length as
in the stationary case, but the distribution is not an (0 1). This fact can be explained
using the argument in Box 2.

3 Testing for No-Cointegration


The discussion of spurious regression makes it obvious, that it is important to the able to
test whether a set of  variables, 1  2    , are cointegrated or not. If the variables
are cointegrated we can use the methods suggested above for estimation and inference
on the equilibrium relation. If the variables do not cointegrate, the regression model is
useless and should be disregarded or changed to obtain cointegration.
In this section we discuss how the hypothesis that a set of variables are not cointe-
grated can be tested. We consider two different approaches. One approach is based on the
deviation from a proposed cointegrating relation or on the residual from a static regres-
sion. This is the approach implemented in the Engle-Granger methodology. The second
approach is based on the equivalence between cointegration and error correction and it is
actually a test for no-error-correction in an unrestricted dynamic model.

21
(A) Distribution of estimators, IID case (B) Distribution of t-ratios, IID case
T=50 0.4 N(0,1)
T=100 T=50
7.5 T=500 T=100
T=500

5.0
0.2

2.5

0.0 0.0
-0.50 -0.25 0.00 0.25 0.50 -4 -2 0 2 4

(C) Distribution of estimators, I(1) case (D) Distribution of t-ratios, I(1) case
0.75 T=50 T=50
T=100 0.075 T=100
T=500 T=500
0.50
0.050

0.25 0.025

0.00 0.000
-3 -2 -1 0 1 2 3 -75 -50 -25 0 25 50 75

(E) ADL estimators, I(1) case (F) ADL t-ratios, I(1) case
40 T=50 0.4 N(0,1)
T=100 T=50
T=500 T=100
30 T=500

20 0.2

10

0
-0.50 -0.25 0.00 0.25 0.50 -5.0 -2.5 0.0 2.5 5.0

Figure 5: Monte Carlo results for a static regression for stationary variables and for a
spurious regression.

3.1 Residual-Based Tests


Following the definition, a set of variables 1  2    cointegrate with cointegration
¡ ¢0
vector  = 1 − 2  − 3   −  , if the linear combination

 =  0  = 1 −  2 2 −  3 3 −  −   

is stationary. It follows that the null hypothesis of no-integration can be translated into
the hypothesis of a unit root in  . This hypothesis can be tested using a conventional
augmented Dickey-Fuller (ADF) test. Allowing  to have a mean different from zero
but no deterministic linear trend, the hypothesis of no-cointegration can be tested as the

22
hypothesis H0 :  = 0 in the ADF regression with a constant term,
−1
X
∆ =  +  ∆− + −1 +   (22)
=1

where  is an IID error term. The alternative to a unit root is stationarity, H : −2 


  0, and under the null of a unit root the ADF −test statistic,

b

b
 = 
se(b
)

follows a DF distribution. Critical values for the DF distribution are reproduced in part
(A) of Table 1 in the row with zero estimated parameters.
If the relevant alternative to a unit root is trend-stationarity, the ADF regression (22)
may be augmented with a linear trend term, and the test for no-cointegration is the ADF
test with a linear trend,   .

Example 9 (prices on the orange market, continued): As an example of a unit


root test where the potential cointegration vector is known, reconsider the prices from the
orange market. The potential stationary variable is the price differential

 = org reg
 −  

implying a cointegration vector  = (1 −1)0 . To test the hypothesis of no-cointegration,


we test for a unit root in  . Setting up an ADF regression with a constant term and 5
lags in ∆ and deleting insignificant lags, lead to the simple DF regression

∆ = 21718 − 1082 −1 + b



(1534) (00750)

The Dickey-Fuller test is given by the −ratio,


−1082
b
 = = −1443
0075
The 5% critical value for the case of a constant is −286, so we can easily reject the null of
no-cointegration. Also recall from Figure 1 (B) that the price differential looks extremely
stable. ¨

3.1.1 Estimated Cointegration Vector

In many cases the cointegration vector, , is unknown and the above approach is not
feasible. The test procedure can easily be modified, however, to the case where  is
estimated as in the Engle-Granger procedure. Recall that if the cointegration coefficients,
 2     , are unknown, they can be super-consistently estimated in the static regression

1 =  +  2 2 +  +    +   (23)

23
or from the long-run solution of a dynamic model. The regression (23) corresponds to a
cointegrating relation if the deviation from the relation,  , is a stationary process. Esti-
mating the parameters we can test for no-cointegration by testing whether the estimated
b , contains a unit root. This test is translated into the hypothesis H0 :  = 0
residual, 
in the ADF regression
−1
X
∆b =  ∆b
− + b
−1 +    (24)
=1
We note that since the estimated residual,  b , has a mean of zero, there is no constant
term in the ADF regression (24). Nonetheless, the critical values for the ADF test depend
on the deterministic specification of the static regression, e.g. whether (23) contains a
constant or a linear trend.
The fact that the cointegrating vector  b is estimated also changes the critical values
for the ADF test, and the estimation uncertainty has to be taken into account. The
b ,
intuition is that OLS applied to the static regression (23) will minimize the variance of 
and graphically the estimated residuals will look as ‘stationary as possible’. And the more
explanatory variables we include in (23), i.e. the more parameters we estimate, the smaller
b , and the more stationary it will look. In the test procedure we will
is the variance of 
have to account for that, and the critical values depend on the number of I(1) regressors
in (23). The asymptotic distributions of tests for no-cointegration are illustrated in Figure
6 (A). As the number of regressors in the static regression increases, the distribution of
the ADF test statistic moves to the left. This reflects that the OLS procedure makes the
variance of the estimated residual smaller and smaller. The critical values of the residual
based test are reproduced in Table 1 (A).

Example 10 (private consumption, continued): To test whether the static regres-


sion of a consumption function in (16) corresponds to a cointegrating relation, we construct
the estimated residual

b =  − 0404 − 0364 ·  − 0516 ·  


which is depicted in Figure 1 (D). To test for no-cointegration we use an ADF regression
without deterministic terms. In the present case one lag is needed,

 = −0223 ∆b
∆b b−1 + b
−1 − 0221  
(0089) (0068)

and the test statistic is given by


−0221
b
 = = −327
0068
The 5% and 10% critical values for the case of a constant term and two estimated para-
meters in the static regression are given by −374 and is −345, respectively, so we cannot
reject the hypothesis of no-cointegration. This is reflected in Figure 1 (D), where the
deviations from the relation are relatively persistent. The deviations seem to be related

24
0.6
Number of estimated parameters
in the static regression.
7 6 5 4
3 2 1
0.5

DF with a constant,  c

0.4

N(0,1)

0.3

0.2

0.1

-8 -7 -6 -5 -4 -3 -2 -1 0 1 2 3 4 5

Figure 6: Asymptotic distributions of the residual-based test for no-cointegration.

to the business cycle, suggesting that the consumption-income ratio is pro-cyclical besides
the wealth effects. To obtain stronger evidence of cointegration one possible solution is to
augment the model with a measure of the business cycle, e.g. a variable measuring the
effects of unemployment. ¨

3.2 Testing for No-Cointegration in the ECM


Due to the representation theorem discussed in §1.3, the null hypothesis of no-cointegration
corresponds to the null of no-error-correction. This observation has been used to construct
several tests for whether variables cointegrate. The most convenient is based on the
unrestricted error-correction model, e.g.

∆1 =  + 1 ∆1−1 + 0 ∆2 + 1 ∆2−1 +  1 1−1 +  2 2−1 +   (25)

Here we can test the hypothesis that 1 do not error correct, i.e. H0 :  1 = 0 against the
cointegrating alternative, H :  1  0. The test statistic is just the conventional −ratio,
given by
b b1

 1 =0 = 
se(b
1)
As for the residual based test, the distribution of b =0 depends on the deterministic terms
1

in the regression (25) as well as the number of I(1) variables in  . The asymptotic critical

25
(A) Residual-based (ADF) test for no-cointegration
Number of estimated Constant in (22) Constant and trend in (22)
parameters 1% 5% 10% 1% 5% 10%
0 −343 −286 −257 −396 −341 −313
1 −390 −334 −304 −432 −378 −350
2 −429 −374 −345 −466 −412 −384
3 −464 −410 −381 −497 −443 −415
4 −496 −442 −413 −525 −472 −443

(B) PcGive test for no-cointegration


Number of variables Constant in (25) Constant and trend in (25)
in  () 1% 5% 10% 1% 5% 10%
2 −379 −321 −291 −425 −369 −339
3 −409 −351 −319 −450 −393 −362
4 −436 −376 −344 −472 −414 −383
5 −459 −399 −366 −493 −434 −403
Table 1: Asymptotic critical values for tests of no-cointegration. Reproduced from David-
son and MacKinnon (1993).

values are reproduced in part (B) of Table 1. This test appeared very early in the PcGive
software package and is often referred to as the PcGive test for no-cointegration.
Comparing the residual-based test for no-cointegration with the test for no-error-
correction in the dynamic model three things are worth noting. First, the test for no-error-
correction is based on the assumption that 1 is the only variable which error corrects to
the potential cointegrating relation. This implies that we should test for no-cointegration
in the ‘correct’ error-correction model; in the present case that is the model for ∆1 and
not the model for ∆2 . In most cases, prior knowledge from economic theory suggests
which equation to consider.
Secondly, the test for no-error-correction of ∆1 is parallel to a test for no-cointegration
for a relation involving 1 . Even if we cannot reject the hypothesis of no-error-correction
of ∆1 , the other right hand side variables in levels, 2    , may still cointegrate in a
relation not involving 1 .
Thirdly, a comparison of (25) with the ADF test (24) shows, that the latter imposes a
common factor restriction on the dynamics when the hypothesis of a unit root is tested.
There is no a priori reason to believe that the data obey a common factor restriction,
and the test may be negatively affected by imposing the restriction. The relation between
(25) and (24) is explored in more details in Box 3.

Example 11 (private consumption, continued): To test whether the unrestricted

26
Box 3: ADF Tests and Common Factor Restrictions
Consider a potential cointegrating relation between two I(1) variables

1 =  +  2 2 +  

To test for no-cointegration we use the residual,  = 1 −  −  2 2 , and consider an ADF
regression. Assume for simplicity that only one lag of ∆ is needed, i.e.

∆ = 1 ∆−1 + −1 +   

Inserting the definition of  and collecting terms yields a model

∆1 = − + 1 ∆1−1 +  2 ∆2 − 1  2 ∆2−1 + 1−1 −  2 2−1 +  

This is an ECM model, but subject to a number of common factor restrictions. We have 6
regressors on the right hand side, but only 4 parameters to be estimated: , , 1 , and  2 , and
hence two restrictions.
The restrictions imply e.g. that the contemporaneous impact of 2 on 1 is  2 , which is
identical to the long-run impact. There is no compelling reason to believe that this is true in
practice, see also Example 7.

error correction model in (19) suggests cointegration we test for no-error-correction using
the −ratio,
b −0251
 1 =0 = = −386
0065
The 5% critical value is given in part (B) of Table 1 as −351, so we can borderline reject
no-cointegration.
The different conclusions from the residual-based test and the PcGive test for no-
cointegration could be related to the fact that the common factor restrictions imposed
on the ADF test are not in line with the data. The test statistic for the two common
factor restrictions is 1041, which is highly significant according to the asymptotic 2 (2)
distribution, so the common factors are easily rejected. Rejection of the common factor
restrictions and our knowledge that consumption is the only error-correcting variable
suggest that the PcGive test for no-cointegration is probably preferable in the present
case. ¨

4 Limitations of the Single-Equation Approach


So far we have presented some single-equation tools for cointegration analysis. Although
these methods are powerful in many situations, it is important to know the assumptions
that are implicitly made in the analysis and the drawbacks and limitations of the methods.
Below we focus on the analysis based on an unrestricted ADL or error correction model
and outline some important limitations.

27
The starting point for the discussion is an ECM model for three variables,

∆ =  + 0 ∆ + 1 ∆ +  1 −1 +  2 −1 +  3 −1 +   (26)

and to make the discussion less abstract we think of a consumption function and use the
notation    , and  for the three variables. In the analysis of the equation (26) we
implicitly make three sets of assumptions:
(1) Cointegration is a system property and in principle there exist error correction equa-
tions for all variables: ∆ , ∆ , and ∆ . We only consider the equation for ∆ .
(2) We assume that there is only one cointegrating relation between the variables, given
by the long-run solution.
 
 = − 2 ·  − 3 ·  
1 1
We mention in §1 that this is not necessarily true; and as the number of variables
in the model increases it actually becomes less and less likely.
(3) To condition equation (26) on the contemporaneous changes, ∆ and ∆ , we
assume that they are predetermined, i.e. there is no feedback from ∆ to ∆
and ∆ . This is not necessarily true in practice, where several variables may be
simultaneously determined.
Below we discuss these three issues in turn.

4.1 A Vector Error Correction Model


We consider the case where the variables,  = (     )0 , are cointegrated with cointe-
gration vector  = (1 − 2  − 3 )0 , so that  0  is a stationary process. Assume that we
are mainly interested in estimating the long-run parameters,  2 and  3 .
We consider the three error correction models:

∆ =  1 + 1 (−1 −  1 −1 −  2 −1 ) + dynamics + 1


∆ =  2 + 2 (−1 −  1 −1 −  2 −1 ) + dynamics + 2
∆ =  3 + 3 (−1 −  1 −1 −  2 −1 ) + dynamics + 3

where ‘dynamics’ represents lagged values of first differences. Cointegration implies the
existence of error correction, so one or more of the three coefficients, 1 , 2 , or 3 , have to
be significantly different from zero. We note that the cointegrating parameters,  1 and  2 ,
appear in all equations, so if we want the best possible (or efficient) estimators, we have
to use the information in all three equations and not just the equation for ∆ . Remember
that we can stack the three equations in a vector error correction model (VECM):
⎛ ⎞ ⎛ ⎞ ⎛ ⎞ ⎛ ⎞ ⎛ ⎞
∆ 1 1 ³ ´ −1 1
⎜ ⎟ ⎜ ⎟ ⎜ ⎟ ⎜ ⎟ ⎜ ⎟
⎝ ∆ ⎠ = ⎝  2 ⎠ + ⎝ 2 ⎠ 1 − 1 − 2 ⎝ −1 ⎠ +  + ⎝ 2 ⎠ 
∆ 3 3 −1 3

28
where we have left out the dynamics. The parameters of this model can be estimated
using maximum likelihood, but that is beyond the scope of the present note.
In the special case where 2 = 3 = 0 it is sufficient to consider the first equation ∆ ,
and the single equation analysis will be efficient. This assumption is implicitly imposed
by the single equation model.

4.2 More Cointegrating Relations


Now assume that there actually exists two cointegrating relations between the variables
in  , e.g.
 −  1  ∼ (0) and  −  2  ∼ (0)

The first one represent a consumption-income ratio if  1 = 1 and the second one show the
propensity to consume out of private wealth. The long-run relations can be written as
⎛ ⎞
à ! à ! 
 −  1  1 − 1 0 ⎜ ⎟ 0
= ⎝  ⎠ =   
 −  2  1 0 − 2


We can write the VECM as


⎛ ⎞ ⎛ ⎞ ⎛ ⎞ ⎛ ⎞ ⎛ ⎞
∆ 1 11 12 Ã ! −1 1
⎜ ⎟ ⎜ ⎟ ⎜ ⎟ 1 − 1 0 ⎜ ⎟ ⎜ ⎟
⎝ ∆ ⎠ = ⎝  2 ⎠ + ⎝ 21 22 ⎠ ⎝ −1 ⎠ +  + ⎝ 2 ⎠ 
1 0 − 2
∆ 3 31 32 −1 3

Here 11 measures how ∆ is affected by deviations from the first long-run relation,
−1 −  1 −1 , while 12 measures how ∆ is affected by deviations from the second long-
run relation, −1 −  2 −1 . Second row in  measures how ∆ is affected by deviations
from equilibrium etc.
The parameters of this model can again be estimated using ML, but we will not discuss
that here. Instead we just note that if we only consider the first equation, then we will
estimate the first row of the model, i.e.

∆ =  1 + (11 + 12 ) −1 −  1 11 −1 −  2 12 −1 + 1 

which contains a combination of the two stationary relations. Since a combination of


stationary relations will also be stationary, the equation is still balanced in terms of the
order of integration, but we will not be able to separately interpret the two equilibrium
relations.

4.3 Exogeneity
To be able to condition on ∆ and ∆ in the single-equation cointegration analysis we
have to assume that they are predetermined, i.e.

[1 ∆ ] = 0 and [1 ∆ ] = 0

29
This requirement states that there can be no feedback from ∆ to ∆ and ∆ . In the
present case consumption is a main component in the gross domestic product, which is
the (national accounts) basis for defining disposable income. This link may suggest that
consumption and income are simultaneously determined in a given quarter.
If the regressor is not predetermined we may exclude it, focussing on the reduced form
with no contemporaneous effects. Alternatively we may be able to find good instruments
for ∆ , and estimate the model using an instrumental variables estimator.
A third possibility is again to estimate the vector error correction model directly.
In this setting the variables, ∆ , ∆ , and ∆ , are treated on equal footing, without
imposing a priori restrictions of exogeneity.

5 Concluding Remarks
This note has illustrated that regression models for unit-root non-stationary time series
give unreliable results, and the usual tools will not be able to distinguish between genuine
relationships and spurious regressions. This suggests that for non-stationary time series
we should always think in terms of cointegration. Logically a relationship can only be
interpreted as defining an economic equilibrium if the variables cointegrate; and if they
don’t—there is no interpretable relationship between the variables.
We have presented a number of single-equation tools for cointegration. The con-
ceptually simplest approach is the Engle-Granger two-step estimation, but for practical
purposes the cointegration analysis based on unrestricted ADL or ECM models are prob-
ably preferable. This also fits within the general-to-specific framework, in which we first
find an appropriate statistical description of the data (the unrestricted ADL model), and
afterwards test hypotheses to link the statical model to economic theory (testing for coin-
tegration and interpreting the long-run relationship).

5.1 Further Readings


The literature on cointegration analysis is huge, and most references are far more tech-
nical that the present note. An accessible introduction is Hendry and Juselius (2000).
Alternative presentations of time series econometrics, including sections on single equa-
tion cointegration analysis, are given in Patterson (2000) and Enders (2004). A classic
reference on cointegration analysis based on the ADL model is the book by Banerjee,
Dolado, Gailbraith, and Hendry (1993). Maddala and Kim (1998) give a review of the
literature on unit roots and cointegration. A specific reference for the test for no-error-
correction (with references to the earlier literature) is Ericsson and MacKinnon (2002).
The classic reference for time series analysis in general, which includes rather technical
sections on cointegration models is Hamilton (1994). An introduction to vector error cor-
rection models and the analysis of cointegration in a VAR model is given in Hendry and
Juselius (2001) and Juselius (2007), while the (very technical) theory is given in Johansen

30
(1996).

References
Banerjee, A., J. Dolado, J. W. Gailbraith, and D. Hendry (1993): Co-
Integration, Error-Correction, and the Econometric Analysis of Non-Stationary Data.
Oxford University Press, Oxford.

Davidson, R., and J. G. MacKinnon (1993): Estimation and Inference in Economet-


rics. Oxford University Press, Oxford.

Enders, W. (2004): Applied Econometric Time Series. John Wiley & Sons, 2nd edn.

Engle, R., and C. Granger (1987): “Co-Integration and Error Correction: Represen-
tation, Estimation and Testing,” Econometrica, 55, 251—276.

Ericsson, N. R., and J. G. MacKinnon (2002): “Distributions of Error Correction


Tests for Cointegration,” The Econometrics Journal, 5, 285—318.

Hamilton, J. D. (1994): Time Series Analysis. Princeton University Press, Princeton.

Hendry, D. F., and K. Juselius (2000): “Explaining Cointegration Analysis: Part I,”
Energy Journal, 21(1), 1—42.

(2001): “Explaining Cointegration Analysis: Part II,” Energy Journal, 22(1),


75—120.

Johansen, S. (1996): Likelihood-Based Inference in Cointegrated Vector Autoregressive


Models. Oxford University Press, Oxford, 2nd edn.

Juselius, K. (2007): The Cointegrated VAR model: Econometric Methodology and


Macroeconomic Applications. Oxford University press, Oxford.

Maddala, G. S., and I.-M. Kim (1998): Unit Roots, Cointegration, and Structural
Change. Cambridge University Press, Cambridge.

Patterson, K. (2000): An Introduction to Applied Econometrics. A Time Series Ap-


proach. Palgrave MacMillan, New York.

Sims, C. A., J. H. Stock, and M. W. Watson (1990): “Inference in Linear Time


Series Models with Some Unit Roots,” Econometrica, 58(1), 113—144.

31

You might also like