Lecture Note 6 - Cointegration and Common Trends
Lecture Note 6 - Cointegration and Common Trends
I
n this note we discuss some important issues in regression models for non-statio-
nary time series. It is illustrated how linear combinations of non-stationary time
series are non-stationary in general, and cointegration is defined as the special
case where a linear combination is stationary. We emphasize that relations between
non-stationary variables can only be interpreted as defining an equilibrium if the vari-
ables cointegrate, and we discuss error-correction as the force that sustain the equilib-
rium relation. We then present some single-equation tools for cointegration analysis,
e.g. the so-called Engle-Granger two-step procedure and cointegration analysis based
on unrestricted ADL models. We show how to estimate the cointegrating parameters
and how to test the hypothesis of no-cointegration. Towards the end of the note we
discuss some limitations of the single-equation approach.
Outline
§1 Unit-Root Time-Series and Cointegration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
§2 Estimation and Inference . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
§3 Testing for No-Cointegration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
§4 Limitations of the Single-Equation Approach. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
§5 Concluding Remarks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
1
1 Unit-Root Time-Series and Cointegration
In this section we look at linear combinations of unit root non-stationary time series and
define the concept of cointegration. To simplify the notation we consider the case of = 2
variables in most of the presentation below, but the discussion is easily extended to more
variables.
Let 1 and 2 be two time series that are integrated of first order, I(1). We can write
the two processes on the form
where 1 and 2 are random walk components generated by unit roots. We often refer
to as the stochastic trend of .
Next define the linear combination, := 0 , where is a vector of variables, and
is a vector of weights in the linear combination, i.e.
à ! à !
1 1
= and =
2 − 2
We note that contains the random walk component, 1 − 2 2 , and in most cases
will also be I(1). The result that a combination of I(1) variables is in general I(1) can
easily be extended to higher order of integration, and a combination of variables integrated
of order 2 (say), will also be I(2) in general.
An exception from this result is if there exist a vector, , so that defined in (3) is
a stationary process. This property is denoted cointegration and the vector is called a
cointegration vector. For cointegration we need to be common, that is generated by
the same underlying random walk, ∗ , i.e.
1 = 1 ∗ and 2 = 2 ∗
2
Example 1 (cointegrated processes): As an example of a data generating process
(DGP) that generates cointegrated variables, consider the following system:
∆2 = 2 (4)
1 = 2 2 + 1 (5)
where 1 and 2 are IID and uncorrelated error processes. We solve for the levels to find
X
2 = 2 + initial value = 2 + 20
=1
X
1 = 1 + initial value + stationary process = 2 2 + 2 20 + 1
=1
Here 1 = 2 2 is a common stochastic trend and the processes cointegrate with cointe-
gration vector = (1 : − 2 )0 . In particular we find
= 0 = 1
which is stationary. An economic example could be that income (2 ) develops as a random
walk process according to (4), while consumption (21 ) according to (5) is a linear function
of income plus a stationary noise term. The dynamics of both equations could of course
be more complicated. ¨
are cointegration vectors for the variables in . In the first case, , we have imposed a
normalization on the first coefficient, 1 = 1. This normalization is natural if we have
a relation of the form (6) in mind, but we could equally well have chosen a different
¡ ¢
normalization, e.g. e = − −1 1 0 , corresponding to an equation with 2 on the left
2
hand side.
The definition of cointegration is easily extended to more variables. In particular,
let = (1 2 3 )0 be a −dimensional vector of variables. Then a vector
¡ ¢0
= 1 − 2 − is a cointegration vector if
= 0 = 1 − 2 2 − −
is a stationary process. Note that with I(1) variables there can be several (at most
− 1) different cointegration vectors. This is not a problem for the theory, but the single-
equation tools presented in this note are only appropriate for the existence of a single
cointegration vector. As the number of variables, , increases it becomes less and less
likely that there is only one stationary combination.
3
1.1 Cointegration and Economic Equilibrium
Consider again a DGP as in Example 1:
where 2 is IID and is a stationary process uncorrelated with 2 . Notice that the
individual variables, 1 and 2 , are I(1) non-stationary, while 0 is a stationary process.
An implication is that the shock 2 has permanent effects on the levels of both variables
but only transitory effects on 0 .
That makes it natural to think of the cointegrating relation (8) as defining an economic
equilibrium: The variables themselves wander arbitrarily far up and down due to the
accumulation of shocks to 2 , but they never deviate too much from equilibrium. When
the variables cointegrate, we can define ∗1 = + 2 2 , and we will refer to ∗1 as
the equilibrium value of 1 , and = 1 − ∗1 is the deviation from equilibrium. The
equilibrium value can be interpreted as the value at which there is no inherent tendency for
1 to move away, but it is important to realize that because the economy is continuously
hit by shocks, the system will never settle down at ∗1 , and 1 will not converge to ∗1
in any sense.
Example 2 (purchasing power parity): Let 1 = log( ) denote the log of the
bilateral exchange rate between Dollar and Euro (denominated as Dollar per Euro), and
let 2 = log( ) − log( ) denote the corresponding difference between the logs of
the consumer prices. Then
µ ¶
¡
¢ ·
= 1 − 2 = log( ) − log( ) − log( ) = log
is the relative deviation from purchasing power parity (PPP) between the US and the
Euro area. For most countries consumer prices and exchange rates appear non-stationary,
and if the deviation from PPP is stationary we can think of PPP as a valid equilibrium
relation for parity between US and the Euro area. In this case = (1 −1)0 would be a
cointegrating vector for = (1 2 )0 . If, on the other hand, the deviation, , is non-
stationary, it means that the price differential can wander arbitrarily far from the PPP
value and there is no equilibrium interpretation of the PPP. ¨
4
This suggests that the relation
org
= + reg
+
defines an equilibrium for the orange market, where is the additional price of organic
oranges in equilibrium, and is the deviation from equilibrium in period . Note again,
that org
−
reg
will not equal in any specific period and org
−
reg
will not approach
as → ∞. The equilibrium concept refers to the fact that fluctuations of org −
reg
Example 4 (private consumption): Similarly, Figure 1 (C) illustrates the log of real
private consumption in Denmark, , the log of real disposable income, , and the log of
real private wealth including the value of owner occupied housing, (we have subtracted
2 from in the graph to make the levels comparable). All three time series are clearly
trending. The series for consumption and income have many similarities and co-move
in some periods. Deviations from this pattern seem to occur primarily when there are
large fluctuations in private wealth. People familiar with the Danish business cycle will
recognize the peak in private wealth in 1986 as the result of a boom in the housing market,
which apparently drove up the consumption-to-income ratio. The time series behavior, as
well as simple economic theory, suggest that consumption depends on both income and
wealth, and graph (D) depicts the deviation, = − ∗ , from a simple consumption
function
= −0404 + 0364 · + 0516 · +
We note that the deviation, , looks much more stable than the variables themselves,
suggesting that = (1 −0364 −0516)0 may be a cointegrating vector for = ( )0 .
Whether the deviation, , actually corresponds to a stationary process is a testable
hypothesis to which we return in §3. ¨
= − ( − )
so that money demand increases with the amount of transactions, measured by , and
decreases with the opportunity cost of holding money, − . This suggests that =
(1 −1 −)0 could be a cointegrating vector for the variables in .
Alternatively, theories for the determination of interest rates would suggest that two
interest rates with different maturities should be cointegrated, and also the velocity, −
5
(A) Price of oranges (pence per lb.) (B) Price differential
Organic oranges 30
250 Regular oranges
200 25
150 20
100 15
50 10
0 50 100 150 0 50 100 150
(C) Real consumption and income, logs (D) Deviation from consumption function
6.50 Income
Consumption 0.05
Wealth (subtracted 2)
6.25
0.00
6.00
-0.05
Figure 1: Examples of some possibly cointegrated series. (A): Price of organic oranges,
org reg
, and regular oranges, , measured in pence per lb. (B): The price differential,
org reg
− . (C): Real aggregate consumption, , disposable income, , and private wealth,
, in logs. (D): The linear combination, = − 0364 · − 0516 · + 0404.
, may be stationary. That suggests a different scenario with two cointegration relations:
⎛ ⎞ ⎛ ⎞
⎜ ⎟ ⎜ ⎟
⎜ ⎟ ⎜ ⎟
1 = (0 0 −1 1) ⎜
0 ⎟ 0 ⎜ ⎟
⎜ ⎟ = − and 2 = (1 −1 0 0) ⎜ ⎟ = −
⎝ ⎠ ⎝ ⎠
It is an empirical question, which of the scenarios (if any) that characterizes a data set,
but the single-equation tools presented in this note are only appropriate in the scenario
with one cointegrating relation. ¨
6
trends, and that the linear combination, 0 , cancels the stochastic trends but not the
deterministic trends. To model this case we can extend (6) with a deterministic trend
term, e.g.
1 = + 1 + 2 2 +
The interpretation is that = 1 − 2 2 is trend-stationary, i.e. stationary around the
linear trend, + 1 . The deviation, , is a mean zero stationary process.
Similarly, linear combinations could be stationary around other deterministic compo-
nents, e.g. level shifts.
1 = + 2 2 +
be an equilibrium relation between two I(1) variables. Since is a stationary mean zero
variable, there exist a stationary ARMA model for . Assume for simplicity that it is an
AR(2),
= 1 −1 + 2 −2 +
or collecting terms:
which is an autoregressive distributed lag model, ADL(2,2). This can also be written as
the error-correction model
7
where the long-run solution is the lagged deviation from the cointegrating relation, −1 ,
and the error-correction parameter −(1 − 1 − 2 ) 0 ensures that deviations from the
equilibrium are eliminated1 .
To intuitively understand the link between cointegration and error correction, notice
that under the maintained assumptions, ∆1 , ∆1−1 , ∆2 , ∆2−1 , and are all sta-
tionary terms. Since 1 and 2 are assumed to be I(1), the equation in (9) is only
balanced in terms of the order of integration if the combination 1−1 − 2 2−1 is station-
ary, i.e. if the variables cointegrate. If the variables do not cointegrate, the only way to
balance the equation is to exclude the levels from the equation by setting (1 −1 −2 ) = 0.
The link between cointegration and error correction also emphasizes that cointegration
is essentially a system property; and from the result of the representation theorem we do
not know whether 1 or 2 or both variables error correct. This suggests that a general
formulation of the error-correction model consists of an equation for each variable, in our
case
where one lag of each first difference has been included. Stacking the equations we may
write the model as the so-called vector error correction model,
à ! à ! à !à ! à ! à !
∆1 1 Γ11 Γ12 ∆1−1 1 1
= + + (1−1 − 2 2−1 )+
∆2 2 Γ21 Γ22 ∆2−1 2 2
or
∆ = + Γ∆−1 + 0 −1 +
We note that the lagged deviation from the cointegrating relation, 0 −1 = 1−1 −
2 2−1 , appears as an explanatory variable in both equations. For 1 to error correct we
need 1 0. To see this, imagine that 1−1 is above equilibrium so that 1−1 − 2 2−1 is
positive. For 1 to move towards the equilibrium we need ∆1 0, which requires 1
0. If 1 error corrects, the magnitude of 1 measures the proportion of the deviation that
is corrected each period, and 1 is sometimes referred to as the speed of adjustment. As
an example, a value of 1 = −04 would indicate that 40% of a deviation from equilibrium
is removed each period. Using the same line of arguments, 2 0 is consistent with error
correction of 2 .
1
The simple assumptions used in the present derivation impose a common factor restriction on (9) but
that is not necessarily true in practice.
8
To illustrate the graphical implications of cointegration and error correction we con-
sider a simple model for two cointegrated variables,
à ! à ! à !
∆1 −02 1
= (1−1 − 2−1 ) + (10)
∆2 01 2
where 1 and 2 are independent standard normals, (0 1). Here = (1 −1)0 is a
cointegrating vector and both variables error correct, with speeds of adjustment given
by = (−02 01)0 . One realization of 1 and 2 ( = 1 2 100) generated from the
DGP in (10) is illustrated in Figure 2 (A). Notice the strong co-movement between the
variables, which reflects that they have the same stochastic trend. Graph (B) depicts the
deviation from the long-run relation,
= 0 = 1 − 2
The series is relatively persistent and is often above or below equilibrium for longer
periods of time. This illustrates the moderately slow error-correction in (10). In graph
(C) we illustrate the speed of adjustment. We consider a large deviation = 0 =
10 in a particular period and show the adjustment towards equilibrium in a situation
where no shocks hit the system. In the present case the deviation from 0 is visible
for approximately 10 periods and the convergence is exponential. It is the equilibrating
force in graph (C) that ensures that the levels in graph (A) do not move to far apart.
Finally graph (D) depicts a cross plot of 1 on 2 . The variables are non-stationary
and will wander arbitrarily on the real axis. Cointegration (i.e. the force implied by
error-correction) implies that the observations will never move to far from the equilibrium
defined by the straight line. Finally, observe that the most recent observation is far from
equilibrium, 0 100 0. If we were to make an out-of-sample forecast of the series,
101 102 then we would conjecture that would be drawn towards equilibrium, i.e.
that either 1 would decrease or that 2 would increase to close the gap.
Example 6 (prices on the orange market, continued): For the case of the organic
and regular oranges, an estimation yields the two error-correction equations
¡ reg ¢
∆org
= 22864 − 1090 · org org
−1 − −1 + ̂
(1665) (0081)
reg ¡ reg ¢
∆ = 1147 − 0008 · org reg
−1 − −1 + ̂
(0634) (0031)
where the numbers in parentheses are standard errors of the estimated coefficients. We
can write the system as a vector error correction model,
à ! à ! à ! à !
∆org
22864 1090 ¡ org reg ¢ ̂org
= − −1 − −1 +
∆reg
1147 0008 ̂reg
where à !
org
0
−1 = (1 −1) −1
= org reg
−1 − −1
reg
−1
9
(A) Two cointegrated variables (B) Deviation from equilibrium
x1t
0 x2t 'x t =x 1t x 2t
2.5
-5
0.0
-10
-2.5
0 20 40 60 80 100 0 20 40 60 80 100
7.5
5.0 -5
'x t
2.5
0.0 -10 x
100
-2.5
0 20 40 60 80 100 -12.5 -10.0 -7.5 -5.0 -2.5 0.0
is the cointegrating relation, and = (−1090 −0008)0 characterizes the speed of adjust-
ment towards equilibrium.
The organic orange price seems to error correct very strongly, removing the entire
disequilibrium each month. The regular orange price, on the other hand, does not seem to
error correct. The coefficient is negative, indicating a movement away from equilibrium,
but it is very small and not significantly different from zero. A simple interpretation of
this result is that the orange price is essentially determined on the large market for regular
oranges. The price of organic oranges has to follow the price of regular oranges, with an
additional premium of approximately 23 pence per lb. Note that changes in the price of
regular oranges (i.e. a shock to reg
ceteris paribus) will be fully transmitted to the price
of organic oranges after one month, while changes to the price of organic oranges (i.e. a
ceteris paribus) will not be transmitted to the market for regular oranges. ¨
shock to org
10
2.1 The Engle-Granger Two-Step Approach
Recall, that if a set of variables, 1 and 2 , cointegrate then there exists coefficients,
and 2 , so that
1 = + 2 2 + (11)
defines an equilibrium. It is natural to try to estimate 2 in the static regression (11) and
this is the approach suggested in the seminal paper of Engle and Granger (1987).
It can be shown that if 1 and 2 are I(1) and cointegrated then the OLS estimator
from (11), b , is consistent for the true parameter, . We do not postulate that the
2 2
model in (11) is the DGP that generated the data, and it turns out that consistency of
b2 holds even if the estimation model is misspecified relative to the DGP—as long as the
misspecification only relates to stationary terms. The reason is that the stochastic trends
will dominate asymptotically, so for → ∞ any misspecification of stationary terms
will not affect the estimator. As an example the static regression in (11) will produce
consistent estimators even if the true DGP is dynamic. This is discussed in some detail
in Box 1. This result is in contrast to the stationary case, where consistency is normally
only obtained if the DGP is contained in the estimation model.
Consistency of the estimator tells you that b2 converges to 2 as diverges. It turns
out that the non-stationarity of the variables in affects the so-called rate of convergence,
b2 go to zero. If 1 and 2 are stationary variables,
i.e. the speed at which the variance of
we know that under usual conditions,
√ ³ ´
b − → (0 )
2 2
11
Box 1: Static Regression when the DGP is Dynamic
In most cases we believe that the DGP, generating the observed data in the economy, is dynamic.
In this case the static regression (11) is misspecified; but the misspecification is related only to
stationary terms and the obtained estimator, b , is still consistent.
2
As an example, consider a simple dynamic DGP given by
where 1 and 2 are IID error processes. Here, 2 is a random walk, while 1 is generated as
an autoregressive distributed lag model ADL(1,1). The equation in (B1-1) can be rewritten as
or
1 = + 2 2 + 1 ∆1 + 2 ∆2 + e1 (B1-3)
we may define the so-called error correction term as the deviation from equilibrium,
b = 1 −
b 2
b− 2
Under cointegration b is a stationary process, and since the estimators converge to the
true values very fast we can include b−1 as a fixed regressor in a dynamic model. The
second step of the Engle-Granger procedure is therefore to estimate an error-correction
model given b−1 , e.g.
where we have assumed one lag in the first differences and have conditioned on the con-
temporaneous change ∆2 . All terms in the error correction models are stationary and
12
standard inference procedures apply to all parameters, in the sense that −ratios will
follow standard normal distributions, (0 1), asymptotically.
Recall that the unrestricted ADL model can be written as an error-correction model. In
particular we can use the reformulations
These formulations are equivalent but (13) can be estimated with OLS while (15) is
non-linear in the parameters and requires a more elaborate estimation procedure (e.g.
maximum likelihood).
Compared to the estimator from the static regression, the estimator derived from a
dynamic model has the advantage of being based on a well-specified model. The main
problem in empirical applications is that the DGP is not known, so the precise form of
(12) has to be determined from the data. The usual approach is to start with a general
ADL(p,q), where and are large enough to eliminate residual autocorrelation. From
this model insignificant lags can be removed.
13
(A) Distribution of estimators, stationary case (B) Distribution of estimators, cointegrated case
30 30
T=50 T=50
T=100 T=100
T=500 T=500
20 20
10 10
0 0
0.50 0.75 1.00 1.25 1.50 0.50 0.75 1.00 1.25 1.50
4
0.2
2 0.1
0 0.0
0.4 0.6 0.8 1.0 1.2 1.4 -10.0 -7.5 -5.0 -2.5 0.0 2.5 5.0
(E) Distribution of estimators, static regression (F) Distribution of estimators, ADL regressions
2.0 2.0
ADL(1,1)
ADL(2,2)
1.5 1.5
1.0 1.0
0.5 0.5
0.0 0.0
0 100 200 300 400 500 0 100 200 300 400 500
Figure 3: (A): Consistency of the estimated parameter in a static regression for station-
ary variables. (B): Superconsistency for cointegrated I(1) variables. (C): Distributions of
the estimated cointegration parameter based on a static and a dynamic regression. (D):
Distributions of the −ratios under a true null hypothesis for = 100 and = 500
based on the static and dynamic regressions. (E)-(F): Mean and 95% confidence bands
of the distributions of the estimated cointegration parameter for different sample lengths
= 20 30 500. The Monte Carlo simulations are based on 10 000 replications.
14
Box 2: Inference on Coefficients in an ADL Model
Consider an ADL(2,2) model given by
where is an IID process. Given that the variables in (B2-1) are all I(1), it is interesting
to ask if any of the estimators obtained by applying OLS to equation (B2-1) follow standard
distributions, so that inference based on the standard normal distribution applies.
The answer to this question is given in Sims, Stock, and Watson (1990). They give the
general result that an estimated parameter follow a normal distribution asymptotically if it
can be written as the coefficient to a mean zero stationary variable—possibly after a linear
transformation of the model. This means that if the model can be reformulated so that e.g. the
parameter is the coefficient to a stationary variable with mean zero, then the distribution of
b is asymptotically normal.
Again we may rewrite the ADL model in ECM form as
for = 1 2 , where the innovations, 1 and 1 , are assumed (0 1) and uncorrelated.
The DGP implies a long-run solution with a cointegrating coefficient of 2 = 030+020 1−050 = 1.
Based on 10 000 data sets from this DGP we look at the properties of the estimators
obtained from the static regression (11) and from the ADL(1,1) model. We note that the
15
used ADL model is identical to the DGP and we expect it to perform better than the
static regression. To illustrate the effect of choosing an estimation model which is more
general than the DGP we also consider the estimates obtained from an ADL(2,2), which
estimates a redundant lag for both variables. This setup amounts to using one regression
model that coincides with the DGP (an ADL(1,1)), one that is too general (an ADL(2,2)),
and one that is too restricted (a static regression).
Figure 3 (C) illustrates the distributions of the estimated parameters for the three
cases for = 100 observations. We note that the distributions for the ADL(1,1) and
ADL(2,2) almost coincide and are symmetric and nicely centered around the true value.
This indicates that estimating a redundant lag will only marginally affect the estimators.
The distribution of the estimates from the static regression is shifted to the left, reflecting
the bias of the estimator. The mean of the estimates is 093, which is significantly smaller
than unity. We also note that the distribution is skewed, with a long left tail. Graph
(E) and (F) illustrate the mean and the 5% and 95% quantiles of the distributions of
the estimates for different sample lengths = 20 30 500. We see that the estimator
from the static regression is consistent, but it is severely biased in small samples and the
distribution is clearly asymmetric. The estimator from the dynamic regression has the
correct expectation for all considered sample lengths. We also note that the cost of the
two redundant regressors in the ADL(2,2) is only visible for very short sample lengths, and
even for = 20, the difference between the estimates from an ADL(1,1) and an ADL(2,2)
is small.
The results for this specific DGP thus seem to suggest that estimators derived from a
dynamic regression model are clearly preferable to the two-step Engle-Granger estimators.
This seems to be confirmed for more general classes of DGPs in the literature.
16
on the cointegrating coefficient using the standard −ratio
b −
b
2 = = 2
b )
se( 2
which will follow a standard normal distribution asymptotically. A more theoretical dis-
cussion of the inference on the parameters of the ADL model for I(1) variables is given in
Box 2.
The only complication is that b2 is a non-linear function of the estimated parameters,
b2 is a complicated function of the covariance matrix
cf. (14), and the standard error of
for the estimated parameters in (12). The software package PcGive has a procedure to
calculate the static long-run solution and supply the derived standard errors. In other
software packages it is sometimes more convenient to use the alternative (but numerically
equivalent) IV estimator mentioned in Box 1 or the non-linear etimation of (15) since they
automatically produce standard errors to b2 .
The distributions of the −ratios in the Monte Carlo simulation are reported in graph
3 (D). The −ratios from a static regression have a distribution which is skewed to the left,
and inference based on a standard normal would be very misleading. The −ratios from
the dynamic regression, on the other hand, seem to be close to a standard normal—making
it possible to test hypotheses on the parameters.
b
= −0404 + 0364 · + 0516 · + (16)
(0129) (0049) (0044)
where the numbers in parentheses are standard errors. The estimates seem consistent
with a simple consumption function in which consumption depends positively on income
and wealth. We may note that a one per cent increase in income and wealth give less
than a one per cent increase in consumption as 0364 + 0516 = 088. Consequently, the
consumption-income ratio will not be constant in a steady state, which may be regarded
as unsatisfactory from an economic point of view. Note that −ratios constructed from
the reported standard errors in (16) do not follow a standard normal distribution and they
are not suitable for testing. For example we cannot test if the sum of coefficients, 088, is
significantly different from zero.
Based on the estimates of the static regression (16) we may define the error-correction
term
b = − 0404 − 0364 · − 0516 ·
which is interpretable as the deviation from equilibrium. The term b may be used in
the construction of error-correcting models to characterize the dynamic properties of the
17
data as suggested by the Engle-Granger approach. In principle there may exist error
correction models of ∆ , ∆ , and ∆ , and starting with a model with two lags in the
first differences and deleting insignificant lags, produces the three equations:
∆b b−1
= 0001 − 0195 · ∆−1 + 0229 · ∆ + 0426 · ∆ − 0250 ·
(0002) (0077) (0057) (0117) (0064)
∆b b−1
= 0002 + 0433 · ∆ + 0387 · ∆−1 − 0353 · ∆−1 + 0066 ·
(0002) (0118) (0115) (0087) (0099)
b = 0003 + 0232 · ∆ − 0030 ·
∆ b−1
(0001) (0060) (0050)
Note that only consumption corrects deviations form the long-run relation, with a speed
of adjustment of −025, while ∆ and ∆ do not adjust significantly when the variables
are out of equilibrium.
An alternative estimator of the cointegrating coefficients can be derived from a condi-
tional ADL model for consumption. Assuming at most three lags and deleting insignificant
terms lead to the preferred ADL model
18
(A) Impulse response function for income (B) Impulse response function for wealth
0.5 0.5
0.4 0.4
0.3 0.3
0.2 0.2
0 10 20 30 40 0 10 20 30 40
Figure 4: Impulse-response functions for a permanent change in income and wealth, i.e.
the accumulated values of + and + ( = 0 1 40).
the contemporaneous impact is 0240, and there is a smooth convergence to the long-run
impact of 0458. A permanent change in the private wealth have a contemporaneous
effect on consumption of 0401, which is not far from the long-run impact of 0436. The
convergence is not monotone, however, and the large contemporaneous impact is followed
by a decrease in the next period and then a gradual convergence.
Notice that the results obtained in the estimation of (17) can also be obtained by
estimating the equivalent unrestricted error-correction model, i.e.
Here the cointegrating coefficients can be found as the long-run solution, 01150251 =
0458 and 01100251 = 0436, which are identical to the results in (18). ¨
to reveal that 2 = 0, at least asymptotically. This turns out not to be the case, however,
and the estimator b is not consistent. Moreover, as → ∞ the −ratio, b 2 =0 , will
2
indicate a significant relation between 1 and 2 . This is known as the spurious regression
result. The problem is that when the variables do not cointegrate, is an (1) process
and standard results do not hold.
19
Example 8 (spurious regression): As an example of a spurious regression, consider
two presumably unrelated I(1) variables, namely yearly data covering 1980 − 2000 for the
log of real private consumption in Denmark, cons , and the log of the number of breeding
cormorants in Denmark, bird . We estimate a static regression:
b
cons = 12145 + 0095 · bird +
(0150) (0015)
b 0095
2 =0 = = 630
0015
which seems highly significant in a (0 1) distribution, apparently suggesting a clear
positive relation between the number of birds and aggregate consumption! Furthermore,
2 in the equation is 069, indicating that the number of breeding birds can account
for large proportion of the variation in consumption. These results are of cause totally
spurious—a simple consequence of the variables being I(1). ¨
To illustrate the spurious regression problem we set up a simple Monte Carlo simulation.
As a comparison we first reproduce the standard results for a stationary regression. We
generate data series as independent IID variables,
where 1 and 2 are independent drawings from a (0 1). The results from the regression
model (20) are reported in Figure 5 (A) and (B) for sample lengths = 50 100 500. We
note in graph (A) that the distributions of b2 are centered around the true value and the
convergence of the estimator b2 implies that the variance decreases as → ∞. In graph
(B) we consider the distributions of the −ratio, b 2 =0 . The distribution is close to the
asymptotic (0 1) for all considered sample lengths.
Next consider the spurious regression between I(1) variables. Here we generate data
as independent random walks, i.e.
The results are reported in graph (C) and (D). In graph (C) we note that the distribution of
b2 is centered around the true value, but it does not collapse as → ∞. This reflects that
the estimator is unbiased but not consistent. In graph (D) we notice that the distributions
of the −ratios get increasingly dispersed as increases, and the distributions are far from
a standard normal. As → ∞ this implies that using the conventional critical values of
±196, we would always reject the true hypothesis that 2 = 0.
One way to explain the spurious regression result is to note that the regression model in
(20) is logically inconsistent if the variables do not cointegrate. Since there is no genuine
20
relation between 1 and 2 , the true value of the parameter is zero, 2 = 0; so a model
with 2 6= 0 is necessarily false. Note, however, that the model with 2 = 0 is also false. If
2 = 0, then the only way to balance the equation is if is I(1), but that is not consistent
with the assumptions of the regression model. One problem with the spurious regression
is therefore that the actual DGP can not be contained in the estimation model; and the
test for 2 = 0 against 2 6= 0 compares two false models.
Based on this insight it is easy to suggest a modification of the static regression, which
circumvents some of the problems of the spurious regression. As an example, consider the
simple ADL(1,0) model, where the lagged value of 1 is included in the regression:
1 = 1 and 0 = = 0
21
(A) Distribution of estimators, IID case (B) Distribution of t-ratios, IID case
T=50 0.4 N(0,1)
T=100 T=50
7.5 T=500 T=100
T=500
5.0
0.2
2.5
0.0 0.0
-0.50 -0.25 0.00 0.25 0.50 -4 -2 0 2 4
(C) Distribution of estimators, I(1) case (D) Distribution of t-ratios, I(1) case
0.75 T=50 T=50
T=100 0.075 T=100
T=500 T=500
0.50
0.050
0.25 0.025
0.00 0.000
-3 -2 -1 0 1 2 3 -75 -50 -25 0 25 50 75
(E) ADL estimators, I(1) case (F) ADL t-ratios, I(1) case
40 T=50 0.4 N(0,1)
T=100 T=50
T=500 T=100
30 T=500
20 0.2
10
0
-0.50 -0.25 0.00 0.25 0.50 -5.0 -2.5 0.0 2.5 5.0
Figure 5: Monte Carlo results for a static regression for stationary variables and for a
spurious regression.
is stationary. It follows that the null hypothesis of no-integration can be translated into
the hypothesis of a unit root in . This hypothesis can be tested using a conventional
augmented Dickey-Fuller (ADF) test. Allowing to have a mean different from zero
but no deterministic linear trend, the hypothesis of no-cointegration can be tested as the
22
hypothesis H0 : = 0 in the ADF regression with a constant term,
−1
X
∆ = + ∆− + −1 + (22)
=1
b
b
=
se(b
)
follows a DF distribution. Critical values for the DF distribution are reproduced in part
(A) of Table 1 in the row with zero estimated parameters.
If the relevant alternative to a unit root is trend-stationarity, the ADF regression (22)
may be augmented with a linear trend term, and the test for no-cointegration is the ADF
test with a linear trend, .
= org reg
−
In many cases the cointegration vector, , is unknown and the above approach is not
feasible. The test procedure can easily be modified, however, to the case where is
estimated as in the Engle-Granger procedure. Recall that if the cointegration coefficients,
2 , are unknown, they can be super-consistently estimated in the static regression
23
or from the long-run solution of a dynamic model. The regression (23) corresponds to a
cointegrating relation if the deviation from the relation, , is a stationary process. Esti-
mating the parameters we can test for no-cointegration by testing whether the estimated
b , contains a unit root. This test is translated into the hypothesis H0 : = 0
residual,
in the ADF regression
−1
X
∆b = ∆b
− + b
−1 + (24)
=1
We note that since the estimated residual, b , has a mean of zero, there is no constant
term in the ADF regression (24). Nonetheless, the critical values for the ADF test depend
on the deterministic specification of the static regression, e.g. whether (23) contains a
constant or a linear trend.
The fact that the cointegrating vector b is estimated also changes the critical values
for the ADF test, and the estimation uncertainty has to be taken into account. The
b ,
intuition is that OLS applied to the static regression (23) will minimize the variance of
and graphically the estimated residuals will look as ‘stationary as possible’. And the more
explanatory variables we include in (23), i.e. the more parameters we estimate, the smaller
b , and the more stationary it will look. In the test procedure we will
is the variance of
have to account for that, and the critical values depend on the number of I(1) regressors
in (23). The asymptotic distributions of tests for no-cointegration are illustrated in Figure
6 (A). As the number of regressors in the static regression increases, the distribution of
the ADF test statistic moves to the left. This reflects that the OLS procedure makes the
variance of the estimated residual smaller and smaller. The critical values of the residual
based test are reproduced in Table 1 (A).
which is depicted in Figure 1 (D). To test for no-cointegration we use an ADF regression
without deterministic terms. In the present case one lag is needed,
= −0223 ∆b
∆b b−1 + b
−1 − 0221
(0089) (0068)
24
0.6
Number of estimated parameters
in the static regression.
7 6 5 4
3 2 1
0.5
DF with a constant, c
0.4
N(0,1)
0.3
0.2
0.1
-8 -7 -6 -5 -4 -3 -2 -1 0 1 2 3 4 5
to the business cycle, suggesting that the consumption-income ratio is pro-cyclical besides
the wealth effects. To obtain stronger evidence of cointegration one possible solution is to
augment the model with a measure of the business cycle, e.g. a variable measuring the
effects of unemployment. ¨
Here we can test the hypothesis that 1 do not error correct, i.e. H0 : 1 = 0 against the
cointegrating alternative, H : 1 0. The test statistic is just the conventional −ratio,
given by
b b1
1 =0 =
se(b
1)
As for the residual based test, the distribution of b =0 depends on the deterministic terms
1
in the regression (25) as well as the number of I(1) variables in . The asymptotic critical
25
(A) Residual-based (ADF) test for no-cointegration
Number of estimated Constant in (22) Constant and trend in (22)
parameters 1% 5% 10% 1% 5% 10%
0 −343 −286 −257 −396 −341 −313
1 −390 −334 −304 −432 −378 −350
2 −429 −374 −345 −466 −412 −384
3 −464 −410 −381 −497 −443 −415
4 −496 −442 −413 −525 −472 −443
values are reproduced in part (B) of Table 1. This test appeared very early in the PcGive
software package and is often referred to as the PcGive test for no-cointegration.
Comparing the residual-based test for no-cointegration with the test for no-error-
correction in the dynamic model three things are worth noting. First, the test for no-error-
correction is based on the assumption that 1 is the only variable which error corrects to
the potential cointegrating relation. This implies that we should test for no-cointegration
in the ‘correct’ error-correction model; in the present case that is the model for ∆1 and
not the model for ∆2 . In most cases, prior knowledge from economic theory suggests
which equation to consider.
Secondly, the test for no-error-correction of ∆1 is parallel to a test for no-cointegration
for a relation involving 1 . Even if we cannot reject the hypothesis of no-error-correction
of ∆1 , the other right hand side variables in levels, 2 , may still cointegrate in a
relation not involving 1 .
Thirdly, a comparison of (25) with the ADF test (24) shows, that the latter imposes a
common factor restriction on the dynamics when the hypothesis of a unit root is tested.
There is no a priori reason to believe that the data obey a common factor restriction,
and the test may be negatively affected by imposing the restriction. The relation between
(25) and (24) is explored in more details in Box 3.
26
Box 3: ADF Tests and Common Factor Restrictions
Consider a potential cointegrating relation between two I(1) variables
1 = + 2 2 +
To test for no-cointegration we use the residual, = 1 − − 2 2 , and consider an ADF
regression. Assume for simplicity that only one lag of ∆ is needed, i.e.
This is an ECM model, but subject to a number of common factor restrictions. We have 6
regressors on the right hand side, but only 4 parameters to be estimated: , , 1 , and 2 , and
hence two restrictions.
The restrictions imply e.g. that the contemporaneous impact of 2 on 1 is 2 , which is
identical to the long-run impact. There is no compelling reason to believe that this is true in
practice, see also Example 7.
error correction model in (19) suggests cointegration we test for no-error-correction using
the −ratio,
b −0251
1 =0 = = −386
0065
The 5% critical value is given in part (B) of Table 1 as −351, so we can borderline reject
no-cointegration.
The different conclusions from the residual-based test and the PcGive test for no-
cointegration could be related to the fact that the common factor restrictions imposed
on the ADF test are not in line with the data. The test statistic for the two common
factor restrictions is 1041, which is highly significant according to the asymptotic 2 (2)
distribution, so the common factors are easily rejected. Rejection of the common factor
restrictions and our knowledge that consumption is the only error-correcting variable
suggest that the PcGive test for no-cointegration is probably preferable in the present
case. ¨
27
The starting point for the discussion is an ECM model for three variables,
and to make the discussion less abstract we think of a consumption function and use the
notation , and for the three variables. In the analysis of the equation (26) we
implicitly make three sets of assumptions:
(1) Cointegration is a system property and in principle there exist error correction equa-
tions for all variables: ∆ , ∆ , and ∆ . We only consider the equation for ∆ .
(2) We assume that there is only one cointegrating relation between the variables, given
by the long-run solution.
= − 2 · − 3 ·
1 1
We mention in §1 that this is not necessarily true; and as the number of variables
in the model increases it actually becomes less and less likely.
(3) To condition equation (26) on the contemporaneous changes, ∆ and ∆ , we
assume that they are predetermined, i.e. there is no feedback from ∆ to ∆
and ∆ . This is not necessarily true in practice, where several variables may be
simultaneously determined.
Below we discuss these three issues in turn.
where ‘dynamics’ represents lagged values of first differences. Cointegration implies the
existence of error correction, so one or more of the three coefficients, 1 , 2 , or 3 , have to
be significantly different from zero. We note that the cointegrating parameters, 1 and 2 ,
appear in all equations, so if we want the best possible (or efficient) estimators, we have
to use the information in all three equations and not just the equation for ∆ . Remember
that we can stack the three equations in a vector error correction model (VECM):
⎛ ⎞ ⎛ ⎞ ⎛ ⎞ ⎛ ⎞ ⎛ ⎞
∆ 1 1 ³ ´ −1 1
⎜ ⎟ ⎜ ⎟ ⎜ ⎟ ⎜ ⎟ ⎜ ⎟
⎝ ∆ ⎠ = ⎝ 2 ⎠ + ⎝ 2 ⎠ 1 − 1 − 2 ⎝ −1 ⎠ + + ⎝ 2 ⎠
∆ 3 3 −1 3
28
where we have left out the dynamics. The parameters of this model can be estimated
using maximum likelihood, but that is beyond the scope of the present note.
In the special case where 2 = 3 = 0 it is sufficient to consider the first equation ∆ ,
and the single equation analysis will be efficient. This assumption is implicitly imposed
by the single equation model.
The first one represent a consumption-income ratio if 1 = 1 and the second one show the
propensity to consume out of private wealth. The long-run relations can be written as
⎛ ⎞
à ! à !
− 1 1 − 1 0 ⎜ ⎟ 0
= ⎝ ⎠ =
− 2 1 0 − 2
Here 11 measures how ∆ is affected by deviations from the first long-run relation,
−1 − 1 −1 , while 12 measures how ∆ is affected by deviations from the second long-
run relation, −1 − 2 −1 . Second row in measures how ∆ is affected by deviations
from equilibrium etc.
The parameters of this model can again be estimated using ML, but we will not discuss
that here. Instead we just note that if we only consider the first equation, then we will
estimate the first row of the model, i.e.
4.3 Exogeneity
To be able to condition on ∆ and ∆ in the single-equation cointegration analysis we
have to assume that they are predetermined, i.e.
29
This requirement states that there can be no feedback from ∆ to ∆ and ∆ . In the
present case consumption is a main component in the gross domestic product, which is
the (national accounts) basis for defining disposable income. This link may suggest that
consumption and income are simultaneously determined in a given quarter.
If the regressor is not predetermined we may exclude it, focussing on the reduced form
with no contemporaneous effects. Alternatively we may be able to find good instruments
for ∆ , and estimate the model using an instrumental variables estimator.
A third possibility is again to estimate the vector error correction model directly.
In this setting the variables, ∆ , ∆ , and ∆ , are treated on equal footing, without
imposing a priori restrictions of exogeneity.
5 Concluding Remarks
This note has illustrated that regression models for unit-root non-stationary time series
give unreliable results, and the usual tools will not be able to distinguish between genuine
relationships and spurious regressions. This suggests that for non-stationary time series
we should always think in terms of cointegration. Logically a relationship can only be
interpreted as defining an economic equilibrium if the variables cointegrate; and if they
don’t—there is no interpretable relationship between the variables.
We have presented a number of single-equation tools for cointegration. The con-
ceptually simplest approach is the Engle-Granger two-step estimation, but for practical
purposes the cointegration analysis based on unrestricted ADL or ECM models are prob-
ably preferable. This also fits within the general-to-specific framework, in which we first
find an appropriate statistical description of the data (the unrestricted ADL model), and
afterwards test hypotheses to link the statical model to economic theory (testing for coin-
tegration and interpreting the long-run relationship).
30
(1996).
References
Banerjee, A., J. Dolado, J. W. Gailbraith, and D. Hendry (1993): Co-
Integration, Error-Correction, and the Econometric Analysis of Non-Stationary Data.
Oxford University Press, Oxford.
Enders, W. (2004): Applied Econometric Time Series. John Wiley & Sons, 2nd edn.
Engle, R., and C. Granger (1987): “Co-Integration and Error Correction: Represen-
tation, Estimation and Testing,” Econometrica, 55, 251—276.
Hendry, D. F., and K. Juselius (2000): “Explaining Cointegration Analysis: Part I,”
Energy Journal, 21(1), 1—42.
Maddala, G. S., and I.-M. Kim (1998): Unit Roots, Cointegration, and Structural
Change. Cambridge University Press, Cambridge.
31