Econometery ch2

2.
2 Heteroscedasticity
2.2.1. Introduction
Heteroscedasticity, what happens if the Error Variance is not constant?
 One of the assumptions we have made until now is that the errors u1 , u 2 , u 3 ,...., u n  , in the regression
2
equation have a common variance  .This is known as the Homoscedasticity assumption. If the errors
do not have a constant variance, we say they are heteroskedastic.
 There are several questions we might want to ask if the errors do not have a constant variance.
These are
 What are the properties of the least squares estimators?
 What are the consequences on the estimated standard errors if we use OLS?
 What are the solutions to this problem?
 We seek answers to the following questions
o What is the nature of heteroscedasticity? o How does one detect it?
o What are its consequences? o What are the remedial measures?
 The term Heteroscedasticity refers to any case in which the variance of the probability distribution of
the disturbance term is not the same in all observations.
Homoscedasticity: the same for all observations
 2   u2
Heteroscedasticity: ui not the same for all observations
 Why does heteroscedasticity matter?
There are two reasons.
1. If there is no heteroscedasticity, the variances of the regression coefficients will be as small as
possible so that, in a probabilistic sense, you have maximum precision. Implies that the OLS
regression coefficients have the lowest variances of all the unbiased estimators that are linear
functions of the observations of Y.
If heteroscedasticity is present, the OLS estimators are inefficient because you could, at least in
principle, find other estimators that have smaller variances and are still unbiased.
2. The second reason is that the estimators of the standard errors of the regression coefficients will
be wrong. They are computed on the assumption that the distributions of the disturbance terms are
homoscedastic. If this is not the case, they are biased, and as a consequence the t tests, and also the
usual F tests, are invalid.
It is quite likely that the standard errors will be underestimated, so the t statistic will be overestimated
and you will have a misleading impression of the precision of your regression coefficients. You may
be led to believe that a coefficient is significantly different from 0, at a given significance level, when
in fact it is not.
Possible Causes of Heteroscedasticity

i. Heteroscedasticity is likely to be a problem when the values of the variables in the sample vary
substantially in different observations.
ii. If the true relationship is given by Y = β1 + β2X + u, it may be the case that the variations in the
omitted variables and the measurement errors that are jointly responsible for the disturbance
1
term will be relatively small when Y and X are small and large when they are large, economic
variables tending to move in size together.
Sources of Heteroscedasticity
1. Following the error-learning models, as people learn, their errors of behavior become smaller
over time. In this case, σ2i is expected to decrease. As an example, as the number of hours of
typing practice increases, the average number of typing errors as well as their variances
decreases.
2. As incomes grow, people have more discretionary income and hence more scope for choice
about the disposition of their income. Hence, σ 2iislikely to increase with income. Thus in the
regression of savings on income one is likely to find σ2iincreasing with income because people
have more choices about their savings behavior.
3. As data collecting techniques improve, σ2iis likely to decrease. Thus, banks that have
sophisticated data processing equipment are likely to commit fewer errors in the monthly or
quarterly statements of their customers than banks without such facilities.
4. Heteroscedasticity can also arise as a result of the presence of outliers. An outlying observation,
or outlier, is an observation that is much different (either very small or very large) in relation to
the observations in thesample. More precisely, an outlier is an observation from a different
population to that generating the remaining sample observations.
5. Another source of heteroscedasticity arises from violating Assumption of CLRM, namely, that
the regression model is correctly specified. Very often what looks like heteroscedasticity may
be due to the fact that some important variables are omitted from the model.
6. Another source of heteroscedasticity is skewness in the distribution of one or more regressors
included in the model. Examples are economic variables such as income, wealth, and
education. It is well known that the distribution of income and wealth in most societies is
uneven, with the bulk of the income and wealth being owned by a few at the top.
7. Other sources of heteroscedasticity: As David Hendry notes, heteroscedasticity can also arise
because of (1) incorrect data transformation (e.g., ratio or first difference transformations) and
(2) incorrect functional form (e.g., linear versus log–linear models).
Note that the problem of heteroscedasticity is likely to be more common in cross-sectional than in time
series data.
Consequences of Hetroscedasticity
What happens to OLS estimators and their variances if we introduce heteroscedasticity?
Letting E ui    i but retain all other assumptions of the classical model?
2 2
Y = β1 + β2X + u,
Applying the usual formula, the OLS estimator of β2 is still BLUE
n X i Yi   X i  Yi
ˆ 2 
X 2  X 
2 ,∧E ( ^β 2 )=β 2
 i  i
2
Var( β̂ 2 ) 
X σ
i
2
i
i ( X )
2 2
But its variance is now given by the following expression. . This is obviously
different from the usual variance formula obtained under the assumption of Homoscedasticity, namely,
σ2
Var( β̂ 2 ) 
X 2
i
Note that ̂ 2 is best linear unbiased estimator (BLUE) if the assumptions of the classical model,
including Homoscedasticity and also if we replace it with the assumption of heteroscedasticity.
The consequence of hetroscedasticity is either over-estimates or under-estimates the variance.
Also the SE will be over-estimated or under-estimated
The test statistics is inconsistent, hypothesis and CI will be invalid.
Detection of Heteroscedasticity
How does one know that heteroscedasticity is present in a specific situation?
1. Informal Methods
Rules of thumb
i. Nature of the Problem.
The nature of the problem under consideration suggests whether heteroscedasticity is likely to be
encountered.
ii. Graphical Method
If there is no a priori or empirical information about the nature of heteroscedasticity, in practice one
can do the regression analysis on the assumption that there is no heteroscedasticity and then do a
2
postmortem examination of the residual squared uî to see if they exhibit any systematic pattern. This
can be achieved by
2
 Plotting uˆ i against ŷ i , the estimated y i from the regression line, the idea being to find out
whether the estimated mean value of y is systematically related to the squared residual.
 One may also plot them against one of the explanatory variables.
As it were shown in the graph above:

 We see that there is no systematic pattern or no relation b/n the error term and the response
variable, suggesting that perhaps no heteroscedasticity is present in the data.
3
 Figure from b to e, however, exhibits definite patterns. For instance, Figure c suggests a
2
linear relationship, whereas Figure d and e indicates a quadratic relationship between u^ i and
Y^ i . using such knowledge, albeit informal, one may transform the data in such a manner
2
that the transformed data do not exhibit heteroscedasticity.

2. Formal Methods
 The most common methods for testing heteroscedasticity are:
i. Goldfield-quandt test iv. A likelihood ratio test
ii. The Breusch pagan/Godfrey test v. Park Test and Glejser Test
iii. The white general heteroscedasticity
test
a) Goldfeld-Quandt Test
2
This popular method is applicable if one assumes that the heteroscedastic variance,  i , is positively
related to one of the explanatory variables in the regression model.
Consider the model: Yi  α  β1 X i  u i

2 2 2 2 σ2
Suppose σ i is positively related to X i as σ i  σ X i .where is a constant.
2
This postulates that σ i is proportional to the square of the X variable.
2
If this is appropriate, it would mean σ i would be larger, the larger the values of X i .
If that turns out to be the case, heteroscedasticity is most likely to be present in the model.
To test this explicitly, Goldfeld and Quandt suggest the following steps:
Step 1: arrange observations based on increasing orderof the values of Xi,
Step 2: Omit c central observations, where c is specified a priori, and divide the remaining (n − c)
observations into two groups each of (n − c)/2 observations.
Step 3: Fit separate OLS regressions to the first (n − c)/2 observations and the last (n − c)/2
observations and obtain the respective residual sums of squares RSS 1 and RSS 2 .
RSS 1 representing the RSS from the regression corresponding to the smaller Xi values (the small
variance group) and RSS 2 that from the larger Xi values (the large variance group)
These RSS each have
n  c   k or
n  c  2k  df
2 2
Where k is the number of parameters to be estimated, including the intercept
Step 4:Compute the ratio
4
RSS2/df
λ
RSS1/df
If ui are assumed to be normally distributed (which we usually do), and if the assumption of
homoscedasticity is valid, then it can be shown that the above λ follows the F distribution with
numerator and denominator df each of (n− c − 2k)/2.
If in an application the computed λ (= F) is greater than the critical F at the chosen level of
significance, we can reject the hypothesis of homoscedasticity, that is, we can say that
heteroscedasticity is very likely.
Note that as a rule of thumb it is suggested that c is about 4 if the sample size is about 30, and it is
about 10 if the sample size is about 60.
Example (The Goldfeld–Quandt Test)
1. Data on consumption expenditure in relation to income for a cross section of 30 families and
suppose we postulate that consumption expenditure is linearly related to income but that
heteroscedasticity is present in the data. Test the presence of heteroscedasticity using Goldfeld and
Quandt test?
Unordered data Ordered data
Y X Y X X Y X Y
55 80 115 180 80 55 180 115
65 100 140 225 85 70 185 130
70 85 120 200 90 75 190 135
80 110 145 240 100 65 200 120
79 120 130 185 105 74 205 140
84 115 152 220 110 80 210 144
98 130 144 210 115 84 220 152
95 140 175 245 120 79 225 140
90 125 180 260 125 90 230 137
75 90 135 190 130 98 240 145
74 105 140 205 140 95 245 175
110 160 178 265 145 108 250 139
113 150 191 270 150 113 260 180
125 165 137 230 160 110 265 178
108 145 139 250 165 125 270 191
Solution
Regression based on the first 13 observations.
Yî = 3.4094 + 0.6968Xi
(8.7049) (0.0744) r 2 = 0.8887 RSS1= 377.17 df= 11

Regression based on the last 13 observations:
Yî = − 28.0272 + 0.7941Xi
(30.6421) (0.1319) r2 = 0.7681 RSS2= 1889.6 df= 11
5
From these results we obtain
RSS2/df 1889.6/11
λ λ  5.01
RSS1/df => 377.17/11
The critical F value for 11 numerators and 11 denominators df at the 5 percent level is 2.82. Since the
estimated F(=λ) value greater than the critical value, we may conclude that there is heteroscedasticity
in the error variance (since H0 is rejected).
Breusch–Pagan–Godfrey Test (BPG)
To illustrate this test, consider the k-variable linear regression model
Yi   0  β1 X1i  β 2 X 2i  ....  β k X ki  u i i  1,2,3,....., n
2
Assume that the error variance  i is described as
 i2  f (α   1 Z1i   2 Z 2i  ....   m Z mi )
2
That is,  i is some function of the non-stochastic variables Z’s, some or all of the X’s can serve as Z’s.
Specifically, assume that
 i2  α   1 Z1i   2 Z 2i  ....   m Z mi 2
That is,  i is a linear function of the Z’s.
2
If  2   3   4 ,....,   m  0 ,  i  1 , which is a constant.
2
Therefore, to test whether  i is-homoscedastic, one can test the hypothesis that  2   3   4 ,....,   m  0.
this is the basic idea behind the Breusch–Pagan test. The actual test procedure is as follows.
Step 1: Estimate the model by OLS and obtain the residuals uˆ1 , uˆ 2 , uˆ 3 ,......., uˆ n
~ 2 
 uˆ 2
i
2
Step 2: Obtain n , this is the maximum likelihood (ML) estimator of  . Note that, The OLS
~ 2 
 uˆ 2
i
estimator is n  k  .
uˆ 2
pi  ~i2
Step 3: Construct variables Pi defined as:  , which is simply each residual squared divided by
~ 2
.
Step 4: Regress pi thus constructed on the Z’s as: Pi  α  1Z1i   2 Z 2i  ....   m Z mi  υi Where vi is the
residual term of this regression.
Step 5:Obtain the ESS (explained sum of squares) from step 4 and define
1
 ESS 
2
Assuming ui are normally distributed, one can show that if there is homoscedasticity and if the sample
asym
2
size n increases indefinitely, then  ~  m1
That is,  follows the chi-square distribution with (m − 1) degrees of freedom.
6
2 2
Therefore, if in an application the computed  (   ) exceeds the critical  value at the chosen level of
significance, one can reject the hypothesis of homoscedasticity; otherwise one does not reject it.
Example (The Breusch–Pagan–Godfrey (BPG) Test)
As an example, let us revisit the data (previous example) that were used to illustrate the Goldfeld–
Quandt heteroscedasticity test. Regressing Y on X, we obtain the following:
Step 1: ŷ = 9.2903 + 0.6378Xi
2
Se = (5.2314) (0.0286) RSS = 2361.153 R = 0.9466.
~ 2 
 uˆ 2
i

2361.153
 78.7051
Step 2: n 30
Step 3: Divide the squared residuals ûi obtained from regression above by 78.7051 to construct the
variable pi.
Step 4: Assuming that pi are linearly related to X i  Z i  , we obtain the regression
P̂i = −0.7426 + 0.0101Xi
Se = (0.7529) (0.0041) ESS = 10.4280 R2 = 0.18

1
  ESS 
Step 5: 2 =5.2140
o Under the assumptions of the BPG test, asymptotically follows the chi-square distribution with 1
d.f. Note: There is only one regressor in above example.
o Now from the chi-square table we find that for 1 d.f, the 5 percent critical chi-square value is
3.8414 and the 1 percent critical χ2 value is 6.6349.
o Thus, the observed chi-square value of 5.2140 is significant at the 5 percent but not the 1 percent
level of significance. Therefore, we reach the same conclusion as the Goldfeld–Quandt test.
c) White’s General Heteroscedasticity Test
Unlike the Goldfeld–Quandt test, which requires reordering of the observations with respect to the X
variable that supposed that be the cause of heteroscedasticityor the BPG test, which is sensitive to the
normality assumption, the general test of heteroscedasticity proposed by White does not rely on the
normality assumption and is easy to implement.
As an illustration of the basic idea, consider the following three-variable regression model (the
generalization to the k-variable model is straightforward).
y i   0  1 x1i   2 x 2i  ui
The White test proceeds as follows.
Step 1:Given the data, estimate the residuals ûi and R2

Step 2:We then run the following (auxiliary) regression:
û i2  α 0  α1 x 1i  α 2 x 2i  α 3 x 1i2  α 4 x 22i  α 5 x 1i x 2i  ν i
7
That is, the squared residuals from the original regression are regressed on the original X variables or
regressors, their squared values, and the cross product(s) of the regressors.
Note that there is a constant term in this equation even though the original regression may or may not
2
contain it. Obtain R from this (auxiliary) regression.
Step 3:Under the null hypothesis that there is no heteroscedasticity, it can be shown that sample size
2
(n) times the R obtained from the auxiliary regression asymptotically follows the chi-square
distribution with df equal to the number of regressors (excluding the constant term) in the auxiliary
asy
regression. That is,

n.R 2 ~χ 2
df
Where d.f is as defined previously. In our example, there is 5 d.f since there are 5 regressors in the
auxiliary regression.
Step 4:If the chi-square value obtained exceeds the critical chi-square value at the chosen level of
significance, the conclusion is that heteroscedasticity. Else is no heteroscedasticity, which is to say that
in the auxiliary regression, α2= α3= α4= α5= α6= 0.
Example (White’s General Heteroscedasticity Test)
1. As an example, let us revisit the data (previous example) that were used to illustrate the Goldfeld–
Quandt heteroscedasticity test
yˆ  14.084  0.60 xi
Se = (5.498) (0.30) RSS=2608.143 r2=0.934
Solutions for Whites General Heteroscedasticity test
uˆ i2  145.649  2.024 xi  0.009 xi2
Se = (196.475) (2.427) (0.007)
t = (0.741) (-0.834) (1.271) R2=0.288
Using the R value and n = 30, we obtain nR2= 8.64, which, under the null hypothesis of no
2
heteroscedasticity, has a chi-square distribution with 2 df because there are two regressors. The critical
value of obtaining a chi-square value of as much as 8.64 or greater is about 5.99146. Since the
calculated chi-square value obtained exceeds the critical chi-square value at 5 % level of significance,
the conclusion is that there is heteroscedasticity.
Note that: at 1% level of significance the critical value of chi-squared (=9.21034) exceeds the
calculated value.
If u plots the residuals you may obtain
8
Park Test
Park formalizes the graphical method by suggesting that σ 2iis some function of the
explanatory variable Xi. The functional form suggested was:
2 2 β vi 2 2
σ i =σ X i e orln σ i =ln σ + βln X i +vi
Where vi is the stochastic disturbance term.
Sinceσ 2i is generally not known, Park suggests using u^ 2i as a proxy and running the following regression:
2 2
ln u^ i =ln σ + βln X i+ vi=α + βln X i+ vi
If β turns out to be statistically significant, it would suggest that heteroscedasticity is present in the
data. If it turns out to be insignificant, we may accept the assumption of homoscedasticity. The Park
test is thus a two stage procedure. In the first stage we run the OLS regression disregarding the
heteroscedasticity question. We obtain u^ i from this regression, and then in the second stage we run the
regression.
Although empirically appealing, the Park test has some problems. Goldfeld and Quandt have argued
that the error term vi may not satisfy the OLS assumptions and may itself be heteroscedastic.
Nonetheless, as a strictly exploratory method, one may use the Park test.
Example: relationship between compensation and productivity
Y i=β 1 + β 2 X i+u i
Where Y=average compensation in thousands of dollars, X=average productivity in thousands of
dollars, and i = ith employment size of the establishment. The results of the regression were as follows:
Y^ i =1992.3452+0.2329Xi
Se=(936.4791) (0.0998) t= (2.1275) (2.333) R2=0.4375
The results reveal that the estimated slope coefficient is significant at the 5 percent level on the basis of
a one-tail t test. The equation shows that as labor productivity increases by, say, a dollar, labor
compensation on the average increases by about 23 cents.
The residuals obtained from regression were regressed on Xi as suggested giving the following results:
2
ln u^ i =¿35.817− 2.8099 lnXi
se=(38.319) (4.216) t= (0.934) (−0.667) R2=0.0595
Obviously, there is no statistically significant relationship between the two variables. Following the
Park test, one may conclude that there is no heteroscedasticity in the error variance.
Glejser Test
9
The Glejser test is similar in spirit to the Park test. After obtaining the residuals u^ i from the OLS
regression, Glejser suggests regressing the absolute values of u^ i on the X variable that is thought to be
closely associated with σ2i. In his experiments, Glejser used the following functional forms:
|u^ i|=β 1+ β2 X i + vi |u^ i|=β 1+ β2

1
+v i
|u^ i|=β 1+ β2 √ X i +v i √Xi
1 |u^ i|=√ β1 + β 2 X i +v i
|u^ i|=β 1+ β2 X + v i
i |u^ i|=√ β1 + β 2 X 2i +v i
Where vi is the error term.
Again as an empirical or practical matter, one may use the Glejser approach. But Goldfeld and Quandt
point out that the error term vi has some problems in that its expected value is nonzero, it is serially
correlated, and ironically it is heteroscedastic.
An additional difficulty with the Glejser method is that models such as
|u^ i|=√ β1 + β 2 X i +v iand |u^ i|=√ β1 + β 2 X 2i +v i, are nonlinear in the parameters and therefore cannot be
estimated with the usual OLS procedure.
Glejser has found that for large samples the first four of the preceding models give generally
satisfactory results in detecting heteroscedasticity. As a practical matter, therefore, the Glejser
technique may be used for large samples and may be used in the small samples strictly as a qualitative
device to learn something about heteroscedasticity.
EXAMPLE 11.2
Relationship between compensation and productivity: The glejser test
Continuing with Example above, the absolute value of the residuals obtained from regression we
reregressed on average productivity (X), giving the following results:
|u^ i | =407.2783− 0.0203Xi
Se=(633.1621) (0.0675) R2=0.0127 t= (0.6432) (−0.3012)
As you can see from this regression, there is no relationship between the absolute value of the residuals
and the regressor, average productivity. This reinforces the conclusion based on the Park test.
There are several other tests of heteroscedasticity, each based on certain assumptions like: the
Koenker–Bassett (KB) test, Cochran's C test, Hartley's test, Cook–Weisberg test, Harrison–McCabe
test, Brown–Forsythe test and Levene's test
Remedial Measures
As we have seen, heteroscedasticity does not destroy the unbiasedness and consistency properties of
the OLS estimators, but they are no longer efficient, not even asymptotically (i.e., large sample size).
This lack of efficiency makes the usual hypothesis-testing procedure of dubious value. Therefore,
remedial measures may be called for. There are two approaches to remediation: when σ 2i is known and
when σ2iis not known.
10
When σ2i is known: The Method of Weighted Least Squares
If σ2i is known, the most straightforward method of correcting heteroscedasticity is by means of
weighted least squares, for the estimators thus obtained is BLUE.
When σ2i is Not Known
As noted earlier, if true σ2i are known, we can use the WLS method to obtain BLUE estimators. Since
the true σ2iare rarely known, is there a way of obtaining consistent (in the statistical sense) estimates of
the variances and covariances of OLS estimators even if there is heteroscedasticity? The answer is yes.
White’s Heteroscedasticity-Consistent Variances and Standard Errors, White has shown that this
estimate can be performed so that asymptotically valid (i.e., large-sample) statistical inferences can be
made about the true parameter values.
2.3 Autocorrelation
What happens if the Error terms are correlated?
The term autocorrelation may be defined as “correlation between members of series of observations
ordered in time [as in time series data] or space [as in cross-sectional data].
In the regression context, the classical linear regression model assumes that such autocorrelation does
not exist in the disturbances u i . Symbolically,
 
E uiu j  0 i j
Simply stating, the disturbance term relating to any observation is not influenced by the disturbance
term relating to any other observation.
In this portion, we take a critical look at this assumption with a view to answering the following
questions:
 What is the nature of autocorrelation?
 What are the theoretical and practical consequences of autocorrelation?
 How does one know that there is autocorrelation in any given situation?
 How does one remedy the problem of autocorrelation
a) Nature of Autocorrelation
In regression of family consumption expenditure on family income, the effect of an increase of one
family’s income on its consumption expenditure is not expected to affect the consumption expenditure
of another family. However, if there is such dependence, we have autocorrelation. Symbolically,
 
E uiu j  0 i j
Sources of Autocorrelation: There are several causes which gives rise to autocorrelation. Following
are;
11
1. Omitted explanatory variables: We know in economics one variable is affected by so many
variables. The investigator includes only important and directly fit variables because neither
he is competent nor the circumstances allow him to include all variables influencing that
phenomena. The error term represents the influence of omitted variables and because of that an error
term in one period many have a relation with the error terms in successive periods. Thus the
problem of autocorrelation arises.
2. Misspecification of the mathematical form of the model: If we have adopted a
mathematical form which differs from the true form of the relationship, then the disturbance
term is the must be show serial correlation.
3. Interpolation in the statistical observation: Most of the published time series data involve
some of the interpolation and smoothing process, which do average the true disturbance over
successive time periods. As a consequence the successive value is/are autocorrelated.
4. Misspecification of the true error u: Disturbance term may be autocorrelated because it contains
errors, of measurement. If the explanatory variable is measured wrongly the serial disturbance will
be auto correlated.
The consequences of autocorrelation for OLS are somewhat similar to those of heteroscedasticity.
 The regression coefficients remain unbiased, but OLS is inefficient because one can find an
alternative unbiased estimator with smaller variance.
 The other main consequence, which should not be mixed up with the first, is that the standard
errors are estimated wrongly, probably being biased downwards.
 Finally, although in general autocorrelation does not cause OLS estimates to be biased.
 Autocorrelation normally occurs only in regression analysis using time series data.
 OLS is no longer BLUE in the presence of serial correlation.
 AR (1) model for the error terms:
u t   u t 1  et t  1,2, . . ., n. |  |  1,
Why does serial correlation occur? There are several reasons,
 Specification Bias: Excluded Variables Case.
 Specification Bias: Incorrect Functional Form.
 Lags. Consumption expenditure of the current period depends, among other things, on the
consumption expenditure of the previous period.
 Inertia: A salient feature of most economic time series is inertia.
 Manipulation of Data.
 Data Transformation.
 Non-stationarity
b) OLS Estimation in the Presence of Autocorrelation
12
What happens to the OLS estimators and their variances if we introduce autocorrelation in the
disturbances? By assuming the,
Eu u
i t s  0 s0
We revert once again to the two-variable regression model to explain the basic ideas involved, namely,
y    x u
t 0 1 t t
The disturbance, or error, terms are generated by the following mechanism.
u t   ut 1   t  1   1
Where ρ (=rho) is known as the coefficient of auto covariance and where  t is the stochastic
disturbance term such that it satisfied the standard OLS assumptions, namely,
E  t   0
Var  t    2
Cov  t ,  t  s   0 s  0
 It states: the value of the disturbance term in period t is equal to rho times its value in the previous
period plus a purely random error term. It is known as Markov first-order autoregressive
scheme, or a first-order autoregressive scheme, usually denoted, AR (1).
u
 The name Autoregressive is appropriate because it can be interpreted as the regression of t on
itself lagged one period.
u
 It is first order because t and its immediate past value is involved, that is, the maximum lag is 1.
If the model were
ut   1 u t 1   2 u t 2   t
 It would be an AR (2), or second-order, autoregressive scheme, and so on.
1 Is the coefficient of
autocorrelation at lag 1.
E u t  Eu t   u t 1  Eu t 1    Eu t u t 1 
ρ
var u t  var u t 1 
= var u t 1 
 Given the AR (1), it can be shown that
 2
 
var u t   E u t2 
1  2
 2
Cov u t u t  s   E u t u t  s   
s
1  2
Corr u t u t  s    s
c) Tests for autocorrelation
i. Graphical method
uˆ t  y t  yˆ t
Plot the estimated residuals against time. If we see a clustering of neighboring residuals
ut  0
on one or the other side of the line , then such clustering is a sign that the errors are auto
correlated.
ii. Durbin -Watson test
13
 Before estimation of the regression coefficients a test on the assumption of the covariance structure
of error vectors must be done.
,
ut s

We test whether the assumption of uncorrelated is met or not.
Test problem.
H 0 :   0 vs H 1 : not H 0
The Durbin-Watson test statistics is
 uˆ  uˆ 
2
t t 1
d
 uˆ  2
t
Durbin and Watson derived the upper and lower confidence limits for testing the significance of d at
desired level of significance.
Decision rule for Durbin -Watson test:
d  d u ( )

Do not reject Ho if
 Indifference if d L ( )  d  d u ( )
 Reject Ho if d  d L ( )
Limitations of Durbin Watson tests
 The test is valid for AR (1) error scheme process.
 The test is valid only where there is an intercept in the model.
 There is certain region where the test is inconclusive.
 Test is invalid when lagged values of the dependent variable appear as a regressor.
iii. Breusch –God fray (BG) test
Assume that the error term follows the Autoregressive scheme of order p (AR (1)) given by:
ut  1 ut 1   2 u t 2 , ....,  p ut  p   t
t
Where fulfills all the assumption of the classical linear regression models. The null hypothesis to be
tested is:
H 0 :  1   2 , . . . .,   p  0
Steps:
1.
Step1. Estimate the modelUsing OLS and obtain the residuals.
y t   0   1 X t  u t T  1 , 2 ,3 ,..., t
û t X t uˆ t 1 , uˆ t 2 , uˆ t 3 ,......, uˆ t  p
2. Step 2:Regress
on and that is run the auxiliary regression
uˆ t   0   1 X t   1u t   2 uˆ t 2 ,....,  p uˆ t  p
3. Obtain the coefficient of determination R2 from the auxiliary regression.
 2p
4. If the sample size T is large, the Breusch and Godfrey have shown that (T-P)R2 follows the
Decision rule:
2
Reject the null hypothesis of no AC if (T-P) R2 exceeds the tabulated value from the p .
14
Advantages of BG test:
 The test is always conclusive.
 Test is valid when lagged values of the dependent variable appear as a regressor.
 The test is valid for higher order AR schemes (not just for AR(1) error scheme process
only).
Correcting for error AC (of AR (1) Scheme)
y t   0   1 X t  u t t  1, 2 , 3 ,..., T (*)
Consider the model
Where the errors are generated according to the AR (1) Scheme:
ut   ut 1   t t  1,2, ..., n. |  |  1,
u t   u t 1  et
12
t
Where fulfills all the assumptions of the classical linear regression model. Suppose by applying any
of the above tests you came to the conclusion that the errors are auto correlated.

Lagging equation (*) by one period and multiplying throughout by , we get
y     x  u
t 1 0 1 t 1 t 1
…...(**)
Subtracting equation (**) from (*), we get:

y t  y   0 (   1)  1 ( X t  x )  (u t  u )
t 1 t 1 t 1
y t*   0*  1 X t*   t
……(***)
 t  u t   u t 1
Where the above transformation is known as Cochrane-Orcutt Transformation. Since
t
fulfils all the assumptions of classical linear regression models we can apply the OLS to equation
(***) to get the estimators which are BLUE.

Problem: The above transformation requires knowledge of the value of . Thus we need to estimate
it.

Method of estimation of .

From residuals
û t uˆ t 1 û t ̂
Regress on without a constant term. Since are the OLS residuals to obtain . That is
uˆ t   uˆ t 1   t

Durbin’s method
15
Yt Yt 1 , X t and X t 1
Run the regression of on
Yt  Yt 1   1 1      2 ( X t  X t 1 )  u t
 Yt 1
An estimator of is the estimated coefficient of .
Example
The following data is on investment and value of outstanding shares.
Year Investment Share Year Investmen Share
t
317.6 3078.5 561.2 4840.9
1935 1945
391.8 4661.7 688.1 4900.9
1936 1946
410.6 5383.1 568.9 3526.5
1937 1947
257.7 2792.2 529.2 3254.7
1938 1948
330.8 4313.2 555.1 3700.2
1939 1949
461.2 4643.9 642.9 3755.6
1940 1950
512.0 4551.2 755.9 4833.0
1941 1951
448.0 3244.1 891.2 4924.9
1942 1952
499.6 4053.7 1304.4 6241.7
1943 1953
547.5 4379.3
1944
The estimated regression equation using OLS is
Model Summaryb
Adjusted R Std. Error of the

Model R R Square Square Estimate Durbin-Watson
1
.650a .423 .389 184.9873 .553
a. Predictors: (Constant), sha
b. Dependent Variable: inv
ANOVAb
16
Model Summaryb
Adjusted R Std. Error of the

Model R R Square Square Estimate Durbin-Watson
1
.650a .423 .389 184.9873 .553
Model Sum of Squares Df Mean Square F Sig.

1 Regression
426433.282 1 426433.282 12.461 .003a
Residual
581745.015 17 34220.295
Total
1008178.297 18
a. Predictors: (Constant), sha
b. Dependent Variable: inv
Coefficientsa
Standardized
Unstandardized Coefficients Coefficients
Model B Std. Error Beta t Sig.
1 (Constant) -186.792 216.259 -.864 .400
Sha .175 .050 .650 3.530 .003
a. Dependent Variable: inv
AC Diagnostics
Although the model passes the ANOVA test, we plot the estimated residuals again against time and
look for some model misspecifications. The graph (scatter plot) of the estimated residual is shown
below. We can see from the graph that the spread of estimated residual does not seen to have an
increasing or decreasing pattern with time (meaning the residuals are somewhat homoscedastic).
17
However, we see a clustering of neighboring residual on one or the other side of the line uˆ  0. This
might be a sign that the error are autocorrelated. However, we do not make a final judgment until we
apply formal tests of autocorrelation. The partial auto correlation function of the residual is
 The plot indicates that the residual follows the first order AR process:
uˆ t   uˆ t 1   t
ut
(Where fulfils all the assumptions of classical regression)
H o :   0 vs H o :   0 uˆ t on uˆ t 1
To test we regress (without a constant) and test if the coefficient
uˆ t 1
of is significant. This gives the following result.
Parameter Estimates
Estimates Std Error T Approx Sig
Non-Seasonal Lags AR1 .859 .181 4.737 .000
Melard's algorithm was used for estimation.

Thus, we reject the null hypothesis of no error AC at the 1% significance level and conclude that
the residuals are autocorrelated.
18

We then apply the following (Cochrane-Orcutt) transformation.
y t   0   1 X t  u t t  1, 2 , 3 ,..., T (*)
y     x  u
t 1 0 1 t 1 t 1
y t  y   0 (   1)  1 ( X t  x )  (u t  u )
t 1 t 1 t 1
y t*   0*   2 X t*   t
…………………(*)
 *   0 1   
Where, .

Note that equation (*) fulfills all basic assumption and thus ,we can estimate the parameters in this
equation by an OLS procedure.
  ̂ 
The problem here is that is not known. One possibility is to use 0.859 for the transformation.
̂  y t* X t* y t*
Using 0.859 we obtain and and estimate the regression of on
X t*
Model Summary(b)
Adjusted R Std. Error of Durbin-

Model R R Square Square the Estimate Watson
1 .805(a) .648 .626 80.87267 1.591
a Predictors: (Constant), invsha
b Dependent Variable: invinv
DW-Critical value
5% DL=1.180, DU=1.401
1% DL=0.928, DU=1.132
19

The partial autocorrelation function of the residual in the transformed model is shown below.

it can be seen that the function lies within the upper and lower confidence limits, indicating that the
autocorrelation structure has been properly dealt with.
The graph (scatter plot) of the estimated residual from the transformed model is shown below. From
U n s ta n d a r d iz e d R e s id u a l
the scatter plot of the new set of residuals shown below, we can see that the clustering of neighboring
residual has been significantly reduced.
100.00000
0.00000
-100.00000
-200.00000
1935 1940 1945 1950 1955

t
20
2.4 Multicollinearity
What happens if the regressors are correlated?
One of the Assumptions of the classical linear regression model (CLRM) is that there is no
multicollinearity among the regressors included in the regression model. We look at this assumption by
seeking answers to the following questions:
 What is the nature of multicollinearity?
 Is multicollinearity really a problem?
 What are its practical consequences?
 How does one detect it?
 What remedial measures can be taken to alleviate the problem of multicollinearity?
Some Illustrative Examples
Y  1 X 1   2 X 2  ut Y  1 X 1   2 2 X 1   ut
Consider the model . If X 2  2X 1 , we have t =
Y  ( 1  2  2 ) X 1  ut
. Thus only (  1  2  2 ) would be estimable. We cannot get estimates of 1 and
 2 separately. In this case we say that there is "perfect Multicollinearity," because X 1 and X 2 are
perfectly correlated.
As a numerical example, consider the following hypothetical data
X1 10 15 18 24 30
X2 50 75 90 120 150
X*3 52 75 97 129 152
X*3i= 5X2i. Therefore, there is perfect Collinearity between X2 and X3

Why does the classical linear regression model assume that there is no multicollinearity among the
X’s? The reasoning is this:
If multicollinearity is perfect, the regression coefficients of the X variables are indeterminate and their
standard errors are infinite.
If multicollinearity is less than perfect, the regression coefficients, although determinate, possess large
standard error, which means the coefficients cannot be estimated with great precision or accuracy.
a) The nature of multicollinearity
It is meant as the existence of a “perfect,” or exact, linear relationship among some or all explanatory
variables of a regression model.
21
Multicollinearity is not a statistical, problem rather it is a data problem.
There are several sources of multicollinearity. Multicollinearity may be due to the following factors
 The data collection method employed, for example, sampling over a limited range of the
values taken by the regressors in the population.
 Constraints on the model or in the population being sampled. For example, in the regression of
electricity consumption on income (X2) & house size (X3), there is physical constraint in the
population in the families with higher income generally have larger homes than families with
lower incomes.
 Model specifications for example, adding polynomial terms to a regression model,
especially when the range of the X variable is small.
 An over determined model. This happens when the model has more explanatory variables than
the number of observations. This could happen in medical research where there may be a
small number of patients about whom information is collected on a large number of
variables.
b) Estimation in the Presence of perfect Multicollinearity
The regression coefficients remain indeterminate and their standard errors are infinite. Consider the
three variable regression model deviation forms as
y  β x  β x u x2i  x1i
i 1 1i 2 2i i Assume that , where λ is a nonzero
constant. Then
 2   
( y x )  x   (  y x )  x x 
βˆ
 2i 
i 1i   i 2i  1i 2i  (  y x )
i 1i  

 λ 2  x 1i2   (λ  y i x 1i ) λ  x1i x 1i 

βˆ 
1  2  2   2
1   x 2  λ 2  x 2   λ 2  x x 2 0
  x   x     x x   1i  1i  1i 1i
 1i  2i   1i 2i  0
   =
ˆ
This is an indeterminate expression. The reader can verify that 2 is also indeterminate
c) Practical Consequences of Multicollinearity
In cases of near or high multicollinearity, one is likely to encounter the following consequences.
 Although BLUE, the OLS estimators have large variances and covariance, making precise
estimation difficult.
 Because of consequence 1, the confidence intervals tend to be much wider, leading to the
acceptance of the “zero null hypothesis” (i.e., the true population coefficient is zero).
22
 Also because of consequence 1, the t ratio of one or more coefficients tends to be statistically
insignificant.
 Although the t ratio of one or more coefficients is statistically insignificant, R2, the overall
measure of goodness of fit, can be very high.
 The OLS estimators and their standard errors can be sensitive to small changes in the data.
DETECTION OF MULTICOLLINEARITY
Having studied the nature and consequences of multicollinearity, the natural question is: How does one
know that collinearity is present in any given situation, especially in models involving more than two
explanatory variables?
 Multicollinearity is a question of degree and not of kind. The meaningful distinction is not
between the presence and the absence of multicollinearity, but between its various degrees.
 Since multicollinearity refers to the condition of the explanatory variables that are assumed to
be nonstochastic, it is a feature of the sample and not of the population.
Therefore, we do not “test for multicollinearity” but can, if we wish, measure its degree in any
particular sample. Since multicollinearity is essentially a sample phenomenon, arising out of the
largely non experimental data collected in most social sciences, we do not have one unique method of
detecting it or measuring its strength. What we have are some rules of thumb, some informal and some
formal, but rules of thumb all the same. We now consider some of these rules.
d) Measures of Multicollinearity
It is important to be familiar with two measures that are often suggested in the discussion of
Multicollinearity.
1.) The Variance-Inflation Factor (VIF)
The VIF is defined as
 
VIF ˆ j 
1
1  R 2j
2.)
R 2j Xj
Where is the coefficient of determination obtained from an auxiliary regression of on the
remaining explanatory variables.
x j  1 x1   2 x 2 ,....., j 1 x j 1   j 1 x j 1 ,...., x  ut
k k
23
When we interpret
  as the ratio of the actual variance of ˆ
VIF ˆ j j
to what the variance of
ˆ j
would
Xj ;
have been if were to be uncorrelated with the remaining X s .
Rule of thumb
 If
 
VIF ˆ j E 1,10 ˆ j
, then the estimated coefficient is OKAY!

If any of the VIF exceeds 10, then this is an indication that the associated regression
coefficients are poorly estimated due to MC.
2.) The Condition Number (CN)

Most statistical software’s uses eigenvalues and the condition index to diagnose multicollinearity. We
will not discuss eigenvalues here, for that would take us into topics in matrix algebra that are beyond
the scope of this course. From these eigenvalues, however, we can derive what is known as the
condition numberk defined as
Maximum eigenvalue
K
Minimum eigenvalue
The condition index (CI) defined as
Maximum eigenvalue
CI   K
Minimum eigenvalue
Rule of thumb
 If k is less than 100 there is no series problem of multicollinearity.
 If k is between 100 and 1000 there is moderate to strong multicollinearity and
 If it exceeds 1000 there is severe multicollinearity.
Alternatively, if the CI (= K ) is between 10 and 30, there is moderate to strong multicollinearity

and if it exceeds 30 there is severe multicollinearity.
Note that: There are also other methods of overcoming Multicolinirety such us:
 Ridge regression and
 Principal Component Analysis (PCA)
2.5 Errors in variables
2.5.1. Measurement Error in Y
 The type of problem that measurement error in Y presents depends on the nature of the error.
24
 Assume that
y i   0   1 xi   i   i
where

 i is the measurement error in yi with  i ~ N 0,   2 
'
If
 and  i are independent, then we may write the model as yi   0  1 xi   i where
'
 2 2
 '
 i ~ N 0,      with  i   i   i .
2
 The measurement error thus adds to the model error, thereby reducing the percentage fit ( R ) of
the regression model, and leaving more variability in Y unexplained by the model.
 The least squares estimators are still unbiased, If  i is markedly non normal and, in particular,  i ,
is not small relative to  i , then an alternative to least squares such as nonparametric regression or
robust regression may need to be used.
 If the
 i are correlated, we have a problem that is very similar to the problem of having the  i
correlated, with the magnitude of the problem depending on the magnitude of the
 i . We would
also have a major problem if the mean of each  i is not zero.

4.4.2. Measurement Error in X
 The effect caused by measurement error in X will depend on the nature of the error, and will
depend on whether X is a random variable.
 The estimated regression parameter will be unbiased if the measurement error,

 i , is NID (0,  i )
and X is fixed.
 The least squares estimators will be biased when X is random, however, and the magnitude of the
problem will depend on the spread of the true (i.e., correctly measured) X values relative to
i .
Consequently, it would be fortuitous to have considerable variability in X, relative to  i , when

measurement error is apt to be a problem.
 Regression analysis seeks to lend credence to a claim of a causal relationship among variables. For
a data point (Xi, Yi,) arising from a simple linear relationship, we view Yi as a random variable
with distribution depending upon the specific value of Xi.
 Assumption of the simple linear model presumes that each Xi is a fixed value, not the result of any
random process. On the other hand, Yi may have some error associated with it and therefore is a
random variable.
 The first step in the modeling process is to ask whether this view of the data is, indeed, appropriate.
25
y i   0  1 xi   i
 When deciding to use the simple linear model, to draw inferences from a data
set (Xi, Yi), first ask the following questions:
i. Does each Yi naturally depend on X?
ii. Is it reasonable to assume Xi is not a random variable?
iii. Is the error in measuring Xi insignificant compared to the spread of the Xi's?
 If the answer to any of these questions is “no”, then assumption of fixed explanatory variable may
fail, and there is danger of an erroneous conclusion.
4.4.3. Errors in dependent variables and stochastic regressors
 Regression analysis seeks to lend credence to a claim of a causal relationship among variables. For a
data point (Y, X) arising from a simple linear relationship, we view Y as a random variable with
distribution depending upon the specific value of X.
 In almost all regression models we assume that the response variable Y is subject to the error term 
and the regressors are deterministic not affected by the error.
 There are two variation of this situation. The first is the case where the response and the regressors are
jointly distributed random variables.
 These assumptions give rise to the correlation model. The second is the situation where there are
measurement errors in the response and the regressors.
4.4.4. Measurement error in response variable
 Now if the measurement error is present only in the response variable there is no new problem so
long as these errors are uncorrelated and have no bias.
 The type of problem that measurement error in Y presents depends on the nature of the error.
*
Consider the following model: Assume that yi   0  1 xi   i where y = permanent consumption
*

expenditure, xi = current income and  i = stochastic disturbance term.

*
y
 Since y is not directly measurable, we may use an observable expenditure variable i such that
*

yi  yi   i ,  i is the measurement error in y i with  i ~ N 0,  
2

'
 If  and  i are independent, then we may write the model as yi   0  1 xi   i where
'
 2 2
 '
 i ~ N 0,      with  i   i   i .
26
E  i   0, E  i  0 covxi ,  i   0
 For simplicity assume that that is, the errors of measurement
covxi ,  i   0 cov i ,  i   0
in yare uncorrelated with Xi, , i.e. the equation error and the
measurement error are uncorrelated.
2
 The measurement error thus adds to the model error, thereby reducing the percentage fit ( R ) of
the regression model, and leaving more variability in Y unexplained by the model.
 The least squares estimators are still unbiased. If

 i is markedly non normal and, in particular,  i
is not small relative to

 i , then an alternative to least squares such as nonparametric regression or
robust regression may need to be used.
 If  i are correlated, we have a problem that is very similar to the problem of having the  i
correlated, with the magnitude of the problem depending on the magnitude of the  i .
 Therefore, although the errors of measurement in the dependent variable still give unbiased
estimates of the parameters and their variances, the estimated variances are now larger than in the
case where there are no such errors of measurement.
4.4.5. Measurement error in explanatory variable
 One of the basic assumption of the general linear regression model is the X’s are measured without
error. However, different situation occurs when there are measurement errors in the X’s.
 Given the interdependence of the economic variables and the nature of statistical data, which are used
for the measurement of economic relationships, we can have serious doubt of this assumption.
 This assumption is never valid because all economic variables have some stochastic nature. So there
arise a situations where in predetermined variables are random variables.
 If the distributions of predetermined variables is independent of the distributions of the random
disturbance term, then the least squares estimators are consistent.
 The first step in the modeling process is to ask whether this view of the data is, indeed, appropriate.
 When deciding to use the simple linear model, yi   0  1 xi   i to draw inferences from a data set
(Y, X), first ask the following questions:
i. Does each Y naturally depend on X?
ii. Is it reasonable to assume X is not a random variable?
iii. Is the error in measuring X insignificant compared to the spread of the X's?
27
 If the answer to any of these questions is “no”, then assumption of fixed explanatory variable may
fail, and there is danger of an erroneous conclusion.
 The effect caused by measurement error in X will depend on the nature of the error, and will also
depend on whether X is a random variable.
 The estimated regression parameter will be unbiased if the measurement error in y,

 i , is NID(0,
 i ) and X is fixed.
 The least squares estimators will be biased when X is random, however, and the magnitude of the
problem will depend on the spread of the true (i.e., correctly measured) X values relative to  i .
Consequently, it would be fortuitous to have considerable variability in X, relative to

 i , when
measurement error is apt to be a problem.

 Suppose that we wish to fit simple linear regression model, but the regressor is measured with error, so
that the observed regressor is X i  xi  ai i  1, 2, ..., n where xi the true is value of the regressor,
is the measurement error with E ai   0 and varai    a .

2
Xi ai
is the observed value, and
 The response variable y i is subject to the usual error  i i  1, 2, ..., n , so that the regression model is
yi   0  1 xi   i . Since X i is observed value of the regressor, we have
yi   0  1 ( X i  ai )   i   0  1 X i   i  1ai  .
 yi   0  1 X i   i where i   i  1a is a compound of equation and measurement errors. Now
even if we assume that  has zero mean, is serially independent, and is uncorrelated with  i we
can no longer assume that the composite error term

 i is independent of the explanatory variable
Xi because assuming E  i  0 , cov i , X i   E i  E  i X i  E  X i    1 a

  2
 Thus, the explanatory variable and the error term are correlated, which violates the crucial
assumption of the classical linear regression model that the explanatory variable is uncorrelated
with the stochastic disturbance term.
 If this assumption is violated, it can be shown that the OLS estimators are not only biased but also
inconsistent, that is, they remain biased even if the sample size n increases indefinitely.
Measurement error is present in some extent in almost all practical regression situations.
28
4.4.6. Plausibility of the assumption of no measurement error
 No matter how much reliable the source of data is there will always be some error in the
observation used for estimation of the regression coefficients.
 In some cases the published data refer to some variable different in content from the variable
required by economic theory, which will be found to give rise to measurement error.
 Usually the variables included in the models are expressed in current values, while economic
theory requires quantity variable. In this ease it is common to use price indexes to deflate the value
figures.
 The deflation procedure may lead to errors of measurements.
 It is common to use dummy variables in the function as an approximation to some explanatory
variables. Dummy variables are by their nature subject to measurement error since they are proxies
for the variable which they represent.
Consequences of errors in measurement in X
 Least square estimators are biased
 Least square estimators are inconsistent.
2.6 Lagged variables
Lagged variables are variables with current and past values (lags) of variables.
Lagged values of the dependent variable appear because of the theoretical basis of the model rather
than as a computational means of removing autocorrelation.
There are several reasons lagged effects might appear in an empirical model:
I. In modeling the response of economic variables to policy stimuli, it is expected that there will be
possibly long lags between policy changes and their impacts.
II. Either the dependent variable or one of the independent variables is based on expectations.
Certain economic decisions are explicitly driven by a history of related activities
In regression analysis involving time series data, if the regression model includes not only the current
but also the lagged (past) values of the explanatory variables (the X’s), it is called a distributed-lag
model. If the model includes one or more lagged values of the dependent variable among its
explanatory variables, it is called an autoregressive model. Thus, Yt= α + β0Xt + β1Xt−1 + β2Xt−2 +
ut , represents a distributed-lag model, whereas Yt= α + βXt+ γYt−1 + utis an example of an
autoregressive model. The latter are also known as dynamic models since they show the time path of
29
the dependent variable in relation to its past value(s). Autoregressive and distributed-lag models are
used extensively in econometric analysis, and in this chapter we take a close look at such models with
a view to finding out the following:
1. What is the role of lags in economics?
2. What are the reasons for the lags?
3. Is there any theoretical justification for the commonly used lagged models in empirical
econometrics?
4. What is the relationship, if any, between autoregressive and distributed-lag models?
Can one be derived from the other?
THE ROLE OF “TIME,’’ OR “LAG,’’ IN ECONOMICS
In economics the dependence of a variable Y (the dependent variable) on another variable(s) X (the
explanatory variable) is rarely instantaneous. Very often, Y responds to X with a lapse of time. Such a
lapse of time is called a lag.
THE REASONS FOR LAGS: There are three main reasons:
1. Psychological reasons. As a result of the force of habit (inertia), people do not change their
consumption habits immediately following a price decrease or an income increase perhaps because
the process of change may involve some immediate disutility. Thus, those who become instant
millionaires by winning lotteries may not change the lifestyles to which they were accustomed for a
long time because they may not know how to react to such a windfall gain immediately. Of course,
given reasonable time, they may learn to live with their newly acquired fortune. Also, people may
not know whether a change is “permanent’’ or “transitory.’’ Thus, my reaction to an increase in the
income will depend on whether or not the increase is permanent.
2. Technological reasons. Suppose the price of capital relative to labor declines, making substitution
of capital for labor economically feasible. Of course, addition of capital takes time (the gestation
period). Moreover, if the drop in price is expected to be temporary, firms may not rush to substitute
capital for labor, especially if they expect that after the temporary drop the price of capital may
increase beyond its previous level. Sometimes, imperfect knowledge also accounts for lags. At
present the market for personal computers is glutted with all kinds of computers with varying
features and prices. Moreover, since their introduction in the late 1970s, the prices of most personal
computers have dropped dramatically. As a result, prospective consumers for the personal computer
may hesitate to buy until they have had time to look into the features and prices of all the competing
30
brands. Moreover, they may hesitate to buy in the expectation of further decline in price or
innovations.
3. Institutional reasons. These reasons also contribute to lags. For example, contractual obligations
may prevent firms from switching from one source of labor or raw material to another. As another
example, those who have placed funds in long-term savings accounts for fixed durations such as: 1
year, 3 years, or 7 years, are essentially “locked in’’ even though money market conditions may be
such that higher yields are available elsewhere. Similarly, employers often give their employees a
choice among several health insurance plans, but once a choice is made, an employee may not
switch to another plan for at least 1 year. Although this may be done for administrative
econvenience, the employee is locked in for 1 year. For the reasons just discussed; lag occupies a
central role in economics.
This is clearly reflected in the short-run–long-run methodology of economics. It is for this reason we
say that short-run price or income elasticities are generally smaller (in absolute value) than the
corresponding long-run elasticities or that short-run marginal propensity to consume is generally
smaller than long-run marginal propensity to consume
Estimation Methods of Lag Models
Granted that distributed-lag models play a highly useful role in economics, how does one estimate such
models? Specifically, suppose we have the following distributed-lag model in one explanatory
variable:
Yt= α + β0Xt + β1Xt−1 + β2Xt−2 +· · ·+ut(1)
Where we have not defined the length of the lag, that is, how far back into the past we want to go.
Such a model is called an infinite (lag) model, whereas a model is called a finite (lag) distributed-
lag model if the length of the lag k is specified. How do we estimate the α and β’s of (1)? We may
adopt the following approach:
Ad Hoc Estimation of Lag Models Approach
Since the explanatory variable Xtis assumed to be non stochastic (or at least uncorrelated with the
disturbance term ut), Xt−1, Xt−2, and so on, are non stochastic, Therefore, in principle, the ordinary
least squares (OLS) can be applied to (1).The suggestion that to estimate (1) one may proceed
sequentially; that is, first regress Yt on Xt, then regress Yt on Xt and Xt−1, then regress Yt on Xt, Xt−1,
and Xt−2, and so on. This sequential procedure stops when the regression coefficients of the lagged
31
variables start becoming statistically insignificant and/or the coefficient of at least one of the variables
changes signs from positive to negative or vice versa.
e.g consider that we regressed fuel oil consumption Y on new orders X. Based on the quarterly data for
the period 1930–1939, the results were as follows:
Yt= 8.37 + 0.171Xt
Yt= 8.27 + 0.111Xt + 0.064Xt−1
Yt= 8.27 + 0.109Xt + 0.071Xt−1 − 0.055Xt−2
Yt= 8.32 + 0.108Xt + 0.063Xt−1 + 0.022Xt−2 − 0.020Xt−3
The second regression as the “best’’ one because in the last two equations the sign of Xt−2 was not
stable and in the last equation the sign of Xt−3 was negative, which may be difficult to interpret
economically.
Although seemingly straightforward, Adhoc estimation suffers from many drawbacks, such as the
following:
1. There is no a priori guide as to what is the maximum length of the lag.
2. As one estimates successive lags, there are fewer degrees of freedom left, making statistical
inference somewhat shaky. Economists are not usually that lucky to have a long series of data so that
they can go on estimating numerous lags.
3. More importantly, in economic time series data, successive values (lags) tend to be highly
correlated; hence multicollinearity rears its ugly head. As noted in Chapter 2 multicollinearity leads to
imprecise estimation; that is, the standard errors tend to be large in relation to the estimated
coefficients. As a result, based on the routinely computed t ratios, we may tend to declare
(erroneously), that a lagged coefficient(s) is statistically insignificant.
4. The sequential search for the lag length opens the researcher to the charge of data mining. In view
of the preceding problems, the Adhoc estimation procedure has very little to recommend it. Clearly,
some prior or theoretical considerations must be brought to allow upon the various β’s if we are to
make evolution with the estimation problem. Therefore the following estimation method is minimizing
the Adhoc estimation method.
THE KOYCK’ APPROACH LAG MODELS
Koyck has proposed an ingenious method of estimating distributed-lag models. Suppose we start with
the infinite lag distributed-lag model (1). Assuming that the β’s are all of the same sign, Koyck
assumes that they decline geometrically as follows.
32
Yt= α + β0Xt + β1Xt−1 + β2Xt−2 +· · ·+ut (2)
where01, and β1= β0, β2= β1,. . .using this in (2) we get
Yt= α + β0Xt + β0λXt−1 + β0λ2Xt−2 + ·· ·+ut (3)
As it stands, the model is still not willing to easy estimation since a large (literally infinite) number of
parameters remain to be estimated and the parameter λ enters in a highly nonlinear form: Strictly
speaking, the method of linear (in the parameters) regression analysis cannot be applied to such a
model. But now Koyck suggests an ingenious way out. He lags (3) by one period to obtain Yt−1 = α +
β0Xt−1 + β0λXt−2 + β0λ2Xt−3 + ·· ·+ut−1 (3), He then multiplies (4) by λ to obtain
λYt−1 = λα + λβ0Xt−1 + β0λ2Xt−2 + β0λ3Xt−3 +· · ·+λut−1(4)
Subtracting (4) from (3), it becomes as:
Yt− λYt−1 = α(1 − λ) + β0Xt + (ut− λut−1) (5) and when we rearranging eq(5) we can get
Y= α(1 − λ) + β0Xt + λYt−1 + vt
Where vt= (ut− λut−1), a moving average of ut and ut−1. The procedure just described is known as
the Koyck transformation. Comparing (5) with (1), we see the remarkable simplification
accomplished by Koyck. Whereas before we had to estimate α and an infinite number of β’s, we now
have to estimate only three unknowns: α, β0, and λ. Now there is no reason to expect multicollinearity.
In a sense multicollinearity is resolved by replacing Xt−1, Xt−2, . . .,by a single variable, namely, Yt−1.
But note the following features of the Koyck transformation:
1. We started with a distributed-lag model but ended up with an autoregressive
model because Yt−1 appears as one of the explanatory variables. This transformation shows how one
can “convert’’ a distributed-lag model into an autoregressive model.
2. The appearance of Yt−1 is likely to create some statistical problems. Yt−1, like Yt, is stochastic,
which means that we have a stochastic explanatory variable in the model. Recall that the classical
least-squares theory is predicated on the assumption that the explanatory variables either are
nonstochastic or, if stochastic, are distributed independently of the stochastic disturbance term. Hence,
we must find out if Yt−1 satisfies this assumption.
3. In the original model (1) the disturbance term was ut, whereas in the transformed model it is
vt= (ut− ut−1). The statistical properties of vtdepend on what is assumed about the statistical properties
of ut, for, as shown later, if the original ut’s are serially uncorrelated, the vt’s are seriallycorrelated.
Therefore, we may have to face up to the serial correlation problem in addition to the stochastic
explanatory variable Yt−1.
33
Note:For psychological, technological, and institutional reasons, a regressand may respond to a
repressor (s) with a time lag. Regression models that take into account time lags are known as
dynamic or lagged regression models. There are two types of lagged models: distributed-lag and
autoregressive. In the former, the current and lagged values of regressors are explanatory variables. In
the latter, the lagged value(s) of the regressand appear as explanatory variables.A purely distributed-lag
model can be estimated by OLS, but in that case there is the problem of multicollinearity since
successive lagged values of a regressor tend to be correlated.As a result, some shortcut methods have
been applicable as Koyck estimation method,
2.6 Time-series methods
 The characteristic of time series data which distinguishes it from cross-sectional data is time
series data have always been used in the field of econometrics that a time series data set comes
with a temporal (time) ordering.
 For analyzing time series data we present the most important approaches that are autoregressive
(AR) processes and moving average (MA) processes, as well as a combination of both types,
the so-called ARMA processes.
Autoregressive: used the first order for modeling the residuals of a regression equation.
We will consider only the case of AR (1) autocorrelation. It has received the most attention in the
literature because it is intuitively plausible and there is seldom sufficient evidence to make it
worthwhile considering more complicated models. If the observations are taken quarterly or monthly,
however, other models may be more suitable, but we will not investigate them here. AR(1)
autocorrelation can be eliminated by a simple manipulation of the model. Suppose that the
Model is Yt= 1 + 2Xt + ut…(1),
With ut generated by the process,
ut= ut–1 + εt…(2)
If we lag equation (1) by one time period and multiply by , we have
Yt–1 = 1 + 2ρXt–1 + ut–1…(3)
Now subtract (3) from (1):
Yt– Yt–1 = 1(1–) + 2Xt – 2Xt–1 + ut– ut–1…(4)
Hence
Yt= 1(1–ρ) + Yt–1 + 2Xt – 2Xt–1 + εt…(5)
The model is now free from autocorrelation because the disturbance term has been reduced to the
34
Innovation εt.
In the case of the more general multiple regression model
Yt= 1 + 2X2t + ... + kXkt+ ut…(6)
With ut following an AR(1) process, we follow the same procedure. We lag the equation and multiply
it by  which becoming as
Yt–1 = 1ρ + 2X2,t–1 + ... + kXk,t–1 + ut–1, (7)
Subtracting (7) from (6) and rearranging, we again derive a model free from autocorrelation:
Yt= 1(1–) + ρYt–1 + (2X2t – 2ρX2,t–1)+ ... + (kXkt– kρXk,t–1) + εt, (8)
Note that the model incorporates the nonlinear restriction that the coefficient of the lagged value of
each X variable is equal to minus the product of the coefficients of its current value and Yt–1. This
means that you should not use OLS to fit it. If you did, there would be no guarantee that the
coefficients would conform to the theoretical restrictions. Thus one has to use some nonlinear
estimation procedure instead.
Consider the regression Model with Lagged Explanatory Variables,
Yt     0 X t   1 X t 1   2 X t  2 ,....,  q X t  q  et
Model:
q
Yt     i  X t  i  e t i  0,1,..., q
t 0
Multiple regression model with current and past values (lags) of X used as explanatory variables.
q = lag length = lag order
Xt
 is the value of the variable in period t.
 X t 1 is the value of the variable in period t-1 or “lagged one period” or “lagged X”.
  2 Measures the effect of the explanatory variable 2 periods ago on the dependent
variable, “ceteris paribus”.

 Each column will have T-1 observations.
 In general, when creating “X lagged q periods” you will have T-q observations.
Example: Lagged Variables
T = 10
Yt     0 X t   1 X t 1   2 X t  2   t X t  q  e t
Data format
Col.A Col.B Col.C Col.D Col.E
Y X X lagged 1 X lagged 2 X lagged 3
period period period
Raw 1 Y4 X4 X3 X2 X1
35
Raw 7 Y10 X10 X9 X8 X7
2.7 Further Model Peculiarities
a) Stochastic Regressors
In the basic least squares regression model, it is assumed that the explanatory variables are non-
stochastic. This means that they do not have random components and that their values in the sample
are fixed and unaffected by the way the sample is generated. This is typically an unrealistic
assumption.
The desirable properties of the OLS estimators remain unchanged even if the explanatory variables
have stochastic components, provided that stochastic components are distributed independently of the
disturbance term, and if they do not depend on the model parameters
Meaning of random regressors
Until now, we have assumed (against all reason) that the experimenter has controlled the values of x.
 Economists almost never actually control the regressors.
 We should usually think of them as random variables that are determined jointly with y and
e.
 With a small adaptation of our assumptions, OLS still has the desirable properties it had
before.
 OLS assumptions with random regressors
With fixed x with random x
1. Y  β 0  β1 X  e , withx fixed 1. Y  β 0  β1 X  e with x, y, e random
2. E e   0 2. (x, y) obtained from I.I.D sampling
3. var e    3. E e | x   0
2
cov ei , e j   0
4. 4. x takes on at least two values
5. var e | x   
2
5. xtakes on at least two values
6. eis normal 6. e is normal
Instead of assuming that x is a fixed value and e is random, we make the properties of e conditional on
the particular outcome of x.
OLS properties
 Small-sample properties, If assumption 1–6 hold, then,
 OLS is unbiased  OLS standard errors are unbiased.
 OLS is BLUE  OLS coefficient estimators (conditional on x) are normal
 Asymptotic properties, if We can replace assumption 3 by the weaker assumption 3*as
 E e   0 , cov x, e   0
36
 OLS is biased in small samples if assumption 3* is true but by assumption 3 is not
 Under assumption 1–5, replacing with 3*:
 OLS coefficient estimators are consistent.
 OLS coefficient estimators are asymptotically normal
If x is correlated with e and if 3* is violated, then OLS is biased and inconsistent.
d) Model misspecification
One of the Assumption regression model used in the analysis is “correctly” specified model, if the
model is not “correctly” specified, we encounter the problem of model specification error or model
specification bias.
Model Selection Criteria
A model chosen for empirical analysis should satisfy the following criteria.
1. Be data admissible; that is, predictions made from the model must be logically possible.
2. Be consistent with theory; that is, it must make good economic sense.
3. Have weakly exogenous regressors; that is, the explanatory variables, or regressors, must be
uncorrelated with the error term.
4. Exhibit parameter constancy; that is, the values of the parameters should be stable. Otherwise,
forecasting will be difficulty.
5. Exhibit data coherency; that is, the residuals estimated from the model must be purely random
(technically, white noise).
6. Be encompassing; that is, the model should encompass or include all the rival models in the sense
that it is capable of explaining their results.
In developing an empirical model, one is likely to commit one or more of the following Model
Misspecification:
1. Omission of a relevant variable(s)
2. Inclusion of an unnecessary variable(s)
3. Adopting the wrong functional form
4. Errors of measurement
5. Incorrect specification of the stochastic error term
e) Qualitative variables
In previous chapters, the dependent and independent variables in our multiple regression models have
had quantitative meaning. Examples,
 Hourly wage rate,  amount of air pollution,
 years of education,  level of firm sales, and
 college grade point average,  Number of arrests
In empirical work, we must also incorporate qualitative factors into regression models.
 The gender or race of an individual,
 The industry of a firm (manufacturing, retail, etc.), are all considered to be qualitative
factors.
Qualitative variables often come in the form of binary information:
 A person is female or male;
 a person does or does not own a personal computer;
37
 a firm offers a certain kind of employee pension plan or it does not;
 a state administers capital punishment or it does not.
In all of these examples, the relevant information can be captured by defining a binary variable or a
zero-one variable. In econometrics, binary variables are most commonly called dummy variables,
although this name is not especially descriptive.
In quantitative models, y, our objective is to estimate its expected, or mean, value given the values of
the regressors, where as:
in models where Y is qualitative, our objective is to find the probability of something happening, such
as voting for a Democratic candidate, or owning a house, or belonging to a union, or participating in a
sport etc. Hence, qualitative response regression models are often known as probability models.
There are three approaches to developing a probability model for a binary response variable:
1. The linear probability model
2. The logit model
3. The probit model
38

Econometery ch2

Uploaded by

Copyright:

Available Formats

Econometery ch2

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Econometery ch2

Uploaded by

Copyright:

Available Formats

2.

Possible Causes of Heteroscedasticity

As it were shown in the graph above:

that the transformed data do not exhibit heteroscedasticity.

Consider the model: Yi  α  β1 X i  u i

(8.7049) (0.0744) r 2 = 0.8887 RSS1= 377.17 df= 11

(30.6421) (0.1319) r2 = 0.7681 RSS2= 1889.6 df= 11

Se = (0.7529) (0.0041) ESS = 10.4280 R2 = 0.18

The White test proceeds as follows.

Step 1:Given the data, estimate the residuals ûi and R2

regression. That is,

|u^ i|=β 1+ β2 X i + vi |u^ i|=β 1+ β2

Subtracting equation (**) from (*), we get:

Adjusted R Std. Error of the

a. Predictors: (Constant), sha

b. Dependent Variable: inv

Adjusted R Std. Error of the

Model Sum of Squares Df Mean Square F Sig.

a. Predictors: (Constant), sha

b. Dependent Variable: inv

the residuals are autocorrelated.

equation by an OLS procedure.

Adjusted R Std. Error of Durbin-

autocorrelation structure has been properly dealt with.

1935 1940 1945 1950 1955

X*3i= 5X2i. Therefore, there is perfect Collinearity between X2 and X3

coefficients are poorly estimated due to MC.

2.) The Condition Number (CN)

Alternatively, if the CI (= K ) is between 10 and 30, there is moderate to strong multicollinearity

also have a major problem if the mean of each  i is not zero.

 The estimated regression parameter will be unbiased if the measurement error,

Consequently, it would be fortuitous to have considerable variability in X, relative to  i , when

expenditure, xi = current income and  i = stochastic disturbance term.

 The least squares estimators are still unbiased. If

is not small relative to

robust regression may need to be used.

 The estimated regression parameter will be unbiased if the measurement error in y,

Consequently, it would be fortuitous to have considerable variability in X, relative to

measurement error is apt to be a problem.

is the measurement error with E ai   0 and varai    a .

 yi   0  1 X i   i where i   i  1a is a compound of equation and measurement errors. Now

can no longer assume that the composite error term

Xi because assuming E  i  0 , cov i , X i   E i  E  i X i  E  X i    1 a

2.6 Lagged variables

THE KOYCK’ APPROACH LAG MODELS

variable, “ceteris paribus”.

You might also like