Assignment

Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 20

Basic Concept of Econometrics

 Define econometrics. How does econometrics bridge among economics,


mathematics, and statistics?
Ans:
Econometrics may be defined as the social science in which the tools of economic theory,
mathematics, and statistical inference are applied to analyze economic phenomena.
Econometrics serves as a crucial link between economics, mathematics, and statistics,
transforming theoretical economic models into actionable, data-driven conclusions. It begins
with economic theory, which provides hypotheses about the relationships among key
variables like income, consumption, investment, inflation, and unemployment. These theories
often remain abstract until they are mathematically formulated and empirically tested.
Mathematics is essential in economics, providing a precise way to express economic models
through equations that illustrate the relationships among variables. For example, it can show
how consumer spending changes with income or how interest rates affect inflation.
Econometric models often use linear and non-linear equations with unknown parameters that
economists estimate from data. Key mathematical disciplines like calculus, algebra, and
matrix theory are vital for developing these models and understanding complex multivariate
relationships.
After establishing the mathematical framework, statistical methods are used to validate it
with empirical data. Statistics provides techniques to estimate model parameters, analyze
relationships among variables, and assess model adequacy. Key methods include regression
analysis, which allows econometricians to measure the impact of one variable, like income,
on another, such as consumption, while considering other factors. Additionally, hypothesis
testing evaluates the significance of observed relationships, determining if they are
statistically meaningful or due to random variation.
Moreover, econometrics involves challenges like heteroscedasticity (unequal variances in
error terms), multicollinearity (strong correlations among independent variables), and
endogeneity (correlation between explanatory variables and the error term). Econometricians
address these issues using advanced statistical methods to improve the robustness and
reliability of their models.
 What are the types of econometrics? Discuss the methodology of econometrics.
Ans:
Types of econometrics:

Econometrics

Theoretical Applied

Classical Bayesian Classical Bayesian

Traditional econometric methodology follows these steps:


1. Statement of Theory or Hypothesis: This is the formulation of an economic theory
or hypothesis, such as the relationship between income and consumption, which will
be tested empirically.
2. Specification of the Mathematical Model of the Theory: The theory is expressed in
a formal mathematical equation that relates variables (e.g., consumption = f(income)).
3. Specification of the Statistical or Econometric Model: This step translates the
mathematical model into a statistical form, incorporating terms for random error (to
account for real-world variability), such as Y = β0 + β 1 X +u(where u is the error term).
4. Obtaining the Data: The relevant real-world data for the variables in the model (e.g.,
income and consumption levels) are collected from surveys, reports, or other sources.
5. Estimation of the Parameters of the Econometric Model: Using statistical
techniques like regression analysis, the model’s parameters (e.g., α and β) are
estimated from the data.
6. Hypothesis Testing: Statistical tests are used to determine whether the estimated
parameters are consistent with the original economic theory and whether the
relationships are significant.
7. Forecasting or Prediction: The model is used to predict future values of the
dependent variable (e.g., future consumption based on income forecasts).
8. Using the Model for Control or Policy Purposes: The model helps policymakers
evaluate the potential impact of policy changes or interventions, aiding decision-
making based on predicted outcomes.
 Explain the concept of Population Regression Function (PRF) and Sample
Regression Function (SRF).
Ans:
Population Regression Function (PRF):
The PRF represents the true relationship between the dependent variable Y and the
independent variable XXX in the entire population. It is written as:
Y i=β 0 + β 1 X i+ ui

where :
o Y iis the dependent variable,

o X i is the independent variable,

o β 0∧β 1are the true (but unknown) parameters of the population,

o uiis the error term that captures the unobserved factors affecting Y i.

The PRF is theoretical because we rarely have data for the entire population. It
represents the "true" underlying relationship that exists in the entire population.
Sample Regression Function (SRF):
The SRF is an estimate of the PRF, based on the data we have collected from a
sample of the population. It is written as:

Y^ i= β^ 0 + β^ 1 X i
where :
o Y^ iis the estimated (or predicted) value of the dependent variable,

o ^β ∧ β^ are the estimated parameters from the sample,


0 1

o X i is the independent variable.

The SRF is used to approximate the PRF by applying statistical methods like
Ordinary Least Squares (OLS) to sample data. The goal is to get estimates ^β 0∧ β^ 1
that are close to the true values β 0∧β 1 of the PRF.

 State the significance of including a stochastic disturbance term in an


econometric modeling.
Ans:
The disturbance term ui is a surrogate for all those variables that are omitted from the model
but that collectively affect Y. The significance of including a stochastic disturbance term is
many:
1. Vagueness of theory: The theory, if any, determining the behavior of Y may be, and often
is, incomplete. We might know for certain that weekly income X influences weekly
consumption expenditure Y, but we might be ignorant or unsure about the other variables
affecting Y. Therefore, ui may be used as a substitute for all the excluded or omitted variables
from the model.
2. Unavailability of data: Even when we identify omitted variables and use multiple
regression instead of simple regression, we often lack quantitative data on these variables.
This issue is common in empirical research, as ideal datasets are rarely available. For
example, while including family wealth alongside income would enhance understanding of
family consumption expenditure, such data is typically inaccessible, leading to its exclusion
despite its theoretical importance.
3. Core variables versus peripheral variables: Assume in our consumption-income
example that besides income X 1 , the number of children per family X 2 , sex X 3 , religion X 4,
education X 5 , and geographical region X 6 also affect consumption expenditure. But it is quite
possible that the joint influence of all or some of these variables may be so small and at best
nonsystematic or random that as a practical matter and for cost considerations it does not pay
to introduce them into the model explicitly. One hopes that their combined effect can be
treated as a random variable ui .
4. Intrinsic randomness in human behavior: Even if we succeed in introducing all the
relevant variables into the model, there is bound to be some “intrinsic” randomness in
individual Y’s that cannot be explained no matter how hard we try. The disturbances, the u’s,
may very well reflect this intrinsic randomness.
5. Poor proxy variables: The classical regression model assumes precise measurement of
variables Y and X. However, real-world data often suffers from measurement errors. For
instance, Milton Friedman’s theory of the consumption function posits that permanent
consumption (Y p) depends on permanent income ( X p ). Since these variables are
unobservable, we use proxies like current consumption (Y) and current income (X), which
can lead to discrepancies and measurement errors. Consequently, the disturbance term u may
also include these errors.
6. Principle of parsimony: We aim to keep our regression model simple. If we can explain
Y's behavior with just two or three variables and our theory doesn't require more, there's no
need to complicate it. Let ui represent other variables, but we must avoid omitting significant
ones for simplicity.
7. Wrong functional form: While we may have valid variables and data, we often remain
uncertain about the functional relationship between dependent and independent variables. Is
consumption expenditure linear or nonlinear in relation to income? If linear, it’s expressed as
Y i=β 1 + β 2 X i+u i; if nonlinear, as Y i=β 1 + β 2 X i+ β3 X 2i ui. Two-variable models can be
assessed with scatter plots, but multiple regression complicates this due to the difficulty of
visualizing higher dimensions.
For all these reasons, the stochastic disturbances ui assume an extremely critical role in
regression analysis.
 Discuss the standard assumptions of the ordinary least square (OLS) method.
Ans:
Assumption 1: Linear regression model. The regression model is linear in the parameters, as
Y i=β 1 + β 2 X i+u i

Assumption 2: X values are fixed in repeated sampling. Values taken by the regressor X are
considered fixed in repeated samples. More technically, X is assumed to be non-stochastic.
Assumption 3: Zero mean value of disturbance ui. Given the value of X, the mean, or
expected, value of the random disturbance term ui is zero. Technically, the conditional mean
value of ui is zero. Symbolically, we have E ( ui| X i )=0 .

Assumption 4: Homoscedasticity or equal variance of ui . Given the value of X, the variance


of ui is the same for all observations. That is, the conditional variances of ui are identical.
Symbolically, we have
2
var ( ui| X i )=E [ ui−E ( ui| X i ) ]
2
¿ E(u i ∨X i ) because of assumption 3
2
¿σ
Assumption 5: No autocorrelation between the disturbances. Given any two X values,
X i ∧X j (i≠ j), the correlation between any two ui∧u j (i ≠ j)is zero. Symbolically,

cov ( ui , u j| X i , X j ¿=E {[ ui−E ( ui ) ]| X i } {[ u j−E ( u j ) ]| X j }

¿ E ( ui|X i )( u j|X j )

¿0

Assumption 6: Zero covariance between ui∧ X i ,∨E ( ui X i ) =0.Formally,

cov ( ui , X i )=E [ ui−E ( ui ) ][ X i−E ( X i ) ]

¿ E¿
¿ E ( ui X i ) −E ( X i ) E ( ui ) since E ( X i ) is nonstochastic

¿ E ( ui X i ) since E ( u i )=0

¿0
Assumption 7: The number of observations n must be greater than the number of parameters
to be estimated. Alternatively, the number of observations n must be greater than the number
of explanatory variables.
Assumption 8: Variability in X values. The X values in a given sample must not all be the
same. Technically, var (X) must be a finite positive number.
Assumption 9: The regression model is correctly specified. Alternatively, there is no
specification bias or error in the model used in empirical analysis.
 Show that the ordinary least square (OLS) estimates are BLUE or state and
prove the Gauss-Markov theorem.
Ans:
Statement: If a linear model satisfies the classical assumptions, then the ordinary least
squares (OLS) estimators are unbiased and have minimum variance.
Proof:
Let us take a two−variable linear model as :
Y i=α + β X i +ui

Where :

−( Y i ) : dependent variable−( X i ) :explanatory variable −( α ) :intercept parameter


−( β ) : slope parameter −( u i ) : disturbances

The standard assumptions about disturbances are:

E ( ui ) =0, E ( u 2i ) =σ 2u∧E ( ui u j ) =0 ;(i ≠ j)

The OLS estimators of β and α are


∑ xi yi
β́= 2
∧ά =Y^ − β́ X́ respectively.
Σ xi

where, y i=Y i −Ý


x i=X i −X

Now,

β́=
∑ xi yi ∑ x i ( Y i−Ý )
¿
∑ x 2i ∑ x 2i
¿ ¿=
∑ xi Y i − Y ∑ xi
∑ x2i ∑ xi2
¿ ¿=
∑ xi Y i
∑ x 2i
¿ ¿
∴ OLS estimator β́ is a linear estimator because it is a linear function of the observed variable
Y i.

Similarly, we can prove that ά is also a linear function of the observed variable Y i .
Again,

¿ β́ =∑ k i Y i =∑ k i ( α + β x i+u i )
¿=α ∑ k i+ β ∑ k i x i + ∑ k i ui
¿=0+ β + ∑ k i ui E ( β́)= β+ k i ∑ E ( u i ) ¿ ¿= β́ +0= β ¿
¿=β+ ∑ k i ui
¿
¿

∵ E( β́ )=β . Hence, our estimator β́ is an unbiased estimation of β .


Similarly, we can prove that

E( ά )=α

Now,

Var ( β́ ) ¿

¿=∑ k i σ u +0
2 2

¿=σ 2u ∑ k 2i

¿=σ 2u
∑x
1
2
i
[
∴ k 2i =
1
∑ x 2i ]
Similarly, we find the variance of ά as

var ( ά )=σ u
2 ∑ x 2i
n∑ xi
2

Let β ¿ as any arbitrary linear estimator of β such that

β¿ ¿ ∑ ωi Y i
¿ ¿=α ∑ ωi + β ∑ ω i x i +∑ ωi ui
E ( β ) ¿ α ∑ ωi + β ∑ ωi xi + ∑ ω i E ( ui )
¿

¿ ¿

∴ E ( β ¿ )=β i f ∧onlyif ∑ ωi =0 and ∑ ω i x i=1


Now,

¿ E [ β ¿ −E ( β¿ ) ]
2
var ( β ¿ )
2
¿=E [ ∑ ω i ui ] {∵ β = β+ Σ ωi ui }
¿
¿
¿ ¿=∑ ω 2i E ( u2i +2 Σ ω i ω i ⋅ E ( u i ui )
¿ ¿=σ 2u ∑ ω2i
¿ ¿=σ 2u ∑ k 2i +σ u2 ∑ c 2i + σ 2u 2 ∑ k i c i
1
+σ u ∑ c i
2 2 2
¿ ¿=σ u
∑ xi 2

¿ ¿

Since , Var ( β ) ¿ Var ( β́ )+ σ u Σ c i >Var ( β́ )¿


¿ 2 2

¿ ¿
¿
Similarly, we can prove that Var ( α ) >Var (α) if α ¿ is another estimator of α .

Thus, OLS estimators ά and β́ trave minimum variances.

Hence, OLS estimators are BLUE.

[showed ]

 Write down the difference between the error term and the residual term.

Ans:
The error term is a theoretical concept used in regression analysis. It represents the difference
bet the true value of the dependent variable and the value predicted by the regression model.
It's often assumed to have certain properties, such as being normally distributed with a mean
of zero.

The residual term, on the other hand, is the actual difference between the observed variable
and the value predicted by the regression model for a specific data point. Residuals are
calculated after fitting the model to the data and are used to assess how well the model fits the
data.

Multicollinearity

Definition of the Nature of Multicollinearity:


The term Multicollinearity means the existence of a ‘perfect’, or exact, linear relationship
among some or all explanatory variables of a regression model. For the k -variable regression

involving explanatory variable X 1 , X 2 , ⋯, X k (where X 1 =1 for all observations to allow for


the intercept term), an exact linear relationship is said to exist if the following condition is
satisfied.

λ 1 X 1 + λ 2 X 2 +⋯+ λk X k =0 ⋯ ⋯ ⋯ ( 1)

where λ 1 , λ 2 , ⋯, λk are constants such that not all of them are zero simultaneously.

Today, however, the term Multicollinearity is also used as the case where the X variables are
intercorrelated but not perfectly so, as follows:

λ 1 X 1 + λ 2 X 2 +⋯+ λk X k + v i=0 ⋯ ⋯ ⋯ ( 2)

where v i is a stochastic error term.

Multicollinearity may also be express the nonlinear relationships among the X variables.
Consider the following regression model

Y i =β 0 + β 1 X i + β 2 X 2i + β k X 3i +ui ⋯ ⋯ ⋯ (3 )

2 3
where, say, Y is the total cost of production and X is the output. The variables X i and X i

are obviously functionally related to X i , but the relationship is nonlinear.

Sources of Multicollinearity:

Multicollinearity may be due to the following factors:

a) There is a tendency for economic variables to move together over time.


b) The use of lagged values of some explanatory variables as separate independent
factors in the relationship.
c) The data collection method employed
For example, sampling over a limited range of the values taken by the regressors in
the population.
d) Constraints on the model or in the population being sampled

For example, in the regression of electricity consumption on income ( X 2 ) and

house size ( X 3 ) , there is a physical constraint in the population in that families


with higher incomes generally have larger homes than families with lower
incomes.

e) Model specification
For example, adding polynomial terms to a regression model, especially when the
range of the X variable is small.

f) An over-determined model
This happens when the model has more explanatory variables than the number of
observations. This could happen in medical research where there may be a small
number of patients about whom information is collected on a large number of
variables.

Practical Consequences of Multicollinearity:

In cases of near or high multicollinearity, one is likely to encounter the following


consequences:

1. Although BLUE, the OLS estimators have large variances and co-variances.
2. Because of consequence 1, the confidence intervals tend to be much wider, leading to
the acceptance of the ‘zero null hypothesis’ (i.e., the true population coefficient is
zero) more readily.
3. Also because of consequence 1, the t ratio of one or more coefficients tends to be
statistically insignificant.
2
4. Although the t ratio of one or more coefficients is statistically insignificant, R , the
overall measure of goodness of fit, can be very high.
5. The OLS estimators and their standard errors can be sensitive to small changes in the
data.

Detection of Multicollinearity:

There are different methods of detecting multicollinearity such as


2
a) High R but few significant t ratios
b) High pair-wise correlations among regressors
c) Examination of partial correlations
d) Auxiliary regressions
e) Eigenvalues and condition index
f) Tolerance and variance inflation factor

Remedial Measures:

There are some rules for removing multicollinearity.

a) A priori information
b) Combining cross-sectional and time series data
c) Dropping a variable(s) and specification bias
d) Transformation of variables
e) Additional or new data
f) Reducing collinearity in polynomial regressions
g) Other methods of remedying multicollinearity

A Priori Information:

Suppose we consider the model Y i =β 1 + β 2 X 2 i + β 3 X 3 i +ui

where Y = consumption, X 2 = income, and X 3 = wealth. But suppose a priori we believe

that β 3 =0 .10 β 2 . We can then run the following regression:

Y i =β 1 + β 2 X 2 i +0 .10 β 2 X 3 i +ui
=β 1 + β 2 X i +u i

^ ^
where X i =X 2 i +0 . 10 X 3i . Once we obtain β 2 , we can estimate β 3 from the postulated

relationship between β 2 and β 3 . We can get priori information from previous empirical work
in which the collinearity problem happens to be less serious or from the relevant theory
underlying the field of study.
Combining Cross-Sectional and Time Series Data:

The combination of cross-sectional and time-series data is known as polling the data.
Suppose we want to study the demand for automobiles in the United States and assume we
have time series data on the number of cars sold, the average price of the car, and consumer
income. Suppose also that

ln Y i =β 1 + β 2 ln Pt + β 3 ln I t +u t

where Y = number of cars sold, P= average price, I = income, and t = time. Our objective is

to estimate the price elasticity β 2 and income elasticity β 3 .

In time series data the price and income variables generally tend to be highly collinear.
Therefore, we cannot run the preceding regression. A way out of this, if we have cross-

sectional data, we can obtain a fairly reliable estimate of the income elasticity β 3 because in
such data, which are at a point in time, the prices do not very much. Let the cross-sectionally
^
estimated income elasticity be β 3 . Using this estimate, we may write as

Y ¿t =β 1 + β 2 ln Pt +ut

¿ ^
where Y =ln Y − β3 ln I , that is, Y represents that the value of Y after removing from it the
¿

effect of income. We can now obtain an estimate of the price elasticity β 2 from the preceding
regression.

Dropping a Variable(s) and Specification Bias:


When faced with multicollinearity, one of the ‘simplest’ things to do is to drop one of the
collinear variables. But in dropping a variable form the model we may be committing a
specification bias or specification error. Specification bias arises from incorrect specification
of the model used in the analysis. So before dropping a variable we should mind that
multicollinearity may prevent precise estimation of the parameters of the model, but omitting
a variable may seriously mislead us as to the true values of the parameters.

Transformation of Variables:
Suppose we have time series data on consumption expenditure, income and wealth. Income
and wealth are highly correlated. One way of minimizing this dependence is as follows. If the
relation

Y t = β 1 + β 2 X 2t + β 3 X 3 t + ut ⋯ ⋯ ⋯ ( 1)

holds at time t , it must also hold at time t −1 because the origin of time is arbitrary anyway.

Therefore, we have

Y t −1 =β1 + β 2 X 2 ,t −1 + β 3 X 3, t −1 +u t −1 ⋯ ⋯ ⋯ ( 2)

If we subtract ( 2 ) from ( 1 ) , we obtain

Y t −Y t −1 =β 2 ( X 2t −X 2, t −1 ) + β 3 ( X 3 t −X 3 , t −1 ) +v t ⋯ ⋯ ⋯ (3)

where v t =ut −ut−1 . Equation ( 3 ) is known as the first difference form.

The first difference regression model often reduces the multicollinearity because there is no
reason that differences in the variables will also be highly correlated. Another type of
transformation may be ration transformation.

Additional or New Data:

Since multicollinearity is a sample feature, it is possible that in another sample involving the
same variables collinearity may not be so serious as in the first ample. Sometimes simply
increasing the size of the sample may remove the collinearity problem.

Reducing Collinearity in Polynomial Regressions:


Polynomial regressions reduce multicollinearity among the explanatory variable. If the
explanatory variable(s) are expressed in the deviation form, multicollinearity is substantially
reduced. But even then the problem may present, in which case one may want to consider
techniques such as orthogonal polynomials.
Other Methods of Remedying Multicollinearity:
Multivariate statistical techniques such as factor analysis and principal components or
techniques such as ridge regression are often employed to ‘solve’ the problem of
multicollinearity.

Heteroscedasticity
Definition:
One of the most important assumptions of the classical linear regression model is that the
variance of the disturbance term ui is constant i.e. E ( u i ) =σ ; i=1 , 2 ,⋯ , n this is the
2 2

assumption of Homoscedasticity, or equal (homo) spread (scedasticity), that is, equal


variance.

But in some situations when this assumption is not fulfilled i.e. the variance of the
disturbance term ut is not constant E ( u t ) =σ i :i=1 , 2 ,⋯ , n, then the heteroscedasticity arises.
2 2

Thus, the non-constant variance of the disturbance term is known as heteroscedasticity.

Reason of Heteroscedasticity:
There are several reasons why the variances of u1 may be variable, some of which are as
follows:
i) Following the error learning models, as people learn, their errors of behavior become
smaller over time. For example, the number of typing errors. As the number of hours put into
typing practice increases, the average number of typing errors as well as their variances
decreases.
ii) As income grows, people have more discretionary income and hence more scope for
choice about the disposition of their income. Hence σ 21 is likely to increase with income.
iii) Improvement of data collecting technique cases, σ 21decreases.
iv) In the presence of an outlier heteroscedasticity arises.
v) Due to incorrect specification of a regression model, it arises,
vi) If also arises because of incorrect transformation and incorrect functional form.
vii) For the skewness of the distribution of regression, it arises.

Consequences of Using OLS in the Presence of Heteroscedasticity:

The consequences of OL.S in the presence of heteroscedasticity are:


a) In the presence of heteroscedasticity, β̇ still unbiased, linear, and consistent but inefficient.
c) The confidence interval based on the OLS estimator will be unnecessarily large.
d) Variance of OLS coefficients will be incorrect i.e. v ( β̇)=∑ k i E ( u1 )=∑k 1 σ α . In this case,
2 2 2 2
i

we have to estimate n -variance from n -observation’s. That is, one for each variance, a
situation in which estimation is obviously impossible because we cannot estimate a variance
from observation.
Detection of Heteroscedasticity:
there are two types of methods for detecting heteroscedasticity. They are:
i) Informal Method (graphical method)
ii) Formal Methods (such as Park test, Glejser test, White’s test, etc.)

Remedial Measures:
There are two approaches to remediation such as

 When σ 21 is known: In this case, we use weighted least squares. The estimators thus
obtained are BLUE.

 When σ 21 is unknown: In this case, we use the following four assumptions about the
functional form of σ 21.

Autocorrelation
Definition: The term autocorrelation may be defined as “correlation between members of
a series of observations ordered in time or space.” Symbolically, E ( ui u j ) ≠ 0 ; i≠ j.

Difference between Autocorrelation and Simple Correlation:


Simple Correlation Autocorrelation

Simple correlation refers to the Autocorrelation refers to the relationship


relationship between two or more different between the successive values of the same
variables. variable.
Used in cross-sectional data or unrelated Used in time series data or data with
variables. temporal order.

Applied to variables measured at the same Applied to data measured at different time
time. points.

To find out how two variables move To identify patterns, trends, or


together. dependencies within a single time series.

Example: Example:
Height and weight of the individuals. Stock prices on consecutive days.
Why does Serial Correlation Occur?
There are several reasons of occurring the serial correlation. Some of them are as
follows:
a) Inertia
b) Specification Bias
i. Excluded variables case
ii. Incorrect functional form
c) Cobweb Phenomenon
d) Lags
e) Manipulation of data or Data transformation
g) Non-stationarity
Sources of Autocorrelation:
Auto-correlated values of the disturbance term a may be observed for many reasons
a) Omitted explanatory variables
It is known that most economic variables tend to be autocorrelated. If an
autocorrelated variable is excluded from the set of explanatory variables,
obviously its influence will be reflected in the random variable u, whose values
will be autocorrelated. This case may be called "Quasi autocorrelation" since it is
due to the autocorrelated pattern of omitted explanatory variables (X's) and not to
the behavioral pattern of the values of the true u.
b) Miss specification of the mathematical form of the model.
If we have adopted a mathematical form that differs from the true form of the
relationship, the u's may show serial correlation.

c) Interpolation in the statistical observations.


Most of the published time series data involve some interpolation and smoothing
processes that average the true disturbances over successive time periods. As a
consequence, the successive values of the u are interrelated and exhibit
autocorrelation patterns.

d) Miss specification of the true random term u.


It may well be expected in many cases for the successive values of the true to be
correlated. Thus, even the purely random factors (wars, draughts, storms, strikes,
etc.) exert influences that are spread over more than one period of time.

What are the Consequences of Autocorrelation?


When the disturbance term exhibits serial correlation, the values as well as the
standard errors of the parameter estimates are affected. In particular
1. When the residuals are serially correlated the parameter estimates of OLS
are statistically unbiased.
2. With auto-correlated values of the disturbance term, the OLS variances of
the parameter estimates are likely to be larger than those of other econometrics
methods
3. The variance of the random term may be seriously underestimated if the's
are auto-correlated
4. Finally, if the values are auto correlated the prediction based on ordinary
least squares estimates will be inefficient.

What are the Methods of Detecting Autocorrelation?


Autocorrelation can be detected by the following methods:
1) Subjective or Qualitative in nature
a. Graphical Method
2) Quantitative method
a. The runs test
b. Durbin-Watson a test
c. Breusch-Godfrey (BG) test.

Run's Test:

We have several residuals that are negative then there is a series of positive residuals
and finally there are several residuals that are again negative. This intuition can be
cheeked by the so-called Run's Test

¿
8 13 1 1 9

Here.
Lengle of run¿5
N¿Total number of observation
N1¿number of positive symbol(+ residuals)
N¿number of negative symbol(- residuals)
R¿number of runs

Then under the null hypothesis that the successive outcomes (here residuals) are independent
and assuming that N1> 10 and N2>10, the number of runs is (asymptotically) normally
distributed with
2 N1 N2
Mean: E ( R )= +1
N

2 N 1 N 2(2 N 1 N 2−N )
Variance : σ 2R =
N 2(N −1)

If the null hypothesis of randomness is sustainable, following the properties of the normal
distribution, we should expect that

Pr [ E ( R )−1.96 σ R ≤ R ≤ E ( R ) +1.96 σ R ]=1−α

Decision Rule:
a) Do not reject the null hypothesis of randomness if R, the number of runs, lies in the
preceding confidence interval.
b) Reject the null hypothesis if the estimate & lies outside these limits.

The Durbin-Watson Test:


Durbin & Watson have suggested a test that is applicable to small samples. However,
the test is appropriate only for the first-order autoregressive schemeut =ρ ut −1 + ε 1 . The
test may be outlined as follows.
The null hypothesis is
H 0 : ρ=0

The alternative hypothesis


H 1 : ρ≠ 0

To test the null hypothesis we use the Durbin-Watson statistic and the statistic is
n

∑ ( u t−u t−1 )2
d= t =2 n

∑ u t2
t =1

Expanding the d statistic we obtain


n n

∑ ( u t−u t−1 ) 2
∑ ( ut 2+u t−22−2 ut ut −1)
d= t =2 n
= t =2 n

∑ ut 2
∑ u t2
t =1 t =1
n n n

∑ ut +∑ u
2 2
u−1 +2 ∑ ut ut −1
t =2 t =2 t=2
¿ n

∑ ut 2
t=1

n n n
But for a large sample the terms∑ u t , ∑ u t−1 ,∧∑ ut and are approximately
2 2 2

t =2 t =1 t=1
equal. Therefore we may write
n n
2 ∑ ut −12 2 ∑ ut ut −1
t=1 t=2
d≈ n
− n

∑ u t−12 ∑ ut −12
t =1 t=1

¿ 2 ( 1−^ρ )

From this expression, it is obvious that the values of d lie between 0 to 4. When d = 2
then ρ=0. Thus, testing H 0 : ρ=0 is equivalent to testing H 0 : ρ=0.
Firstly: if ^ρ =0 and d=2 then there is no autocorrelation. Thus from the sample data if we
find d ¿ =2accept that there is no autocorrelation in the function.
Secondly: If ^ρ =+ 1 it and d=0 hence there is perfect positive autocorrelation. Therefore if
¿ ¿
0< d <2 ,There is some degree of positive autocorrelation which is stronger when d close to
zero.
Thirdly: If ^ρ =−1 and d=4 then there is perfect negative autocorrelation. Therefore if
¿ ¿
2<d <4 , there is some degree of negative autocorrelation which is stronger when d close to
4.
The problem with this test is that the exact distribution of a is not known. However. Durbin &
Watson have established upper and lower limits for the significance level which are
appropriate to test the hypothesis.

You might also like