Assignment
Assignment
Assignment
Econometrics
Theoretical Applied
where :
o Y iis the dependent variable,
o uiis the error term that captures the unobserved factors affecting Y i.
The PRF is theoretical because we rarely have data for the entire population. It
represents the "true" underlying relationship that exists in the entire population.
Sample Regression Function (SRF):
The SRF is an estimate of the PRF, based on the data we have collected from a
sample of the population. It is written as:
Y^ i= β^ 0 + β^ 1 X i
where :
o Y^ iis the estimated (or predicted) value of the dependent variable,
The SRF is used to approximate the PRF by applying statistical methods like
Ordinary Least Squares (OLS) to sample data. The goal is to get estimates ^β 0∧ β^ 1
that are close to the true values β 0∧β 1 of the PRF.
Assumption 2: X values are fixed in repeated sampling. Values taken by the regressor X are
considered fixed in repeated samples. More technically, X is assumed to be non-stochastic.
Assumption 3: Zero mean value of disturbance ui. Given the value of X, the mean, or
expected, value of the random disturbance term ui is zero. Technically, the conditional mean
value of ui is zero. Symbolically, we have E ( ui| X i )=0 .
¿ E ( ui|X i )( u j|X j )
¿0
¿ E¿
¿ E ( ui X i ) −E ( X i ) E ( ui ) since E ( X i ) is nonstochastic
¿ E ( ui X i ) since E ( u i )=0
¿0
Assumption 7: The number of observations n must be greater than the number of parameters
to be estimated. Alternatively, the number of observations n must be greater than the number
of explanatory variables.
Assumption 8: Variability in X values. The X values in a given sample must not all be the
same. Technically, var (X) must be a finite positive number.
Assumption 9: The regression model is correctly specified. Alternatively, there is no
specification bias or error in the model used in empirical analysis.
Show that the ordinary least square (OLS) estimates are BLUE or state and
prove the Gauss-Markov theorem.
Ans:
Statement: If a linear model satisfies the classical assumptions, then the ordinary least
squares (OLS) estimators are unbiased and have minimum variance.
Proof:
Let us take a two−variable linear model as :
Y i=α + β X i +ui
Where :
Now,
β́=
∑ xi yi ∑ x i ( Y i−Ý )
¿
∑ x 2i ∑ x 2i
¿ ¿=
∑ xi Y i − Y ∑ xi
∑ x2i ∑ xi2
¿ ¿=
∑ xi Y i
∑ x 2i
¿ ¿
∴ OLS estimator β́ is a linear estimator because it is a linear function of the observed variable
Y i.
Similarly, we can prove that ά is also a linear function of the observed variable Y i .
Again,
¿ β́ =∑ k i Y i =∑ k i ( α + β x i+u i )
¿=α ∑ k i+ β ∑ k i x i + ∑ k i ui
¿=0+ β + ∑ k i ui E ( β́)= β+ k i ∑ E ( u i ) ¿ ¿= β́ +0= β ¿
¿=β+ ∑ k i ui
¿
¿
E( ά )=α
Now,
Var ( β́ ) ¿
¿=∑ k i σ u +0
2 2
¿=σ 2u ∑ k 2i
¿=σ 2u
∑x
1
2
i
[
∴ k 2i =
1
∑ x 2i ]
Similarly, we find the variance of ά as
var ( ά )=σ u
2 ∑ x 2i
n∑ xi
2
β¿ ¿ ∑ ωi Y i
¿ ¿=α ∑ ωi + β ∑ ω i x i +∑ ωi ui
E ( β ) ¿ α ∑ ωi + β ∑ ωi xi + ∑ ω i E ( ui )
¿
¿ ¿
¿ E [ β ¿ −E ( β¿ ) ]
2
var ( β ¿ )
2
¿=E [ ∑ ω i ui ] {∵ β = β+ Σ ωi ui }
¿
¿
¿ ¿=∑ ω 2i E ( u2i +2 Σ ω i ω i ⋅ E ( u i ui )
¿ ¿=σ 2u ∑ ω2i
¿ ¿=σ 2u ∑ k 2i +σ u2 ∑ c 2i + σ 2u 2 ∑ k i c i
1
+σ u ∑ c i
2 2 2
¿ ¿=σ u
∑ xi 2
¿ ¿
¿ ¿
¿
Similarly, we can prove that Var ( α ) >Var (α) if α ¿ is another estimator of α .
[showed ]
Write down the difference between the error term and the residual term.
Ans:
The error term is a theoretical concept used in regression analysis. It represents the difference
bet the true value of the dependent variable and the value predicted by the regression model.
It's often assumed to have certain properties, such as being normally distributed with a mean
of zero.
The residual term, on the other hand, is the actual difference between the observed variable
and the value predicted by the regression model for a specific data point. Residuals are
calculated after fitting the model to the data and are used to assess how well the model fits the
data.
Multicollinearity
λ 1 X 1 + λ 2 X 2 +⋯+ λk X k =0 ⋯ ⋯ ⋯ ( 1)
where λ 1 , λ 2 , ⋯, λk are constants such that not all of them are zero simultaneously.
Today, however, the term Multicollinearity is also used as the case where the X variables are
intercorrelated but not perfectly so, as follows:
λ 1 X 1 + λ 2 X 2 +⋯+ λk X k + v i=0 ⋯ ⋯ ⋯ ( 2)
Multicollinearity may also be express the nonlinear relationships among the X variables.
Consider the following regression model
Y i =β 0 + β 1 X i + β 2 X 2i + β k X 3i +ui ⋯ ⋯ ⋯ (3 )
2 3
where, say, Y is the total cost of production and X is the output. The variables X i and X i
Sources of Multicollinearity:
e) Model specification
For example, adding polynomial terms to a regression model, especially when the
range of the X variable is small.
f) An over-determined model
This happens when the model has more explanatory variables than the number of
observations. This could happen in medical research where there may be a small
number of patients about whom information is collected on a large number of
variables.
1. Although BLUE, the OLS estimators have large variances and co-variances.
2. Because of consequence 1, the confidence intervals tend to be much wider, leading to
the acceptance of the ‘zero null hypothesis’ (i.e., the true population coefficient is
zero) more readily.
3. Also because of consequence 1, the t ratio of one or more coefficients tends to be
statistically insignificant.
2
4. Although the t ratio of one or more coefficients is statistically insignificant, R , the
overall measure of goodness of fit, can be very high.
5. The OLS estimators and their standard errors can be sensitive to small changes in the
data.
Detection of Multicollinearity:
Remedial Measures:
a) A priori information
b) Combining cross-sectional and time series data
c) Dropping a variable(s) and specification bias
d) Transformation of variables
e) Additional or new data
f) Reducing collinearity in polynomial regressions
g) Other methods of remedying multicollinearity
A Priori Information:
Y i =β 1 + β 2 X 2 i +0 .10 β 2 X 3 i +ui
=β 1 + β 2 X i +u i
^ ^
where X i =X 2 i +0 . 10 X 3i . Once we obtain β 2 , we can estimate β 3 from the postulated
relationship between β 2 and β 3 . We can get priori information from previous empirical work
in which the collinearity problem happens to be less serious or from the relevant theory
underlying the field of study.
Combining Cross-Sectional and Time Series Data:
The combination of cross-sectional and time-series data is known as polling the data.
Suppose we want to study the demand for automobiles in the United States and assume we
have time series data on the number of cars sold, the average price of the car, and consumer
income. Suppose also that
ln Y i =β 1 + β 2 ln Pt + β 3 ln I t +u t
where Y = number of cars sold, P= average price, I = income, and t = time. Our objective is
In time series data the price and income variables generally tend to be highly collinear.
Therefore, we cannot run the preceding regression. A way out of this, if we have cross-
sectional data, we can obtain a fairly reliable estimate of the income elasticity β 3 because in
such data, which are at a point in time, the prices do not very much. Let the cross-sectionally
^
estimated income elasticity be β 3 . Using this estimate, we may write as
Y ¿t =β 1 + β 2 ln Pt +ut
¿ ^
where Y =ln Y − β3 ln I , that is, Y represents that the value of Y after removing from it the
¿
effect of income. We can now obtain an estimate of the price elasticity β 2 from the preceding
regression.
Transformation of Variables:
Suppose we have time series data on consumption expenditure, income and wealth. Income
and wealth are highly correlated. One way of minimizing this dependence is as follows. If the
relation
Y t = β 1 + β 2 X 2t + β 3 X 3 t + ut ⋯ ⋯ ⋯ ( 1)
holds at time t , it must also hold at time t −1 because the origin of time is arbitrary anyway.
Therefore, we have
Y t −1 =β1 + β 2 X 2 ,t −1 + β 3 X 3, t −1 +u t −1 ⋯ ⋯ ⋯ ( 2)
Y t −Y t −1 =β 2 ( X 2t −X 2, t −1 ) + β 3 ( X 3 t −X 3 , t −1 ) +v t ⋯ ⋯ ⋯ (3)
The first difference regression model often reduces the multicollinearity because there is no
reason that differences in the variables will also be highly correlated. Another type of
transformation may be ration transformation.
Since multicollinearity is a sample feature, it is possible that in another sample involving the
same variables collinearity may not be so serious as in the first ample. Sometimes simply
increasing the size of the sample may remove the collinearity problem.
Heteroscedasticity
Definition:
One of the most important assumptions of the classical linear regression model is that the
variance of the disturbance term ui is constant i.e. E ( u i ) =σ ; i=1 , 2 ,⋯ , n this is the
2 2
But in some situations when this assumption is not fulfilled i.e. the variance of the
disturbance term ut is not constant E ( u t ) =σ i :i=1 , 2 ,⋯ , n, then the heteroscedasticity arises.
2 2
Reason of Heteroscedasticity:
There are several reasons why the variances of u1 may be variable, some of which are as
follows:
i) Following the error learning models, as people learn, their errors of behavior become
smaller over time. For example, the number of typing errors. As the number of hours put into
typing practice increases, the average number of typing errors as well as their variances
decreases.
ii) As income grows, people have more discretionary income and hence more scope for
choice about the disposition of their income. Hence σ 21 is likely to increase with income.
iii) Improvement of data collecting technique cases, σ 21decreases.
iv) In the presence of an outlier heteroscedasticity arises.
v) Due to incorrect specification of a regression model, it arises,
vi) If also arises because of incorrect transformation and incorrect functional form.
vii) For the skewness of the distribution of regression, it arises.
we have to estimate n -variance from n -observation’s. That is, one for each variance, a
situation in which estimation is obviously impossible because we cannot estimate a variance
from observation.
Detection of Heteroscedasticity:
there are two types of methods for detecting heteroscedasticity. They are:
i) Informal Method (graphical method)
ii) Formal Methods (such as Park test, Glejser test, White’s test, etc.)
Remedial Measures:
There are two approaches to remediation such as
When σ 21 is known: In this case, we use weighted least squares. The estimators thus
obtained are BLUE.
When σ 21 is unknown: In this case, we use the following four assumptions about the
functional form of σ 21.
Autocorrelation
Definition: The term autocorrelation may be defined as “correlation between members of
a series of observations ordered in time or space.” Symbolically, E ( ui u j ) ≠ 0 ; i≠ j.
Applied to variables measured at the same Applied to data measured at different time
time. points.
Example: Example:
Height and weight of the individuals. Stock prices on consecutive days.
Why does Serial Correlation Occur?
There are several reasons of occurring the serial correlation. Some of them are as
follows:
a) Inertia
b) Specification Bias
i. Excluded variables case
ii. Incorrect functional form
c) Cobweb Phenomenon
d) Lags
e) Manipulation of data or Data transformation
g) Non-stationarity
Sources of Autocorrelation:
Auto-correlated values of the disturbance term a may be observed for many reasons
a) Omitted explanatory variables
It is known that most economic variables tend to be autocorrelated. If an
autocorrelated variable is excluded from the set of explanatory variables,
obviously its influence will be reflected in the random variable u, whose values
will be autocorrelated. This case may be called "Quasi autocorrelation" since it is
due to the autocorrelated pattern of omitted explanatory variables (X's) and not to
the behavioral pattern of the values of the true u.
b) Miss specification of the mathematical form of the model.
If we have adopted a mathematical form that differs from the true form of the
relationship, the u's may show serial correlation.
Run's Test:
We have several residuals that are negative then there is a series of positive residuals
and finally there are several residuals that are again negative. This intuition can be
cheeked by the so-called Run's Test
¿
8 13 1 1 9
Here.
Lengle of run¿5
N¿Total number of observation
N1¿number of positive symbol(+ residuals)
N¿number of negative symbol(- residuals)
R¿number of runs
Then under the null hypothesis that the successive outcomes (here residuals) are independent
and assuming that N1> 10 and N2>10, the number of runs is (asymptotically) normally
distributed with
2 N1 N2
Mean: E ( R )= +1
N
2 N 1 N 2(2 N 1 N 2−N )
Variance : σ 2R =
N 2(N −1)
If the null hypothesis of randomness is sustainable, following the properties of the normal
distribution, we should expect that
Decision Rule:
a) Do not reject the null hypothesis of randomness if R, the number of runs, lies in the
preceding confidence interval.
b) Reject the null hypothesis if the estimate & lies outside these limits.
To test the null hypothesis we use the Durbin-Watson statistic and the statistic is
n
∑ ( u t−u t−1 )2
d= t =2 n
∑ u t2
t =1
∑ ( u t−u t−1 ) 2
∑ ( ut 2+u t−22−2 ut ut −1)
d= t =2 n
= t =2 n
∑ ut 2
∑ u t2
t =1 t =1
n n n
∑ ut +∑ u
2 2
u−1 +2 ∑ ut ut −1
t =2 t =2 t=2
¿ n
∑ ut 2
t=1
n n n
But for a large sample the terms∑ u t , ∑ u t−1 ,∧∑ ut and are approximately
2 2 2
t =2 t =1 t=1
equal. Therefore we may write
n n
2 ∑ ut −12 2 ∑ ut ut −1
t=1 t=2
d≈ n
− n
∑ u t−12 ∑ ut −12
t =1 t=1
¿ 2 ( 1−^ρ )
From this expression, it is obvious that the values of d lie between 0 to 4. When d = 2
then ρ=0. Thus, testing H 0 : ρ=0 is equivalent to testing H 0 : ρ=0.
Firstly: if ^ρ =0 and d=2 then there is no autocorrelation. Thus from the sample data if we
find d ¿ =2accept that there is no autocorrelation in the function.
Secondly: If ^ρ =+ 1 it and d=0 hence there is perfect positive autocorrelation. Therefore if
¿ ¿
0< d <2 ,There is some degree of positive autocorrelation which is stronger when d close to
zero.
Thirdly: If ^ρ =−1 and d=4 then there is perfect negative autocorrelation. Therefore if
¿ ¿
2<d <4 , there is some degree of negative autocorrelation which is stronger when d close to
4.
The problem with this test is that the exact distribution of a is not known. However. Durbin &
Watson have established upper and lower limits for the significance level which are
appropriate to test the hypothesis.