Chapter 1

UNIT I
A TWO-VARIABLE REGRESSION MODEL
1.1. Introduction
Regression analysis is one of the most commonly used tools in econometric work.
Definition: Regression analysis is concerned with describing and evaluating the relationship
between a given variable (often called the dependent variable) and one or more variables
which are assumed to influence the given variable (often called independent or explanatory
variables).
The simplest economic relationship is represented through a two-variable model (also called
the simple linear regression model) which is given by:
Y = a + bX
where a and b are unknown parameters (also called regression coefficients) that we estimate
using sample data. Here Y is the dependent variable and X is the independent variable.
Example: Suppose the relationship between expenditure (Y) and income (X) of households is
expressed as:
Y = 0.6X + 120
Here, on the basis of income, we can predict expenditure. For instance, if the income of a
certain household is 1500 Birr, then the estimated expenditure will be:
expenditure = 0.6(1500) + 120 = 1020 Birr
Note that since expenditure is estimated on the basis of income, expenditure is the dependent
variable and income is the independent variable.
The error term

Consider the above model: Y = 0.6X + 120. This functional relationship is deterministic or
exact, that is, given income we can determine the exact expenditure of a household. But in
reality this rarely happens: different households with the same income are not expected to
spend equal amounts due to habit persistence, geographical and time variation, etc. Thus, we
should express the regression model as:
Yi = α + β X i + εi
where εi is the random error term (also called disturbance term).
Generally the reasons for including the error term include:

• Omitted variables: a model is a simplification of reality. It is not always possible to
include all relevant variables in a functional form. For instance, we may construct a
model relating demand and price of a commodity. But demand is influenced not only by
own price: income of consumers, price of substitutes and several other variables also
influence it. The omission of these variables from the model introduces an error.
• Measurement error: inaccuracy in collection and measurement of sample data.
• Sampling error: Consider a model relating consumption (Y) with income (X) of
households. The sample we randomly choose to examine the relationship may turn out
to be predominantly poor households. In such cases, our estimation of α and β from
this sample may not be as good as that from a balanced sample group.
Note that the size of the error εi is not fixed: it is non-deterministic or stochastic or
probabilistic in nature. This in turn implies that Yi is also probabilistic in nature. Thus, the
probability distribution of Yi and its characteristics are determined by the values of X i and by
the probability distribution of εi .
Thus, a full specification of a regression model should include a specification of the probability
distribution of the disturbance (error) term. This information is given by what we call basic
assumptions or assumptions of the classical linear regression model (CLRM).
Consider the model:

Yi = α + β X i + εi , i = 1, 2, . . ., n
Here the subscript i refers to the ith observation. In the CLRM, Yi and X i are observable while
εi is not. If i refers to some point or period of time, then we speak of time series data. On the
other hand, if i refers to the ith individual, object, geographical region, etc., then we speak of
cross-sectional data.
1.2. Assumptions of the classical linear regression model
1. The true model is: Yi = α + β X i + εi

2. The error terms have zero mean: E(εi ) = 0 .
3. Homoscedasticity (error terms have constant variance): var(ε i ) = E ( ε i2 ) = σ 2 for all i
4. No error autocorrelation (the error terms ε i are statistically independent of each
other): cov(ε i , ε j ) = E ( ε i ε j ) = 0 for i ≠ j .
5. X i are deterministic (non-stochastic): X i and ε j are independent for all i, j
6. Normality: εi are normally distributed with mean zero and variance σ 2 for all i (often
written as: εi ∼ N( 0 , σ 2 ) ).
Let us examine the meaning of these assumptions:
• Assumption (1) states that the relationship between Yi and X i is linear, and that the
deterministic component ( α + β X i ) and the stochastic component ( εi ) are additive.
• Assumption (2) tells us that the mean of the Yi is:
E(Yi ) = α + β X i
This simply means that the mean value of Yi is non-stochastic.
• Assumption (3) tells us that every disturbance has the same variance σ 2 whose value is
unknown, that is, regardless of whether the X i are large or small, the dispersion of the
disturbances is the same. For example, the variation in consumption level of low income
households is the same as that of high income households.
• Assumption (4) states that the disturbances are uncorrelated. For example, the fact that
output is higher than expected today should not lead to a higher (or lower) than expected
output tomorrow.
Econometrics lecture notes 2 Emmanuel GabreYohannes
• Assumption (5) states that X i are not random variables, and that the probability
distribution of εi is in no way affected by the X i .
• We need assumption (6) for parameter estimation purposes and also to make inferences on
the basis of the normal (t and F) distribution.
1.3. The ordinary least squares (OLS) method of Estimation

In the regression model Yi = α + β X i + εi , the values of the parameters α and β are not
known. When they are estimated from a sample of size n, we obtain the sample regression
line given by:
ˆ = αˆ + βˆ X
Y i = 1, 2, . . ., n
i i
where α and β are estimated by α̂ and β̂ , respectively, and Ŷ is the estimated value of Y.
The dominating and powerful estimation method of the parameters (or regression coefficients)
α and β is the method of least squares. The deviations between the observed and estimated
values of Y are called the residuals ε̂i , that is:
εˆ = Y − Y ˆ , i = 1, 2, . . ., n
i i i
The magnitude of the residuals is the vertical distance between the actual observed points and
the estimating line (see the figure below).
The estimating line will have a ‘good fit’ if it minimizes the error between the estimated points
on the line and the actual observed points that were used to draw it. Our aim is then to
determine the equation of such an estimating line in such a way that the error in estimation is
minimized.
The sum of squares of the errors (SSE) is:

SSE = ∑ εˆ i 2 = ∑ (Yi − Y ˆ ) 2 = ∑ (Y − αˆ − βˆ X ) 2
ï i i
By partial differentiation of the SSE with respect to α̂ and β̂ and equating the results to zero
we get:
∂ SSE
= − 2 ∑ (Yi − αˆ − βˆ Xi ) = 0
∂ αˆ
∂ SSE
= − 2 ∑ X i (Yi − αˆ − βˆ X i ) = 0
∂βˆ
Re-arranging the two equations, we get the so-called normal equations:
∑Y i = nαˆ + βˆ ∑ X i
∑X Y i i = αˆ ∑ X i + βˆ ∑ X i2
Thus, we have two equations with two unknowns α̂ and β̂ . Solving for α̂ and β̂ we get:
n ∑ X i Yi − (∑ X i )(∑ Yi )
= ∑ i2i
X Y − nXY
βˆ =
n ∑ X i − (∑ X i )
2 2
∑ Xi − nX 2
αˆ = Y − βˆ X
where X and Y are the mean values of the independent and dependent variables, respectively,
1 1
that is, X = ∑ X i and Y = ∑ Yi
n n
α̂ and β̂ are said to be the ordinary least-squares (OLS) estimators of α and β ,
ˆ = αˆ + βˆ X is called the least squares line or the estimated
respectively. The line Y
regression line of Y on X.
Note: Model in deviations form

Consider the regression model:
Yi = α + βX i + u i ………. (1)
Applying summation to both sides of the equation and dividing by n we have:
n n n n
Yi α βXi ui
∑
i =1 n
= ∑
i =1 n
+ ∑
i =1 n
+ ∑
i =1 n
⇒ Y = α + βX + u ………….(2)
Subtracting equation (2) from (1) we get:

Yi − Y = β(X i − X) + (u i − u) …………. (3)
Letting x i = X i − X , yi = Yi − Y and εi = (u i − u) , we can write equation (3) as:
yi = βx i + εi ……….(4)
Equation (4) is the simple linear regression (or two-variable) model in deviations form.
The OLS estimator of β from equation (4) is given by:
βˆ =
∑ x i yi
∑ xi2
1.4. The Gauss-Markov Theorem

Under assumptions (1) – (5) of the CLRM, the OLS estimators α̂ and β̂ are Best Linear
Unbiased Estimators (BLUE).
The theorem tells us that of all estimators of α and β which are linear and which are
unbiased, the estimators resulting from OLS have the minimum variance, that is, α̂ and β̂
are the best (most efficient) linear unbiased estimators (BLUE) of α and β.
Note: If some of the assumptions stated above do not hold, then OLS estimators are no more
BLUE!!!

Proof of Gauss-Markov Theorem
Here we will prove that β̂ is the BLUE of β. The proof for α̂ can be done similarly.
a) To show that β̂ is a linear estimator

The OLS estimator of β can be expressed as:
βˆ =
∑ x i yi = ∑ a y
∑ xi2
i i
xi
where a i = , x i = X i − X and yi = Yi − Y . Thus, we can see that β̂ is a linear
∑ x i2
estimator as it can be written as a weighted average of the individual observations on Y.
b) To show that β̂ is an unbiased estimator of β
Note: An estimator θ̂ of θ is said to be unbiased if: E(θˆ ) = θ .
Consider the model in deviations form: yi = βx i + εi .
βˆ =
∑x yi i
=
∑ x (βx + ε )
i i i
=
β∑ x i2 + ∑x εi i
= β +
∑x ε
i i
(*)
∑x 2
i ∑x 2
i ∑x 2
i ∑x 2
i
Now we have:
E(β) = β (since β is a constant)
E(∑ x i εi ) = ∑ x i E(εi ) = ∑ x i (0) = 0 (since x i is non-stochastic (assumption 5),
and E ( ε i ) = 0 (assumption 2)).
Thus:
E(βˆ ) = E(β) + E(
∑ x ε ) = β + ∑ x E(ε )
i i i i
= β + 0 = β
∑x 2
i ∑x 2
i
⇒ β̂ is an unbiased estimator of β.
c) To show that β̂ has the smallest variance out of all linear unbiased estimators of β
Note:
1. The OLS estimators α̂ and β̂ are calculated from a specific sample of observations of
the dependent and independent variables. If we consider a different sample of
observations for Y and X, we get different values for α̂ and β̂ . This means that the
values of α̂ and β̂ may vary from one sample to another, and hence, are random
variables.
2. The variance an estimator (a random variable) θ̂ of θ is given by:

Var(θˆ ) = E(θˆ − θ)2

2
 n 
3. The expression:  ∑ x i  can be written in expanded form as:
 i =1 
2
 n  n n
 ∑ x i  = ∑ xi2 + ∑x x i j
i =1  i =1 i≠ j
This is simply the sum of squares ( x i2 ) plus the sum of cross-product terms ( x i x j for
i ≠ j ).
From equation (*) we have:
βˆ − β =
∑x ε i i
∑x 2
i
The variance of β̂ is thus given by:

 ∑ x i εi 
2
Var(βˆ ) = E(βˆ − β) = E 
2
2 
 
 ∑ xi 
1  n 2 2 n 
= E  ∑ x i εi + ∑ x ε x ε 
( ∑ x i2 )  i = 1
2 i i j j
i≠ j 
1  n 2 n 
 ∑ x i E(εi ) + ∑ x x E(ε ε ) 
2
= i j i j
(∑ x ) 2 2
i i = 1 i≠ j 
1  n 2 2 n 
=  ∑ x i (σ ) + ∑ x x (0) 
i j …… (**)
(∑ x ) 2 2
i i = 1 i≠ j 
1  2 n 2  σ2
=  σ ∑ x i  =
(∑ x ) 2 2
i  i =1  ∑ x i2
Note that (**) follows from assumptions (3) and (4), that is, var(ε i ) = E ( ε i2 ) = σ 2 for all i
and cov(ε i , ε j ) = E ( ε i ε j ) = 0 for i ≠ j .
σ2
Thus, Var(βˆ ) =
∑ x i2
We have seen above (in proof (a)) that the OLS estimator of β can be expressed as:
βˆ =
∑ x i yi = ∑ a y
∑ x i2
i i
xi
where a i = . Now let β* be another linear unbiased estimator of β given by:
∑ i
x 2
β* = ∑c y i i

xi
where ci = + d i and d i are arbitrary constants (real numbers). β* can be written as:
∑ xi2
 xi 
β* = ∑c y = ∑  + di (β x i + εi )

(since yi = βx i + εi )
 ∑ xi
i i 2

=β
∑x 2
i
+ β∑ d i x i +
∑ x i εi + ∑ d ε
∑x ∑ x i2
2 i i
i
Taking expectations we have:


E(β* ) = E  β + β∑ d i x i +
∑ xi εi + d ε 
 ∑ i i
 ∑ x i2 
= β + β∑ d i x i (since E(x i εi ) = x i E(εi ) = 0 , E(d i εi ) = d i E(εi ) = 0 )
Thus, for β* to be unbiased (that is, for E(β* ) = β to hold) we should have:
∑ d i x i = 0 ……… (***)
The variance of β* is given by:
Var(β* ) = E(β* − β)2 = E ( ∑ ci εi )
2
 
= E  ∑ ci2 εi2 + ∑ ci c jεi ε j  = ∑ ci2 E(εi2 ) + ∑ c c E(ε ε )
i j i j
 i≠ j  i≠ j
= ∑ ci (σ ) + ∑ ci c j (0) = σ 2 ∑ ci2
2 2
i≠ j
 xi 
2
 
 x i2 2x i d i 2
= σ ∑
 x2
+ d2

i = σ 2
∑ + + di
∑ i   ( x2
 ∑ i )
2
∑ x i2 

= σ 2 ∑x 2
i
+ σ2
2∑ x i d i
+ σ 2 ∑ d i2 (but ∑d x = 0 from (***))
∑x
i i
(∑ x ) 2 2 2
i i
σ2
= + σ 2 ∑ d i2 = Var(βˆ ) + σ2 ∑ d i2
∑ xi2
Thus, we have shown that:

Var(β* ) = Var(βˆ ) + σ 2 ∑ di2
Since ∑d 2
i (which is a sum of squares of real numbers) is always greater than or equal to
zero, we have:
Var(β* ) ≥ Var(βˆ )
This implies that the variance of β̂ is the smallest as compared to the variance of any other
linear unbiased estimator of β.
Hence, we conclude that β̂ is the BLUE of β.

1.5. Maximum likelihood (ML) method of estimation
Probability distribution of error terms

The OLS estimators α̂ and β̂ are both linear functions of the error term, which is random by
assumption. For example:
βˆ = β +
∑ x i εi = β + ∑ a ε
∑ x i2
i i
xi
where a i = and x i = X i − X .
∑ x i2
Therefore, the probability distributions of the OLS estimators will depend upon the
assumptions made about the probability distribution of the error term. The nature of the
probability distribution of the error term is important for hypothesis testing (or for making
inferences about α and β) and also for estimation purposes.
In regression analysis, it is usually assumed that the error terms follow the normal distribution
with mean 0 and variance σ 2 .
Since εi = Yi − α − βX i , the probability distribution of εi would be:

1  1  ε − E(εi ) 
2

p(εi ) = exp −  i  
2πσ 2  2  sd(εi )  
1  1  ε − 0  
2
= exp −  i 
2πσ2  2  σ  
1  1 
= exp − 2 (Yi − α − βXi )2 
2πσ 2
 2σ 
Here sd(εi ) is the standard deviation of εi , that is, sd(εi ) = Var(εi ) = σ
Consider the linear model: Yi = α + βX i + εi . Under the assumption that the error terms
εi follow the normal distribution with mean 0 and variance σ 2 , Yi is also normally
distributed with:
Mean = E(Yi ) = E(α + βX i + εi ) = α + βXi
Variance = Var(Yi ) = Var(α + β X i + εi ) = Var(εi ) = σ 2
Thus, the probability distribution of Yi can be written as:

1  1  Y − E(Y ) 2 
p(Yi ) = exp −  i i
 
2πσ 2 2
  sd(Yi )  
1  1 
= exp − 2 (Yi − α − βXi )2 
2πσ 2
 2σ 

ML estimation focuses on the fact that different populations generate different samples, and
any one sample being scrutinized is more likely to have come from some population than from
others.
The ML estimator of a parameter β is the value of β̂ which would most likely generate the
observed sample observations Y1 , Y2 , . . ., Yn . The ML estimator maximizes the likelihood
function L which is the product of the individual probabilities (since Y1 , Y2 , . . ., Yn are
randomly selected implying independence) taken over all n observations given by:
L(Y1 , Y2 , . . ., Yn , α, β, σ2 ) = p(Y1 ) p(Y2 ). . . p(Yn )

1  1 n 
= n
exp − 2 ∑ (Y i − α − βXi ) 2 
( 2πσ2 )  2σ i =1 
Our aim is to maximize this likelihood function L with respect to the parameters α, β and σ 2 .
To do this, it is more convenient to work with the natural logarithm of L (called the log-
likelihood function) given by:
n n 1 n
2 ∑
log L = − log(σ 2 ) − log(2π) − (Yi − α − βX i )2
2 2 2σ i = 1
Taking partial derivatives of log L with respect to α, β & σ 2 and equating to zero, we get the
ML estimators.
By partial differentiation of the log L with respect to α and β and equating the results to zero
we get:
∂ Log L 1
= − 2 ∑ (Yi − α − βX i )(−X i ) = 0
∂β 2σ
∂ Log L 1
= − 2 ∑ (Yi − α − βX i )(−1) = 0
∂α 2σ
Re-arranging the two equations, and replacing β by βML and α by α ML , we get:

∑X Y i i = α ML ∑ X i + βML ∑X 2
i
∑Y i = nα ML + βML ∑X i
Note that these equations are similar to the normal equations that we obtained earlier. Solving
for βML and α ML we get:
n ∑ X i Yi − ( ∑ X )( ∑ Y ) ∑ X Y − nXY = βˆ
i i i i
β ML = =
n∑ X 2
i − (∑ X ) i
2
∑ X − nX 2
i
2
α ML = Y − β ML X = αˆ
By partial differentiation of the log L with respect to σ 2 and equating to zero we get:
∂ Log L n 1  1  −1 
∂σ 2
= −  2 −
2σ 
∑ (Yi − α − βX i ) 2  2 2  = 0
2  (σ ) 

Replacing σ 2 by σ 2ML and simplifying, we get:
n n n
1 1 1
σˆ 2ML =
n
∑ (Yi − α ML − βML Xi )2 =
i =1 n
∑
i =1
(Yi − αˆ − βˆ X i ) 2 =
n
∑ εˆ
i =1
2
í
Note
1) The ML estimators α̂ ML and β̂ML are identical to the OLS estimators, and are thus best
linear unbiased estimators (BLUE) of α and β, respectively.
2) The ML estimator σ̂ 2ML of σ 2 is biased.
Proof
ˆ = Y − (αˆ + βˆ X )
εˆ i = Yi − Yi i i
= (α + β X i + εi ) − (αˆ + βˆ Xi ) = (α − αˆ ) + (β − βˆ )X i + εi ……. (1)
Yi = α + βX i + εi
⇒ Y = α + βX + ε
⇒ α = Y − βX − ε …………. (2)
We know that the OLS estimator of α is given by:

αˆ = Y − βˆ X …………. (3)
Subtracting (3) from (2) we get:

α − αˆ = (βˆ − β)X − ε = − (β − βˆ )X − ε …………. (4)
Substituting relation (4) in (1) we get:

εˆ i = − (β − βˆ )X − ε + (β − βˆ )X i + εi = (β − βˆ )(Xi − X) + (εi − ε )
= (β − βˆ )x + e i i
where x i = X i − X and ei = εi − ε . Squaring both sides and taking summations we have:
n n n n
∑ εˆ i2 = (β − βˆ ) 2 ∑ x i2 + 2(β − βˆ )∑ x iei +
i =1 i =1 i =1
∑e
i =1
2
i ………… (5)
From the two-variable model in deviations form we have:
βˆ − β =
∑ xi εi = ∑ xi (εi − ε ) = ∑ xi ei (since ∑x ε i
=
ε ∑ xi
=
ε (0)
= 0)
∑ x i2 ∑ x i2 ∑ x i2 ∑x i
2
∑x 2
i ∑ xi2
⇒ ∑x e i i = (βˆ − β)∑ x i2 = − (β − βˆ )∑ x i2 ………. (6)
Substituting (6) in (5) we have:
n n n n n n
∑ εˆ i2 =
i =1
(β − βˆ )2 ∑ x i2 − 2(β − βˆ )2 ∑ x i2 +
i =1 i =1
∑ ei2 = − (β − βˆ )2 ∑ xi2 +
i =1 i =1
∑e
i =1
2
i

 
 n
 n n n  σ 2 
• E  − (β − βˆ )2 ∑ x i2  = − ∑ x i2 E (β − βˆ ) 2  = − ∑ x i2 var(βˆ ) = − ∑ x i2  n  = − σ2
   x2 
 ∑
i =1 i =1 i =1 i =1
i =1
i

 n   n   n 
• E  ∑ ei2  = E  ∑ (εi − ε ) 2  = E  ∑ εi2 − n ε 2 
 i =1   i =1   i =1 
n
 σ2 
= ∑ E(εi2 ) − nE( ε 2 ) = nσ 2 − n   = nσ 2 − σ2 = (n −1)σ2
i =1  n 
Then it follows that:

 n   n
  n 
E  ∑ εˆ i2  = E  − (β − βˆ )2 ∑ x i2  + E  ∑ ei2  = − σ 2 + (n − 1)σ 2 = (n − 2)σ 2
 i =1   i =1   i =1 
Thus,
1 n
2 1  n 2 n − 2 2
E(σˆ 2ML ) = E ∑ ˆ
ε í  = E  ∑ εˆ í  = 

σ ≠ σ

2
n i =1  n  i =1  n
From the above result it follows that an unbiased estimator of σ 2 is:

1 n 2
σˆ 2 = ∑ εˆ i
n − 2 i =1
1.6. Statistical inference in simple linear regression model
Estimation of standard error

To make statistical inferences about the true (population) regression coefficient β, we make
use of the estimator β̂ and its variance Var(βˆ ) . We have already seen that:
σ2
Var(βˆ ) =
∑ x i2
where x i = X i − X . Since this variance depends on the unknown parameter σ 2 , we have to
estimate σ 2 . As shown above, an unbiased estimator of σ 2 is given by:
n n
1 1
σˆ 2 = ∑
n − 2 i =1
(Yi − αˆ − βˆ X i )2 = ∑
n − 2 i =1
εˆ í2
Thus, an unbiased estimator of Var(βˆ ) is given by:
V̂(βˆ ) =
σˆ 2
=
∑ εˆ 2ï
∑ x i2 (n − 2)∑ x i2
The square root of V̂(βˆ ) is called the standard error of β̂ , that is,
σˆ 2
s.e.(βˆ ) = ˆ βˆ )
V( =
∑ xi2
Tests of significance of regression coefficients
Consider the simple linear regression model:

Yi = α + βX i + εi
If there is no relationship between X and Y, then this is equivalent to saying β = 0 (β is not
significantly different from zero). Thus, the null hypothesis of no relationship between X and Y
is expressed as:
H0 : β = 0
The alternative hypothesis is that there is a significant relationship between X and Y, that is,
H1 : β ≠ 0
In order to reject or not reject the null hypothesis, we calculate the test statistic given by:
βˆ − β0 βˆ − 0 βˆ
t = = =
s.e.(βˆ ) s.e.(βˆ ) s.e.(βˆ )
and compare this figure with the value from the student’s t distribution with (n-2) degrees of
freedom for a given significance level α.
Decision rule: If | t | > t α / 2 (n − 2) then we reject the null hypothesis, and conclude that there
is a significant relationship between X and Y.
Confidence interval for β

Confidence interval provides a range of values which are likely to contain the true regression
parameter. With every confidence interval, we associate a level of statistical significance (α).
The confidence intervals are constructed in such a way that the probability of the interval to
contain the true parameter is (1 - α). Symbolically,
P[− t α / 2 (n − 2) < t < t α / 2 (n − 2)] = 1 − α

 
 βˆ − β0 
⇒ P − t α / 2 (n − 2) < < t α / 2 (n − 2) = 1 − α
 βˆ 
 s.e.( ) 
 
⇒ P[ β − t α / 2 (n − 2) s.e.(β) < β0 < β + t α / 2 (n − 2) s.e.(βˆ ) ] = 1 − α
ˆ ˆ ˆ
Thus, a (1 - α)100% confidence interval for β is given by:

βˆ ± t α / 2 (n − 2) s.e.(βˆ )
1.7. Test of model adequacy
Is the estimated equation a useful one? To answer this, an objective measure of some sort is
desirable.
The total variation in the dependent variable Y is given by:

2
Variation(Y) = ∑ (Yi − Y)
Our goal is to partition this variation into two: one that accounts for variation due to the
regression equation (explained portion) and another that is associated with the unexplained
portion of the model.

We can write Yi − Y as:
Yi − Y = (Yi − Y ˆ ) + (Y ˆ − Y)
i i
Squaring both sides and taking summations we have:

n n n n
∑ (Y − Y)
i =1
i
2
= ∑ (Y − Yˆ )
i =1
i i
2
+ ∑ ( Yˆ − Y)
i =1
i
2
+ 2∑ (Yi − Y
ˆ )( Y
i =1
i
ˆ − Y) …… (*)
i
Remark (1)
∑ εˆ i = 0 where εˆ i = Yi − Yˆ i = Yi − αˆ − βˆ Xi
Proof
∑ εˆ i = ∑ (Y i − αˆ − βˆ X i ) = ∑ Y − ∑ αˆ
i − βˆ ∑ X i = nY − nαˆ − nβˆ X
= n[(Y − βˆ X) − αˆ ] = n[ αˆ − αˆ ] = 0
Remark (2)
∑ εˆ i Xi = 0
Proof
∑ εˆ i Xi = ∑ (Y − αˆ − βˆ X )X i i i
= ∑ Y X − ∑ αˆ X − ∑ βˆ X
i i i
2
i
= ∑ Y X − nαˆ X − ∑ βˆ X
i i
2
i
= ∑ Y X − n(Y − βˆ X)X − ∑ βˆ X
i i
2
i
= ∑ Y X − nXY − βˆ [∑ X − nX ]
i i
2
i
2
= ∑ Y X − nXY −
∑ Y X − nXY [∑ X i i 2
− nX 2 ] = 0
∑ X − nX
i i 2 2 i
i
From remarks (1) and (2) it follows that:

∑ (Yi − Yˆ i ) ( Yˆ i − Y) = ∑ εˆ i ( Yˆ i − Y) = ∑ εˆ i Yˆ i − ∑ εˆ i Y
= ∑ εˆ i (αˆ + βˆ X i ) − Y ∑ εˆ i = αˆ ∑ εˆ i + βˆ ∑ εˆ i Xi = 0

=0 =0 =0
Thus, the cross product term in equation (*) vanishes, and we are left with:
n n n
∑ (Y − Y)
i =1
i
2
= ∑ (Y − Yˆ )
i =1
i i
2
+ ∑ ( Yˆ − Y)
i =1
i
2
Variation in Y Residual variation Explained variation

TSS = ESS + RSS.
In other words, the total sum of squares (TSS) is decomposed into regression (explained) sum
of squares (RSS) and error (residual or unexplained) sum of squares (ESS).

Computational formulas
• The total sum of squares (TSS) is a measure of dispersion of the observed values of Y
about their mean. This is computed as:
n n
TSS = ∑ (Yi − Y)2 =
i =1
∑y
i =1
2
i
• The regression (explained) sum of squares (RSS) measures the amount of the total
variability in the observed values of Y that is accounted for by the linear relationship
between the observed values of X and Y. This is computed as:
ˆ − Y)2 = βˆ 2  ∑ (X − X)2  = βˆ 2 ∑ x 2
n n n
RSS = ∑ (Y i  i  i
i =1  i =1  i =1
• The error (residual or unexplained) sum of squares (ESS) is a measure of the dispersion of
the observed values of Y about the regression line. This is computed as:
ESS = ∑ (Yi − Yˆ ) 2 = TSS – RSS
i
If a regression equation does a good job of describing the relationship between two variables,
the explained sum of squares should constitute a large proportion of the total sum of squares.
Thus, it would be of interest to determine the magnitude of this proportion by computing the
ratio of the explained sum of squares to the total sum of squares. This proportion is called the
sample coefficient of determination, R 2 . That is:
RSS ESS
Coefficient of determination = R 2 = = 1 −
TSS TSS
The coefficient of determination can also be computed as:

βˆ ∑ x i yi
R2 = where x i = X i − X and yi = Yi − Y .
∑ yi2
Note
1) The proportion of total variation in the dependent variable (Y) that is explained by changes
in the independent variable (X) or by the regression line is equal to: R 2 x100%.
2) The proportion of total variation in the dependent variable (Y) that is due to factors other
than X (for example, due to excluded variables, chance, etc) is equal to: (1– R 2 ) x 100%.
Tests for the coefficient of determination ( R 2 )
The largest value that R 2 can assume is 1 (in which case all observations fall on the regression
line), and the smallest it can assume is zero. A low value of R 2 is an indication that:
• X is a poor explanatory variable in the sense that variation in X leaves Y unaffected, or
• while X is a relevant variable, its influence on Y is weak as compared to some other
variables that are omitted from the regression equation, or
• the regression equation is misspecified (for example, an exponential relationship might
be more appropriate.

Thus, a small value of R 2 casts doubt about the usefulness of the regression equation. We do
not, however, pass final judgment on the equation until it has been subjected to an objective
statistical test. Such a test is accomplished by means of analysis of variance (ANOVA) which
enables us to test the significance of R 2 (i.e., the adequacy of the linear regression model).
The ANOVA table for simple linear regression is given below:
ANOVA table for simple linear regression

Source of Sum of Degrees of Mean square Variance ratio
variation squares freedom
Regression RSS 1 RSS/1 RSS /1
Fcal =
ESS /(n − 2)
Residual ESS n-2 ESS/(n-2)
Total TSS n-1
To test for the significance of R 2 , we compare the variance ratio with the critical value from
the F distribution with 1 and (n-2) degrees of freedom in the numerator and denominator,
respectively, for a given significance level α.
Decision: If the calculated variance ratio exceeds the tabulated value, that is, if
Fcal > Fα (1, n − 2) , we then conclude that R 2 is significant (or that the linear regression
model is adequate).
Note: The F test is designed to test the significance of all variables or a set of variables in a
regression model. In the two-variable model, however, it is used to test the explanatory power
of a single variable (X), and at the same time, is equivalent to the test of significance of R 2 .
Illustrative example
Consider the following data on the percentage rate of change in electricity consumption
(millions KWH) (Y) and the rate of change in the price of electricity (Birr/KWH) (X) for the
years 1979 – 1994.
year X Y year X Y
1979 -0.13 17.93 1987 2.57 52.17
1980 0.29 14.56 1988 0.89 39.66
1981 -0.12 32.22 1989 1.80 21.80
1982 0.42 22.20 1990 7.86 -49.51
1983 0.08 54.26 1991 6.59 -25.55
1984 0.80 58.61 1992 -0.37 6.43
1985 0.24 15.13 1993 0.16 15.27
1986 -1.09 39.25 1994 0.50 60.40
Summary statistics: (note here that: x i = X i − X and yi = Yi − Y )

n = 16, X = 1.280625, Y = 23.42688, ∑x 2
i = 92.20109, ∑y 2
i = 13228.7,
∑x yi i = –779.235

Estimation of regression coefficients
The slope β̂ and the intercept α̂ are computed as:
βˆ =
∑ xy = −779.235 = – 8.45147
∑ x 2 92.20109
αˆ = Y − βˆ X = 23.42688 − (−8.45147)(1.280625) = 34.25004
Therefore, the estimated regression equation is:

Yˆ = αˆ + βˆ X ⇔ Ŷ = 34.25004 – 8.45147X
Test of model adequacy

n
TSS = ∑y
i =1
2
i = 13228.7
n
RSS = βˆ 2 ∑ x i2 = (−8.45147)2 (92.20109) = 6585.679
i =1
ESS = TSS – RSS = 13228.7 – 6585.679 = 6643.016
RSS 6585.679
⇒
2
R = = = 0.4978
TSS 13228.7
Thus, we can conclude that:

• About 50% of the variation in electricity consumption is due to changes in the price of
electricity.
• The remaining 50% of the variation in electricity consumption is not due to changes in
the price of electricity, but instead due to chance and other factors not included in the
model.
ANOVA table
Source of Sum of Degrees of Mean square Variance ratio

variation squares freedom
Regression 6585.679 1 6585.679 13.87916
Residual 6643.016 16 – 2 = 14 474.5011
Total 13228.7 16 – 1 = 15
For α = 0.05, the critical value from the F-distribution is:

Fα (1, n − 2) = F0.05 (1,14) = 4.60
Decision: Since the calculated variance ratio exceeds the critical value, we reject the null
hypothesis of no linear relationship between price and consumption of electricity at the 5%
level of significance. Thus, we then conclude that R 2 is significant, that is, the linear
regression model is adequate and is useful for prediction purposes.

Estimation of the standard error of β and test of its significance
An unbiased estimator of the error variance σ 2 is given by:

n
1 ESS 6643.016
σˆ 2 = ∑
n − 2 i =1
εˆ í2 =
n−2
=
16 − 2
= 474.5011
Thus, an unbiased estimator of Var(βˆ ) is given by:

σˆ 2 474.5011
V̂(βˆ ) = = = 5.146372
∑ xi2 92.20109
The standard error of β̂ is:
s.e.(βˆ ) = ˆ βˆ )
V( = 5.146372 = 2.268562
The hypothesis of interest is:

H0 : β = 0
H1 : β ≠ 0
We calculate the test statistic:
t = - 8.45147 = – 3.72548
2.268562
For α = 0.05, the critical value from the student’s t distribution with (n-2) degrees of freedom
is: t α / 2 (n − 2) = t 0.025 (14) = 2.14479
Decision: Since | t | > t α / 2 (n − 2) , we reject the null hypothesis, and conclude that β is
significantly different from zero. In other words, the price of electricity significantly and
negatively affects electricity consumption.
The interpretation of the estimated regression coefficient β̂ = – 8.45147 is that for a one
percent drop (increase) in the growth rate of price of electricity, there is an 8.45 percent
increase (decrease) in the growth rate of electricity consumption.

Chapter 1

Uploaded by

Copyright:

Available Formats

Chapter 1

Uploaded by

Document Information

Original Description:

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Chapter 1

Uploaded by

Copyright:

Available Formats

UNIT I

A TWO-VARIABLE REGRESSION MODEL

The error term

Generally the reasons for including the error term include:

Consider the model:

1.2. Assumptions of the classical linear regression model

1. The true model is: Yi = α + β X i + εi

Let us examine the meaning of these assumptions:

1.3. The ordinary least squares (OLS) method of Estimation

The sum of squares of the errors (SSE) is:

Note: Model in deviations form

Subtracting equation (2) from (1) we get:

1.4. The Gauss-Markov Theorem

Econometrics lecture notes 4 Emmanuel GabreYohannes

a) To show that β̂ is a linear estimator

b) To show that β̂ is an unbiased estimator of β

Note: An estimator θ̂ of θ is said to be unbiased if: E(θˆ ) = θ .

Consider the model in deviations form: yi = βx i + εi .

2. The variance an estimator (a random variable) θ̂ of θ is given by:

Econometrics lecture notes 5 Emmanuel GabreYohannes

From equation (*) we have:

The variance of β̂ is thus given by:

Econometrics lecture notes 6 Emmanuel GabreYohannes

Taking expectations we have:

Thus, we have shown that:

Hence, we conclude that β̂ is the BLUE of β.

Econometrics lecture notes 7 Emmanuel GabreYohannes

Probability distribution of error terms

Since εi = Yi − α − βX i , the probability distribution of εi would be:

Here sd(εi ) is the standard deviation of εi , that is, sd(εi ) = Var(εi ) = σ

Thus, the probability distribution of Yi can be written as:

Econometrics lecture notes 8 Emmanuel GabreYohannes

L(Y1 , Y2 , . . ., Yn , α, β, σ2 ) = p(Y1 ) p(Y2 ). . . p(Yn )

Re-arranging the two equations, and replacing β by βML and α by α ML , we get:

Econometrics lecture notes 9 Emmanuel GabreYohannes

= (α + β X i + εi ) − (αˆ + βˆ Xi ) = (α − αˆ ) + (β − βˆ )X i + εi ……. (1)

We know that the OLS estimator of α is given by:

Subtracting (3) from (2) we get:

Substituting relation (4) in (1) we get:

where x i = X i − X and ei = εi − ε . Squaring both sides and taking summations we have:

From the two-variable model in deviations form we have:

Substituting (6) in (5) we have:

Econometrics lecture notes 10 Emmanuel GabreYohannes

Then it follows that:

From the above result it follows that an unbiased estimator of σ 2 is:

1.6. Statistical inference in simple linear regression model

Estimation of standard error

Thus, an unbiased estimator of Var(βˆ ) is given by:

Consider the simple linear regression model:

Confidence interval for β

P[− t α / 2 (n − 2) < t < t α / 2 (n − 2)] = 1 − α

Thus, a (1 - α)100% confidence interval for β is given by:

1.7. Test of model adequacy

The total variation in the dependent variable Y is given by:

Econometrics lecture notes 12 Emmanuel GabreYohannes

Squaring both sides and taking summations we have:

From remarks (1) and (2) it follows that:

Variation in Y Residual variation Explained variation

Econometrics lecture notes 13 Emmanuel GabreYohannes