Chapter 1
Chapter 1
Chapter 1
1.1. Introduction
Regression analysis is one of the most commonly used tools in econometric work.
Definition: Regression analysis is concerned with describing and evaluating the relationship
between a given variable (often called the dependent variable) and one or more variables
which are assumed to influence the given variable (often called independent or explanatory
variables).
The simplest economic relationship is represented through a two-variable model (also called
the simple linear regression model) which is given by:
Y = a + bX
where a and b are unknown parameters (also called regression coefficients) that we estimate
using sample data. Here Y is the dependent variable and X is the independent variable.
Example: Suppose the relationship between expenditure (Y) and income (X) of households is
expressed as:
Y = 0.6X + 120
Here, on the basis of income, we can predict expenditure. For instance, if the income of a
certain household is 1500 Birr, then the estimated expenditure will be:
expenditure = 0.6(1500) + 120 = 1020 Birr
Note that since expenditure is estimated on the basis of income, expenditure is the dependent
variable and income is the independent variable.
Thus, a full specification of a regression model should include a specification of the probability
distribution of the disturbance (error) term. This information is given by what we call basic
assumptions or assumptions of the classical linear regression model (CLRM).
Here the subscript i refers to the ith observation. In the CLRM, Yi and X i are observable while
εi is not. If i refers to some point or period of time, then we speak of time series data. On the
other hand, if i refers to the ith individual, object, geographical region, etc., then we speak of
cross-sectional data.
• Assumption (1) states that the relationship between Yi and X i is linear, and that the
deterministic component ( α + β X i ) and the stochastic component ( εi ) are additive.
• Assumption (2) tells us that the mean of the Yi is:
E(Yi ) = α + β X i
This simply means that the mean value of Yi is non-stochastic.
• Assumption (3) tells us that every disturbance has the same variance σ 2 whose value is
unknown, that is, regardless of whether the X i are large or small, the dispersion of the
disturbances is the same. For example, the variation in consumption level of low income
households is the same as that of high income households.
• Assumption (4) states that the disturbances are uncorrelated. For example, the fact that
output is higher than expected today should not lead to a higher (or lower) than expected
output tomorrow.
Econometrics lecture notes 2 Emmanuel GabreYohannes
• Assumption (5) states that X i are not random variables, and that the probability
distribution of εi is in no way affected by the X i .
• We need assumption (6) for parameter estimation purposes and also to make inferences on
the basis of the normal (t and F) distribution.
where α and β are estimated by α̂ and β̂ , respectively, and Ŷ is the estimated value of Y.
The dominating and powerful estimation method of the parameters (or regression coefficients)
α and β is the method of least squares. The deviations between the observed and estimated
values of Y are called the residuals ε̂i , that is:
εˆ = Y − Y ˆ , i = 1, 2, . . ., n
i i i
The magnitude of the residuals is the vertical distance between the actual observed points and
the estimating line (see the figure below).
The estimating line will have a ‘good fit’ if it minimizes the error between the estimated points
on the line and the actual observed points that were used to draw it. Our aim is then to
determine the equation of such an estimating line in such a way that the error in estimation is
minimized.
By partial differentiation of the SSE with respect to α̂ and β̂ and equating the results to zero
we get:
∂ SSE
= − 2 ∑ (Yi − αˆ − βˆ Xi ) = 0
∂ αˆ
∂ SSE
= − 2 ∑ X i (Yi − αˆ − βˆ X i ) = 0
∂βˆ
Re-arranging the two equations, we get the so-called normal equations:
Econometrics lecture notes 3 Emmanuel GabreYohannes
∑Y i = nαˆ + βˆ ∑ X i
∑X Y i i = αˆ ∑ X i + βˆ ∑ X i2
Thus, we have two equations with two unknowns α̂ and β̂ . Solving for α̂ and β̂ we get:
n ∑ X i Yi − (∑ X i )(∑ Yi )
= ∑ i2i
X Y − nXY
βˆ =
n ∑ X i − (∑ X i )
2 2
∑ Xi − nX 2
αˆ = Y − βˆ X
where X and Y are the mean values of the independent and dependent variables, respectively,
1 1
that is, X = ∑ X i and Y = ∑ Yi
n n
α̂ and β̂ are said to be the ordinary least-squares (OLS) estimators of α and β ,
ˆ = αˆ + βˆ X is called the least squares line or the estimated
respectively. The line Y
regression line of Y on X.
⇒ Y = α + βX + u ………….(2)
βˆ =
∑ x i yi
∑ xi2
The theorem tells us that of all estimators of α and β which are linear and which are
unbiased, the estimators resulting from OLS have the minimum variance, that is, α̂ and β̂
are the best (most efficient) linear unbiased estimators (BLUE) of α and β.
Note: If some of the assumptions stated above do not hold, then OLS estimators are no more
BLUE!!!
Here we will prove that β̂ is the BLUE of β. The proof for α̂ can be done similarly.
βˆ =
∑ x i yi = ∑ a y
∑ xi2
i i
xi
where a i = , x i = X i − X and yi = Yi − Y . Thus, we can see that β̂ is a linear
∑ x i2
estimator as it can be written as a weighted average of the individual observations on Y.
βˆ =
∑x yi i
=
∑ x (βx + ε )
i i i
=
β∑ x i2 + ∑x εi i
= β +
∑x ε
i i
(*)
∑x 2
i ∑x 2
i ∑x 2
i ∑x 2
i
Now we have:
E(β) = β (since β is a constant)
E(∑ x i εi ) = ∑ x i E(εi ) = ∑ x i (0) = 0 (since x i is non-stochastic (assumption 5),
and E ( ε i ) = 0 (assumption 2)).
Thus:
E(βˆ ) = E(β) + E(
∑ x ε ) = β + ∑ x E(ε )
i i i i
= β + 0 = β
∑x 2
i ∑x 2
i
⇒ β̂ is an unbiased estimator of β.
c) To show that β̂ has the smallest variance out of all linear unbiased estimators of β
Note:
1. The OLS estimators α̂ and β̂ are calculated from a specific sample of observations of
the dependent and independent variables. If we consider a different sample of
observations for Y and X, we get different values for α̂ and β̂ . This means that the
values of α̂ and β̂ may vary from one sample to another, and hence, are random
variables.
∑ x i = ∑ xi2 + ∑x x i j
i =1 i =1 i≠ j
This is simply the sum of squares ( x i2 ) plus the sum of cross-product terms ( x i x j for
i ≠ j ).
βˆ − β =
∑x ε i i
∑x 2
i
Var(βˆ ) = E(βˆ − β) = E
2
2
∑ xi
1 n 2 2 n
= E ∑ x i εi + ∑ x ε x ε
( ∑ x i2 ) i = 1
2 i i j j
i≠ j
1 n 2 n
∑ x i E(εi ) + ∑ x x E(ε ε )
2
= i j i j
(∑ x ) 2 2
i i = 1 i≠ j
1 n 2 2 n
= ∑ x i (σ ) + ∑ x x (0)
i j …… (**)
(∑ x ) 2 2
i i = 1 i≠ j
1 2 n 2 σ2
= σ ∑ x i =
(∑ x ) 2 2
i i =1 ∑ x i2
Note that (**) follows from assumptions (3) and (4), that is, var(ε i ) = E ( ε i2 ) = σ 2 for all i
and cov(ε i , ε j ) = E ( ε i ε j ) = 0 for i ≠ j .
σ2
Thus, Var(βˆ ) =
∑ x i2
We have seen above (in proof (a)) that the OLS estimator of β can be expressed as:
βˆ =
∑ x i yi = ∑ a y
∑ x i2
i i
xi
where a i = . Now let β* be another linear unbiased estimator of β given by:
∑ i
x 2
β* = ∑c y i i
xi
β* = ∑c y = ∑ + di (β x i + εi )
(since yi = βx i + εi )
∑ xi
i i 2
=β
∑x 2
i
+ β∑ d i x i +
∑ x i εi + ∑ d ε
∑x ∑ x i2
2 i i
i
Thus, for β* to be unbiased (that is, for E(β* ) = β to hold) we should have:
∑ d i x i = 0 ……… (***)
The variance of β* is given by:
Var(β* ) = E(β* − β)2 = E ( ∑ ci εi )
2
= E ∑ ci2 εi2 + ∑ ci c jεi ε j = ∑ ci2 E(εi2 ) + ∑ c c E(ε ε )
i j i j
i≠ j i≠ j
= ∑ ci (σ ) + ∑ ci c j (0) = σ 2 ∑ ci2
2 2
i≠ j
xi
2
x i2 2x i d i 2
= σ ∑
x2
+ d2
i = σ 2
∑ + + di
∑ i ( x2
∑ i )
2
∑ x i2
= σ 2 ∑x 2
i
+ σ2
2∑ x i d i
+ σ 2 ∑ d i2 (but ∑d x = 0 from (***))
∑x
i i
(∑ x ) 2 2 2
i i
σ2
= + σ 2 ∑ d i2 = Var(βˆ ) + σ2 ∑ d i2
∑ xi2
βˆ = β +
∑ x i εi = β + ∑ a ε
∑ x i2
i i
xi
where a i = and x i = X i − X .
∑ x i2
Therefore, the probability distributions of the OLS estimators will depend upon the
assumptions made about the probability distribution of the error term. The nature of the
probability distribution of the error term is important for hypothesis testing (or for making
inferences about α and β) and also for estimation purposes.
In regression analysis, it is usually assumed that the error terms follow the normal distribution
with mean 0 and variance σ 2 .
= exp − i
2πσ2 2 σ
1 1
= exp − 2 (Yi − α − βXi )2
2πσ 2
2σ
Consider the linear model: Yi = α + βX i + εi . Under the assumption that the error terms
εi follow the normal distribution with mean 0 and variance σ 2 , Yi is also normally
distributed with:
Mean = E(Yi ) = E(α + βX i + εi ) = α + βXi
Variance = Var(Yi ) = Var(α + β X i + εi ) = Var(εi ) = σ 2
The ML estimator of a parameter β is the value of β̂ which would most likely generate the
observed sample observations Y1 , Y2 , . . ., Yn . The ML estimator maximizes the likelihood
function L which is the product of the individual probabilities (since Y1 , Y2 , . . ., Yn are
randomly selected implying independence) taken over all n observations given by:
Our aim is to maximize this likelihood function L with respect to the parameters α, β and σ 2 .
To do this, it is more convenient to work with the natural logarithm of L (called the log-
likelihood function) given by:
n n 1 n
2 ∑
log L = − log(σ 2 ) − log(2π) − (Yi − α − βX i )2
2 2 2σ i = 1
Taking partial derivatives of log L with respect to α, β & σ 2 and equating to zero, we get the
ML estimators.
By partial differentiation of the log L with respect to α and β and equating the results to zero
we get:
∂ Log L 1
= − 2 ∑ (Yi − α − βX i )(−X i ) = 0
∂β 2σ
∂ Log L 1
= − 2 ∑ (Yi − α − βX i )(−1) = 0
∂α 2σ
∑Y i = nα ML + βML ∑X i
Note that these equations are similar to the normal equations that we obtained earlier. Solving
for βML and α ML we get:
n ∑ X i Yi − ( ∑ X )( ∑ Y ) ∑ X Y − nXY = βˆ
i i i i
β ML = =
n∑ X 2
i − (∑ X ) i
2
∑ X − nX 2
i
2
α ML = Y − β ML X = αˆ
By partial differentiation of the log L with respect to σ 2 and equating to zero we get:
∂ Log L n 1 1 −1
∂σ 2
= − 2 −
2σ
∑ (Yi − α − βX i ) 2 2 2 = 0
2 (σ )
Note
1) The ML estimators α̂ ML and β̂ML are identical to the OLS estimators, and are thus best
linear unbiased estimators (BLUE) of α and β, respectively.
2) The ML estimator σ̂ 2ML of σ 2 is biased.
Proof
ˆ = Y − (αˆ + βˆ X )
εˆ i = Yi − Yi i i
Yi = α + βX i + εi
⇒ Y = α + βX + ε
⇒ α = Y − βX − ε …………. (2)
n n n n
∑ εˆ i2 = (β − βˆ ) 2 ∑ x i2 + 2(β − βˆ )∑ x iei +
i =1 i =1 i =1
∑e
i =1
2
i ………… (5)
βˆ − β =
∑ xi εi = ∑ xi (εi − ε ) = ∑ xi ei (since ∑x ε i
=
ε ∑ xi
=
ε (0)
= 0)
∑ x i2 ∑ x i2 ∑ x i2 ∑x i
2
∑x 2
i ∑ xi2
⇒ ∑x e i i = (βˆ − β)∑ x i2 = − (β − βˆ )∑ x i2 ………. (6)
n n n n n n
∑ εˆ i2 =
i =1
(β − βˆ )2 ∑ x i2 − 2(β − βˆ )2 ∑ x i2 +
i =1 i =1
∑ ei2 = − (β − βˆ )2 ∑ xi2 +
i =1 i =1
∑e
i =1
2
i
i =1
i
n n n
• E ∑ ei2 = E ∑ (εi − ε ) 2 = E ∑ εi2 − n ε 2
i =1 i =1 i =1
n
σ2
= ∑ E(εi2 ) − nE( ε 2 ) = nσ 2 − n = nσ 2 − σ2 = (n −1)σ2
i =1 n
Thus,
1 n
2 1 n 2 n − 2 2
E(σˆ 2ML ) = E ∑ ˆ
ε í = E ∑ εˆ í =
σ ≠ σ
2
n i =1 n i =1 n
V̂(βˆ ) =
σˆ 2
=
∑ εˆ 2ï
∑ x i2 (n − 2)∑ x i2
The square root of V̂(βˆ ) is called the standard error of β̂ , that is,
σˆ 2
s.e.(βˆ ) = ˆ βˆ )
V( =
∑ xi2
Econometrics lecture notes 11 Emmanuel GabreYohannes
Tests of significance of regression coefficients
βˆ − β0 βˆ − 0 βˆ
t = = =
s.e.(βˆ ) s.e.(βˆ ) s.e.(βˆ )
and compare this figure with the value from the student’s t distribution with (n-2) degrees of
freedom for a given significance level α.
Decision rule: If | t | > t α / 2 (n − 2) then we reject the null hypothesis, and conclude that there
is a significant relationship between X and Y.
Is the estimated equation a useful one? To answer this, an objective measure of some sort is
desirable.
∑ (Y − Y)
i =1
i
2
= ∑ (Y − Yˆ )
i =1
i i
2
+ ∑ ( Yˆ − Y)
i =1
i
2
+ 2∑ (Yi − Y
ˆ )( Y
i =1
i
ˆ − Y) …… (*)
i
Remark (1)
∑ εˆ i = 0 where εˆ i = Yi − Yˆ i = Yi − αˆ − βˆ Xi
Proof
∑ εˆ i = ∑ (Y i − αˆ − βˆ X i ) = ∑ Y − ∑ αˆ
i − βˆ ∑ X i = nY − nαˆ − nβˆ X
= n[(Y − βˆ X) − αˆ ] = n[ αˆ − αˆ ] = 0
Remark (2)
∑ εˆ i Xi = 0
Proof
∑ εˆ i Xi = ∑ (Y − αˆ − βˆ X )X i i i
= ∑ Y X − ∑ αˆ X − ∑ βˆ X
i i i
2
i
= ∑ Y X − nαˆ X − ∑ βˆ X
i i
2
i
= ∑ Y X − n(Y − βˆ X)X − ∑ βˆ X
i i
2
i
= ∑ Y X − nXY − βˆ [∑ X − nX ]
i i
2
i
2
= ∑ Y X − nXY −
∑ Y X − nXY [∑ X i i 2
− nX 2 ] = 0
∑ X − nX
i i 2 2 i
i
Thus, the cross product term in equation (*) vanishes, and we are left with:
n n n
∑ (Y − Y)
i =1
i
2
= ∑ (Y − Yˆ )
i =1
i i
2
+ ∑ ( Yˆ − Y)
i =1
i
2
In other words, the total sum of squares (TSS) is decomposed into regression (explained) sum
of squares (RSS) and error (residual or unexplained) sum of squares (ESS).
• The total sum of squares (TSS) is a measure of dispersion of the observed values of Y
about their mean. This is computed as:
n n
TSS = ∑ (Yi − Y)2 =
i =1
∑y
i =1
2
i
• The regression (explained) sum of squares (RSS) measures the amount of the total
variability in the observed values of Y that is accounted for by the linear relationship
between the observed values of X and Y. This is computed as:
ˆ − Y)2 = βˆ 2 ∑ (X − X)2 = βˆ 2 ∑ x 2
n n n
RSS = ∑ (Y i i i
i =1 i =1 i =1
• The error (residual or unexplained) sum of squares (ESS) is a measure of the dispersion of
the observed values of Y about the regression line. This is computed as:
ESS = ∑ (Yi − Yˆ ) 2 = TSS – RSS
i
If a regression equation does a good job of describing the relationship between two variables,
the explained sum of squares should constitute a large proportion of the total sum of squares.
Thus, it would be of interest to determine the magnitude of this proportion by computing the
ratio of the explained sum of squares to the total sum of squares. This proportion is called the
sample coefficient of determination, R 2 . That is:
RSS ESS
Coefficient of determination = R 2 = = 1 −
TSS TSS
The largest value that R 2 can assume is 1 (in which case all observations fall on the regression
line), and the smallest it can assume is zero. A low value of R 2 is an indication that:
• X is a poor explanatory variable in the sense that variation in X leaves Y unaffected, or
• while X is a relevant variable, its influence on Y is weak as compared to some other
variables that are omitted from the regression equation, or
• the regression equation is misspecified (for example, an exponential relationship might
be more appropriate.
To test for the significance of R 2 , we compare the variance ratio with the critical value from
the F distribution with 1 and (n-2) degrees of freedom in the numerator and denominator,
respectively, for a given significance level α.
Decision: If the calculated variance ratio exceeds the tabulated value, that is, if
Fcal > Fα (1, n − 2) , we then conclude that R 2 is significant (or that the linear regression
model is adequate).
Note: The F test is designed to test the significance of all variables or a set of variables in a
regression model. In the two-variable model, however, it is used to test the explanatory power
of a single variable (X), and at the same time, is equivalent to the test of significance of R 2 .
Illustrative example
Consider the following data on the percentage rate of change in electricity consumption
(millions KWH) (Y) and the rate of change in the price of electricity (Birr/KWH) (X) for the
years 1979 – 1994.
year X Y year X Y
1979 -0.13 17.93 1987 2.57 52.17
1980 0.29 14.56 1988 0.89 39.66
1981 -0.12 32.22 1989 1.80 21.80
1982 0.42 22.20 1990 7.86 -49.51
1983 0.08 54.26 1991 6.59 -25.55
1984 0.80 58.61 1992 -0.37 6.43
1985 0.24 15.13 1993 0.16 15.27
1986 -1.09 39.25 1994 0.50 60.40
βˆ =
∑ xy = −779.235 = – 8.45147
∑ x 2 92.20109
αˆ = Y − βˆ X = 23.42688 − (−8.45147)(1.280625) = 34.25004
n
RSS = βˆ 2 ∑ x i2 = (−8.45147)2 (92.20109) = 6585.679
i =1
RSS 6585.679
⇒
2
R = = = 0.4978
TSS 13228.7
ANOVA table
Decision: Since the calculated variance ratio exceeds the critical value, we reject the null
hypothesis of no linear relationship between price and consumption of electricity at the 5%
level of significance. Thus, we then conclude that R 2 is significant, that is, the linear
regression model is adequate and is useful for prediction purposes.
t = - 8.45147 = – 3.72548
2.268562
For α = 0.05, the critical value from the student’s t distribution with (n-2) degrees of freedom
is: t α / 2 (n − 2) = t 0.025 (14) = 2.14479
Decision: Since | t | > t α / 2 (n − 2) , we reject the null hypothesis, and conclude that β is
significantly different from zero. In other words, the price of electricity significantly and
negatively affects electricity consumption.
The interpretation of the estimated regression coefficient β̂ = – 8.45147 is that for a one
percent drop (increase) in the growth rate of price of electricity, there is an 8.45 percent
increase (decrease) in the growth rate of electricity consumption.