Chapter Three Chapter Three The Multiple Linear Regression (MLR)
Chapter Three Chapter Three The Multiple Linear Regression (MLR)
Chapter Three Chapter Three The Multiple Linear Regression (MLR)
1
3.1 Introduction: The Multiple Linear Regression
Yi = β 0 + β1 X 1i + β 2 X 2i + • • • + β K X Ki + ε i
Yi = βˆ0 + βˆ1 X 1i + βˆ2 X 2i + • • • + βˆ K X Ki + ei
Residual
Dependent (Response) Independent/Explanatory
2
variable (for sample) variables (for sample)
3.1 Introduction: The Multiple Linear Regression
Additional Assumption:
8. No perfect multicollinearity: That is, no
exact linear relation exists between any
subset of explanatory variables.
In the presence of perfect (deterministic)
linear relationship between/among any
set of the Xj5s, the impact of a single
variable ( β j ) cannot be identified.
More on multicollinearity in chapter 4!
6
3.3 Estimation:The Method of OLS
The Case of K Explanatory Variables
Yi = β 0 + β 1 X 1i + β 2 X 2 i + K + β K X Ki + ε i
Number of parameters is = K+1.
Y1 = βˆ0 + βˆ1 X11 + βˆ2 X 21 + K+ βˆK X K1 + e1
Y = βˆ + βˆ X + βˆ X + K+ βˆ X + e
2 0 1 12 2 22 K K2 2
Y1 1 X11 X 21 X 31 K X K1 βˆ 0 e1
Y 1 ˆ
2 X12 X 22 X 32 K X K2 β1 e 2
Y3 = 1 X13 X 23 X 33 K X K3 • βˆ 2 + e 3
M
M M M M O M M M
Yn 1 X1n X 2n X 3n KX Kn βˆ K e n
n × ( K + 1) (K +1) ×1
n ×1 n ×1
8
Y = Xβ + e
ˆ
3.3 Estimation:The Method of OLS
.
e1 Y1 1 X11 X21 X31 KXK1 βˆ 0
e Y 1 ˆ
2 2 X12 X22 X32 KXK2 β1
e 3 = Y3 − 1 X13 X23 X33 KXK3 * βˆ 2
M M M M M M O M M
en Yn 1 X1n X2n X3n KXKn βˆ K
e = Y − Xβ̂
9
3.3 Estimation:The Method of OLS
e1
e2
RSS= e1 + e2 +...+ en = ∑ei = (e1 e2 K en ).
2 2 2 2
M
e
⇒RSS= e'e n
RSS = (Y − Xβ̂)' (Y − Xβ̂) = Y'Y − Y'Xβˆ − βˆ ' X'Y + βˆ ' X'Xβˆ
Since Y' Xβˆ is a costant, Y' Xβˆ = (Y' Xβˆ )' = βˆ ' X' Y
⇒ RSS = Y'Y − 2βˆ ' X'Y + βˆ ' (X'X)βˆ
∂(RSS) ∂(RSS )
F.O.C. : =0 ⇒ = −2X' Y + 2X' Xβˆ = 0
∂(βˆ ) ∂(βˆ )
⇒ −2X' (Y − Xβˆ ) = 0
10
3.3 Estimation:The Method of OLS
1 1 K 1 e 0
X11 X12 K X1n 1
X e2 0
⇒ X'e = 0 ⇒ 21 X 22 K X 2n . =
M
M
M M O M
X e 0
K1 X K 2 K X Kn n
1. ∑ ei = 0 2. ∑ ei X ji = 0. ( j = 1,2,..., K )
K 1
βˆ0 1 1 X 11 K XK1
1
β
ˆ X 11 X 12 K X 1 n 1 X 12 K XK2
βˆ = 1 X' X = .
M M O M M M O M
M
βˆ X K 1 XK2 K X Kn 1 X
K X Kn
K 1n
n ∑X 1 K ∑ XK
⇒ X' X = ∑ X1 ∑X 1
2
K∑ X1 X K
M M O M
∑ X K ∑ X K X1 K ∑ X K2
∑Y
1 1
1 Y YX
∑
1 K 1
X 11 X 12 K X 1n Y 2
X'Y = ⇒ X ' Y = ∑ YX 2
M M O M M
M
X K 1 X K2 K X Kn Y ∑ YX n
n
12
3.3 Estimation:The Method of OLS
Numerical Example:
Y (Salary in X1 (Years of post high X2 (Years of
'000 Dollars) school Education) Experience)
30 4 10
20 3 8
36 6 11
24 4 9
40 8 12
ƩY = 150 ƩX1 = 25 ƩX2 = 50
14
3.3 Estimation:The Method of OLS
15
3.3 Estimation:The Method of OLS
−1
βˆ 0 n
∑X ∑X 1 2 ∑Y
β = (X' X) X' Y
ˆ −1
βˆ 1 = ∑ X 1
ˆ
∑X ∑X X 2
1 1 2 • ∑YX1
YX
β2 ∑ X 2 ∑X X ∑X ∑ 2
1 2
2
2
17
3.3 Estimation:The Method of OLS
The constant term -23.75 is the salary one with no experience &
no education would get.
18
3.4 Properties of OLS Estimators
Given the assumptions of the CLRM (in Section
3.2), the OLS estimators of the partial
regression coefficients are BLUE: linear,
unbiased & have minimum variance in the class
of all linear unbiased estimators – the Gauss-
Markov Theorem.
In cases where the small-sample desirable
properties (BLUE) may not be found, we look
for asymptotic (or large-sample) properties
like consistency and asymptotic efficiency &
asymptotic normality (CLT).
The OLS estimators are consistent:
p lim(βˆ − β) = 0 & p lim var( βˆ ) = 0
n→ ∞
n →∞
19
3.5 Partial Correlations & Coefficients of Determination
In the multiple regression equation with 2
regressors (X1 & X2), Yi = βˆ 0 + βˆ 1X1i + βˆ 2X2i + ei ,
we can talk of:
the joint effect of X1 and X2 on Y, and
the partial effect of X1 or X2 on Y.
The partial effect of X1 is measured by β̂ 1 &
the partial effect of X2 is measured byβ̂ .
2
Partial effect: holding the other variable
constant or after eliminating the effect of the
other variable.
Thus,β̂ 1 is interpreted as measuring the effect
of X1 on Y after eliminating the effect of X2
from X1.
20
3.5 Partial Correlations and Coefficients of Determination
22
3.5 Partial Correlations and Coefficients of Determination
r 2
y 2•1 = 0.01.
To explain Y, X2 alone can do a good job
(high simple correlation coefficient between
Y & X2).
But after X1 is already included, X2 does
not add much: X1 has done the job of X2
(very low partial correlation coefficient
betweenY & X2).
26
3.5 Partial Correlations and Coefficients of Determination
= (R 2
y•12 − R )∑ y
2
y•1
2
27
3.5 Partial Correlations and Coefficients of Determination
If we now regress that part of Y freed from
the effect of X1 (residualized Y) on the part of
X2 freed from the effect of X1 (residualized
X2), we will be able to explain the following
proportion of the RSSSIMP:
(R 2
y •12 − R )∑ y
2
y •1
2
R 2
y •12 −R 2
y •1
= =
2 i
r
(1 − R )∑ y
y 2•1 2
y •1
2
i 1 − R y2•1
This is Coefficient of Partial Determination
(the square of partial correlation coefficient)
We include X2 if the reduction in RSS (or the
increase in ESS) is significant.
28 But, when exactly? We will see later!
3.5 Partial Correlations and Coefficients of Determination
R =2 ∑ xy
βˆ
Or , R =
2
βˆ
∑x
2 2
∑y 2
∑y 2
∑y 2 n
∑ i
y 2
i =1
Coefficients of Partial Determination:
R 2
y•12 −R 2
y•1 R 2
y •12 −R 2
y •2
r2
y 2•1 = & r2
y1•2 =
1− R 2
y•1 1− R 2
y •2
29
3.5 Partial Correlations and Coefficients of Determination
The coefficient of multiple determination
(R2) measures the proportion of the variation
in the dependent variable explained by (the set
of all the regressors in) the model.
R2 can be used to compare goodness-of-fit of
alternative regression equations, but only if
the regression models satisfy two conditions.
1) The models must have the same dependent
variable.
Reason: TSS, ESS & RSS depend on the units
in which the regressand (Y) is measured.
For instance, the TSS for Y is not the same as
the TSS for ln(Y).
2) The models must have the same number of
30 regressors & parameters (same value of K+1).
3.5 Partial Correlations and Coefficients of Determination
Reason: Adding a variable to a model never
raises the RSS (or, never lowers ESS or R2)
even if the new variable is not very relevant.
The adjusted R-squared, R 2, attaches a penalty to
adding more variables.
It is modified to account for changes/
differences in degrees of freedom (df): due to
differences in number of regressors (K) and/
or sample size (n).
If adding a variable raises R 2 for a regression,
then this is a better indication that it has
improved the model than if the addition
merely raises R 2.
31
3.5 Partial Correlations and Coefficients of Determination
∑ e 2
R = 2 ∑ ˆ
y 2
= 1−
∑ e 2
R2 = 1−
[
n − (K + 1)
]
∑y ∑ [∑
2 2 2
y y
]
n−1
(Dividing TSS and RSS by their df).
K+1 is number of parameters to be estimated
R 2
= 1−[
∑ e 2
•
n −1
]
∑y 2
n − K −1
As long as K ≥ 1,
n−1
R =1−(1−R )•(
2 2
) 1− R 2 >1− R2 ⇒ R 2 < R2
n−K −1 In general, R 2 ≤ R 2
n −1
1− R = (1− R ) • (
2 2
) As n grows larger (relative
n − K −1 to K ), R 2 → R 2 .
32
3.5 Partial Correlations and Coefficients of Determination
2 2
1. While R is always non-negative, R can be
positive or negative.
. 2 can be used to compare goodness-of-fit of
2. R
two/more regression models only if the
models have the same regressand.
3. Including more regressors reduces both RSS
& df; R 2 rises only if the first effect dominates.
. 2 or R2 should never be the sole criterion for
4. R
choosing between/among models:
In addition to R 2, one should also:
consider expected signs & values of
coefficients, and
look for consistency with economic theory
33 or reasoning (possible explanations).
3.5 Partial Correlations and Coefficients of Determination
⇒ TSS = 272
2 . ESS = ∑ yˆ 2 = ∑ 1 1 2 2
( ˆ x + βˆ x ) 2
β
ESS = β1 ∑ x1 + β2 ∑ x2 + 2 βˆ1 βˆ 2 ∑ x1 x2
ˆ 2 2 ˆ 2 2
2
1 ∑
2 2
1
ˆ2 2
1
2
2 ∑
ESS=β ( X −nX )+β ( X −nX )+2βˆ βˆ ( X X −nX X )
ˆ
2 2 1 2 ∑ 1 2 1 2
ESS=( −0.25)
2
[141−5(5)2 ] +(5.5)2 [510−5(10)
2
] +2(−0.25)(5.5)
[262−5(5)(10)]
⇒ ESS = 270.5
OR: ESS= βˆ1 ∑yx1 + βˆ2 ∑yx2
ESS = βˆ1 (∑YX1 − nX1Y) + βˆ2( ∑YX2 − nX 2Y)
34 ⇒ESS= −0.25(62) +5.5(52) = 270.5
3.5 Partial Correlations and Coefficients of Determination
TSS 272
Our model (education & experience together)
explains about 99.45% of the wage differenti al.
RSS (n − K −1) 1.5 2
5. R = 1−
2
= 1− ⇒ R = 0.9890
2
βˆ y1 =
∑ yx 1
=
∑ YX 1 − nX 1Y
=
62
= 3 .875
35
∑x 2
1 ∑X 1
2
− nX 12 16
3.5 Partial Correlations and Coefficients of Determination
37
3.6 Statistical Inferences in Multiple Linear Regression
σ 2
β1 ~ N ( β1 , var( β1 ));
ˆ ˆ var(βˆ
1) =
∑x 2
1i (1 − r )
2
12
σ 2
β 2 ~ N (β 2 , var(β 2 ));
ˆ ˆ β
var( ˆ )=
∑x (1 − r )
2 2 2
2i 12
− σ ( ∑ x 1i x 2 i ) 2
2 2
r12
cov(β1 , β2 ) =
ˆ ˆ
r =
2
∑x 2
1i (1 − r ) is the RSS from regressingX1 on X 2 .
2
12
∑x 2
2i
2i (1 − r ) is the RSS from regressingX 2 on X1 .
2
12
RSS
σˆ 2
= 2
is an unbiased estimator of σ .
n−3 −1
n ∑X1 K ∑ XK
2∑ 1 ∑X K∑ X1 XK
2
X
var- cov(β̂ ) = σ (X X) = σ
2 / -1 1
M M O M
2
39 ∑ XK ∑XK X1 K ∑ XK
3.6 Statistical Inferences in Multiple Linear Regression
−1
n ∑X 1 K ∑ XK
2∑ ∑X K ∑ X1 X K
∧ 2
X1
var − cοv(β) = σˆ
ˆ 1
M M O M
∑ X K ∑ X K X1 K ∑ X K2
Note that:
(a) (X'X)-1 is the same matrix we use to
derive the OLS estimates, and
(b) σˆ =
2 RSS in the case of two regressors.
n−3
In the general case of K explanatory variables,
RSS is an unbiased estimator of σ 2 .
σˆ =
2
n − K −1
40
3.6 Statistical Inferences in Multiple Linear Regression
Ceteris paribus, the higher the correlation
coefficient between X1 & X2 ( r12 ), the less
precise will estimates βˆ1 & βˆ2 be, i.e., the CIs
for the parameters β1 & β 2 will be wider.
Ceteris paribus, the higher the degree of
variation of the Xjs, the more precise will the
estimates be (narrow CIs for parameters).
The above two points are contained in:
2
σ
var(βˆ j ) = 2
(1-rj.123...)∑ x2ji
where rj2.123... is the R2 from an auxiliary
regression of Xj on all other (K–1) X's & a
41 constant.
3.6 Statistical Inferences in Multiple Linear Regression
K n
URSS = (1 − R )∑ y = ∑ y − ∑{βˆ j ∑ x ji yi }.
2 2
i
2
i
j =1 i =1
RRSS = ∑ y . 2
i
( RRSS − URSS) / K R2 / K
⇒ = ~ FK ,n− K −1
URSS /(n − K − 1) (1 − R ) /(n − K − 1)
2
44
3.6 Statistical Inferences in Multiple Linear Regression
45
3.6 Statistical Inferences in Multiple Linear Regression
βˆ −0 − 0.25
≈ −0.37 t = t ≈
a) t c = 1 2
= 4.30
seˆ( β1 )
ˆ 0.46875 tab 0 .025
b) c
t =
βˆ 2 − 0
=
5.5
≈ 6. 35
t cal > t tab
seˆ( βˆ 2 ) 0.75
⇒ reject the null.
βˆ0 − 0 − 23.75
c) t c = = ≈ −4.29
seˆ( βˆ0 ) 30.61875
t cal ≤ t tab , ⇒ we do not reject the null! ! !
47
3.6 Statistical Inferences in Multiple Linear Regression
R2 / K 0.9945 / 2
d) Fc = = ≈ 180.82
(1 − R ) /(n − K − 1) 0.0055 / 2
2