Lecture 6

Section 5
Linear Regression
Sourabh B Paul (IIT Delhi) Econometric Methods II Semester 2023-24 106

Regresion and Causal Relationship
Determine whether a change in one variable (w ) causes a change in

another variable (y ), holding all other relevant factors fixed (c).
Focus on average or expected response E (y | w , c) [structural
conditional expectation]
We would like to explicitly hold the set of control variables (c) fixed
when studying the effect of w (Why?)
If w is a continuous variables, we want to estimate the partial effect,
ˆE (y | w , c)
ˆw
If w is discrete, we are interested in E (y | w , c) at different values of
w , with c fixed at the same level.

What are the problems?
It is fairly straightforward to estimate the partial effect, if we can collect
data on y , w and c in a random sample from the underlying population
We start with an assumption of a parametric form of the structural
E (y | w , c)
Usually the structural equation and estimable equation are not the
same
There are the challenges:
I Can we agree on the set of controls c?
I Can we observe all elements of c?
I Can we really hold them fixed like in a lab experiment?
I Can we measure y , w and c error free?
I What if we only observe equilibrium values of y and w (simultaneously
determined)?
We shall explore different ways of estimating conditional expectations and

testing hypothesis related to that under various contexts.
Underlying assumptions
How to recover the original structural conditional expectation?

We require a few additional assumptions, generally called identification
assumptions, apart from the functional form assumption of E (y | w , c)
The details depend on the context. Throughout this course we shall
explore these situations
Let y be the explained (or outcome) variable and x = (x1 , x2 , . . . , xk ) be
the set of explanatory (or features) variables. They have a joint distribution.
If E (|y |) < Œ), then there is a function, say µ : Rk æ R, such that
E (y | x) = µ(x)

Error form model of conditional expectation
We can decompose a random variable y into two parts: a part

explained by observable variables x and an error u.
y = E (y | x) + u,
E (u | x) = 0.
The above decomposition is always true by definition.

Important implications: E (u | x) = 0 ∆ i) E (u) = 0 and ii) u is
uncorrelated with any function of x1 , x2 , ...., xK .

Parametric model
Note that E (y | x) = µ(x) is a random variable (why?)

There are many possible candidates of µ(x):
I We specify a model for the conditional expectation that depends on a
finite set of parameters
I Examples:
E (y | x1 , x2 ) = —0 + —1 x1 + —2 x2
E (y | x1 , x2 ) = —0 + —1 x1 + —2 x22
E (y | x1 , x2 ) = e (—0 +—1 ln(x1 )+—2 x2 ) , y Ø 0, x1 > 0
I We restrict to models which are linear in parameters (Ex 1 and 2

above)

Simple Regression Model
Let’s start with a simple two-variables functional form.

Specify E (y | x) = —0 + —1 x1 . Then
y= E (y | x) + u
= —0 + —1 x1 + u
If we hope to estimate the partial effect of x1 on E (y | x) using the

above parametric form, we want E (u | x) = 0
In other words, if our linear specification holds corrects, then we must
have
E (u | x) = 0
The question is whether we can be convinced that E (u | x) = 0

Simple Regression Model
We typically refer to y as the Dependent Variable, or Left-Hand Side

Variable, or Explained Variable, or Regressand
Refer to x as the Independent Variable, or Right-Hand Side Variable,
or Explanatory Variable, or Regressor, or Covariate
A simpler assumption is E (u) = 0
This is not a restrictive assumption, since we can always use —0 to
normalize E (u) to 0

Zero Conditional Mean
We need to make a crucial assumption about how u and x are related

We want it to be the case that knowing something about x does not
give us any information about u, so that they are completely unrelated.
That is, that
I E (u|x ) = E (u) = 0, which implies
I E (y |x ) = —0 + —1 x
E (y |x ) as a linear function of x , where for any x the distribution of y
is centred about E (y |x )

E (y |x ) as a linear function of x
f(y)
E(y|x) = b0 + b1x
x1 x2
Figure 8: Conditional mean

Ordinary Least Squares
Basic idea of regression is to estimate the population parameters from

a sample
Let {(xi , yi ) : i = 1, ., n} denote a random sample of size n from the
population
For each observation in this sample, it will be the case that
yi = —0 + —1 xi + ui

Population regression line
y4 .
u4 {
y3 .} u3
y2 .
u2 {
y1 .} u1
x1 x2 x3 x4 x
Figure 9: Regression line

Deriving OLS Estimates
To derive the OLS estimates we need to realize that our main

assumption of E (u|x ) = E (u) = 0 also implies that
Cov (x , u) = E (xu) = 0
We can write our 2 restrictions just in terms of x , y , —0 and —1 , since

u = y ≠ —0 ≠ —1 x
E (y ≠ —0 ≠ —1 x ) = 0
E [x (y ≠ —0 ≠ —1 x )] = 0
These are called moment restrictions

Deriving OLS using M.O.M. I
The method of moments approach to estimation implies imposing the

population moment restrictions on the sample moments
What does this mean? Recall that for E (X ), the mean of a population
distribution, a sample estimator of E (X ) is simply the arithmetic mean
of the sample
We want to choose values of the parameters that will ensure that the
sample versions of our moment restrictions are true
Therefore, the estimators —ˆ0 and —ˆ1 must satisfy
n 1
ÿ 2
n≠1 yi ≠ —ˆ0 ≠ —ˆ1 xi = 0
i=1
n
ÿ 1 2
n ≠1
xi yi ≠ —ˆ0 ≠ —ˆ1 xi = 0
i=1

Deriving OLS using M.O.M. II
Given the definition of a sample mean, and properties of summation,

we can rewrite the first condition as follows
ȳ = —ˆ0 + —ˆ1 x̄
or
—ˆ0 = ȳ ≠ —ˆ1 x̄
qn 1 21 2
i=1 xi ≠ x̄ yi ≠ ȳ
—ˆ1 = 1 22
qn
i=1 xi ≠ x̄
provided that the denominator is strictly greater than zero.

Summary of OLS slope estimate
The slope estimate is the sample covariance between x and y divided

by the sample variance of x
If x and y are positively correlated, the slope will be positive
If x and y are negatively correlated, the slope will be negative
Only need x to vary in our sample

More OLS
Intuitively, OLS is fitting a line through the sample points such that
the sum of squared residuals is as small as possible, hence the term
least squares
The residual, û, is an estimate of the error term, u, and is the
difference between the fitted line (sample regression function) and the
sample point

Population regression line
y
y4 .
û4 {
yˆ ˆ ˆx
0 1
y3 .} û3
y2 û {.
2
y1 .} û1
x1 x2 x3 x4 x
Figure 10: Regression line

Alternate approach to derivation
Given the intuitive idea of fitting a line, we can set up a formal

minimization problem
That is, we want to choose our parameters such that we minimize the
following:
n
ÿ n 1
ÿ 22
(ûi )2 = yi ≠ —ˆ0 ≠ —ˆ1 xi
i=1 i=1
This is called Least Square approach

Alternate approach, continued
If one uses calculus to solve the minimization problem for the two
parameters you obtain the following first order conditions, which are
the same as we obtained before, multiplied by n
n 1
ÿ 2
yi ≠ —ˆ0 ≠ —ˆ1 xi = 0
i=1
n
ÿ 1 2
xi yi ≠ —ˆ0 ≠ —ˆ1 xi = 0
i=1

Algebraic Properties of OLS
The sum of the OLS residuals is zero

Thus, the sample average of the OLS residuals is zero as well
The sample covariance between the regressors and the OLS residuals is
zero
The OLS regression line always goes through the mean of the sample

qn q
ûi = 0 and thus, n≠1 ni=1 ûi = 0
qi=1
n
i=1 xi ûi
=0
ˆ ˆ
ȳ = —0 + —1 x̄

More terminology
We can think of observation as being made up of an explained part,

and an unexplained part,
yi = ŷi + ûi
When we define the following, 1

q
Total Sum of Squares (SST): yi ≠ ȳ )2
q1
Explained Sum of Squares (SSE): ŷi ≠ ȳ )2
q
Residual Sum of Squares (SSR): ûi2
Then SST = SSE + SSR

Goodness-of-Fit
How do we think about how well our sample regression line fits our
sample data?
Can compute the fraction of the total sum of squares (SST) that is
explained by the model, call this the R-squared of regression
R 2 = SSE /SST = 1 ≠ SSR/SST

Goodness-of-Fit (continued)
We can think of R 2 as being equal to squared correlation coefficient

between actual yi and the predicted values ŷi .
1q 22
(yi ≠ ȳ )(ŷi ≠ ŷ¯ )
R2 = 1 q 21 q 2
(yi ≠ ȳ )2 (ŷi ≠ ŷ¯ )2

More about R-squared
R 2 can never decrease when another independent variable is added to a

regression, and usually will increase
Because R 2 will usually increase with the number of independent
variables, it is not a good way to compare models

R example: CEO salary and returns on equity I
require(foreign)
# ceosal1<-read.dta( http://fmwww.bc.edu/ec-p/data/wooldridge/ceosal1.dta )
ceosal1 <- read.dta(file.path(datafolder, "statafiles/CEOSAL1.dta"))
attach(ceosal1)
# ingredients to the OLS formulas

cov(roe, salary)
## [1] 1342.538
var(roe)
## [1] 72.56499
mean(salary)
## [1] 1281.12
mean(roe)
## [1] 17.18421

R example: CEO salary and returns on equity II
# manual calculation of OLS coefficients

(b1hat <- cov(roe, salary)/var(roe))
## [1] 18.50119
(b0hat <- mean(salary) - b1hat * mean(roe))
## [1] 963.1913
# detach the data frame
detach(ceosal1)
# A more convenient way ... OLS regression
lm(salary ~ roe, data = ceosal1)

R example: CEO salary and returns on equity III
##
## Call:
## lm(formula = salary ~ roe, data = ceosal1)
##
## Coefficients:
## (Intercept) roe
## 963.2 18.5

R example - storing result I
# OLS regression
CEOregres <- lm(salary ~ roe, data = ceosal1)
# Scatter plot (restrict y axis limits)

Plot
4000
3000
ceosal1$salary
2000
1000
0
0 10 20 30 40 50
ceosal1$roe

Relation between wage and education
##
## Call:
## lm(formula = wage ~ educ, data = wage1)
##
## Coefficients:
## (Intercept) educ
## -0.9049 0.5414

Relation between voting outcome and campaign
expenditure
##
## Call:
## lm(formula = voteA ~ shareA, data = vote1)
##
## Coefficients:
## (Intercept) shareA
## 26.8122 0.4638

Plot I
80
70
60
vote1$voteA
50
40
30
20
0 20 40 60 80 100
vote1$shareA

Coefficients, fitted values, and residuals I
attach(ceosal1)
# Get the list in the object where you saved the

# regression model
names(CEOregres)
## [1] "coefficients" "residuals" "effects" "rank"

## [5] "fitted.values" "assign" "qr" "df.residual"
## [9] "xlevels" "call" "terms" "model"
CEOregres$coefficients
## (Intercept) roe
## 963.19134 18.50119
# Generic function to access some values
nobs(CEOregres)
## [1] 209

Coefficients, fitted values, and residuals II
coef(CEOregres)
## (Intercept) roe
## 963.19134 18.50119
# Fitted values
bhat <- coef(CEOregres)
yhat <- bhat["(Intercept)"] + bhat["roe"] * ceosal1$roe
uhat <- ceosal1$salary - yhat

A more generic way .. I
attach(ceosal1)
# extract variables as vectors:
sal <- ceosal1$salary
roe <- ceosal1$roe
# regression with vectors:

CEOregres <- lm(sal ~ roe)
# obtain predicted values and residuals

sal.hat <- fitted(CEOregres)
u.hat <- resid(CEOregres)
# Wooldridge, Table 2.2:

cbind(roe, sal, sal.hat, u.hat)[1:15, ]

A more generic way .. II
## roe sal sal.hat u.hat

## 1 14.1 1095 1224.058 -129.058071
## 2 10.9 1001 1164.854 -163.854261
## 3 23.5 1122 1397.969 -275.969216
## 4 5.9 578 1072.348 -494.348338
## 5 13.8 1368 1218.508 149.492288
## 6 20.0 1145 1333.215 -188.215063
## 7 16.4 1078 1266.611 -188.610785
## 8 16.3 1094 1264.761 -170.760660
## 9 10.5 1237 1157.454 79.546207
## 10 26.3 833 1449.773 -616.772523
## 11 25.9 567 1442.372 -875.372056
## 12 26.8 933 1459.023 -526.023116
## 13 14.8 1339 1237.009 101.991102
## 14 22.3 937 1375.768 -438.767778
## 15 56.3 2011 2004.808 6.191886

wage1 <- read.dta(file.path(datafolder, "statafiles/WAGE1.dta"))
attach(wage1)
WAGEregres <- lm(wage ~ educ, data = wage1)
# obtain coefficients, predicted values and

# residuals
b.hat <- coef(WAGEregres)
wage.hat <- fitted(WAGEregres)
u.hat <- resid(WAGEregres)
# Confirm property (1):

mean(u.hat)
## [1] -2.243911e-16
cor(wage1$educ, u.hat)
## [1] 5.472256e-16
mean(wage1$wage)
R-squared
attach(ceosal1)
CEOregres <- lm(salary ~ roe, data = ceosal1)
# Calculate predicted values & residuals:

sal.hat <- fitted(CEOregres)
u.hat <- resid(CEOregres)
# Calculate Rˆ2 in three different ways:

sal <- ceosal1$salary
var(sal.hat)/var(sal)
## [1] 0.01318862
1 - var(u.hat)/var(sal)
## [1] 0.01318862
cor(sal, sal.hat)ˆ2
## [1] 0.01318862
# Summary of the regression results
summary(CEOregres)

Lecture 6

Uploaded by

Document Informationclick to expand document information

Copyright:

Available Formats

Lecture 6

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Lecture 6

Uploaded by

Copyright:

Available Formats

Section 5

Sourabh B Paul (IIT Delhi) Econometric Methods II Semester 2023-24 106

Determine whether a change in one variable (w ) causes a change in

Sourabh B Paul (IIT Delhi) Econometric Methods II Semester 2023-24 107

We shall explore different ways of estimating conditional expectations and

How to recover the original structural conditional expectation?

Sourabh B Paul (IIT Delhi) Econometric Methods II Semester 2023-24 109

We can decompose a random variable y into two parts: a part

The above decomposition is always true by definition.

Sourabh B Paul (IIT Delhi) Econometric Methods II Semester 2023-24 110

Note that E (y | x) = µ(x) is a random variable (why?)

I We restrict to models which are linear in parameters (Ex 1 and 2

Sourabh B Paul (IIT Delhi) Econometric Methods II Semester 2023-24 111

Let’s start with a simple two-variables functional form.

If we hope to estimate the partial effect of x1 on E (y | x) using the

Sourabh B Paul (IIT Delhi) Econometric Methods II Semester 2023-24 112

We typically refer to y as the Dependent Variable, or Left-Hand Side

Sourabh B Paul (IIT Delhi) Econometric Methods II Semester 2023-24 113

We need to make a crucial assumption about how u and x are related

Sourabh B Paul (IIT Delhi) Econometric Methods II Semester 2023-24 114

Figure 8: Conditional mean

Sourabh B Paul (IIT Delhi) Econometric Methods II Semester 2023-24 115

Basic idea of regression is to estimate the population parameters from

Sourabh B Paul (IIT Delhi) Econometric Methods II Semester 2023-24 116

Figure 9: Regression line

Sourabh B Paul (IIT Delhi) Econometric Methods II Semester 2023-24 117

To derive the OLS estimates we need to realize that our main

We can write our 2 restrictions just in terms of x , y , —0 and —1 , since

Sourabh B Paul (IIT Delhi) Econometric Methods II Semester 2023-24 118

The method of moments approach to estimation implies imposing the

Sourabh B Paul (IIT Delhi) Econometric Methods II Semester 2023-24 119

Given the definition of a sample mean, and properties of summation,

provided that the denominator is strictly greater than zero.

Sourabh B Paul (IIT Delhi) Econometric Methods II Semester 2023-24 120

The slope estimate is the sample covariance between x and y divided

Sourabh B Paul (IIT Delhi) Econometric Methods II Semester 2023-24 121

Sourabh B Paul (IIT Delhi) Econometric Methods II Semester 2023-24 122

Figure 10: Regression line

Sourabh B Paul (IIT Delhi) Econometric Methods II Semester 2023-24 123

Given the intuitive idea of fitting a line, we can set up a formal

This is called Least Square approach

Sourabh B Paul (IIT Delhi) Econometric Methods II Semester 2023-24 124

Sourabh B Paul (IIT Delhi) Econometric Methods II Semester 2023-24 125

The sum of the OLS residuals is zero

Sourabh B Paul (IIT Delhi) Econometric Methods II Semester 2023-24 126

Sourabh B Paul (IIT Delhi) Econometric Methods II Semester 2023-24 127

We can think of observation as being made up of an explained part,

When we define the following, 1

Sourabh B Paul (IIT Delhi) Econometric Methods II Semester 2023-24 128

Sourabh B Paul (IIT Delhi) Econometric Methods II Semester 2023-24 129

We can think of R 2 as being equal to squared correlation coefficient

Sourabh B Paul (IIT Delhi) Econometric Methods II Semester 2023-24 130

R 2 can never decrease when another independent variable is added to a

Sourabh B Paul (IIT Delhi) Econometric Methods II Semester 2023-24 131

# ingredients to the OLS formulas

Sourabh B Paul (IIT Delhi) Econometric Methods II Semester 2023-24 132

# manual calculation of OLS coefficients