0% found this document useful (0 votes)
22 views40 pages

Lecture 6

Download as pdf or txt
Download as pdf or txt
Download as pdf or txt
You are on page 1/ 40

Section 5

Linear Regression

Sourabh B Paul (IIT Delhi) Econometric Methods II Semester 2023-24 106


Regresion and Causal Relationship

Determine whether a change in one variable (w ) causes a change in


another variable (y ), holding all other relevant factors fixed (c).
Focus on average or expected response E (y | w , c) [structural
conditional expectation]
We would like to explicitly hold the set of control variables (c) fixed
when studying the effect of w (Why?)
If w is a continuous variables, we want to estimate the partial effect,

ˆE (y | w , c)
ˆw
If w is discrete, we are interested in E (y | w , c) at different values of
w , with c fixed at the same level.

Sourabh B Paul (IIT Delhi) Econometric Methods II Semester 2023-24 107


What are the problems?
It is fairly straightforward to estimate the partial effect, if we can collect
data on y , w and c in a random sample from the underlying population
We start with an assumption of a parametric form of the structural
E (y | w , c)
Usually the structural equation and estimable equation are not the
same
There are the challenges:
I Can we agree on the set of controls c?
I Can we observe all elements of c?
I Can we really hold them fixed like in a lab experiment?
I Can we measure y , w and c error free?
I What if we only observe equilibrium values of y and w (simultaneously
determined)?

We shall explore different ways of estimating conditional expectations and


testing hypothesis related to that under various contexts.
Sourabh B Paul (IIT Delhi) Econometric Methods II Semester 2023-24 108
Underlying assumptions

How to recover the original structural conditional expectation?


We require a few additional assumptions, generally called identification
assumptions, apart from the functional form assumption of E (y | w , c)
The details depend on the context. Throughout this course we shall
explore these situations
Let y be the explained (or outcome) variable and x = (x1 , x2 , . . . , xk ) be
the set of explanatory (or features) variables. They have a joint distribution.
If E (|y |) < Œ), then there is a function, say µ : Rk æ R, such that

E (y | x) = µ(x)

Sourabh B Paul (IIT Delhi) Econometric Methods II Semester 2023-24 109


Error form model of conditional expectation

We can decompose a random variable y into two parts: a part


explained by observable variables x and an error u.

y = E (y | x) + u,
E (u | x) = 0.

The above decomposition is always true by definition.


Important implications: E (u | x) = 0 ∆ i) E (u) = 0 and ii) u is
uncorrelated with any function of x1 , x2 , ...., xK .

Sourabh B Paul (IIT Delhi) Econometric Methods II Semester 2023-24 110


Parametric model

Note that E (y | x) = µ(x) is a random variable (why?)


There are many possible candidates of µ(x):
I We specify a model for the conditional expectation that depends on a
finite set of parameters
I Examples:

E (y | x1 , x2 ) = —0 + —1 x1 + —2 x2
E (y | x1 , x2 ) = —0 + —1 x1 + —2 x22
E (y | x1 , x2 ) = e (—0 +—1 ln(x1 )+—2 x2 ) , y Ø 0, x1 > 0

I We restrict to models which are linear in parameters (Ex 1 and 2


above)

Sourabh B Paul (IIT Delhi) Econometric Methods II Semester 2023-24 111


Simple Regression Model

Let’s start with a simple two-variables functional form.


Specify E (y | x) = —0 + —1 x1 . Then

y= E (y | x) + u
= —0 + —1 x1 + u

If we hope to estimate the partial effect of x1 on E (y | x) using the


above parametric form, we want E (u | x) = 0
In other words, if our linear specification holds corrects, then we must
have
E (u | x) = 0
The question is whether we can be convinced that E (u | x) = 0

Sourabh B Paul (IIT Delhi) Econometric Methods II Semester 2023-24 112


Simple Regression Model

We typically refer to y as the Dependent Variable, or Left-Hand Side


Variable, or Explained Variable, or Regressand
Refer to x as the Independent Variable, or Right-Hand Side Variable,
or Explanatory Variable, or Regressor, or Covariate
A simpler assumption is E (u) = 0
This is not a restrictive assumption, since we can always use —0 to
normalize E (u) to 0

Sourabh B Paul (IIT Delhi) Econometric Methods II Semester 2023-24 113


Zero Conditional Mean

We need to make a crucial assumption about how u and x are related


We want it to be the case that knowing something about x does not
give us any information about u, so that they are completely unrelated.
That is, that
I E (u|x ) = E (u) = 0, which implies
I E (y |x ) = —0 + —1 x
E (y |x ) as a linear function of x , where for any x the distribution of y
is centred about E (y |x )

Sourabh B Paul (IIT Delhi) Econometric Methods II Semester 2023-24 114


E (y |x ) as a linear function of x

f(y)

E(y|x) = b0 + b1x

x1 x2

Figure 8: Conditional mean

Sourabh B Paul (IIT Delhi) Econometric Methods II Semester 2023-24 115


Ordinary Least Squares

Basic idea of regression is to estimate the population parameters from


a sample
Let {(xi , yi ) : i = 1, ., n} denote a random sample of size n from the
population
For each observation in this sample, it will be the case that

yi = —0 + —1 xi + ui

Sourabh B Paul (IIT Delhi) Econometric Methods II Semester 2023-24 116


Population regression line

y4 .
u4 {

y3 .} u3
y2 .
u2 {

y1 .} u1
x1 x2 x3 x4 x

Figure 9: Regression line

Sourabh B Paul (IIT Delhi) Econometric Methods II Semester 2023-24 117


Deriving OLS Estimates

To derive the OLS estimates we need to realize that our main


assumption of E (u|x ) = E (u) = 0 also implies that

Cov (x , u) = E (xu) = 0

We can write our 2 restrictions just in terms of x , y , —0 and —1 , since


u = y ≠ —0 ≠ —1 x
E (y ≠ —0 ≠ —1 x ) = 0
E [x (y ≠ —0 ≠ —1 x )] = 0
These are called moment restrictions

Sourabh B Paul (IIT Delhi) Econometric Methods II Semester 2023-24 118


Deriving OLS using M.O.M. I

The method of moments approach to estimation implies imposing the


population moment restrictions on the sample moments
What does this mean? Recall that for E (X ), the mean of a population
distribution, a sample estimator of E (X ) is simply the arithmetic mean
of the sample
We want to choose values of the parameters that will ensure that the
sample versions of our moment restrictions are true
Therefore, the estimators —ˆ0 and —ˆ1 must satisfy
n 1
ÿ 2
n≠1 yi ≠ —ˆ0 ≠ —ˆ1 xi = 0
i=1

n
ÿ 1 2
n ≠1
xi yi ≠ —ˆ0 ≠ —ˆ1 xi = 0
i=1

Sourabh B Paul (IIT Delhi) Econometric Methods II Semester 2023-24 119


Deriving OLS using M.O.M. II

Given the definition of a sample mean, and properties of summation,


we can rewrite the first condition as follows

ȳ = —ˆ0 + —ˆ1 x̄

or
—ˆ0 = ȳ ≠ —ˆ1 x̄
qn 1 21 2
i=1 xi ≠ x̄ yi ≠ ȳ
—ˆ1 = 1 22
qn
i=1 xi ≠ x̄

provided that the denominator is strictly greater than zero.

Sourabh B Paul (IIT Delhi) Econometric Methods II Semester 2023-24 120


Summary of OLS slope estimate

The slope estimate is the sample covariance between x and y divided


by the sample variance of x
If x and y are positively correlated, the slope will be positive
If x and y are negatively correlated, the slope will be negative
Only need x to vary in our sample

Sourabh B Paul (IIT Delhi) Econometric Methods II Semester 2023-24 121


More OLS

Intuitively, OLS is fitting a line through the sample points such that
the sum of squared residuals is as small as possible, hence the term
least squares
The residual, û, is an estimate of the error term, u, and is the
difference between the fitted line (sample regression function) and the
sample point

Sourabh B Paul (IIT Delhi) Econometric Methods II Semester 2023-24 122


Population regression line

y
y4 .
û4 {
yˆ ˆ ˆx
0 1

y3 .} û3
y2 û {.
2

y1 .} û1
x1 x2 x3 x4 x

Figure 10: Regression line

Sourabh B Paul (IIT Delhi) Econometric Methods II Semester 2023-24 123


Alternate approach to derivation

Given the intuitive idea of fitting a line, we can set up a formal


minimization problem
That is, we want to choose our parameters such that we minimize the
following:
n
ÿ n 1
ÿ 22
(ûi )2 = yi ≠ —ˆ0 ≠ —ˆ1 xi
i=1 i=1

This is called Least Square approach

Sourabh B Paul (IIT Delhi) Econometric Methods II Semester 2023-24 124


Alternate approach, continued

If one uses calculus to solve the minimization problem for the two
parameters you obtain the following first order conditions, which are
the same as we obtained before, multiplied by n
n 1
ÿ 2
yi ≠ —ˆ0 ≠ —ˆ1 xi = 0
i=1

n
ÿ 1 2
xi yi ≠ —ˆ0 ≠ —ˆ1 xi = 0
i=1

Sourabh B Paul (IIT Delhi) Econometric Methods II Semester 2023-24 125


Algebraic Properties of OLS

The sum of the OLS residuals is zero


Thus, the sample average of the OLS residuals is zero as well
The sample covariance between the regressors and the OLS residuals is
zero
The OLS regression line always goes through the mean of the sample

Sourabh B Paul (IIT Delhi) Econometric Methods II Semester 2023-24 126


Algebraic Properties of OLS

qn q
ûi = 0 and thus, n≠1 ni=1 ûi = 0
qi=1
n
i=1 xi ûi
=0
ˆ ˆ
ȳ = —0 + —1 x̄

Sourabh B Paul (IIT Delhi) Econometric Methods II Semester 2023-24 127


More terminology

We can think of observation as being made up of an explained part,


and an unexplained part,

yi = ŷi + ûi

When we define the following, 1


q
Total Sum of Squares (SST): yi ≠ ȳ )2
q1
Explained Sum of Squares (SSE): ŷi ≠ ȳ )2
q
Residual Sum of Squares (SSR): ûi2
Then SST = SSE + SSR

Sourabh B Paul (IIT Delhi) Econometric Methods II Semester 2023-24 128


Goodness-of-Fit

How do we think about how well our sample regression line fits our
sample data?
Can compute the fraction of the total sum of squares (SST) that is
explained by the model, call this the R-squared of regression
R 2 = SSE /SST = 1 ≠ SSR/SST

Sourabh B Paul (IIT Delhi) Econometric Methods II Semester 2023-24 129


Goodness-of-Fit (continued)

We can think of R 2 as being equal to squared correlation coefficient


between actual yi and the predicted values ŷi .
1q 22
(yi ≠ ȳ )(ŷi ≠ ŷ¯ )
R2 = 1 q 21 q 2
(yi ≠ ȳ )2 (ŷi ≠ ŷ¯ )2

Sourabh B Paul (IIT Delhi) Econometric Methods II Semester 2023-24 130


More about R-squared

R 2 can never decrease when another independent variable is added to a


regression, and usually will increase
Because R 2 will usually increase with the number of independent
variables, it is not a good way to compare models

Sourabh B Paul (IIT Delhi) Econometric Methods II Semester 2023-24 131


R example: CEO salary and returns on equity I
require(foreign)
# ceosal1<-read.dta( http://fmwww.bc.edu/ec-p/data/wooldridge/ceosal1.dta )
ceosal1 <- read.dta(file.path(datafolder, "statafiles/CEOSAL1.dta"))
attach(ceosal1)

# ingredients to the OLS formulas


cov(roe, salary)

## [1] 1342.538
var(roe)

## [1] 72.56499
mean(salary)

## [1] 1281.12
mean(roe)

## [1] 17.18421

Sourabh B Paul (IIT Delhi) Econometric Methods II Semester 2023-24 132


R example: CEO salary and returns on equity II

# manual calculation of OLS coefficients


(b1hat <- cov(roe, salary)/var(roe))

## [1] 18.50119
(b0hat <- mean(salary) - b1hat * mean(roe))

## [1] 963.1913
# detach the data frame
detach(ceosal1)
# A more convenient way ... OLS regression
lm(salary ~ roe, data = ceosal1)

Sourabh B Paul (IIT Delhi) Econometric Methods II Semester 2023-24 133


R example: CEO salary and returns on equity III

##
## Call:
## lm(formula = salary ~ roe, data = ceosal1)
##
## Coefficients:
## (Intercept) roe
## 963.2 18.5

Sourabh B Paul (IIT Delhi) Econometric Methods II Semester 2023-24 134


R example - storing result I

# OLS regression
CEOregres <- lm(salary ~ roe, data = ceosal1)
# Scatter plot (restrict y axis limits)

Sourabh B Paul (IIT Delhi) Econometric Methods II Semester 2023-24 135


Plot
4000
3000
ceosal1$salary

2000
1000
0

0 10 20 30 40 50

ceosal1$roe

Sourabh B Paul (IIT Delhi) Econometric Methods II Semester 2023-24 136


Relation between wage and education

##
## Call:
## lm(formula = wage ~ educ, data = wage1)
##
## Coefficients:
## (Intercept) educ
## -0.9049 0.5414

Sourabh B Paul (IIT Delhi) Econometric Methods II Semester 2023-24 137


Relation between voting outcome and campaign
expenditure

##
## Call:
## lm(formula = voteA ~ shareA, data = vote1)
##
## Coefficients:
## (Intercept) shareA
## 26.8122 0.4638

Sourabh B Paul (IIT Delhi) Econometric Methods II Semester 2023-24 138


Plot I
80
70
60
vote1$voteA

50
40
30
20

0 20 40 60 80 100

vote1$shareA

Sourabh B Paul (IIT Delhi) Econometric Methods II Semester 2023-24 139


Coefficients, fitted values, and residuals I

attach(ceosal1)

# Get the list in the object where you saved the


# regression model
names(CEOregres)

## [1] "coefficients" "residuals" "effects" "rank"


## [5] "fitted.values" "assign" "qr" "df.residual"
## [9] "xlevels" "call" "terms" "model"
CEOregres$coefficients

## (Intercept) roe
## 963.19134 18.50119
# Generic function to access some values
nobs(CEOregres)

## [1] 209

Sourabh B Paul (IIT Delhi) Econometric Methods II Semester 2023-24 140


Coefficients, fitted values, and residuals II

coef(CEOregres)

## (Intercept) roe
## 963.19134 18.50119
# Fitted values
bhat <- coef(CEOregres)
yhat <- bhat["(Intercept)"] + bhat["roe"] * ceosal1$roe
uhat <- ceosal1$salary - yhat

Sourabh B Paul (IIT Delhi) Econometric Methods II Semester 2023-24 141


A more generic way .. I

attach(ceosal1)
# extract variables as vectors:
sal <- ceosal1$salary
roe <- ceosal1$roe

# regression with vectors:


CEOregres <- lm(sal ~ roe)

# obtain predicted values and residuals


sal.hat <- fitted(CEOregres)
u.hat <- resid(CEOregres)

# Wooldridge, Table 2.2:


cbind(roe, sal, sal.hat, u.hat)[1:15, ]

Sourabh B Paul (IIT Delhi) Econometric Methods II Semester 2023-24 142


A more generic way .. II

## roe sal sal.hat u.hat


## 1 14.1 1095 1224.058 -129.058071
## 2 10.9 1001 1164.854 -163.854261
## 3 23.5 1122 1397.969 -275.969216
## 4 5.9 578 1072.348 -494.348338
## 5 13.8 1368 1218.508 149.492288
## 6 20.0 1145 1333.215 -188.215063
## 7 16.4 1078 1266.611 -188.610785
## 8 16.3 1094 1264.761 -170.760660
## 9 10.5 1237 1157.454 79.546207
## 10 26.3 833 1449.773 -616.772523
## 11 25.9 567 1442.372 -875.372056
## 12 26.8 933 1459.023 -526.023116
## 13 14.8 1339 1237.009 101.991102
## 14 22.3 937 1375.768 -438.767778
## 15 56.3 2011 2004.808 6.191886

Sourabh B Paul (IIT Delhi) Econometric Methods II Semester 2023-24 143


Algebraic Properties of OLS
wage1 <- read.dta(file.path(datafolder, "statafiles/WAGE1.dta"))
attach(wage1)

WAGEregres <- lm(wage ~ educ, data = wage1)

# obtain coefficients, predicted values and


# residuals
b.hat <- coef(WAGEregres)
wage.hat <- fitted(WAGEregres)
u.hat <- resid(WAGEregres)

# Confirm property (1):


mean(u.hat)

## [1] -2.243911e-16
# Confirm property (2):
cor(wage1$educ, u.hat)

## [1] 5.472256e-16
# Confirm property (3):
mean(wage1$wage)
Sourabh B Paul (IIT Delhi) Econometric Methods II Semester 2023-24 144
R-squared
attach(ceosal1)
CEOregres <- lm(salary ~ roe, data = ceosal1)

# Calculate predicted values & residuals:


sal.hat <- fitted(CEOregres)
u.hat <- resid(CEOregres)

# Calculate Rˆ2 in three different ways:


sal <- ceosal1$salary
var(sal.hat)/var(sal)

## [1] 0.01318862
1 - var(u.hat)/var(sal)

## [1] 0.01318862
cor(sal, sal.hat)ˆ2

## [1] 0.01318862
# Summary of the regression results
summary(CEOregres)
Sourabh B Paul (IIT Delhi) Econometric Methods II Semester 2023-24 145

You might also like