Lecture 6
Lecture 6
Lecture 6
Linear Regression
ˆE (y | w , c)
ˆw
If w is discrete, we are interested in E (y | w , c) at different values of
w , with c fixed at the same level.
E (y | x) = µ(x)
y = E (y | x) + u,
E (u | x) = 0.
E (y | x1 , x2 ) = —0 + —1 x1 + —2 x2
E (y | x1 , x2 ) = —0 + —1 x1 + —2 x22
E (y | x1 , x2 ) = e (—0 +—1 ln(x1 )+—2 x2 ) , y Ø 0, x1 > 0
y= E (y | x) + u
= —0 + —1 x1 + u
f(y)
E(y|x) = b0 + b1x
x1 x2
yi = —0 + —1 xi + ui
y4 .
u4 {
y3 .} u3
y2 .
u2 {
y1 .} u1
x1 x2 x3 x4 x
Cov (x , u) = E (xu) = 0
n
ÿ 1 2
n ≠1
xi yi ≠ —ˆ0 ≠ —ˆ1 xi = 0
i=1
ȳ = —ˆ0 + —ˆ1 x̄
or
—ˆ0 = ȳ ≠ —ˆ1 x̄
qn 1 21 2
i=1 xi ≠ x̄ yi ≠ ȳ
—ˆ1 = 1 22
qn
i=1 xi ≠ x̄
Intuitively, OLS is fitting a line through the sample points such that
the sum of squared residuals is as small as possible, hence the term
least squares
The residual, û, is an estimate of the error term, u, and is the
difference between the fitted line (sample regression function) and the
sample point
y
y4 .
û4 {
yˆ ˆ ˆx
0 1
y3 .} û3
y2 û {.
2
y1 .} û1
x1 x2 x3 x4 x
If one uses calculus to solve the minimization problem for the two
parameters you obtain the following first order conditions, which are
the same as we obtained before, multiplied by n
n 1
ÿ 2
yi ≠ —ˆ0 ≠ —ˆ1 xi = 0
i=1
n
ÿ 1 2
xi yi ≠ —ˆ0 ≠ —ˆ1 xi = 0
i=1
qn q
ûi = 0 and thus, n≠1 ni=1 ûi = 0
qi=1
n
i=1 xi ûi
=0
ˆ ˆ
ȳ = —0 + —1 x̄
yi = ŷi + ûi
How do we think about how well our sample regression line fits our
sample data?
Can compute the fraction of the total sum of squares (SST) that is
explained by the model, call this the R-squared of regression
R 2 = SSE /SST = 1 ≠ SSR/SST
## [1] 1342.538
var(roe)
## [1] 72.56499
mean(salary)
## [1] 1281.12
mean(roe)
## [1] 17.18421
## [1] 18.50119
(b0hat <- mean(salary) - b1hat * mean(roe))
## [1] 963.1913
# detach the data frame
detach(ceosal1)
# A more convenient way ... OLS regression
lm(salary ~ roe, data = ceosal1)
##
## Call:
## lm(formula = salary ~ roe, data = ceosal1)
##
## Coefficients:
## (Intercept) roe
## 963.2 18.5
# OLS regression
CEOregres <- lm(salary ~ roe, data = ceosal1)
# Scatter plot (restrict y axis limits)
2000
1000
0
0 10 20 30 40 50
ceosal1$roe
##
## Call:
## lm(formula = wage ~ educ, data = wage1)
##
## Coefficients:
## (Intercept) educ
## -0.9049 0.5414
##
## Call:
## lm(formula = voteA ~ shareA, data = vote1)
##
## Coefficients:
## (Intercept) shareA
## 26.8122 0.4638
50
40
30
20
0 20 40 60 80 100
vote1$shareA
attach(ceosal1)
## (Intercept) roe
## 963.19134 18.50119
# Generic function to access some values
nobs(CEOregres)
## [1] 209
coef(CEOregres)
## (Intercept) roe
## 963.19134 18.50119
# Fitted values
bhat <- coef(CEOregres)
yhat <- bhat["(Intercept)"] + bhat["roe"] * ceosal1$roe
uhat <- ceosal1$salary - yhat
attach(ceosal1)
# extract variables as vectors:
sal <- ceosal1$salary
roe <- ceosal1$roe
## [1] -2.243911e-16
# Confirm property (2):
cor(wage1$educ, u.hat)
## [1] 5.472256e-16
# Confirm property (3):
mean(wage1$wage)
Sourabh B Paul (IIT Delhi) Econometric Methods II Semester 2023-24 144
R-squared
attach(ceosal1)
CEOregres <- lm(salary ~ roe, data = ceosal1)
## [1] 0.01318862
1 - var(u.hat)/var(sal)
## [1] 0.01318862
cor(sal, sal.hat)ˆ2
## [1] 0.01318862
# Summary of the regression results
summary(CEOregres)
Sourabh B Paul (IIT Delhi) Econometric Methods II Semester 2023-24 145