Block 1
Block 1
y = Xβ + ε
LRM assumptions (for OLS estimation):
(Notation follows Greene, Econometric analysis, 7th ed.)
y = Xβ + ε
LRM assumptions (continued):
A4 Homoscedastic & nonautocorrelated disturbances:
E[εε0 ] = σ 2 In
Homoscedasticity: var[εi |X] = σ 2 , ∀ i = 1, . . . , n.
Independent disturbances: cov[εt , εs |X] = 0, ∀ t 6= s.
GARCH models [i.e. ARCH(1): var[εt |εt−1 ] = σ 2 + αεt−1 ]
do not violate the conditional variance assumption
var[εi |X] = σ 2 . However, var[εt |εt−1 ] 6= var[εt ], with
conditioning on X omitted from notation but left as
implicit.
A5 DGP of X: Variables in X may be fixed or random.
A6 Normal distribution of disturbances:
ε|X ∼ N [0, σ 2 In ].
Ordinary least squares (OLS)
y = Xβ + ε
The least squares estimator is unbiased (given A1 – A3):
β̂ = b = (X 0 X)−1 X 0 y = β + (X 0 X)−1 X 0 ε,
take expectations :
E[b|X] = β + E[(X 0 X)−1 X 0 ε|X] = β, (zero by A3).
Variance of the least squares estimator (A1 – A4):
var[b|X] = var[(X 0 X)−1 X 0 ε|X]
because var(β) = 0. Using A3 & A4:
= A σ 2 In A0 where A = (X 0 X)−1 X 0
which is a matrix quadratic form for var(cZ) = c2 var(Z)
= σ 2 (X 0 X)−1
because (AB)0 = B 0 A0 ; dim. compatible matrices A, B.
Normal distribution of the least squares estimator (A1 – A6):
b|X ∼ N [β, σ 2 (X 0 X)−1 ].
General properties of estimators
Consistency: plim(θ̂) = θ.
As n → ∞, vector θ̂ is an unbiased estimator of θ and
plim(var(θ̂)) = 0 [i.e. var(θ̂) → 0 as n → ∞].
Consistent estimators: unbiased (or at least asymptotically
unbiased) & their variance shrinks to zero as sample size
grows (entire population is used).
Minimal requirement for estimator used in statistics or
econometrics.
If some estimator is not consistent, then it does not provide
relevant estimates of population θ values, even with
unlimited data, i.e. as n → ∞.
Unbiased estimators are not necessarily consistent.
Properties of estimators - classification:
Application example 1
Sample covariance is a consistent estimator of population
covariance.
Application example 2
OLS estimators we have used for parameters in the CLRM can
be derived by the method of moments.
Method of moments
...
n
1 X
xiK yi − β̂1 − β̂2 xi2 − · · · − β̂K xiK = 0
n
i=1
1
Removing n
elements from equations does not affect the solution.
This is a system of K equations with K unknown parameters βj .
The set of moment equations is equivalent to 1st order conditions for
the OLS estimator:
n
X 2
min yi − β̂1 − β̂2 xi2 − · · · − β̂K xiK
β̂
i=1
Generalized method of moments
...
n
1 X
ziL yi − β̂1 − β̂2 xi2 − · · · − β̂K xiK = 0
n
i=1
θ = (θ1 , . . . , θm )0
i=1 2πσ 2
In matrix form, the log likelihood function is:
n n 1
LL(β, σ 2 |y, X) = − log(2π) − log(σ 2 ) − 2 (y − Xβ)0 (y − Xβ)
2 2 2σ
Recall that:
(y − Xβ)0 (y − Xβ) = y 0 y − 2β 0 X 0 y + β 0 X 0 Xβ
and
∂(y−Xβ)0 (y−Xβ)
∂β 0
= −2X 0 y + 2X 0 Xβ.
Maximum likelihood estimator
MLE – Normal distribution (continued)
n n 1
LL(β, σ 2 |y, X) = − log(2π)− log(σ 2 )− 2 (y−Xβ)0 (y−Xβ)
2 2 2σ
∂LL
∂ σ2 = − 2σn2 + 1
2σ 4 [(y − Xβ)0 (y − Xβ)] = 0
is solved by:
(y−Xβ)0 (y−Xβ) u0 u SSR
σ̂ 2 = n = n = n .
MLE properties
Asymptotic normality of θ̂
h i−1
W = [h(θ̂) − q]0 Asy.Var[h(θ̂) − q] [h(θ̂) − q] ∼ χ2 (r),
H0
!0 !
∂ log L(θ̂R ) ∂ log L(θ̂R )
LM = I[θ̂R ]−1 ∼ χ2 (r),
∂ θ̂R ∂ θ̂R H0
W ≥ LR ≥ LM.
MLE – summary
Quantile regression
Non-linear regression models
yi = h(xi , β) + εi
non-autocorrelation:
yt = x0t β + ut , ut = ρut−1 + εt ,
yt = x0t β + ρut−1 + εt note: ut−1 = yt−1 − x0t−1 β,
hence:
yt = ρyt−1 + x0t β + ρ(x0t−1 β) + εt ,
which is non-linear in parameters (ρβ).
consi = β1 + β2 incβi 3 + εi
special case: model is linear for β3 = 1
(such assumption can be tested).
Nonlinear regression: examples
(1) wagei = β1 + ui
(2) wagei = β1 + β2 femalei + ui
(3) wagei = β1 + β2 femalei + β3 experi + ui
1.4
−2
1.2
−4
1.0
−6
0.8
−8
AGE ADEPCNT
0.00
−0.02
−0.015
−0.04
−0.025
−0.06
−0.08
−0.035
y = β1 + β2 x2 + β3 x3 + · · · + βK xK + u
ŷ = β̂1 + β̂2 x2 + β̂3 x3 + · · · + β̂K xK
ŷp = E(y|x1 = 1, x2 = c2 , . . . , xK = cK )
ŷp = β̂1 + β̂2 c2 + β̂3 c3 + · · · + β̂K cK
Reparametrized CLRM:
ŷp = βˆ1∗
s.e.(ŷp ) = s.e.(βˆ1∗ ), i.e.
var(ŷp ) = var(β̂1∗ )
Predictions - basics
Prediction error
because var(β1 + β2 c2 + β3 c3 + · · · + βK cK ) = 0
Predictions - basics
log(y) = β1 + β2 x2 + · · · + βK xK + u
\ = β̂1 + β̂2 x2 + · · · + β̂K xK
log(y)
\
ŷ = elog(y) systematically underestimates ŷ ,
\
b 0 elog(y)
we can use a correction: ŷ = α
Pn
b 0 = n−1
where α i=1 exp(ûi )
ŷp = x0p β̂
q
s.e.(êp ) = σ̂ · 1 + x0p (X 0X)−1 xp ,
which relates to the individual prediction error.
Reliability of predictions:
1 Xh i2
MSETe = yi − fˆ(xi )
m i∈Te
Variance vs. Bias trade-off
k
X
1
CV(k) = k MSEs ,
s=1