Applied Econometrics 2024
Applied Econometrics 2024
Applied Econometrics 2024
University of Tübingen
1
Fact sheet Applied Econometrics 2024
Joachim Grammig (Lecturer) joachim.grammig@uni-tuebingen.de
Sylvia Bürger (Sek./Admin) sylvia.buerger@uni-tuebingen.de
ILIAS Password: AE24
● Lecture with practical part and selected problems presentation (from q4r)
embedded: Monday 8h - 10h and Thursday 10-12 H23 Kupferbau
● Tutorial videos on technical aspects, e.g. proofs, detailed derivations
● Practical sessions using R will be via Zoom (to make up for Thursday
public holidays)
● Questions for Review (q4r) updated each week for revision of lecture (in
pairs, group or solo)
● Practical parts using statistical software R: implement/code key concepts
● Course material will be made available on ILIAS
● ... a weekly updated time table
● ... slides (may be updated or extended)
● ... a forum for discussion
● ... tutorial videos
● ... R code from practical part
● ... pdfs of books (parts) and papers
2
Fact sheet Applied Econometrics 2024
● Recommended text books are in the Uni library with many copies
● Please check ILIAS, discussion forum, and your student email address
regularly
● Recommendation: form study groups or pairs and work regularly through
q4r and practical parts
● Work continuously, do not procrastinate
● Take intensive during lecture and while working through tutorial
videos
● Help each other out (in case a friend missed a lecture)
● You my bring excerpts of your handwritten lecture notes to the 90 min
exam (cheat sheets): five handwritten DIN-A4 (or Letter)-sized pages.
Can be written front and back, so in total 10 sides
● TIMMS Video from lecture during pandemic (summer 2020) remain
available, but should be used mainly for revision (lecture content/focus
changes).
● My regular office hours: Wendesdays 13-14h, pls contact Ms. Bürger
3
What is econometrics?
4
Econometrics-Econ-Nobels: Gen X and Y
5
Econometrics-Econ-Nobels: Gen Z and Alpha
6
Recommended texts
7
Revise and to dos
8
Table of Contents (may be modified)
2 Parameter Estimation
5 Goodness-of-fit Measures
9
Table of Contents (may be modified)
3 Multicollinearity
5 Instrumental Variables
10
1. Six Justifications for Linear Regression
11
Six justifications for linear regression
12
Justification A: structural economic model
Regression equation derived from economic/finance theory
13
Example 1: Supply and demand functions
14
Example 2: Glosten-Harris model
● How do financial asset prices evolve?
(Journal of Financial Economics, 1988, 21 (1), pp.123-142)
● Importance of public and private information on price formation
Ingredients and notation:
● market maker (MM): sets bid (buy) and ask (sell) quotes
● traders: buy from/sell to MM at prevailing quotes
● trade (transaction) events indexed by i = 1, . . .
● Efficient price: mi , incorporates all public and private info
● Transaction price: Pi , per share, of ith trade
● Pia (Pib ) prevailing ask (bid) quote before (!) ith trade
⎧
⎪ 1
● Indicator of transaction type: Qi = ⎪
buyer initiated trade
⎨
⎪
⎩−1
⎪ seller initiated trade
15
Glosten-Harris model (2)
● Efficient price:
mi = µ + mi−1 + εi + Qi zi , where zi = z0 + z1 vi
● Drift parameter: µ
● new public information accumulated since (i − 1)th trade: εi
● Private information conveyed through trade: Qi zi
● MM sets bid and ask quotes anticipating price impact on m:
MM’s sell price (ask): Pia = µ + mi−1 + εi + zi + c
MM’s buy price (bid): Pib = µ + mi−1 + εi − zi − c
● (Opportunity) costs of MM: c (per share)
⇒ Transaction price change
∆Pi = µ + z0 Qi + z1 vi Qi + c∆Qi + εi
16
Example 3: Mincer equation
⇒ β2 : return to schooling
17
Example 4: Linear factor asset pricing models
j
j
xt+1 j
pt+1 + dt+1
j
Rt+1 = = (return)
ptj ptj
18
Example 4: Linear factor asset pricing models
Cov (Rtej , ft )
βj =
Var(ft )
−1
β j = E (ft ft′ ) E (Rtej ft )
−1
β j = E (ft ft′ ) E (Rtej ft ) population regression coefficients (see below)
19
CAPM and Fama-French model
“excess return of
CAPM f = R em = R m − R f
market portfolio”
′
Fama-French model ft = (Rtem , HMLt , SMBt )
20
“Compatible regression” (for Fama-French model)
⎡ 1 ⎤
⎢ ⎥
⎢ E(R em ) ⎥
⎢ ⎥
= [αj β1j β2j β3 ] × ⎢
j t
⎥ + εt
⎢ E(HMLt ) ⎥
⎢ ⎥
⎢ E(SMBt ) ⎥
⎣ ⎦
21
Justification B: Population regression
fYX = fYi Xi ∀i
22
Justification B: Population regression
{β̃}
⎡ ⎤
⎢ ⎛ ⎞⎥
⎢ ⎥
PRC β̆ solve F.O.C. ⎢ ⎜
E ⎢Xi ⎜Yi − Xi β̆ ⎟⎥ =0 ′ ⎟⎥
⎢ ⎝´¹¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¸¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹¶⎠⎥
⎢ ⎥
⎣ ε̆i ⎦
Yi = Xi′ β̆ + ε̆i
ε̆i : ● population regression residual
● constructed ε̆i = Yi − Xi′ β̆, “no life of its own”
Interpretation of β̆?
(Angrist/Pischke notation: β̃ = b, β̆ = β, ε̆i = ei )
23
For one constant and single regressor
1
Xi = ( )
Xi2
β̆1
β̆ = ( )
β̆2
∧
Population regression = linear projection
∧
PRC = projection coefficients
24
“Regression anatomy” formula (Frisch-Waugh)
Cov(Yi , X̆ik )
β̆k = (bivariate regression)
Var(X̆ik )
′ ′
Xik = γ̆.k Xi.k + X̆ik with γ̆.k = E(Xi.k Xi.k )−1 E(Xi.k Xik )
25
Important laws
● Law of Total Expectation (LTE):
EX [EY ∣X (Y ∣X )] = EY (Y )
EZ ∣X [EY ∣X ,Z (Y ∣X , Z )∣X ] = EY ∣X (Y ∣X )
● Generalized DET:
26
Justification C: linear cond. expectation function
Marginal effect:
∂E(Yi ∣Xi = xi )
can be nonlinear in xik
∂xik
i = εi )
(Angrist/Pischke notation: ε∗
27
Justification C: linear cond. expectation function
28
Justification C: linear cond. expectation function
Yi = Xi′ β ∗ + ε∗i
= Xi′ β̆ + ε̆i
29
Justification D: best approximation to nonlin. CEF
30
Justification D: best approximation to nonlin. CEF
31
Justification D: best approximation to nonlin. CEF
Again:
Yi = Xi′ β̆ + ε̆i (population regression)
32
Justification E: Optimal prediction
⎡ function to forecast Yi 2 ⎤
⎢⎛ ⎞ ⎥
⎢ ³¹¹ ¹ ¹ ¹·¹ ¹ ¹ ¹ µ ⎥
⎢⎜ ⎟ ⎥
argmin EXY ⎢⎜Yi − m(Xi ) ⎟ ⎥
⎢ ⎥
{m(Xi )} ⎢⎝´¹¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¸¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¶⎠ ⎥
⎢ ⎥
⎣ forecast error
⎦
33
Justification E: Optimal prediction
If only linear m(Xi ) used
⎡ 2⎤
⎢⎛ ⎞ ⎥
⎢ ⎥
β̆ = argmin EXY ⎢⎜ Yi − Xi′ β̃ ⎟ ⎥
⎢⎜ ⎟ ⎥
{β̃} ⎢⎝ ´¹¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¸¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹¶ ⎠ ⎥
⎢ ε̃∶ forecast error ⎥
⎣ ⎦
= Xi′ β̆ + ε̆i
34
Justification F: (Rubin’s) causal model
35
Justification F: (Rubin’s) causal model
36
Conditional independence assumption (CIA)
Y0i ⊥⊥ Ci ∣Zi
⇒
Y1i ⊥⊥ Ci ∣Zi
37
Conditional independence assumption (CIA)
38
CIA can remove selection bias: matching estimator
39
CIA can remove selection bias: matching estimator
40
Causal regression
41
Causal regression
Yi = α + ρSi + ηi
Problem if ηi and Si correlated
Population regression:
Cov(Yi , Si )
β̆2 = ≠ρ
Var(Si )
!
(ε̆i and Si are uncorrelated by construction, i.e. E(Si ε̆i ) = 0)
42
Causal regression
Ysi = fi (S ) = α + ρS + A′i γ̆ + νi
Yi = α + ρSi + A′i γ̆ + νi
Si ∶ years schooling
e.g.
Ai ∶ ability variables
Si
population regression Yi on [ ] (long regression) has causal interpretation.
Ai
43
Long and short regression and OVB
⇒ ρs = ρ + γ̆ ′ δAS
′
Cov(Ai1 , Si ) Cov(Ai2 , Si ) Cov(AiM , Si )
δAS = ( , ,..., )
Var(Si ) Var(Si ) Var(Si )
● δAS : vector of slope coefficients of regression (including constant) of each
element of Ai on Si .
● A variable that affects Ysi (via ηi ) and that is correlated with Si should
be included in Ai . Else: Omitted Variable Bias (OVB)!
● Short population regression coefficient ρs = ρ only if γ̆ = 0 (control
variables don’t affect outcome) or δAS = 0 (control variables and Si
uncorrelated).
44
Long and short regression and OVB
45
Epistemological problems
● non-experimental data
● unobservable variables
● endogeneity
● causality
● simultaneity
46
Notation of the different justifications
1 population
justification of linear Angrist Script QM
regression Pischke
A structural model β, γ, ε β, δ, ε
B population regression β,e β̆, ε̆ β̆, ε̆
∗ ∗ ∗
C linear CEF β ,ε β ,ε β ∗ , ε∗
D best approx. to nonli- β,e + β̆, ε̆ +
near CEF
E optimal prediction β + β̆, ε̆ +, ++
47
Notation of the different justifications
2 objective function
b β̃
2
B population regressi- argmin E [(Y − X ′ b)2 ] argmin E [(Y − X ′ β̃) ]
{b} {β̃}
on
2
D best approx. to argmin E [(E(Y ∣X ) − X ′ b)2 ] argmin EX [(E(Y ∣X ) − X ′ β̃) ]
{b} {β̃}
nonlinear CEF
E(Yi ∣Xi ) ≈ Xi′ β E(Yi ∣Xi ) ≈ Xi′ β̆
E optimal prediction argmin E [(Yi − m(Xi ))2 ] argmin EYX [(Yi − m(Xi ))2 ]
{m(Xi )} {m(Xi )}
m(Xi ) = Xi′ β m(Xi ) = Xi′ β̆
48
2. Parameter Estimation
Hayashi p. 3-18
49
Change in Notation
● random variable Zi
● parameter estimate
50
Linear regression model (CLRM) à la Hayashi
51
Introduction of matrix notation
y = X ⋅ β + ε
(n × 1) (n × K ) (K × 1) (n × 1)
52
System of linear equations
written extensively:
y1 = β1 + β2 x12 + . . . + βK x1K + ε1
y2 = β1 + β2 x22 + . . . + βK x2K + ε2
⋮
yn = β1 + β2 xn2 + . . . + βK xnK + εn
53
Four classical assumptions
54
(Somewhat sloppy) interpretations of parameters
55
Estimation via minimization of SSR
We estimate the linear model and choose b such that SSR is minimized
Obtain an estimate b of β by minimizing the SSR (sum of squared residuals):
n
argmin SSR (β̃) = argmin ∑ (yi − x′i β̃)
2
{β̃} i=1
∂SSR(β̃) 1
⇒ ∑(yi − x′i b) =
!
=0 ∑ ei = 0
∂ β̃1 n
∂SSR(β̃) 1
⇒ ∑(yi − x′i b)xiK =
!
=0 ∑ ei xiK = 0
∂ β̃K n
1 ′
⇒ FOC’s can be conveniently written in matrix notation n
Xe =0
56
Estimation via minimization of SSR
The system of K equations is solved by matrix algebra
X′ e = X′ (y − Xb) = X′ y − X′ Xb = 0
(X′ X)−1 X′ y − IK b = 0
OLS-estimator:
b = (X′ X)−1 X′ y
Alternatively:
−1 −1
1 1 ′ 1 n 1 n
b = ( X′ X) X y = ( ∑ xi x′i ) ∑ xi yi
n n n i=1 n i=1
57
Zooming in
−1 −1
1 1 ′ 1 n 1 n
b = ( X′ X) X y = ( ∑ xi x′i ) ∑ xi yi
n n n i=1 n i=1
1 2 1 1
⎛ n ∑ xi1 n ∑ xi1 xi2 ... n ∑ xi1 xiK ⎞
1 n ⎜ n ∑ xi1 xi2
1 1
∑ xi2
2 1
∑ xi2 xiK ⎟
∑ xi xi = ⎜ ⎟
′ n n
⎜ ⋮ ⋮ ⋱ ⋮ ⎟
n i=1 ⎜ ⎟
⎝ 1 ∑ xi1 x 1
∑ i2 xiK 1 2 ⎠
n ∑ iK
n iK n
x ... x
1
⎛ n ∑ xi1 yi ⎞
⎜ n1 ∑ xi2 yi ⎟
1 n ⎜ ⎟
⎜1 ⎟
∑ xi yi = ⎜
⎜
x y ⎟
n ∑ i3 i ⎟
n i=1 ⎜ ⎟
⎜ ⋮ ⎟
⎝ 1 ∑ xiK yi ⎠
n
58
3. Finite sample properties of OLS
Hayashi p. 27-31
59
Finite sample properties of b = (X′ X)−1 X′ y
3 Var(β̂∣X) ≥ Var(b∣X)
● β̂ is any other linear unbiased estimate of β
● Holds under assumptions 1.1 - 1.4
60
An important result from mathematical statistics
⎛ E(v1 ) ⎞
⎜ E(v2 ) ⎟
E(v) = ⎜ ⎟ = AE(z)
(mx1)
⎜ ⋮ ⎟
⎝ E(vm ) ⎠
Var(v) = AVar(z)A′
(mxm)
61
Unbiasedness of OLS
E(b) = β ⇒ E(b − β) = 0
sampling error
⇒ E(b∣X) = E(β∣X)
62
We show that Var(b∣X) = σ 2 (X′ X)−1
Note:
● β non-random
● b − β sampling error
● A = (X′ X)−1 X′
● Var(ε∣X) = σ 2 In
63
Sketch proof of the Gauss Markov theorem
where
● C is a function of X
● β̂ = Cy
● D=C−A
● A ≡ (X′ X)−1 X′
64
OLS is BLUE
● OLS is linear
⇒ Holds under assumption 1.1
● OLS is unbiased
⇒ Holds under assumption 1.1 - 1.3
65
4. Hypothesis Testing under Normality
Hayashi p. 33-45
66
Hypothesis testing
Distributional assumption:
67
Important facts from multivariate statistics
Expectation vector:
Variance-covariance matrix:
68
Apply facts from mult. statistics and A1.1 - A1.5
Assuming
ε∣X ∼ N (0, σ 2 In )
⎛ ⎞
⇒ b − β∣X ∼ N ⎜
⎜(X′
X)−1 ′
X E(ε∣X) , (X′
X) −1 ′ 2
X σ In X(X′
X) −1 ⎟
⎟
⎝ ´¹¹ ¹ ¹ ¹ ¸¹ ¹ ¹ ¹ ¹ ¶ ⎠
0
69
Testing individual parameters (t-Test)
bk − β̄k
tk = √ ∼ N (0, 1)
σ 2 [(X′ X)−1 ] kk
[(X′ X)−1 ]kk : k -th row k -th column element of (X′ X)−1
70
Nuisance parameter σ 2
2
1 n 1 n 1 n 2 1 ′
σ̂ 2 = ∑ (ei − ∑ ei ) = ∑ ei = n e e
n i=1 n i=1 n i=1
σ̂ 2 is a biased estimate:
n −K 2
E(σ̂ 2 ∣X) = σ
n
71
An unbiased estimate of σ 2
For s 2 = 1
n−K ∑i=1 ei =
n 2 1
n−K
e′ e we get an unbiased estimate
1
E(s 2 ∣X) = E(e′ e∣X) = σ 2
n −K
⇒ t-statistic under H0 :
bk − β̄k bk − β̄k bk − β̄k
tk = √ = =√ ∼ t(n − K )
̂
[Var(b∣X)] s.e.(bk ) ̂ k ∣X)]
[Var(b
kk
72
Decision rule for the t-test
Remark:
√
σ 2 [(X′ X)−1 ]kk : standard deviation of bk ∣X
√
s 2 [(X′ X)−1 ]kk : standard error of bk ∣X
73
Duality of t-test and confidence interval
Under H0 ∶ βk = β k
bk − β k
tk = ∼ t(n − K )
s.e.(bk )
P (−t α2 (n − K ) ≤ tk ≤ t α2 (n − K )) = 1 − α
74
1 − α confidence interval for βk
75
Testing joint hypotheses (F -test/Wald test)
H0 ∶ R β = r
(#r × K) (K × 1) (#r × 1)
Rb − r should be close to 0
76
Wald/F -test statistic
Distributional properties of R b:
77
Distributional properties
⇒ F -ratio:
= ̂
(Rb − r)′ [R Var(b∣X)R′ −1
] (Rb − r)/#r ∼ F (#r, n − K )
78
Decision rule of the F -test
2 Calculate F -statistic.
79
Alternative representation of the F -statistic
n
→ ∑(yi − x′i b)2 ⇒ SSRU
i=1
n
→ ∑(yi − x′i bR )2 ⇒ SSRR
i=1
F -ratio:
(SSRR − SSRU )/#r
F=
SSRU /(n − K )
80
5. Goodness-of-fit measures
Hayashi p. 38/20
81
Coefficient of determination: uncentered R 2
Orthogonal decomposition of y = ŷ + e:
2 e′ e
⇒ Ruc ≡1−
y′ y
A good model explains much and therefore the residual variation is very small
compared to the explained variation.
82
Coefficient of determination: centered R 2
n n n
2 2 2
∑(yi − y) = ∑(ŷi − y) + ∑ ei
i=1 i=1 i=1
n
∑i=1 ei2 SSR
⇒ Rc2 ≡ 1 − ≡1−
∑i=1 (yi − y)2
n
SST
2
Note, that Ruc and Rc2 lie both in the interval [0, 1] but describe different
models. They are not comparable!
83
Model selection criteria
2
adjusted Radj :
2 SSR/(n − K ) n − 1 SSR
Radj =1− =1−
SST /(n − 1) n − K SST
SSR log(n)K
SBC = log ( )+
n n
Note:
● penalty term for heavy parametrization
● Select model with smallest AIC/SBC, highest Radj
2
84
6. Large Sample Theory and OLS
Hayashi p. 88-97/109-133
85
Basic concepts of large sample theory
Using large sample theory we can dispense with basic assumptions from finite
sample theory:
● 1.2 E(εi ∣X) = 0:
strict exogeneity
● 1.4 Var(ε∣X) = σ 2 I:
homoscedasticity
● 1.5 ε∣X ∼ N (0, σ 2 In ):
conditional normality
86
Modes of convergence
Modes of convergence:
● Convergence in probability: →
p
● Convergence in distribution: →
d
87
Convergence in probability
Convergence in probability:
then zn ÐÐ
p→ α (element-wise convergence).
88
Convergence almost surely
P ( lim zn = α) = 1
n→∞
Short-hand we write: zn ÐÐ Ð→ α.
a.s.
89
Convergence in mean square and distribution
Convergence in distribution:
zn ÐÐ
d→ z
90
Khinchin’s Weak Law of Large Numbers (WLLN)
n
then for z n = 1
n ∑ zi it holds that
i=1
z n ÐÐ
p→ µ
or plim z n = µ
91
WLLN: extensions
92
Central Limit Theorems (Lindeberg-Levy)
n
If {zi } i.i.d. with E(zi ) = µ and Var(zi ) = σ 2 and z n = 1
n ∑ zi ÐÐ
p→ µ
i=1
√
d→ y ∼ N (0, σ )
2
n(z n − µ) ÐÐ
a σ2
or z n − µ ∼ N (0, )
n
a σ2
or z n ∼ N (µ, )
n
a
Remark: Read ∼ ‘approximately distributed as’.
CLT also holds for multivariate extension: sequence of random vectors {zi }.
93
Useful lemmas: Continuous Mapping Theorem
a(⋅) ∶ RK → RM
zn ÐÐ
p→ α with a as a continuous function which does not depend on n, then:
a(zn ) ÐÐ
p→ a(α) or plim a(zn ) = a (plim(zn )) = a(α)
Examples:
● xn ÐÐ
p→ α ⇒ ln(xn ) ÐÐ
p→ ln(α)
● xn ÐÐ
p→ β and yn ÐÐ
p→ γ ⇒ xn + yn ÐÐ
p→ β + γ
● Yn ÐÐ
p→ Γ ⇒ Yn−1 ÐÐ
p→ Γ
−1
94
Useful lemmas: Continuous Mapping Theorem
If zn ÐÐ
d→ z, then:
a(zn ) ÐÐ
d→ a(z)
Examples:
● zn ÐÐ
d→ z , z ∼ N (0, 1) ⇒ z 2 ∼ χ2 (1)
● zn ÐÐ
d→ z ∼ N (0, 1)
● zn2 ÐÐ
d→ z ∼ χ (1)
2 2
95
Useful lemmas: Slutzky Theorem
If xn ÐÐ
d→ x and yn ÐÐ
p→ α, then:
xn + yn ÐÐ
d→ x + α
Examples:
● xn ÐÐ
d→ N (0, 1), yn Ð
Ðp→ α ⇒ xn + yn ÐÐ
d→ N (α, 1)
● xn ÐÐ
d→ x, yn Ð
Ðp→ 0 ⇒ xn + yn ÐÐ
d→ x
If xn ÐÐ
d→ x and yn ÐÐ
p→ 0, then:
xn ⋅ yn ÐÐ
p→ 0
96
Useful lemmas: Slutzky Theorem
If xn ÐÐ
d→ x and An Ð
Ðp→ A, then:
An ⋅ xn ÐÐ
d→ A ⋅ x
Example:
● xn ÐÐ
d→ N (0, Σ)
● An ⋅ xn ÐÐ ′
d→ N (0, AΣA )
If xn ÐÐ
d→ x and An Ð
Ðp→ A, then:
x′n A−1
n xn Ð
Ð ′ −1
d→ x A x
97
Large sample assumptions for OLS
98
Large sample distribution of OLS estimator
−1
1 n 1 n
bn = [ ∑ xi x′i ] ∑ xi yi
¯ n i=1 n i=1
n indicates the
dependence on
the sample size
● bn ÐÐ
p→ β
√
● n(bn − β) ÐÐ
a
d→ N (0, AVar(b)) or b ∼ N (β,
AVar(b)
n
)
99
bn = (X′ X)−1 X′ y is consistent
−1
1 n ′ 1 n
bn = [ ∑ xi xi ] ∑ xi yi
n i=1 n i=1
−1
1 1
⇒ bn − β = [ ∑ xi x′i ] ∑ xi εi
´¹¹ ¹ ¹ ¸¹ ¹ ¹ ¹¶ n n
sampling error
We show: bn ÐÐ
p→ β
1 n ′ ′
⇒ ∑ xi xi ÐÐ
p→ E(xi xi )
n i=1
1 n
⇒ ∑ xi εi ÐÐ
p→ E(xi εi ) = 0
n i=1
100
bn = (X′ X)−1 X′ y is consistent
Lemma 1 implies:
−1
CMT 1 n ′ ′ −1
⇒ [ ∑ xi xi ] p→ [E(xi xi )]
ÐÐ
n i=1
−1
1 ′ 1
bn − β = [ ∑ xi xi ] ∑ xi εi
n n
ÐÐ
p→ E(xi x′i )−1 E(xi εi )
ÐÐ
p→ E(xi x′i )−1 ⋅ 0 = 0
101
bn = (X′ X)−1 X′ y is asymptotically normal
Sequence {gi } = {xi εi } allows applying CLT for 1
n ∑ xi εi = g
√ ′
n(g − E(gi )) ÐÐ
d→ N (0, E(gi gi ))
√ 1 −1 √
n(bn − β) = [ ∑ xi x′i ] ng
n
Applying lemma 5:
−1
1 ′ −1
An = [ ∑ xi xi ] ÐÐp→ A = Σxx
n
√ ′
xn = n g ÐÐ
d→ x ∼ N (0, E(gi gi ))
√ −1 ′ −1
⇒ n(bn − β) ÐÐ
d→ Ax ∼ N (0, Σxx E(gi gi )Σxx )
⇒ bn is CAN
102
White standard errors
bk − β̄k a
tk = √ ∼ N (0, 1)
[n
1 x x′ ] n e 2 x x′ [ 1 n x x′ ]
−1 1
∑n ∑n
i=1 i i i n ∑i=1 i i
−1
[ i=1 i i
n
]
kk
holds under H0 ∶ βk = β k
Wald statistic:
−1
̂
AVar(b)
W = (Rb − r)′ [R R′ ] (Rb − r)′ ∼ χ2 (#r )
a
n
103
How to estimate AVar(b)
AVar(b) = Σ−1 ′ −1
xx E(gi gi )Σxx with gi = xi εi
1 n ′ ′
∑ xi xi ÐÐ
p→ E(xi xi )
n i=1
−1 −1
̂ 1 n ′ 1 n ′
⇒ AVar(b) =[ ∑ xi xi ] Ŝ [ ∑ xi xi ] ÐÐ
p→
n i=1 n i=1
104
Testing with conditional homoskedasticity
−1 −1
̂ 1 n ′ 1 n ′ 1
n
′
AVar(b) = [ ∑ xi xi ] σ̂ 2 ∑ xi xi [ ∑ xi xi ]
n i=1 n i=1 n i=1
−1
1 n ′
= σ̂ 2 [ ∑ xi xi ]
n i=1
with σ̂ 2 = 1
n
n 2
∑i=1 ei
1 n 2 2
Note: n ∑i=1 ei is a biased but consistent estimate for σ
105
7. Time Series Basics
(Stationarity and Ergodicity)
Hayashi p. 97-107
106
Time series dependence
Certain degree of dependence in the data in time series analysis; only one
realization of the data generating process is given.
CLT and WLLN rely on i.i.d. data, but dependence in real world data.
Examples:
● Inflation rate
● Stock market returns
107
Parallel worlds: Ensemble means
If we were able to ‘run the world several times’, we had different realizations of
the process at one point in time.
As the described repetition is not possible, we take the mean over the one
realization of the process.
Key question:
1 T
Does ∑ xt → E(X ) hold?
T t=1 p
108
Stationarity restricts the heterogeneity of a s.p.
Strict stationarity:
The joint distribution of zi , zi1 , zi2 , ..., zir depends only on the relative position
i1 − i, i2 − i, ..., ir − i but not on i itself.
In other words: The joint distribution of (zi , zir ) is the same as the joint
distribution of (zj , zjr ) if i − ir = j − jr .
Weak stationarity:
109
Ergodicity restricts memory of stochastic process
lim ∣E [f (zi , zi+1 , ..., zi+k ) ⋅ g(zi+n , zi+n+1 , ..., zi+n+l )]∣
n→∞
= ∣E [f (zi , zi+1 , ..., zi+k )]∣ ⋅ ∣E [g(zi+n , zi+n+1 , ..., zi+n+l )]∣
Ergodic Theorem:
110
Martingale difference sequence
Stationarity and Ergodicity are not enough for applying a CLT. To derive the
CAN property of OLS we assume:
{gi } = {xi εi }
111
8. Generalized Least Squares
Hayashi p. 54-58
112
GLS Assumptions
113
Generalized Least Squares (GLS)
GLS estimator derived under the assumption that V(X) is known, symmetric,
and positive definite
Let V(X)−1 = C′ C
Cy = CXβ + Cε
ỹ = X̃β + ε̃
114
Least squares estimation of β (transformed data)
Problems:
● Difficult to work out the asymptotic properties of β̂ GLS
● In real world applications Var(ε∣X) not known
● If Var(ε∣X) is estimated the BLUE-property of β̂ GLS is lost
115
Special case of GLS - weighted least squares
⎛ V1 (X) 0 ... 0 ⎞
′ 2⎜
0 V2 (X) ⋮ ⎟
E(εε ∣X) = Var(ε∣X) = σ ⎜ ⎟ = σ 2 V(X)
⎜ ⋮ 0 ⋱ 0 ⎟
⎝ 0 ... 0 Vn (X) ⎠
As V(X)−1 = C′ C
√ 1 0 ... 0
⎛ V1 (X) ⎞ ⎛ s1 0 ... 0 ⎞
⎜ √ 1 ⋮ ⎟ ⎜ 01 1
⋮ ⎟
⎜ 0 ⎟ ⎜ ⎟
⇒ C=⎜ V2 (X) ⎟=⎜ s2
⎟
⎜ ⋮ ⋱ 0 ⎟ ⎜ ⋮ 0 ⋱ 0 ⎟
⎜ ⎟
⎠ ⎝ 0 ⎠
1
⎝ 0 ... 0 √ 1
Vn (X)
... 0 sn
n
yi 1 xi2 xiK 2
⇒ β̂ GLS = argmin ∑ ( − β̃1 − β̃2 ... − β̃K )
{β̃} i=1 si si si si
116
9. Multicollinearity
117
Exact multicollinearity
118
Effects of multicollinearity and solutions
Effects:
● Coefficients may have high standard errors and low significance levels
● Estimates may have the wrong sign
● Small changes in the data produces wide swings in the parameter
estimates
Solutions :
● Increasing precision by implementing more data. (Costly!)
● Building a better fitting model that leaves less unexplained.
● Excluding some regressors. (Dangerous! Omitted variable bias!)
119
10. Endogeneity
Hayashi p. 186-196
120
Omitted variable bias (OVB)
y = X1 β 1 + X2 β 2 + ε
Regression of y on X1
⇒ X2 gets into disturbance term
⇒ Omitted variable bias
OLS is biased:
● If β 2 ≠ 0 ⇒ (X′1 X1 )−1 X′1 X2 β 2 ≠ 0
● If (X′1 X1 )−1 X′1 X2 ≠ 0 ⇒ (X′1 X1 )−1 X′1 X2 β 2 ≠ 0
121
Endogeneity bias: Working example
qid = α0 + α1 pi + ui
qis = β0 + β1 pi + vi
122
From structural form to reduced form
β0 − α0 vi − ui
pi = +
α1 − β1 α1 − β1
α1 β0 − α0 β1 α1 vi − β1 ui
qi = +
α1 − β1 α1 − β1
Var(ui )
⇒ Cov(pi , ui ) = −
α1 − β1
123
With endogeneity OLS is not consistent
1
∑(qi − q)(pi − p) Cov(pi , qi )
α̂1 = n
ÐÐ
p→
1
(p − p)2
n ∑ i
Var(pi )
Cov(pi , qi ) Cov(pi , ui )
⇒ = α1 + ≠ α1
Var(pi ) Var(pi )
124
Instruments for the market model
125
A basic macroeconomic model: Haavelmo (1943)
Cov(Ci , Yi ) Cov(Yi , ui )
= α1 + ≠ α1
Var(Yi ) Var(Yi )
126
Errors in variables
Observed variables:
● Yi = Yi∗ + yi
● Ci = Ci∗ + ci
● ci = kyi + ui
⇒ Solution: IV
127
11. Instrumental Variables
128
Solution for endogeneity problem: IV
Linear regression:
yi = x′i β + εi
But the assumption of predetermined regressors does not hold:
E(xi εi ) ≠ 0
⎡ zi1 ⎤
⎢ ⎥
⎢ z ⎥
⎢ i2 ⎥
zi = ⎢ ⎥
⎢ ⋮ ⎥
⎢ ⎥
⎢ ziL ⎥
⎣ ⎦
129
IV Assumptions
130
IV Assumptions
K = L ⇒ Σ−1
ZX exists.
131
Deriving the IV-estimator (K = L)
−1
β = [E(zi x′i )] E(zi yi )
−1
1 ′ 1
β̂ IV = [ ∑ zi xi ] [ ∑ zi yi ]
n n
132
Deriving the IV-estimator (K = L)
Applying WLLN, CLT and the useful lemmas it can be shown that IV estimate
β̂ IV is CAN.
β̂ IV ÐÐ
p→ β
√ ′ −1 −1 ′
n (β̂ IV − β) ÐÐ
d→ N (0, [E(zi xi )] E(ε2i zi z′i ) [[E(zi x′i )] ] )
−1 ′
−1
̂ β̂ ) = [ 1 ∑ zi x′i ]
AVar(
1 2 ′ 1 ′
∑ ei zi zi [[ ∑ zi xi ] ]
IV
n n n
with ei = yi − x′i β̂ IV
̂
̂ β̂ ) = AVar(β̂ IV )
Var( IV
n
133
IV in the context of causal model
ysi = fi (s)
= α + ρS + ηi
yi = α + ρSi + γ ′ Ai + νi
´¹¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹¸¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¶
ηi
134
IV in the context of causal model
“exclusion restriction”
yi = α + ρSi + ηi
figure out:
135
IV in the context of causal model
Cov(yi , zi ) Cov(Si , zi )
ρ= ∶
Var(zi ) Var(zi )
´¹¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¸¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹¶ ´¹¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¸¹¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¶
PR of yi on zi PR of Si on zi
136
IV in the context of causal model
η̃i = ηi − E(ηi )
yi = α̃ + ρSi + η̃i
with α̃ = α + E(ηi )
and E(η˜i ) = 0
137
Special case of IV
1 1 α̃
zi = [ ], xi = [ ], β=[ ], εi = η˜i , yi = yi
zi Si ρ
−1 α̃
β = [E(zi x′i )] E(zi yi ) = [ ]
ρ
Cov(yi , zi )
ρ=
Cov(Si , zi )
α̃ = E(yi ) − ρE(Si )
−1 ˆ
1 ′ 1 α̃
β̂ = [ ∑ zi xi ] ∑ zi yi = [ ]
n n ρ̂
138
Causal model with covariates
yi = α′ xi + ρSi + ηi
Consider 2 PR:
π21 Cov(z̃i , yi )
= =ρ
π11 Cov(z̃i , Si )
139
An alternative view: Two-Stage-Least Squares
Empirical strategy:
140
3rd view IV
yi = α′ xi + ρSi + ηi
Instrument zi :
E(ηi zi ) = 0, E(ηi xi ) = 0
Redefine
xi xi α
zi = [ ], xi = [ ], β=[ ]
zi Si ρ
ηi = yi − α′ xi − ρSi (original)
= yi − x′i β (redefined)
⇒ E(zi ηi ) = 0 redefined
141
3rd view IV
−1 α
β = [E(zi x′i )] E(zi yi ) = [ ]
ρ
−1
1 ′ 1 α̂
β̂ = [ ∑ zi xi ] ∑ zi yi = [ ]
n n ρ̂
⇒ use IV inference!
142
Hayashi in a nutshell
143
Hayashi in a nutshell
144
Hayashi in a nutshell
145