Instrumental PDF
Instrumental PDF
Instrumental PDF
Lecture 8
Instrumental Variables
1
RS – Lecture 8
The IV Problem
• We start with our CLM:
y = X + . (DGP)
- Let's pre-multiplying the DGP by X'
X' y = X' X + X' .
- We can interpret b as the solution obtained by first approximating
X' by zero, and then solving the k equations in k unknowns
X'y = X'X b (normal equations).
2
RS – Lecture 8
X y
3
RS – Lecture 8
4
RS – Lecture 8
5
RS – Lecture 8
6
RS – Lecture 8
7
RS – Lecture 8
• Variables
y, X: endogenous variables –i.e., correlated with .
U & Z: exogenous variables –i.e., uncorrelated with .
U: included instruments, clean variables (“controls”)
Z: excluded instruments, IVs –i.e., satisfies the relevant condition and
the valid condition, also referred as exclusion restriction. (Excluded =
not included in the structural equation.)
8
RS – Lecture 8
• Equations
Structural equation: Theory dictates this relation: it relates y and X
(or Y). It measures the causal effect of X on y, ; but the effect is
blurred by endogeneity.
First stage: Regression of X on the instrument, Z (it measures a
causal effect from Z to X).
Reduced form: Regression of y on the instrument is called the
reduced form (it measures the direct causal effect from Z to y).
Assumptions:
{xi, zi, εi} is a sequence of RVs, with:
E[X’X] = Qxx (pd and finite) (LLN plim(X’X/T) =Qxx )
E[Z’Z] = Qzz (finite) (LLN plim(Z’Z/T) =Qzz )
E[Z’X] = Qzx (pd and finite) (LLN plim(Z’X/T) =Qzx )
E[Z’] = 0 (LLN plim(Z’/T) = 0)
9
RS – Lecture 8
The second case is the usual situation. We can throw l-k instruments,
but throwing away information is never optimal.
IV Estimation
• Case 1: l = k -i.e., number of instruments = number of regressors.
To get the IV estimator, we start from the system of equations:
W'Z’X bIV = W'Z’y
- dim(Z)=dim(X): Txk Z’X is a kxk pd matrix
- In this case, W is irrelevant, say, W=I. Then,
bIV = (Z’X)-1Z’y
10
RS – Lecture 8
11
RS – Lecture 8
IV Estimators: Example
Simplest case: Linear model, two endogenous variables, one IV.
y1 = y2 + – ~ N(0, σεε)
y2 = z π + v – v ~ N(0, σVV)
with reduced form:
y1 = z π + v + = z γ + ξ.
The parameter of interest is (= γ/π).
∑ , ̅
• We estimate with IV:
∑ , ̅
12
RS – Lecture 8
IV Estimators: Example
• To analyze the bias,
bIV = (z' y2)-1 z' y1 = + (z' y2)-1 z'
plim(bIV) – = plim(z’/T)/plim(z’y2/T) Cov(z, )/Cov(z, y2)
p
• If Cov(z, ) is small, but π≈0, the inconsistency can get large (π≈0)
Cov(z, y2) = Cov(z, (zπ+v)) = π Var(z) + Cov(z, v) = π Var(z) ≈ 0
IV Estimators: Example
• When π is small, we say z is a weak instrument. It provides
information, but, as we will see later, not enough.
• Note that even when π=0, in finite samples, the sample analogue to
Cov(z, y2) ≠ 0. Not very useful fact, the sampling variation in
Cov(z,y2) is not helpful to estimate .
13
RS – Lecture 8
14
RS – Lecture 8
• Interpretations of bIV
b IV b 2 SLS ( Xˆ ' Xˆ ) 1 Xˆ ' y This is the 2SLS interpretation
b IV ( Xˆ ' X ) 1 Xˆ ' y This is the usual IV Z Xˆ
15
RS – Lecture 8
Notes:
- In the 1st stage, any variable in X that is also in Z will achieve a
perfect fit (these X are clean), so that this variable is carried over
without modification to the second stage.
- In the 2nd stage, under the usual linear model for X: X = ZП + V,
y X Xˆ { ( X Xˆ ) }
The second component of the error term is a source of finite sample
bias, but not inconsistency.
16
RS – Lecture 8
17
RS – Lecture 8
IV Estimators: Identification
• Case 3: l < k -i.e., number of instruments < number of regressors.
- We cannot estimate . We do not have enough information in Z to
estimate .
- This is the identification problem. This is the case where we need to
rethink the estimation strategy.
- When we can estimate , we say the model is identified. This
happens when l ≥ k.
18
RS – Lecture 8
2 ( Xˆ ' Xˆ ) 1
19
RS – Lecture 8
20
RS – Lecture 8
S u p p o s e o n ly th e fir s t v a r ia b le is c o rr e la te d w ith ε
1
0
U n d e r th e a s s u m p tio n s , p lim ( X 'ε /n ) = . Then
...
.
1 q 11
21
p lim b - β = p lim ( X 'X /n ) -1 0 q
... 1
...
K 1
. q
1 tim e s th e firs t c o lu m n o f Q -1
21
RS – Lecture 8
• Also, recall that the 1st stage introduces a source of finite sample
bias: the estimation of П.
2
T bIV
d 1
N 0, 2 2
X XZ
• We will see that the small sample behavior of bIV will depend on the
nature of the model, the correlation between X and ε, and the
correlation between X and Z.
16
22
RS – Lecture 8
23
RS – Lecture 8
10
OLS, n = 100
IV, n = 25
0
4 5 6
• b2,IV has a greater variance than b2,OLS. For small samples (say, n = 25
or 100) OLS may be better in terms of MSE. But, as n grows, b2,IV and
b2,OLS tend to their plims (b2,IV more slowly than b2,OLS, because it has a
larger variance).
21
n = 25
n = 100
limiting normal distribution 0.2
n = 3,200
0.1
0
-6 -4 -2 0 2 4 6
• We have the distribution of √n (b2,IV – b2) for n = 25, 100, and 3,200.
It also shows, as the dashed red line, the limiting normal distribution.
For n = 3,200 is very close to the limiting distribution. Inference would
be OK with samples of this magnitude. 24
24
RS – Lecture 8
n = 25
n = 100
limiting normal distribution 0.2
n = 3,200
0.1
0
-6 -4 -2 0 2 4 6
• For n=25, 100, the tail are too fat. Inference would give rise to excess
instances of Type I error (under rejection). The distortion for small
sample sizes is partly attributable to the low ρxz corr(X,Z)=0.22
(=.5/sqrt(5.25)) (or weak instruments; common in IV estimation). 24
• Some size problem for small n. Low ρxz slightly increases the size
problem.
24
25
RS – Lecture 8
These data were analyzed in Cornwell, C. and Rupert, P., "Efficient Estimation with Panel
Data: An Empirical Comparison of Instrumental Variable Estimators," Journal of Applied
Econometrics, 3, 1988, pp. 149-155. See Baltagi, page 122 for further analysis. The data
were downloaded from the website for Baltagi's text.
26
RS – Lecture 8
Exogenous Endogenous
OLS Consistent, Efficient Inconsistent
27
RS – Lecture 8
28
RS – Lecture 8
Note: If W contains X, then the 2SLS in the second and third steps
reduces to OLS.
29
RS – Lecture 8
Note: If W contains X, then the 2SLS in the second and third steps
reduces to OLS.
Davidson and MacKinnon (1993) point out that the DWH test really
tests whether possible endogeneity of the right-hand-side variables
not contained in the instruments makes any difference to the
coefficient estimates.
30
RS – Lecture 8
Wu Test (Greene)
+----------------------------------------------------+
| Ordinary least squares regression |
| LHS=LWAGE Mean = 6.676346 |
+----------------------------------------------------+
+--------+--------------+----------------+--------+--------+----------+
|Variable| Coefficient | Standard Error |b/St.Er.|P[|Z|>z]| Mean of X|
+--------+--------------+----------------+--------+--------+----------+
|Constant| -6.60400*** .50833 -12.992 .0000 |
|EXP | .01735*** .00057 30.235 .0000 19.8538|
|OCC | -.04375*** .01489 -2.937 .0033 .51116|
|ED | .07840*** .00275 28.489 .0000 12.8454|
|WKS | .00355*** .00114 3.120 .0018 46.8115|
|WKSHAT | .25176*** .01065 23.646 .0000 46.8115|
+--------+------------------------------------------------------------+
| Note: ***, **, * = Significance at 1%, 5%, 10% level. |
+---------------------------------------------------------------------+
31
RS – Lecture 8
Measurement Error
• DGP: y* = x* + - ~ iid D(0, σε2)
- all of the CLM assumptions apply.
Problem: x*, y* are not observed or measured correctly. x, y are
observed:
x = x* + u u ~ iid D(0, σu2) -no correlation to ,v
y = y* + v v ~ iid D(0, σv2) -no correlation to ,u
Measurement Error
CASE 1 - Only x* is measured with error.
y = y* = x* +
y = (x – u) + = x + – u = x + w
E[x’w] = E[(x* + u)’( – u)] = – σu2 ≠ 0
OLS biased & inconsistent. We need IV!
32
RS – Lecture 8
Measurement Error
• Q: What happens when OLS is used –i.e., we y regress on x?
A: Least squares attenuation:
c o v ( x ,y ) cov(x * u, x * )
p lim b =
v a r(x ) v a r( x * u )
v a r( x * )
= <
v a r( x * ) v a r(u )
Measurement Error
CASE 2 - Only y* is measured with error.
y* = y - v = x* + = x +
y = x + + v = x + ( + v)
• Q: What happens when y is regressed on x?
A: Nothing! We have our usual OLS problem since and v are
independent of each other and x*. CLM assumptions are not violated!
33
RS – Lecture 8
34
RS – Lecture 8
OLS estimates αi, δi (=1 – i), γi(= θ i), ψi (but, not i directly!). H0
can be tested. (In general, smearing complicates the estimation.)
35
RS – Lecture 8
M u ltip le re g re s s io n : y = 1 x 1 * 2 x 2 *
x 1 * is m e a s u re d w ith e r ro r; x 1 x 1 * u
x 2 is m e a s u r e d w ith o u t e rro r .
T h e re g r e s s io n is e s tim a te d b y le a s t s q u a re s
P o p u la r m y th # 1 . b 1 is b ia s e d d o w n w a rd , b 2 c o n s is te n t.
P o p u la r m y th # 2 . A ll c o e ffic ie n ts a re b ia s e d to w a rd z e ro .
R e s u lt fo r th e s im p le s t c a s e . L e t
ij c o v ( x i * , x j * ) , i, j 1 , 2 ( 2 x 2 c o v a r ia n c e m a trix )
ij ijth e le m e n t o f th e in v e r s e o f th e c o v a ria n c e m a trix
2 v a r (u )
F o r th e le a s t s q u a re s e s tim a to rs :
1 2 12
p lim b 1 1 2 11
, p lim b 2 2 1 2 11
1 1
T h e e ffe c t is c a lle d " s m e a r in g ."
36
RS – Lecture 8
• Linear model: y = x + Uγ +
• H0: = 0.
• We do not observe x*, we observe self-reported x. We need to find
an instrument to estimate the model.
37
RS – Lecture 8
38
RS – Lecture 8
From the 2nd part: Once I know the effect of Z on X, I can throw Z.
• It is not difficult to find a Z that meets (2), the valid condition. Many
variables are not correlated with , the error term from the CEO
compensation structural equation.
39
RS – Lecture 8
Note: Deaton (2010) calls the variables in the examples external, since
they are not determined by the model. They may not be exogenous.
• Starting with Angrist (1990) and Angrist and Krueger (1991) (for us,
A&K), who study the effect on earnings of civilian work experience
and schooling, respectively, there has been an emphasis on using a Z
that can be defined by a natural experiment when the IV problem is
caused by omitted variables.
40
RS – Lecture 8
• “Natural” point out that us (the researchers) did not design the
episode to be analyzed, but can use it to identify causal relationships.
41
RS – Lecture 8
- Ideal experiment: Identify 2 similar cities and remove a city next to it.
42
RS – Lecture 8
• Only condition (1) is the only one we can directly check, through the
first-stage regression, where we get . Given that is unobservable,
the legitimacy of (2) is usually left for theory or common sense. A
researcher should also convince the audience about the validity of (3).
43
RS – Lecture 8
• That is, Z should affect earnings only through its effect on schooling.
44
RS – Lecture 8
• The data for the 1930-39 cohort show that men born earlier in the
year have lower schooling. QOB can be an instrument there is a
first stage: x = π z + Dγ + v (Z: Dummy variable for QOB)
45
RS – Lecture 8
• Final 2SLS model interacted QOB with year of birth (30), state of
birth (150):
– OLS: bOLS = .0628 (s.e. = .0003) (large T small SE’s).
– 2SLS: b2SLS = .0811 (s.e. = .0109)
– Var[bIV] > Var[bOLS], as expected. (But, maybe too large?)
46
RS – Lecture 8
E [ y | Z 1] E [ y | Z 0] 2
Then, bIV
E [ x | Z 1] E [ x | Z 0] 2
• The Wald estimator is known as local IV or local average treatment
effect, LATE (under some assumptions, bIV = E[y(1) – y(0)|compliers]).
47
RS – Lecture 8
48
RS – Lecture 8
• Implications:
– Gleser and Hwang (1987) and Dufour (1997) show that CIs and
tests based on t-tests and F (Wald) tests are not robust to weak IVs.
– The concern is not just theoretical: Numerical studies show that
coverage rates of conventional 2SLS CIs can be very poor when
IVs are weak, even if T is large.
49
RS – Lecture 8
50
RS – Lecture 8
51
RS – Lecture 8
52
RS – Lecture 8
53
RS – Lecture 8
54
RS – Lecture 8
• It turns out that LIML is a linear combination of the OLS and 2SLS
estimates (with the weights depending on the data), and the weights
happen to be such that they approximately eliminate the 2SLS bias.
55
RS – Lecture 8
56
RS – Lecture 8
57
RS – Lecture 8
• Intuition of AR test.
- Subtract from model Y0:
y – Y0 = Y – Y0 + = Y ( – 0 ) +
- Substitute 1st stage:
y – Y0 = (ZП + V) ( – 0) +
= ZП ( – 0 ) + V ( – 0 ) +
= ZΦ + W
58
RS – Lecture 8
• The AR stat is the usual F-stat for testing Φ = 0. Under the usual
assumptions (fixed regressors and normal errors), the AR stat follows
the usual F distribution.
• It turns out that the power of the AR test is not very good when
l>1. The AR test tests whether Z enters the (y – Y0) equation. The
AR test sacrifices power: It ignores the restriction Φ = П ( – 0 ).
• Low power leads to very wide CIs based on such tests. Kleibergen
(2002) and Moreira (2001) propose an LM test whose H0 rate is
robust to weak IVs. (LM test first estimates П under H0: = 0.)
59
RS – Lecture 8
60
RS – Lecture 8
• Bekker (1994) and Newey and Smith (2004) show that GMM-type
approaches to estimating structural parameters using instrumental
variables, which include IV and 2SLS, may have substantial bias when
l is not small relative to T.
61
RS – Lecture 8
62
RS – Lecture 8
Note: This bias can affect tests, for example, the Hausman test.
63
RS – Lecture 8
Excessive Overidentification
• Situation: l is much larger than k. Possible “overidentification.”
• While nobody will set l=T, a similar finite sample bias occurs in less
extreme cases. In general, as l → T, we see that b2SLS → bOLS.
Excessive Overidentification
• Angrist and Pischke (2009) report that “just-identified 2SLS is
approximately unbiased.” They report a simulation with weak IVs, using
OLS, just-identified IV and 2SLS with 20 IVs (and LIML too):
64
RS – Lecture 8
Excessive Overidentification
• Angrist and Pischke (2009) also report coverage rates of 95% C.I.
(The coverage rate is the probability that a C.I. includes the true
parameter.) Coverage rates for OLS and 2SLS are poor.
• Detection:
– Visual – there is no test.
– Check b2SLS and bOLS. It they are similar, check that this is
not a result of “too many IVs.”
• Remedy:
– Fewer instruments? (Several methodological problems with this
idea). Donald and Newey (2001) consider this option.
– Jackknife estimation –see Ackerberg and Devereux (2009).
65
RS – Lecture 8
• Remedy:
– Some modifications of the DWH have been suggested under
weak instruments, see Hahn and Hausman (2002, 2005).
– Avoid endogenous weak instruments.
– General problem: It is not easy to find good instruments in
theory and in practice. Find natural experiments.
66
RS – Lecture 8
• For the simple case of one endogenous variable, the F-stat in the
1st stage can help to identify weak IVs. With many IVs, Stock and
Yogo (2005) provide rules of thumb regarding the weakness of the
IVs based on a statistic due to Cragg and Donald (1993).
• Large T will not help. A&K and Consumption CAPM tests have
very large samples!
67
RS – Lecture 8
68
RS – Lecture 8
• References:
- “Mostly Harmless Econometrics: An Empiricist's Companion,”
textbook by Angrist and Pischke (2009).
- “Instruments, Randomization, and Learning about Development,”
by Deaton (2010, JEL).
- “The Empirical Economist's Toolkit: From Models to Methods,” by
Panhans and Singleton (2015, Duke Working Paper).
69