Wooldridge Session 4
Wooldridge Session 4
Wooldridge Session 4
Jeff Wooldridge
Michigan State University
1. Introduction
2. General Setup and Quantities of Interest
3. Assumptions with Neglected Heterogeneity
4. Models with Heterogeneity and Endogeneity
5. Estimating Some Popular Models
1
1. Introduction
When panel data models contain unobserved heterogeneity and
omitted time-varying variables, control function methods can be used to
account for both problems.
Under fairly week assumptions can obtain consistent, asymptotically
normal estimators of average structural functions provided suitable
instruments are available.
Other issues with panel data: How to treat dynamics? Models with
lagged dependent variables are hard to estimate when heterogeneity and
other sources of endogeneity are present.
2
Approaches to handling unobserved heterogeneity:
1. Treat as parameters to estimate. Can work well with large T but with
small T can have incidental parameters problem. Bias adjustments are
available for parameters and average partial effects. Usually weak
dependence or even independence is assumed across the time
dimension.
2. Remove heterogeneity to obtain an estimating equation. Works for
simple linear models and a few nonlinear models (via conditional MLE
or a quasi-MLE variant). Cannot be done in general. Also, may not be
able to identify interesting partial effects.
3
Correlated Random Effects: Mundlak/Chamberlain. Requires some
restrictions on distribution of heterogeneity, although these can be
nonparametric. Applies generally, does not impose restrictions on
dependence over time, allows estimation of average partial effects. Can
be easily combined with CF methods for endogeneity.
Can try to establish bounds rather than estimate parameters or APEs.
Chernozhukov, Fernndez-Val, Hahn, and Newey (2009) is a recent
example.
4
2. General Setup and Quantities of Interest
Static, unobserved effects probit model for panel data with an omitted
time-varying variable r it :
Py it 1|x it , c i , r it x it c i r it , t 1, . . . , T. (1)
5
(ii) The magnitudes of the partial effects. These depend not only on the
value of the covariates, say x t , but also on the value of the unobserved
heterogeneity. In the continuous covariate case,
Py t 1|x t , c, r t
j x t c r t . (2)
x tj
6
Let x it , y it : t 1, . . . , T be a random draw from the cross section.
Suppose we are interested in
Ey it |x it , c i , r it m t x it , c i , r it . (3)
or discrete changes.
7
How do we account for unobserved c i , r it ? If we know enough
about the distribution of c i , r it we can insert meaningful values for
c, r t . For example, if c Ec i , r t Er it then we can compute
the partial effect at the average (PEA),
PEA j x t j x t , c , r t . (5)
8
Alternatively, we can obtain the average partial effect (APE) (or
population average effect) by averaging across the distribution of c i :
APEx t E c i ,r it j x t , c i , r it . (6)
The difference between (5) and (6) can be nontrivial. In some leading
cases, (6) is identified while (5) is not. (6) is closely related to the
notion of the average structural function (ASF) (Blundell and Powell
(2003)). The ASF is defined as
ASF t x t E c i ,r it m t x t , c i , r it . (7)
Passing the derivative through the expectation in (7) gives the APE.
9
3. Assumptions with Neglected Heterogeneity
Exogeneity of Covariates
Cannot get by with just specifying a model for the contemporaneous
conditional distribution, Dy it |x it , c i .
The most useful definition of strict exogeneity for nonlinear panel
data models is
Dy it |x i1 , . . . , x iT , c i Dy it |x it , c i . (8)
Ey it |x i1 , . . . , x iT , c i Ey it |x it , c i . (9)
10
The sequential exogeneity assumption is
Dy it |x i1 , . . . , x it , c i Dy it |x it , c i . (10)
11
Conditional Independence
In linear models, serial dependence of idiosyncratic shocks is easily
dealt with, either by cluster robust inference or Generalized Least
Squares extensions of Fixed Effects and First Differencing. With
strictly exogenous covariates, serial correlation never results in
inconsistent estimation, even if improperly modeled. The situation is
different with most nonlinear models estimated by MLE.
Conditional independence (CI) (under strict exogeneity):
T
Dy i1 , . . . , y iT |x i , c i Dy it |x it , c i . (11)
t1
12
In a parametric context, the CI assumption reduces our task to
specifying a model for Dy it |x it , c i , and then determining how to treat
the unobserved heterogeneity, c i .
In random effects and correlated random frameworks (next section),
CI plays a critical role in being able to estimate the structural
parameters and the parameters in the distribution of c i (and therefore, in
estimating PEAs). In a broad class of popular models, CI plays no
essential role in estimating APEs.
13
Assumptions about the Unobserved Heterogeneity
Random Effects
Generally stated, the key RE assumption is
Dc i |x i1 , . . . , x iT Dc i . (12)
Ey it |x it x t . (13)
14
Correlated Random Effects
A CRE framework allows dependence between c i and x i , but restricted
in some way. In a parametric setting, we specify a distribution for
Dc i |x i1 , . . . , x iT , as in Chamberlain (1980,1982), and much work
since. Distributional assumptions that lead to simple estimation
homoskedastic normal with a linear conditional mean can be
restrictive.
15
Possible to drop parametric assumptions and just assume
Dc i |x i Dc i |x i , (14)
16
APEs are identified very generally. For example, under (14), a
consistent estimate of the average structural function is
N
ASFx t N 1 q t x t , x i , (15)
i1
where q t x it , x i Ey it |x it , x i .
Need a random sample x i : i 1, . . . , N for the averaging out to
work.
17
Fixed Effects
The label fixed effects is used differently by different researchers.
One view: c i , i 1, . . . , N are parameters to be estimated. Usually leads
to an incidental parameters problem.
Second meaning of fixed effects: Dc i |x i is unrestricted and we
look for objective functions that do not depend on c i but still identify
the population parameters. Leads to conditional MLE if we can find
sufficient statistics s i such that
Dy i1 , . . . , y iT |x i , c i , s i Dy i1 , . . . , y iT |x i , s i . (16)
18
4. Models with Heterogeneity and Endogeneity
Let y it1 be a scalar response, y it2 a vector of endogenous variables,
z it1 exogenous variables, and we have
19
Sometimes we can eliminate c i and obtain an equation that can be
estimated by IV (linear, exponential). Generally not possible.
Now a CRE approach involves modeling Dc i1 |z i .
Generally, we need to model how y it2 is related to r it1 .
Control Function methods are convenient for allowing both.
Suppose y it2 is a scalar and
y it2 m it2 z it , z i , 2 v it2
Ev it2 |z i 0 (20)
Dr it1 |v it2 , z i Dr it1 |v it2
20
With suitable time-variation in the instruments, the assumptions in
(20) allow identification of the ASF if we assume a model for
Dc i1 |z i , v it2
Ey it1 |y it2 , z i , v it2 Ey it1 |y it2 , z it1 , z i , v it2 g t1 y it2 , z it1 , z i , v it2 (21)
21
The ASF is now obtained by averaging out z i , v it2 :
ASFy t2 , z t1 E z i ,v it2 g t1 y t2 , z t1 , z i , v it2
Dc i1 |z i , v it2 Dc i1 |z i , v it2
22
5. Estimating Some Popular Models
Linear Model with Endogeneity
Simplest model is
y it1 1 y it2 z it1 1 c i1 u it1 x it1 1 c i1 u it1 (22)
Eu it1 |z i , c i1 0
23
Easy to make inference robust to serial correlation and
heteroskedasticity in u it1 . (Cluster-robust inference.)
Test for (strict) exogeneity of y it2 :
(i) Estimate the reduced form of y it2 by usual fixed effects:
y it2 z it 1 c i2 u it2
Get the FE residuals, it2 it2 z it 1 .
Estimate the augment equation
y it1 1 y it2 z it1 1 1 it2 c i1 error it (23)
24
The random effects IV approach assumes c i1 is uncorrelated with z i ,
and nominally imposes serial independence on u it1 .
Simple way to test the null whether REIV is sufficient. (Robust
Hausman test comparing REIV and FEIV.)
Estimate
25
Other than the rank condition, the key condition for FEIV to be
consistent is that the instruments, z it , are strictly exogenous with
respect to u it . With T 3 time periods, this is easily tested as in the
usual FE case.
The augmented model is
y it1 x it1 1 z i,t1 1 c i1 u it1 , t 1, . . . , T 1
26
Example: Estimating a Passenger Demand Function for Air Travel
N 1, 149, T 4.
Uses route concentration for largest carrier as IV for logfare.
. use airfare
27
. xtivreg lpassen ldist ldistsq y98 y99 y00 (lfare concen), re theta
------------------------------------------------------------------------------
lpassen | Coef. Std. Err. z P|z| [95% Conf. Interval]
-----------------------------------------------------------------------------
lfare | -.5078762 .229698 -2.21 0.027 -.958076 -.0576763
ldist | -1.504806 .6933147 -2.17 0.030 -2.863678 -.1459338
ldistsq | .1176013 .0546255 2.15 0.031 .0105373 .2246652
y98 | .0307363 .0086054 3.57 0.000 .0138699 .0476027
y99 | .0796548 .01038 7.67 0.000 .0593104 .0999992
y00 | .1325795 .0229831 5.77 0.000 .0875335 .1776255
_cons | 13.29643 2.626949 5.06 0.000 8.147709 18.44516
-----------------------------------------------------------------------------
sigma_u | .94920686
sigma_e | .16964171
rho | .96904799 (fraction of variance due to u_i)
------------------------------------------------------------------------------
Instrumented: lfare
Instruments: ldist ldistsq y98 y99 y00 concen
------------------------------------------------------------------------------
28
. xtivreg2 lpassen ldist ldistsq y98 y99 y00 (lfare concen), fe cluster(id)
Warning - collinearities detected
Vars dropped: ldist ldistsq
------------------------------------------------------------------------------
| Robust
lpassen | Coef. Std. Err. z P|z| [95% Conf. Interval]
-----------------------------------------------------------------------------
lfare | -.3015761 .6124127 -0.49 0.622 -1.501883 .8987307
y98 | .0257147 .0164094 1.57 0.117 -.0064471 .0578766
y99 | .0724166 .0250971 2.89 0.004 .0232272 .1216059
y00 | .1127914 .0620115 1.82 0.069 -.0087488 .2343316
------------------------------------------------------------------------------
Instrumented: lfare
Included instruments: y98 y99 y00
Excluded instruments: concen
------------------------------------------------------------------------------
29
. egen concenb mean(concen), by(id)
. xtivreg lpassen ldist ldistsq y98 y99 y00 concenb (lfare concen), re theta
------------------------------------------------------------------------------
lpassen | Coef. Std. Err. z P|z| [95% Conf. Interval]
-----------------------------------------------------------------------------
lfare | -.3015761 .2764376 -1.09 0.275 -.8433838 .2402316
ldist | -1.148781 .6970189 -1.65 0.099 -2.514913 .2173514
ldistsq | .0772565 .0570609 1.35 0.176 -.0345808 .1890937
y98 | .0257147 .0097479 2.64 0.008 .0066092 .0448203
y99 | .0724165 .0119924 6.04 0.000 .0489118 .0959213
y00 | .1127914 .0274377 4.11 0.000 .0590146 .1665682
concenb | -.5933022 .1926313 -3.08 0.002 -.9708527 -.2157518
_cons | 12.0578 2.735977 4.41 0.000 6.695384 17.42022
-----------------------------------------------------------------------------
sigma_u | .85125514
sigma_e | .16964171
rho | .96180277 (fraction of variance due to u_i)
------------------------------------------------------------------------------
Instrumented: lfare
Instruments: ldist ldistsq y98 y99 y00 concenb concen
------------------------------------------------------------------------------
30
. ivreg lpassen ldist ldistsq y98 y99 y00 concenb (lfare concen), cluster(id)
31
. * Now test whether instrument (concen) is strictly exogenous.
------------------------------------------------------------------------------
| Robust
lpassen | Coef. Std. Err. z P|z| [95% Conf. Interval]
-----------------------------------------------------------------------------
lfare | -.8520992 .3211832 -2.65 0.008 -1.481607 -.2225917
y98 | .0416985 .0098066 4.25 0.000 .0224778 .0609192
y99 | .0948286 .014545 6.52 0.000 .066321 .1233363
concen_p1 | .1555725 .0814452 1.91 0.056 -.0040571 .3152021
------------------------------------------------------------------------------
Instrumented: lfare
Included instruments: y98 y99 concen_p1
Excluded instruments: concen
------------------------------------------------------------------------------
32
. * What if we just use fixed effects without IV?
F(4,1148) 121.85
corr(u_i, Xb) -0.3249 Prob F 0.0000
33
. * Test formally for endogeneity of lfare in FE:
------------------------------------------------------------------------------
| Robust
lpassen | Coef. Std. Err. t P|t| [95% Conf. Interval]
-----------------------------------------------------------------------------
lfare | -.301576 .4829734 -0.62 0.532 -1.249185 .6460335
y98 | .0257147 .0131382 1.96 0.051 -.0000628 .0514923
y99 | .0724165 .0197133 3.67 0.000 .0337385 .1110946
y00 | .1127914 .048597 2.32 0.020 .0174425 .2081403
u2h | -.8616344 .5278388 -1.63 0.103 -1.897271 .1740025
_cons | 7.501007 2.441322 3.07 0.002 2.711055 12.29096
-----------------------------------------------------------------------------
34
Turns out that the FE2SLS estimator is robust to random coefficients
on x it1 , but one should include a full set of time dummies.
(Murtazashvili and Wooldridge, 2005).
Can model random coefficients and use a CF approach.
y it1 c i1 x it1 b i1 u it1
y it2 2 z it 2 z i 2 v it2
35
(1) Regress y it2 on 1, z it , z i and obtain residuals v it2 .
(2) Regress
36
Binary and Fractional Response
Unobserved effects (UE) probit model exogenous variables. For a
binary or fractional y it ,
Ey it |x it , c i x it c i , t 1, . . . , T. (26)
c i x i a i , a i |x i ~ Normal(0, 2a . (27)
37
In binary response case under serial independence, all parameters are
identified and MLE (Stata: xtprobit) can be used. Just add the time
x and
averages x i as an additional set of regressors. Then c
2c N 1 i1
N
x i x x i x 2a . Can evaluate PEs at, say,
c k c .
Only under restrictive assumptions does c i have an unconditional
normal distribution, although it becomes more reasonable as T gets
large.
Simple to test H 0 : 0 as null that c i , x i are independent.
38
The APEs are identified from the ASF, estimated as
N
ASFx t N 1 x t a
a x i a (28)
i1
Ey it |x i x it a a x i a . (29)
39
A more radical suggestion, but in the spirit of Altonji and Matzkin
(2005), is to just use a flexible model for Ey it |x it , x i directly, say,
Ey it |x it , x i t x it x i (30)
x i x i x it x i .
40
In any nonlinear model using the Mundlak assumption
Dc i |x i Dc i |x i , if T 3 can include lead values, w i,t1 , to simply
test strict exogeneity.
Example: Married Womens Labor Force Participation: N 5, 663,
T 5 (four-month intervals).
Following results include a full set of time period dummies (not
reported).
The APEs are directly comparable across models, and can be
compared with the linear model coefficients.
41
LFP (1) (2) (3) (4) (5)
. 031 . 071
. 035 . 079
42
Probit with Endogenous Explanatory Variables
Represent endogeneity as an omitted, time-varying variable, in
addition to unobserved heterogeneity:
43
Papke and Wooldridge (2008, Journal of Econometrics): Use a
Chamberlain-Mundlak approach, but only relating the heterogeneity to
all strictly exogenous variables:
c i1 1 z i 1 a i1 , Da i1 |z i Da i1 .
44
Only need
Ey it1 |y it2 , z i , c i1 , v it1 x it1 1 c i1 v it1 , (32)
45
Assume a linear reduced form for y it2 :
y it2 2 z it 2 z i 2 v it2 , t 1, . . . , T (34)
Dv it2 |z i Dv it2
[Easy to allow 1 to change over time; just have time dummies interact
with v it2 .]
Assumptions effectively rule out discreteness in y it2 .
46
Write
v it1 1 v it2 e it1
47
Two step procedure (Papke and Wooldridge, 2008):
(1) Estimate the reduced form for y it2 (pooled or for each t
separately). Obtain the residuals, v it2 .
(2) Use the probit QMLE to estimate 1 , 1 , 1 and 1 .
How do we interpret the scaled estimates? They give directions of
effects. Conveniently, they also index the APEs. For given y 2 and z 1 ,
average out z i and v it2 (for each t):
N
1 N 1 1 y t2 z t1 1
1 z i 1 1 v it2 .
i1
48
Application: Effects of Spending on Test Pass Rates
N 501 school districts, T 7 time periods.
Once pre-policy spending is controlled for, instrument spending with
the foundation grant.
Initial spending takes the place of the time average of IVs.
49
. * First, linear model:
50
. * Get reduced form residuals for fractional probit:
51
. glm math4 lavgrexp v2hat lunch alunch lenroll alenroll y96-y01 lexppp94
le94y96-le94y01, fa(bin) link(probit) cluster(distid)
note: math4 has non-integer values
52
. margeff
------------------------------------------------------------------------------
variable | Coef. Std. Err. z P|z| [95% Conf. Interval]
-----------------------------------------------------------------------------
lavgrexp | .5830163 .2203345 2.65 0.008 .1511686 1.014864
v2hat | -.4641533 .242971 -1.91 0.056 -.9403678 .0120611
lunch | -.1003741 .0716361 -1.40 0.161 -.2407782 .04003
alunch | -.3754579 .0734083 -5.11 0.000 -.5193355 -.2315803
lenroll | .0962161 .0665257 1.45 0.148 -.0341719 .2266041
alenroll | -.0980059 .0669786 -1.46 0.143 -.2292817 .0332698
...
------------------------------------------------------------------------------
53
Count and Other Multiplicative Models
Conditional mean with multiplicative heterogeneity:
Ey it |x it , c i c i expx it (36)
Ey it |x i1 , . . . , x iT , c i Ey it |x it , c i , (37)
54
The FE Poisson estimator is the conditional MLE derived under a
Poisson and conditional independence assumptions. It is one of the rare
cares where treating the c i as parameters to estimate gives a consistent
estimator of .
The FE Poisson estimator is fully robust to any distributional failure
and serial correlation. y it does not even have to be is not a count
variable! Fully robust inference is easy (xtpqml in Stata).
55
For endogeneity there are control function and GMM approaches,
with the former being more convenient but imposing more restrictions.
CF uses same approach as before.
Start with an omitted variables formulation:
Ey it1 |y it2 , z i , c i1 , r it1 expx it1 1 c i1 r it1 . (38)
56
If y it2 is (roughly) continuous we might specify
y it2 2 z it 2 z i 2 v it2 .
Also write
c i1 1 z i 1 a i1
so that
57
Reasonable (but not completely general) to assume v it1 , v i2 is
independent of z i .
If we specify Eexpv it1 |v it2 exp 1 1 v it2 (as would be true
under joint normality), we obtain the estimating equation
58
Now apply a simple two-step method. (1) Obtain the residuals v it2
from the pooled OLS estimation y it2 on 1, z it , z i across t and i. (2) Use a
pooled QMLE (perhaps the Poisson or NegBin II) to estimate the
exponential function, where z i , v it2 are explanatory variables along
with x it1 . (As usual, a fully set of time period dummies is a good idea
in the first and second steps).
Note that y it2 is not strictly exogenous in the estimating equation. and
so GLS-type methods account for serial correlation should not be used.
GMM with carefully constructed moments could be.
59
Estimating the ASF is straightforward:
N
ASF t y t2 , z t1 N 1 exp 1 x t1 1 z i 1 1 v it2 ;
i1
60
A GMM approach which slightly extends Windmeijer (2002)
modifies the moment conditions under a sequential exogeneity
assumption on instruments and applies to models with lagged
dependent variables.
Write the model as
y it c i expx it r it (40)
Er it |z it , . . . , z i1 , c i 1, (41)
61
Now start with the transformation
y it y i,t1
c i r it r i,t1 . (42)
expx it expx i,t1
62
Using the moment conditions
y it y i,t1
E
expx it
expx i,t1
| z it , . . . , z i1 0, t 1, . . . , T 1 (43)
63
So, the modified moment conditions are
y it y i,t1
E
expx it x
expx i,t1 x
|z it , . . . , z i1 0. (44)
64