Wooldridge Session 4

Panel Data Models with Heterogeneity and Endogeneity
Jeff Wooldridge
Michigan State University
Programme Evaluation for Policy Analysis

Institute for Fiscal Studies
June 2012
1. Introduction
2. General Setup and Quantities of Interest
3. Assumptions with Neglected Heterogeneity
4. Models with Heterogeneity and Endogeneity
5. Estimating Some Popular Models
1
1. Introduction
When panel data models contain unobserved heterogeneity and
omitted time-varying variables, control function methods can be used to
account for both problems.
Under fairly week assumptions can obtain consistent, asymptotically
normal estimators of average structural functions provided suitable
instruments are available.
Other issues with panel data: How to treat dynamics? Models with
lagged dependent variables are hard to estimate when heterogeneity and
other sources of endogeneity are present.
2
Approaches to handling unobserved heterogeneity:
1. Treat as parameters to estimate. Can work well with large T but with
small T can have incidental parameters problem. Bias adjustments are
available for parameters and average partial effects. Usually weak
dependence or even independence is assumed across the time
dimension.
2. Remove heterogeneity to obtain an estimating equation. Works for
simple linear models and a few nonlinear models (via conditional MLE
or a quasi-MLE variant). Cannot be done in general. Also, may not be
able to identify interesting partial effects.
3
Correlated Random Effects: Mundlak/Chamberlain. Requires some
restrictions on distribution of heterogeneity, although these can be
nonparametric. Applies generally, does not impose restrictions on
dependence over time, allows estimation of average partial effects. Can
be easily combined with CF methods for endogeneity.
Can try to establish bounds rather than estimate parameters or APEs.
Chernozhukov, Fernndez-Val, Hahn, and Newey (2009) is a recent
example.
4
2. General Setup and Quantities of Interest
Static, unobserved effects probit model for panel data with an omitted
time-varying variable r it :
Py it 1|x it , c i , r it x it c i r it , t 1, . . . , T. (1)
What are the quantities of interest for most purposes?

(i) The element of , the j . These give the directions of the partial
effects of the covariates on the response probability. For any two
continuous covariates, the ratio of coefficients, j / h , is identical to the
ratio of partial effects (and the ratio does not depend on the covariates
or unobserved heterogeneity, c i ).
5
(ii) The magnitudes of the partial effects. These depend not only on the
value of the covariates, say x t , but also on the value of the unobserved
heterogeneity. In the continuous covariate case,
Py t 1|x t , c, r t
j x t c r t . (2)
x tj
Questions: (a) Assuming we can estimate , what should we do about

the unobservables c, r t ? (b) If we can only estimate up-to-scale, can
we still learn something useful about magnitudes of partial effects? (c)
What kinds of assumptions do we need to estimate partial effects?
6
Let x it , y it : t 1, . . . , T be a random draw from the cross section.
Suppose we are interested in
Ey it |x it , c i , r it m t x it , c i , r it . (3)
c i can be a vector of unobserved heterogeneity, r it a vector of omitted

time-varying variables.
Partial effects: if x tj is continuous, then
m t x t , c, r t
j x t , c, r t , (4)
x tj
or discrete changes.
7
How do we account for unobserved c i , r it ? If we know enough
about the distribution of c i , r it we can insert meaningful values for
c, r t . For example, if c Ec i , r t Er it then we can compute
the partial effect at the average (PEA),
PEA j x t j x t , c , r t . (5)
Of course, we need to estimate the function m t and c , r t . If we can

estimate the distribution of c i , r it , or features in addition to its mean,
we can insert different quantiles, or a certain number of standard
deviations from the mean.
8
Alternatively, we can obtain the average partial effect (APE) (or
population average effect) by averaging across the distribution of c i :
APEx t E c i ,r it j x t , c i , r it . (6)
The difference between (5) and (6) can be nontrivial. In some leading
cases, (6) is identified while (5) is not. (6) is closely related to the
notion of the average structural function (ASF) (Blundell and Powell
(2003)). The ASF is defined as
ASF t x t E c i ,r it m t x t , c i , r it . (7)
Passing the derivative through the expectation in (7) gives the APE.
9
3. Assumptions with Neglected Heterogeneity
Exogeneity of Covariates
Cannot get by with just specifying a model for the contemporaneous
conditional distribution, Dy it |x it , c i .
The most useful definition of strict exogeneity for nonlinear panel
data models is
Dy it |x i1 , . . . , x iT , c i Dy it |x it , c i . (8)
Chamberlain (1984) labeled (8) strict exogeneity conditional on the

unobserved effects c i . Conditional mean version:
Ey it |x i1 , . . . , x iT , c i Ey it |x it , c i . (9)
10
The sequential exogeneity assumption is
Dy it |x i1 , . . . , x it , c i Dy it |x it , c i . (10)
Much more difficult to allow sequential exogeneity in in nonlinear

models. (Most progress has been made for lagged dependent variables
or specific functional forms, such as exponential.)
Neither strict nor sequential exogeneity allows for contemporaneous
endogeneity of one or more elements of x it , where, say, x itj is correlated
with unobserved, time-varying unobservables that affect y it .
11
Conditional Independence
In linear models, serial dependence of idiosyncratic shocks is easily
dealt with, either by cluster robust inference or Generalized Least
Squares extensions of Fixed Effects and First Differencing. With
strictly exogenous covariates, serial correlation never results in
inconsistent estimation, even if improperly modeled. The situation is
different with most nonlinear models estimated by MLE.
Conditional independence (CI) (under strict exogeneity):
T
Dy i1 , . . . , y iT |x i , c i Dy it |x it , c i . (11)
t1
12
In a parametric context, the CI assumption reduces our task to
specifying a model for Dy it |x it , c i , and then determining how to treat
the unobserved heterogeneity, c i .
In random effects and correlated random frameworks (next section),
CI plays a critical role in being able to estimate the structural
parameters and the parameters in the distribution of c i (and therefore, in
estimating PEAs). In a broad class of popular models, CI plays no
essential role in estimating APEs.
13
Assumptions about the Unobserved Heterogeneity
Random Effects
Generally stated, the key RE assumption is
Dc i |x i1 , . . . , x iT Dc i . (12)
Under (12), the APEs are actually nonparametrically identified from
Ey it |x it x t . (13)
In some leading cases (RE probit and RE Tobit with heterogeneity

normally distributed), if we want PEs for different values of c, we must
assume more: strict exogeneity, conditional independence, and (12)
with a parametric distribution for Dc i .
14
Correlated Random Effects
A CRE framework allows dependence between c i and x i , but restricted
in some way. In a parametric setting, we specify a distribution for
Dc i |x i1 , . . . , x iT , as in Chamberlain (1980,1982), and much work
since. Distributional assumptions that lead to simple estimation
homoskedastic normal with a linear conditional mean can be
restrictive.
15
Possible to drop parametric assumptions and just assume
Dc i |x i Dc i |x i , (14)
without restricting Dc i |x i . Altonji and Matzkin (2005, Econometrica).

Other functions of x it : t 1, . . . , T are possible.
16
APEs are identified very generally. For example, under (14), a
consistent estimate of the average structural function is
N
ASFx t N 1 q t x t , x i , (15)
i1
where q t x it , x i Ey it |x it , x i .
Need a random sample x i : i 1, . . . , N for the averaging out to
work.
17
Fixed Effects
The label fixed effects is used differently by different researchers.
One view: c i , i 1, . . . , N are parameters to be estimated. Usually leads
to an incidental parameters problem.
Second meaning of fixed effects: Dc i |x i is unrestricted and we
look for objective functions that do not depend on c i but still identify
the population parameters. Leads to conditional MLE if we can find
sufficient statistics s i such that
Dy i1 , . . . , y iT |x i , c i , s i Dy i1 , . . . , y iT |x i , s i . (16)
Conditional Independence is usually maintained.

Key point: PEAs and APEs are generally unidentified.
18
4. Models with Heterogeneity and Endogeneity
Let y it1 be a scalar response, y it2 a vector of endogenous variables,
z it1 exogenous variables, and we have
Ey it1 |y it2 , z it1 , c i1 , r it1 m t1 y it2 , z it1 , c i1 , r it1 (17)
y it2 is allowed to be correlated with r it1 (as well as with c i1 ).

The vector of exogenous variables z it : t 1, . . . , T with z it1 z it
are strictly exogenous in the sense that
Ey it |y it2 , z i , c i1 , r it1 Ey it |y it2 , z it1 , c i1 , r it1 (18)

Dr it1 |z i , c i1 Dr it1 (19)
19
Sometimes we can eliminate c i and obtain an equation that can be
estimated by IV (linear, exponential). Generally not possible.
Now a CRE approach involves modeling Dc i1 |z i .
Generally, we need to model how y it2 is related to r it1 .
Control Function methods are convenient for allowing both.
Suppose y it2 is a scalar and
y it2 m it2 z it , z i , 2 v it2
Ev it2 |z i 0 (20)
Dr it1 |v it2 , z i Dr it1 |v it2
20
With suitable time-variation in the instruments, the assumptions in
(20) allow identification of the ASF if we assume a model for
Dc i1 |z i , v it2
Generally, we can estimate
Ey it1 |y it2 , z i , v it2 Ey it1 |y it2 , z it1 , z i , v it2 g t1 y it2 , z it1 , z i , v it2 (21)
21
The ASF is now obtained by averaging out z i , v it2 :
ASFy t2 , z t1 E z i ,v it2 g t1 y t2 , z t1 , z i , v it2
Most of this can be fully nonparametric (Altonji and Matzkin, 2005;

Blundell and Powell, 2003) although some restriction is needed on
Dc i1 |z i , v it2 , such as
Dc i1 |z i , v it2 Dc i1 |z i , v it2
With T sufficiently large we can add other features of

z it : t 1, . . . , T to z i .
22
5. Estimating Some Popular Models
Linear Model with Endogeneity
Simplest model is
y it1 1 y it2 z it1 1 c i1 u it1 x it1 1 c i1 u it1 (22)
Eu it1 |z i , c i1 0
The fixed effects 2SLS estimator is common. Deviate variables from

time averages to remove c i1 then apply IV:
it1 x it1 1 it1

z it z it z i
23
Easy to make inference robust to serial correlation and
heteroskedasticity in u it1 . (Cluster-robust inference.)
Test for (strict) exogeneity of y it2 :
(i) Estimate the reduced form of y it2 by usual fixed effects:
y it2 z it 1 c i2 u it2

Get the FE residuals, it2 it2 z it 1 .
Estimate the augment equation

y it1 1 y it2 z it1 1 1 it2 c i1 error it (23)
by FE and use a cluster-robust test of H 0 : 1 0.
24
The random effects IV approach assumes c i1 is uncorrelated with z i ,
and nominally imposes serial independence on u it1 .
Simple way to test the null whether REIV is sufficient. (Robust
Hausman test comparing REIV and FEIV.)
Estimate
y it1 1 x it1 1 z i 1 a i1 u it1 (24)
by REIV, using instruments 1, z it , z i . The estimator of 1 is the FEIV

estimator.
Test H 0 : 1 0, preferably using a fully robust test. A rejection is
evidence that the IVs are correlated with c i , and should use FEIV.
25
Other than the rank condition, the key condition for FEIV to be
consistent is that the instruments, z it , are strictly exogenous with
respect to u it . With T 3 time periods, this is easily tested as in the
usual FE case.
The augmented model is
y it1 x it1 1 z i,t1 1 c i1 u it1 , t 1, . . . , T 1
and we estimate it by FEIV, using instruments z it , z i,t1 .

Use a fully robust Wald test of H 0 : 1 0. Can be selective about
which leads to include.
26
Example: Estimating a Passenger Demand Function for Air Travel
N 1, 149, T 4.
Uses route concentration for largest carrier as IV for logfare.
. use airfare
. * Reduced form for lfare; concen is the IV.
. xtreg lfare concen ldist ldistsq y98 y99 y00, fe cluster(id)
(Std. Err. adjusted for 1149 clusters in id)

------------------------------------------------------------------------------
| Robust
lfare | Coef. Std. Err. t P|t| [95% Conf. Interval]
-----------------------------------------------------------------------------
concen | .168859 .0494587 3.41 0.001 .0718194 .2658985
ldist | (dropped)
ldistsq | (dropped)
y98 | .0228328 .004163 5.48 0.000 .0146649 .0310007
y99 | .0363819 .0051275 7.10 0.000 .0263215 .0464422
y00 | .0977717 .0055054 17.76 0.000 .0869698 .1085735
_cons | 4.953331 .0296765 166.91 0.000 4.895104 5.011557
-----------------------------------------------------------------------------
sigma_u | .43389176
sigma_e | .10651186
rho | .94316439 (fraction of variance due to u_i)
------------------------------------------------------------------------------
27
. xtivreg lpassen ldist ldistsq y98 y99 y00 (lfare concen), re theta
G2SLS random-effects IV regression Number of obs 4596

Group variable: id Number of groups 1149
R-sq: within 0.4075 Obs per group: min 4

between 0.0542 avg 4.0
overall 0.0641 max 4
Wald chi2(6) 231.10

corr(u_i, X) 0 (assumed) Prob chi2 0.0000
theta .91099494
------------------------------------------------------------------------------
lpassen | Coef. Std. Err. z P|z| [95% Conf. Interval]
-----------------------------------------------------------------------------
lfare | -.5078762 .229698 -2.21 0.027 -.958076 -.0576763
ldist | -1.504806 .6933147 -2.17 0.030 -2.863678 -.1459338
ldistsq | .1176013 .0546255 2.15 0.031 .0105373 .2246652
y98 | .0307363 .0086054 3.57 0.000 .0138699 .0476027
y99 | .0796548 .01038 7.67 0.000 .0593104 .0999992
y00 | .1325795 .0229831 5.77 0.000 .0875335 .1776255
_cons | 13.29643 2.626949 5.06 0.000 8.147709 18.44516
-----------------------------------------------------------------------------
sigma_u | .94920686
sigma_e | .16964171
------------------------------------------------------------------------------
Instrumented: lfare
Instruments: ldist ldistsq y98 y99 y00 concen
------------------------------------------------------------------------------
. * The quasi-time-demeaning parameter is quite large: .911 ("theta").
28
. xtivreg2 lpassen ldist ldistsq y98 y99 y00 (lfare concen), fe cluster(id)
Warning - collinearities detected
Vars dropped: ldist ldistsq
FIXED EFFECTS ESTIMATION

------------------------
Number of groups 1149 Obs per group: min 4
avg 4.0
max 4
Number of clusters (id) 1149 Number of obs 4596

F( 4, 1148) 26.07
Prob F 0.0000
Total (centered) SS 128.0991685 Centered R2 0.2265
Total (uncentered) SS 128.0991685 Uncentered R2 0.2265
Residual SS 99.0837238 Root MSE .1695
------------------------------------------------------------------------------
| Robust
-----------------------------------------------------------------------------
lfare | -.3015761 .6124127 -0.49 0.622 -1.501883 .8987307
y98 | .0257147 .0164094 1.57 0.117 -.0064471 .0578766
y99 | .0724166 .0250971 2.89 0.004 .0232272 .1216059
y00 | .1127914 .0620115 1.82 0.069 -.0087488 .2343316
------------------------------------------------------------------------------
Instrumented: lfare
Included instruments: y98 y99 y00
Excluded instruments: concen
------------------------------------------------------------------------------
29
. egen concenb mean(concen), by(id)
. xtivreg lpassen ldist ldistsq y98 y99 y00 concenb (lfare concen), re theta
G2SLS random-effects IV regression Number of obs 4596

Wald chi2(7) 218.80

corr(u_i, X) 0 (assumed) Prob chi2 0.0000
theta .90084889
------------------------------------------------------------------------------
-----------------------------------------------------------------------------
lfare | -.3015761 .2764376 -1.09 0.275 -.8433838 .2402316
ldist | -1.148781 .6970189 -1.65 0.099 -2.514913 .2173514
ldistsq | .0772565 .0570609 1.35 0.176 -.0345808 .1890937
y98 | .0257147 .0097479 2.64 0.008 .0066092 .0448203
y99 | .0724165 .0119924 6.04 0.000 .0489118 .0959213
y00 | .1127914 .0274377 4.11 0.000 .0590146 .1665682
concenb | -.5933022 .1926313 -3.08 0.002 -.9708527 -.2157518
_cons | 12.0578 2.735977 4.41 0.000 6.695384 17.42022
-----------------------------------------------------------------------------
sigma_u | .85125514
sigma_e | .16964171
------------------------------------------------------------------------------
Instrumented: lfare
Instruments: ldist ldistsq y98 y99 y00 concenb concen
------------------------------------------------------------------------------
30
. ivreg lpassen ldist ldistsq y98 y99 y00 concenb (lfare concen), cluster(id)
Instrumental variables (2SLS) regression Number of obs 4596

F( 7, 1148) 20.28
Prob F 0.0000
R-squared 0.0649
Root MSE .85549

------------------------------------------------------------------------------
| Robust
lpassen | Coef. Std. Err. t P|t| [95% Conf. Interval]
-----------------------------------------------------------------------------
lfare | -.3015769 .6131465 -0.49 0.623 -1.50459 .9014366
ldist | -1.148781 .8809895 -1.30 0.193 -2.877312 .5797488
ldistsq | .0772566 .0811787 0.95 0.341 -.0820187 .2365319
y98 | .0257148 .0164291 1.57 0.118 -.0065196 .0579491
y99 | .0724166 .0251272 2.88 0.004 .0231163 .1217169
y00 | .1127915 .0620858 1.82 0.070 -.0090228 .2346058
concenb | -.5933019 .2963723 -2.00 0.046 -1.174794 -.0118099
_cons | 12.05781 4.360868 2.77 0.006 3.50164 20.61397
------------------------------------------------------------------------------
Instrumented: lfare
Instruments: ldist ldistsq y98 y99 y00 concenb concen
------------------------------------------------------------------------------
31
. * Now test whether instrument (concen) is strictly exogenous.
. xtivreg2 lpassen y98 y99 concen_p1 (lfare concen), fe cluster(id)
FIXED EFFECTS ESTIMATION

------------------------
Number of groups 1149 Obs per group: min 3
avg 3.0
max 3
Number of clusters (id) 1149 Number of obs 3447

F( 4, 1148) 33.41
Prob F 0.0000
Total (centered) SS 67.47207834 Centered R2 0.4474
Total (uncentered) SS 67.47207834 Uncentered R2 0.4474
Residual SS 37.28476721 Root MSE .1274
------------------------------------------------------------------------------
| Robust
-----------------------------------------------------------------------------
lfare | -.8520992 .3211832 -2.65 0.008 -1.481607 -.2225917
y98 | .0416985 .0098066 4.25 0.000 .0224778 .0609192
y99 | .0948286 .014545 6.52 0.000 .066321 .1233363
concen_p1 | .1555725 .0814452 1.91 0.056 -.0040571 .3152021
------------------------------------------------------------------------------
Instrumented: lfare
Included instruments: y98 y99 concen_p1
Excluded instruments: concen
------------------------------------------------------------------------------
32
. * What if we just use fixed effects without IV?
. xtreg lpassen lfare y98 y99 y00, fe cluster(id)
Fixed-effects (within) regression Number of obs 4596

R-sq: within 0.4507 Obs per group: min 4

between 0.0487 avg 4.0
overall 0.0574 max 4
F(4,1148) 121.85
corr(u_i, Xb) -0.3249 Prob F 0.0000

------------------------------------------------------------------------------
| Robust
-----------------------------------------------------------------------------
lfare | -1.155039 .1086574 -10.63 0.000 -1.368228 -.9418496
y98 | .0464889 .0049119 9.46 0.000 .0368516 .0561262
y99 | .1023612 .0063141 16.21 0.000 .0899727 .1147497
y00 | .1946548 .0097099 20.05 0.000 .1756036 .213706
_cons | 11.81677 .55126 21.44 0.000 10.73518 12.89836
-----------------------------------------------------------------------------
sigma_u | .89829067
sigma_e | .14295339
------------------------------------------------------------------------------
33
. * Test formally for endogeneity of lfare in FE:
. qui areg lfare concen y98 y99 y00, absorb(id)
. predict u2h, resid
. xtreg lpassen lfare y98 y99 y00 v2h, fe cluster(id)
------------------------------------------------------------------------------
| Robust
-----------------------------------------------------------------------------
lfare | -.301576 .4829734 -0.62 0.532 -1.249185 .6460335
y98 | .0257147 .0131382 1.96 0.051 -.0000628 .0514923
y99 | .0724165 .0197133 3.67 0.000 .0337385 .1110946
y00 | .1127914 .048597 2.32 0.020 .0174425 .2081403
u2h | -.8616344 .5278388 -1.63 0.103 -1.897271 .1740025
_cons | 7.501007 2.441322 3.07 0.002 2.711055 12.29096
-----------------------------------------------------------------------------
. * p-value is about .10, so not strong evidence even though FE and

. * FEIV estimatoestimates are uite different.
34
Turns out that the FE2SLS estimator is robust to random coefficients
on x it1 , but one should include a full set of time dummies.
(Murtazashvili and Wooldridge, 2005).
Can model random coefficients and use a CF approach.
y it1 c i1 x it1 b i1 u it1
y it2 2 z it 2 z i 2 v it2
Assume Ec i1 |z i , v it2 and Eb i1 |z i , v it2 are linear in z i , v it2 and

Eu it1 |z i , v it2 is linear in v it2 , can show
Ey it1 |z i , v it2 1 x it1 1 z i 1 1 v it2 (25)

z i z x it1 1 v it2 x it1 1
35
(1) Regress y it2 on 1, z it , z i and obtain residuals v it2 .
(2) Regress
y it1 on 1, x it1 , z i , v it2 , z i z x it1 , v it2 x it1
Probably include time dummies in both stages.
36
Binary and Fractional Response
Unobserved effects (UE) probit model exogenous variables. For a
binary or fractional y it ,
Ey it |x it , c i x it c i , t 1, . . . , T. (26)
Assume strict exogeneity (conditional on c i ) and Chamberlain-Mundlak

device:
c i x i a i , a i |x i ~ Normal(0, 2a . (27)
37
In binary response case under serial independence, all parameters are
identified and MLE (Stata: xtprobit) can be used. Just add the time
x and
averages x i as an additional set of regressors. Then c

2c N 1 i1
N
x i x x i x 2a . Can evaluate PEs at, say,
c k c .
Only under restrictive assumptions does c i have an unconditional
normal distribution, although it becomes more reasonable as T gets
large.
Simple to test H 0 : 0 as null that c i , x i are independent.
38
The APEs are identified from the ASF, estimated as
N
ASFx t N 1 x t a
a x i a (28)
i1
where, for example, a /1 2a 1/2 .

For binary or fractional response, APEs are identified without the
conditional serial independence assumption. Use pooled Bernoulli
quasi-MLE (Stata: glm) or generalized estimating equations (Stata:
xtgee) to estimate scaled coefficients based on
Ey it |x i x it a a x i a . (29)
(Time dummies have been supressed for simplicity.)
39
A more radical suggestion, but in the spirit of Altonji and Matzkin
(2005), is to just use a flexible model for Ey it |x it , x i directly, say,
Ey it |x it , x i t x it x i (30)
x i x i x it x i .
Just average out over x i to get APEs.

If we have a binary response, start with
Py it 1|x it , c i x it c i , (31)
and assume CI, we can estimate by FE logit without restricting

Dc i |x i .
40
In any nonlinear model using the Mundlak assumption
Dc i |x i Dc i |x i , if T 3 can include lead values, w i,t1 , to simply
test strict exogeneity.
Example: Married Womens Labor Force Participation: N 5, 663,
T 5 (four-month intervals).
Following results include a full set of time period dummies (not
reported).
The APEs are directly comparable across models, and can be
compared with the linear model coefficients.
41
LFP (1) (2) (3) (4) (5)
Model Linear Probit CRE Probit CRE Probit FE Logit
Est. Method FE Pooled MLE Pooled MLE MLE MLE
Coef. Coef. APE Coef. APE Coef. APE Coef.
kids . 0389 . 199 . 0660 . 117 . 0389 . 317 . 0403 . 644
. 0092 . 015 . 0048 . 027 . 0085 . 062 . 0104 . 125
lhinc . 0089 . 211 . 0701 . 029 . 0095 . 078 . 0099 . 184
. 0046 . 024 . 0079 . 014 . 0048 . 041 . 0055 . 083
kids . 086 . 210
. 031 . 071
lhinc . 250 . 646
. 035 . 079
42
Probit with Endogenous Explanatory Variables
Represent endogeneity as an omitted, time-varying variable, in
addition to unobserved heterogeneity:
Py it1 1|y it2 , z i , c i1 , v it1 Py it1 1|y it2 , z it1 , c i1 , r it1

x it1 1 c i1 r it1
Elements of z it are assumed strictly exogenous, and we have at least

one exclusion restriction: z it z it1 , z it2 .
43
Papke and Wooldridge (2008, Journal of Econometrics): Use a
Chamberlain-Mundlak approach, but only relating the heterogeneity to
all strictly exogenous variables:
c i1 1 z i 1 a i1 , Da i1 |z i Da i1 .
Even before we specify Da i1 , this is restrictive because it assumes,

in particular, Ec i |z i is linear in z i and that Varc i |z i is constant.
Using nonparametrics can get by with less, such as
Dc i1 |z i Dc i1 |z i .
44
Only need
Ey it1 |y it2 , z i , c i1 , v it1 x it1 1 c i1 v it1 , (32)
so applies to fractional response.

Need to obtain an estimating equation. First, note that
Ey it1 |y it2 , z i , a i1 , r it1 x it1 1 1 z i 1 a i1 r it1
x it1 1 1 z i 1 v it1 . (33)
45
Assume a linear reduced form for y it2 :
y it2 2 z it 2 z i 2 v it2 , t 1, . . . , T (34)
Dv it2 |z i Dv it2
(and we might allow for time-varying coefficients).

Next, assume
v it1 |z i , v it2 Normal 1 v it2 , 21 , t 1, . . . , T.
[Easy to allow 1 to change over time; just have time dummies interact
with v it2 .]
Assumptions effectively rule out discreteness in y it2 .
46
Write
v it1 1 v it2 e it1
where e it1 is independent of z i , v it2 (and, therefore, of y it2 ) and

normally distributed. Again, using a standard mixing property of the
normal distribution,
Ey it1 |y it2 , z i , v it2 x it1 1 1 z i 1 1 v it2 (35)
where the denotes division by 1 21 1/2 .

Identification comes off of the exclusion of the time-varying
exogenous variables z it2 .
47
Two step procedure (Papke and Wooldridge, 2008):
(1) Estimate the reduced form for y it2 (pooled or for each t
separately). Obtain the residuals, v it2 .
(2) Use the probit QMLE to estimate 1 , 1 , 1 and 1 .
How do we interpret the scaled estimates? They give directions of
effects. Conveniently, they also index the APEs. For given y 2 and z 1 ,
average out z i and v it2 (for each t):
N
1 N 1 1 y t2 z t1 1
1 z i 1 1 v it2 .
i1
48
Application: Effects of Spending on Test Pass Rates
N 501 school districts, T 7 time periods.
Once pre-policy spending is controlled for, instrument spending with
the foundation grant.
Initial spending takes the place of the time average of IVs.
49
. * First, linear model:
. ivreg math4 lunch alunch lenroll alenroll y96-y01 lexppp94 le94y96-le94y01

(lavgrexp lfound lfndy96-lfndy01), cluster(distid)
Instrumental variables (2SLS) regression Number of obs 3507

F( 18, 500) 107.05
Prob F 0.0000
R-squared 0.4134
Root MSE .11635
(Std. Err. adjusted for 501 clusters in distid)

------------------------------------------------------------------------------
| Robust
math4 | Coef. Std. Err. t P|t| [95% Conf. Interval]
-----------------------------------------------------------------------------
lavgrexp | .5545247 .2205466 2.51 0.012 .1212123 .987837
lunch | -.0621991 .0742948 -0.84 0.403 -.2081675 .0837693
alunch | -.4207815 .0758344 -5.55 0.000 -.5697749 -.2717882
lenroll | .0463616 .0696215 0.67 0.506 -.0904253 .1831484
alenroll | -.049052 .070249 -0.70 0.485 -.1870716 .0889676
y96 | -1.085453 .2736479 -3.97 0.000 -1.623095 -.5478119
...
y01 | -.704579 .7310773 -0.96 0.336 -2.140941 .7317831
lexppp94 | -.4343213 .2189488 -1.98 0.048 -.8644944 -.0041482
le94y96 | .1253255 .0318181 3.94 0.000 .0628119 .1878392
...
le94y01 | .0865874 .0816732 1.06 0.290 -.0738776 .2470524
_cons | -.334823 .2593105 -1.29 0.197 -.8442955 .1746496
------------------------------------------------------------------------------
Instrumented: lavgrexp
Instruments: lunch alunch lenroll alenroll y96 y97 y98 y99 y00 y01
lexppp94 le94y96 le94y97 le94y98 le94y99 le94y00 le94y01
lfound lfndy96 lfndy97 lfndy98 lfndy99 lfndy00 lfndy01
------------------------------------------------------------------------------
50
. * Get reduced form residuals for fractional probit:
. reg lavgrexp lfound lfndy96-lfndy01 lunch alunch lenroll alenroll y96-y01

lexppp94 le94y96-le94y01, cluster(distid)
Linear regression Number of obs 3507

F( 24, 500) 1174.57
Prob F 0.0000
R-squared 0.9327
Root MSE .03987

------------------------------------------------------------------------------
| Robust
lavgrexp | Coef. Std. Err. t P|t| [95% Conf. Interval]
-----------------------------------------------------------------------------
lfound | .2447063 .0417034 5.87 0.000 .1627709 .3266417
lfndy96 | .0053951 .0254713 0.21 0.832 -.044649 .0554391
lfndy97 | -.0059551 .0401705 -0.15 0.882 -.0848789 .0729687
lfndy98 | .0045356 .0510673 0.09 0.929 -.0957972 .1048685
lfndy99 | .0920788 .0493854 1.86 0.063 -.0049497 .1891074
lfndy00 | .1364484 .0490355 2.78 0.006 .0401074 .2327894
lfndy01 | .2364039 .0555885 4.25 0.000 .127188 .3456198
...
_cons | .1632959 .0996687 1.64 0.102 -.0325251 .359117
------------------------------------------------------------------------------
. predict v2hat, resid

(1503 missing values generated)
51
. glm math4 lavgrexp v2hat lunch alunch lenroll alenroll y96-y01 lexppp94
le94y96-le94y01, fa(bin) link(probit) cluster(distid)
note: math4 has non-integer values
Generalized linear models No. of obs 3507

Optimization : ML Residual df 3487
Scale parameter 1
Deviance 236.0659249 (1/df) Deviance .0676989
Pearson 223.3709371 (1/df) Pearson .0640582
Variance function: V(u) u*(1-u/1) [Binomial]

Link function : g(u) invnorm(u) [Probit]

------------------------------------------------------------------------------
| Robust
math4 | Coef. Std. Err. z P|z| [95% Conf. Interval]
-----------------------------------------------------------------------------
lavgrexp | 1.731039 .6541194 2.65 0.008 .4489886 3.013089
v2hat | -1.378126 .720843 -1.91 0.056 -2.790952 .0347007
lunch | -.2980214 .2125498 -1.40 0.161 -.7146114 .1185686
alunch | -1.114775 .2188037 -5.09 0.000 -1.543623 -.685928
lenroll | .2856761 .197511 1.45 0.148 -.1014383 .6727905
alenroll | -.2909903 .1988745 -1.46 0.143 -.6807771 .0987966
...
_cons | -2.455592 .7329693 -3.35 0.001 -3.892185 -1.018998
------------------------------------------------------------------------------
52
. margeff
Average partial effects after glm

y Pr(math4)
------------------------------------------------------------------------------
variable | Coef. Std. Err. z P|z| [95% Conf. Interval]
-----------------------------------------------------------------------------
lavgrexp | .5830163 .2203345 2.65 0.008 .1511686 1.014864
v2hat | -.4641533 .242971 -1.91 0.056 -.9403678 .0120611
lunch | -.1003741 .0716361 -1.40 0.161 -.2407782 .04003
alunch | -.3754579 .0734083 -5.11 0.000 -.5193355 -.2315803
lenroll | .0962161 .0665257 1.45 0.148 -.0341719 .2266041
alenroll | -.0980059 .0669786 -1.46 0.143 -.2292817 .0332698
...
------------------------------------------------------------------------------
. * These standard errors do not account for the first-stage estimation.

. * Can use the panel bootstrap. Might also look for partial effects at
. * different parts of the spending distribution.
53
Count and Other Multiplicative Models
Conditional mean with multiplicative heterogeneity:
Ey it |x it , c i c i expx it (36)
where c i 0. Under strict exogeneity in the mean,
Ey it |x i1 , . . . , x iT , c i Ey it |x it , c i , (37)
the fixed effects Poisson estimator is attractive: it does not restrict

Dy it |x i , c i , Dc i |x i , or serial dependence.
54
The FE Poisson estimator is the conditional MLE derived under a
Poisson and conditional independence assumptions. It is one of the rare
cares where treating the c i as parameters to estimate gives a consistent
estimator of .
The FE Poisson estimator is fully robust to any distributional failure
and serial correlation. y it does not even have to be is not a count
variable! Fully robust inference is easy (xtpqml in Stata).
55
For endogeneity there are control function and GMM approaches,
with the former being more convenient but imposing more restrictions.
CF uses same approach as before.
Start with an omitted variables formulation:
Ey it1 |y it2 , z i , c i1 , r it1 expx it1 1 c i1 r it1 . (38)
The z it including the excluded instruments are assumed to be

strictly exogenous here.
56
If y it2 is (roughly) continuous we might specify
y it2 2 z it 2 z i 2 v it2 .
Also write
c i1 1 z i 1 a i1
so that
Ey it1 |y it2 , z i , v it1 exp 1 x it1 1 z i 1 v it1 ,
where v it1 a i1 r it1 .
57
Reasonable (but not completely general) to assume v it1 , v i2 is
independent of z i .
If we specify Eexpv it1 |v it2 exp 1 1 v it2 (as would be true
under joint normality), we obtain the estimating equation
Ey it1 |y it2 , z i , v it2 exp 1 x it1 1 z i 1 1 v it2 . (39)
58
Now apply a simple two-step method. (1) Obtain the residuals v it2
from the pooled OLS estimation y it2 on 1, z it , z i across t and i. (2) Use a
pooled QMLE (perhaps the Poisson or NegBin II) to estimate the
exponential function, where z i , v it2 are explanatory variables along
with x it1 . (As usual, a fully set of time period dummies is a good idea
in the first and second steps).
Note that y it2 is not strictly exogenous in the estimating equation. and
so GLS-type methods account for serial correlation should not be used.
GMM with carefully constructed moments could be.
59
Estimating the ASF is straightforward:
N
ASF t y t2 , z t1 N 1 exp 1 x t1 1 z i 1 1 v it2 ;
i1
that is, we average out z i , v it2 .

Test the null of contemporaneous exogeneity of y it2 by using a fully
robust t statistic on v it2 .
Can allow more flexibility by iteracting z i , v it2 with x it1 , or even just
year dummies.
60
A GMM approach which slightly extends Windmeijer (2002)
modifies the moment conditions under a sequential exogeneity
assumption on instruments and applies to models with lagged
dependent variables.
Write the model as
y it c i expx it r it (40)
Er it |z it , . . . , z i1 , c i 1, (41)
which contains the case of sequentially exogenous regressors as a

special case (z it x it ).
61
Now start with the transformation
y it y i,t1
c i r it r i,t1 . (42)
expx it expx i,t1
Can easily show that

Ec i r it r i,t1 |z it , . . . , z i1 0, t 1, . . . , T 1.
62
Using the moment conditions
y it y i,t1
E
expx it

expx i,t1
| z it , . . . , z i1 0, t 1, . . . , T 1 (43)
generally causes computational problems. For example, if x itj 0 for

some j and all i and t for example, if x itj is a time dummy then the
moment conditions can be made arbitarily close to zero by choosing j
larger and larger.
Windmeijer (2002, Economics Letters) suggested multiplying
through by exp x where x T 1 r1 Ex ir .
T
63
So, the modified moment conditions are
y it y i,t1
E
expx it x

expx i,t1 x
|z it , . . . , z i1 0. (44)
As a practical matter, replace x with the overall sample average,

N T
x NT 1 x ir . (45)
i1 r1
The deviated variables, x it x , will always take on positive and

negative values, and this seems to solve the GMM computational
problem.
64

Wooldridge Session 4

Uploaded by

Copyright:

Available Formats

Wooldridge Session 4

Uploaded by

Document Information

Original Description:

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Wooldridge Session 4

Uploaded by

Copyright:

Available Formats

Panel Data Models with Heterogeneity and Endogeneity

Programme Evaluation for Policy Analysis

What are the quantities of interest for most purposes?

Questions: (a) Assuming we can estimate , what should we do about

c i can be a vector of unobserved heterogeneity, r it a vector of omitted

Of course, we need to estimate the function m t and c , r t . If we can

Chamberlain (1984) labeled (8) strict exogeneity conditional on the

Much more difficult to allow sequential exogeneity in in nonlinear

Under (12), the APEs are actually nonparametrically identified from

In some leading cases (RE probit and RE Tobit with heterogeneity

without restricting Dc i |x i . Altonji and Matzkin (2005, Econometrica).

Conditional Independence is usually maintained.

Ey it1 |y it2 , z it1 , c i1 , r it1 m t1 y it2 , z it1 , c i1 , r it1 (17)

y it2 is allowed to be correlated with r it1 (as well as with c i1 ).

Ey it |y it2 , z i , c i1 , r it1 Ey it |y it2 , z it1 , c i1 , r it1 (18)

Generally, we can estimate

Most of this can be fully nonparametric (Altonji and Matzkin, 2005;

With T sufficiently large we can add other features of

The fixed effects 2SLS estimator is common. Deviate variables from

it1 x it1 1 it1

by FE and use a cluster-robust test of H 0 : 1 0.

y it1 1 x it1 1 z i 1 a i1 u it1 (24)

by REIV, using instruments 1, z it , z i . The estimator of 1 is the FEIV

and we estimate it by FEIV, using instruments z it , z i,t1 .

. * Reduced form for lfare; concen is the IV.

. xtreg lfare concen ldist ldistsq y98 y99 y00, fe cluster(id)

(Std. Err. adjusted for 1149 clusters in id)

G2SLS random-effects IV regression Number of obs 4596

R-sq: within 0.4075 Obs per group: min 4

Wald chi2(6) 231.10

. * The quasi-time-demeaning parameter is quite large: .911 ("theta").

FIXED EFFECTS ESTIMATION

Number of clusters (id) 1149 Number of obs 4596

G2SLS random-effects IV regression Number of obs 4596

Wald chi2(7) 218.80

Instrumental variables (2SLS) regression Number of obs 4596

(Std. Err. adjusted for 1149 clusters in id)

. xtivreg2 lpassen y98 y99 concen_p1 (lfare concen), fe cluster(id)

FIXED EFFECTS ESTIMATION

Number of clusters (id) 1149 Number of obs 3447

. xtreg lpassen lfare y98 y99 y00, fe cluster(id)

Fixed-effects (within) regression Number of obs 4596

R-sq: within 0.4507 Obs per group: min 4

(Std. Err. adjusted for 1149 clusters in id)

. qui areg lfare concen y98 y99 y00, absorb(id)

. predict u2h, resid

. xtreg lpassen lfare y98 y99 y00 v2h, fe cluster(id)

. * p-value is about .10, so not strong evidence even though FE and

Assume Ec i1 |z i , v it2 and Eb i1 |z i , v it2 are linear in z i , v it2 and

Ey it1 |z i , v it2 1 x it1 1 z i 1 1 v it2 (25)

y it1 on 1, x it1 , z i , v it2 , z i z x it1 , v it2 x it1

Probably include time dummies in both stages.

Assume strict exogeneity (conditional on c i ) and Chamberlain-Mundlak

where, for example, a /1 2a 1/2 .

(Time dummies have been supressed for simplicity.)

Just average out over x i to get APEs.

and assume CI, we can estimate by FE logit without restricting

Model Linear Probit CRE Probit CRE Probit FE Logit

Est. Method FE Pooled MLE Pooled MLE MLE MLE